This note will content the thing which finding on working progress with K8s. Just take note and link for resolving the problem. Find out detail if it has unique directory
Cronjobs --> Create Jobs (Trigger by scheduled) --> Pod : In this situation, Pod in K8s can used the volume and mount succeed when the script running. But if you applied it with pods, it will not, your command will run faster than mount progress. Checked it in this link
Pods are not directly evicted due to high CPU pressure or usage alone. Instead, Kubernetes relies on CPU throttling mechanisms to manage and limit a podβs CPU usage, ensuring fair resource sharing among pods on the same node.
While high CPU usage by a pod can indirectly contribute to resource pressure and potentially lead to eviction due to memory or other resource shortages, CPU throttling is the primary mechanism used to manage CPU-intensive workloads
Hand-on kubectl with Kubernetes
Configuration the Private Registry
This configuration is pretty simple but truly important for Kubernetes to pull image from private registry - One of best practice in enterprise Kubernetes Platform or container platform. Explore more at Pull an Image from a Private Registry
With kubectl you will have two opts to create the registry-cred, including
First of all, you can create from exist file
# You can create the docker-cred, usually at $HOME/.docker/config.json# Use command: docker login, now you can cred secret by that filekubectl create secret generic <secret-name> \ --from-file=.dockerconfigjson=<path/to/.docker/config.json> \ --type=kubernetes.io/dockerconfigjson# Or usekubectl create secret docker-registry <secret-name> \ --from-file=.dockerconfigjson=path/to/.docker/config.json
Second, you can use the authorization by own kubectl
# Delete a pod using the type and name specified in pod.jsonkubectl delete -f ./pod.json# Delete a pod with no grace periodkubectl delete pod unwanted --nowkubectl delete pods <pod> --grace-period=0# Delete pods and services with same names "baz" and "foo"kubectl delete pod,service baz foo
Edit YAML manifest
kubectl can help you directly change manifest on your shell. If you Linux or macos user, you can use nano or vim to use feature
# Edit the service named docker-registrykubectl edit svc/docker-registry # Use an alternative editorKUBE_EDITOR="nano" kubectl edit svc/docker-registry
When you hit to complete button, your workload or resource will change immediately
Health check and interact with cluster, node and workload
Use the events command for detect what happen occur on cluster node
# List Events sorted by timestampkubectl get events --sort-by=.metadata.creationTimestamp# List all warning eventskubectl events --types=Warning
If the status of workload are not available or running, you can use describe for verbose check workload
When the problem does not come up from workload, you can check log for extract more information
# dump pod logs (stdout)kubectl logs my-pod# dump pod logs (stdout) for a previous instantiation of a container. Usually use for crashloopbackkubectl logs my-pod --previous# dump pod container logs (stdout, multi-container case) for a previous instantiation of a containerkubectl logs my-pod -c my-container --previous# stream pod logs (stdout) kubectl logs -f my-pod
If you check any situation on workload, especially pods, container without results, you can return to check resources usage on cluster.
# Show metrics for all nodeskubectl top node # Show metrics for a given nodekubectl top node my-node# For total overview, you resource-capacity plugin# print information includes quantity available instead of percentage usedkubectl resource-capacity -a# print information includes resource utilization, pods in outputkubectl resource-capacity --until -p
kubectl can help you disable or manipulation node with command
# Mark my-node as unschedulablekubectl cordon my-node# Drain my-node in preparation for maintenancekubectl drain my-node# Mark my-node as schedulablekubectl uncordon my-node
Tips
For explore more, you can do lots of things with kubectl. To read and understand command, you should use manual with --help flag
Install CRD
When you want to install extension API for Kubernetes, in usual Kubernetes provides us the standard called CRD (Custom Resources Definitions). But in some situations, you apply the CRD get over the most bytes able for created and cause the error
k apply -f rayjobs-crd.yaml The CustomResourceDefinition "rayjobs.ray.io" is invalid: metadata.annotations: Too long: must have at most 262144 bytes
Do not removing statefulset workload, it will scale down to 0 and not bring up anymore. Instead of just removing pods, It will help the pods restart base on statefulset strategy
Rollout statefulset is not work when status of statefulset is completed
Deleting pods in statefulset will not remove associated volume
Note
Deleting the PVC after the pods have terminated might trigger deletion of the backing Persistent Volumes depending on the storage class and reclaim policy. You should never assume ability to access a volume after claim deletion.
Note: Use caution when deleting a PVC, as it may lead to data loss.
Complete deletion of a StatefulSet
To delete everything in a StatefulSet, including the associated pods, you can run a series of commands similar to the following*
You can see, cronjob is scheduled workload of Kubernetes which trigger on set-time for executing specify job. But sometimes, on during work time, your test job shouldnβt work, therefore you will concert about suspend state of jobs. You can update state with command
k patch -n <namespace> cronjobs.batch <cronjobs-name> -p '{"spec": {"suspend": true}}'
Enable again by change trueβfalse
k patch -n <namespace> cronjobs.batch <cronjobs-name> -p '{"spec": {"suspend": false}}'
Furthermore, you can use patch for multiple purpose
Update a containerβs image
Partially update a node
Disable a deployment livenessProbe using json patch
Update a deploymentβs replica count
Updating resources
You can handle graceful restart, rollback version with roolout command
# Graceful restart deployments, statefulset and deamonsetk rollout restart -n <namespace> <type-workload>/<name># Rollback versionkubectl rollout undo <type-workload>/<name>kubectl rollout undo <type-workload>/<name> --to-revision=2# Check the rollout statuskubectl rollout status -w <type-workload>/<name>
Kubernetes has some values with help to distinguish service with each others, specify identifying attributes of objects, attach arbitrary non-identifying metadata to objects, β¦
Label
Annotations
And you can update that with kubectl via label and anotation command
# Add a Labelkubectl label pods my-pod new-label=awesome# Remove a labelkubectl label pods my-pod new-label- # Overwrite an existing valuekubectl label pods my-pod new-label=new-value --overwrite # Add an annotationkubectl annotate pods my-pod icon-url=http://goo.gl/XXBTWq # Remove annotationkubectl annotate pods my-pod icon-url-
Next, you can update autoscale for deployment by command autoscale
kubectl autoscale deployment foo --min=2 --max=10
Setup metrics-server
Metrics server will part if you self-hosted your kubernetes, It means you need learn how setup metrics-server , and this quite very easily. Read more about metrics-server at
Or you can use helm to release metrics-server chart at helm
# Add repo to your clusterhelm repo add metrics-server https://kubernetes-sigs.github.io/metrics-server/# Create the metrics with find the helm-template inside repohelm upgrade --install metrics-server metrics-server/metrics-server
Troubleshoot
Warning
Your metrics-server will stuck, because it meet problem to not authentication tls inside them with kube-apiserver
But donβt worry about it, you can bypass this via some trick. Read more about solution at
Therefore, the solution will use edit command of kubectl to edit manifest of deployments kube-server, you can do like this
# First of all, you can configure your editor to nano (Optional), you can't do this step if you prefer vimexport KUBE_EDITOR="nano"# Use edit to change manifest of deploymentkubectl edit deployments -n kube-system metrics-server
Next, you need scroll to args of container metrics-server, you can change them into
- args: - --cert-dir=/tmp - --secure-port=10250 - --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname - --kubelet-use-node-status-port - --metric-resolution=15s - --kubelet-insecure-tls=true # This will help you bypass authentication
At the end, your metrics-server will restart and running after 30s
However, if you use helm to deploy your metric-server, you can easier patch it with add more --set option.
Note
Because args is a list, so you need add the --set with Curly Braces {}, this syntax tells Helm that the value is a list (array)
ProbesΒ have a number of fields that you can use to more precisely control the behavior of startup, liveness and readiness checks
Liveness
Info
Liveness probes determine when to restart a container. For example, liveness probes could catch a deadlock, when an application is running, but unable to make progress.
If a container fails its liveness probe repeatedly, the kubelet restarts the container.
You can set up liveness probe with command configuration
Readiness probes determine when a container is ready to start accepting traffic. This is useful when waiting for an application to perform time-consuming initial tasks, such as establishing network connections, loading files, and warming caches.
If the readiness probe returns a failed state, Kubernetes removes the pod from all matching service endpoints.
Configuration for HTTP and TCP readiness probes also remains identical to liveness probes.
Info
Readiness and liveness probes can be used in parallel for the same container. Using both can ensure that traffic does not reach a container that is not ready for it, and that containers are restarted when they fail.
Note
Readiness probes runs on the container during its whole lifecycle.
Startup
Info
A startup probe verifies whether the application within a container is started. This can be used to adopt liveness checks on slow starting containers, avoiding them getting killed by the kubelet before they are up and running.
If such a probe is configured, it disables liveness and readiness checks until it succeeds.
You can use terraform with manifest to apply this configuration
elastic_snapshotter.tf
# https://www.linkedin.com/pulse/elastic-cloud-kubernetes-eck-quickstart-azure-repository-ajay-singh/resource "kubernetes_secret" "azure_snapshot_secret" { metadata { name = "azure-snapshot-secret" namespace = var.namespace } binary_data = { "azure.client.default.account" = base64encode(var.remote_state.backup_storage_account_name) "azure.client.default.key" = base64encode(var.remote_state.backup_storage_account_key) } depends_on = [ helm_release.elastic_operator ]}# Register the Azure snapshot with the Elasticsearch clusterresource "kubectl_manifest" "elasticsearch_register_snapshot" { yaml_body = <<YAMLapiVersion: batch/v1kind: Jobmetadata: name: ${var.name}-register-snapshot namespace: ${var.namespace}spec: template: spec: containers: - name: register-snapshot image: curlimages/curl:latest volumeMounts: - name: es-basic-auth mountPath: /mnt/elastic/es-basic-auth command: - /bin/sh args:# - -x # Can be used to debug the command, but don't use it in production as it will leak secrets. - -c - | curl -s -i -k -u "elastic:$(cat /mnt/elastic/es-basic-auth/elastic)" -X PUT \ 'https://${var.name}-es-http:9200/_snapshot/azure' \ --header 'Content-Type: application/json' \ --data-raw '{ "type": "azure", "settings": { "client": "default" } }' | tee /dev/stderr | grep "200 OK" restartPolicy: Never volumes: - name: es-basic-auth secret: secretName: ${var.name}-es-elastic-userYAML depends_on = [kubectl_manifest.elasticsearch]}# Create the snapshotter cronjob.resource "kubectl_manifest" "elasticsearch_snapshotter" { yaml_body = <<YAMLapiVersion: batch/v1kind: CronJobmetadata: name: ${var.name}-snapshotter namespace: ${var.namespace}spec: schedule: "0 16 * * 0" concurrencyPolicy: Forbid jobTemplate: spec: template: spec: nodeSelector: pool: infrapool containers: - name: snapshotter image: curlimages/curl:latest volumeMounts: - name: es-basic-auth mountPath: /mnt/elastic/es-basic-auth command: - /bin/sh args: - -c - 'curl -s -i -k -u "elastic:$(cat /mnt/elastic/es-basic-auth/elastic)" -XPUT "https://${var.name}-es-http:9200/_snapshot/azure/%3Csnapshot-%7Bnow%7Byyyy-MM-dd%7D%7D%3E" | tee /dev/stderr | grep "200 OK"' restartPolicy: OnFailure volumes: - name: es-basic-auth secret: secretName: ${var.name}-es-elastic-userYAML depends_on = [kubectl_manifest.elasticsearch_register_snapshot]}resource "kubectl_manifest" "elastic_cleanup_snapshots" { yaml_body = <<YAMLapiVersion: batch/v1kind: CronJobmetadata: name: ${var.name}-cleanup-snapshotter namespace: ${var.namespace}spec: schedule: "@daily" ttlSecondsAfterFinished: 86400 backoffLimit: 3 concurrencyPolicy: Forbid jobTemplate: spec: template: spec: nodeSelector: pool: infrapool containers: - name: clean-snapshotter image: debian:11.7 imagePullPolicy: IfNotPresent volumeMounts: - name: es-basic-auth mountPath: /mnt/elastic/es-basic-auth command: - /bin/sh args: - -c - | # Update and install curl package apt update && apt install -y curl # Get the date base on decision which mark to deleting deletionDate=$(date -d "$date -${var.retention_date} days" +%Y-%m-%d) # Get list elasticsearch snapshot with including in deletion date listElasticSnapshots=$(curl --insecure -X GET "https://elastic:$(cat /mnt/elastic/es-basic-auth/elastic)@${var.name}-es-http:9200/_cat/snapshots/azure" | awk '{print $1}' | grep -e "$deletionDate") # Check if list snapshots are null or not if [ "$listElasticSnapshots" = "" ]; then # Ignore deleted snapshots if no snapshots available echo "Not existing your deletion date" exit 0 else # For remove only or multiple snapshot in deletion date for snapshot in $listElasticSnapshots; do res=$(curl -X DELETE --insecure "https://elastic:$(cat /mnt/elastic/es-basic-auth/elastic)@${var.name}-es-http:9200/_snapshot/azure/$snapshot" 2> /dev/null || echo "false") if [ "$res" != "false" ]; then echo "Deleted $snapshot" else echo "Failed to delete $snapshot" fi done fi restartPolicy: OnFailure volumes: - name: es-basic-auth secret: secretName: ${var.name}-es-elastic-userYAML depends_on = [kubectl_manifest.elasticsearch_register_snapshot]}
Assign Pods to Nodes
You have multiple ways to configuration to assign pods to specific nodes depend on a couple of conditions and itβs make you easier for control cluster, such as
If you setup couple of tags for your node, you can try to retrieve that with nodeSelector for selecting where pods be able to spawn into
In the situation, you wanna add more label and supplied it for your deployment, sure you can use kubectl label to handle that
# Add a Labelkubectl label pods my-pod new-label=awesome# Remove a labelkubectl label pods my-pod new-label-# Overwrite an existing valuekubectl label pods my-pod new-label=new-value --overwrite
View that with get command
kubectl get pods --show-labels
You can modify or set nodeSelector for picking node or resource matching with label
# Assumes the existence of the label: node-role.kubernetes.io/master, and tries to assign the pod to the labelled node.---apiVersion: v1kind: Podmetadata: name: pod-node-selector-simplespec: containers: - command: ["sleep", "3600"] image: busybox name: pod-node-selector-simple-container nodeSelector: node-role.kubernetes.io/master: ""
nodeSelectorΒ is the simplest way to constrain Pods to nodes with specific labels. Affinity and anti-affinity expands the types of constraints you can define.
With Node affinity
You will have two types
requiredDuringSchedulingIgnoredDuringExecution: The scheduler canβt schedule the Pod unless the rule is met. This functions likeΒ nodeSelector, but with a more expressive syntax.
preferredDuringSchedulingIgnoredDuringExecution: The scheduler tries to find a node that meets the rule. If a matching node is not available, the scheduler still schedules the Pod.
You can specify node affinities using theΒ .spec.affinity.nodeAffinity
You can use theΒ operatorΒ field to specify a logical operator for Kubernetes to use when interpreting the rules. You can useΒ In,Β NotIn,Β Exists,Β DoesNotExist,Β GtΒ andΒ Lt. Explore more about it at Operators
You can explore more about extend things with affinity
A toleration βmatchesβ a taint if the keys are the same and the effects are the same, and:
theΒ operatorΒ isΒ ExistsΒ (in which case noΒ valueΒ should be specified), or
theΒ operatorΒ isΒ EqualΒ and the values should be equal.
The allowed values for theΒ effectΒ field are:
NoExecute
This affects pods that are already running on the node as follows:
NoSchedule
No new Pods will be scheduled on the tainted node unless they have a matching toleration. Pods currently running on the node areΒ notΒ evicted.
PreferNoSchedule
A βpreferenceβ or βsoftβ version ofΒ NoSchedule. The control plane willΒ tryΒ to avoid placing a Pod that does not tolerate the taint on the node, but it is not guaranteed.
Warning
You can put multiple taints on the same node and multiple tolerations on the same pod. The way Kubernetes processes multiple taints and tolerations is like a filter: start with all of a nodeβs taints, then ignore the ones for which the pod has a matching toleration; the remaining un-ignored taints have the indicated effects on the pod. In particular,
if there is at least one un-ignored taint with effectΒ NoScheduleΒ then Kubernetes will not schedule the pod onto that node
if there is no un-ignored taint with effectΒ NoScheduleΒ but there is at least one un-ignored taint with effectΒ PreferNoScheduleΒ then Kubernetes willΒ tryΒ to not schedule the pod onto the node
if there is at least one un-ignored taint with effectΒ NoExecuteΒ then the pod will be evicted from the node (if it is already running on the node), and will not be scheduled onto the node (if it is not yet running on the node).
If you want to explore use-case and example, find out with
In some situations, if you wanna use external resource, such as minio, you can consider to setup couple method of Kubernetes for permitting us do stuff like NAT Network
When you inspect kubectl command and kubernetes concept, you will know about network structure inside Kubernetes, including
When you work with Kubernetes, you usually meet Service and Ingress for mapping service but stand behind, It use Endpoint for define how the service make conversation with pod, so we can use this endpoint to define external service. Explore more about at Service without selectors
Info
Services most commonly abstract access to Kubernetes Pods thanks to the selector, but when used with a corresponding set ofΒ EndpointSlicesΒ objects and without a selector, the Service can abstract other kinds of backends, including ones that run outside the cluster.
Now you can use service to connect directly into your external service via Kubernetes components, you can do with strategies for setup ingress and map DNS for your external service via ingress controller, such as nginx, haproxy, β¦
If you gracefully update the documentation, you will see Kubernetes that have four service methodology, and one of them is rarely know about it, thatβs ExternalName
ExternalName permits to map service to DNS name, you can imagine if you have database with FQDN, you can try to map your service as DNS for resolve this location, similarly CNAME
Β An External Service pointing toΒ httpbin.org, a simple HTTP request/response service. Itβs a valuable tool for testing and debugging as it can simulate various HTTP responses.
Longhorn maintaining
If you go for double-check longhorn, you should consider to double-check couple of contents
In my experience, I just combine multiple steps from 3 source above and gather this workaround like
Warning
This workaround will only spend for state node with no-disk inside, if the node exist disk and replica, you should follow Harvester - Evicting Replicas From a Disk (the CLI way) to evict all replica for preventing mismatch
Follow the kubectl and daemonset application of longhorn will install again, and your node will be return. If you wanna know about taint, you should read at Kubernetes - Taints and Tolerations
Trick Solution
Quote
Sometimes, the above stuff will not make you feel comfortable, for example, if your node become huge than ever like (200GB Memory Reserve), 100% for sure you donβt wanna touch any in this node for not cause downtime
So thatβs why I have experience and give a try to retrieve this trick, but first of all, if you stuck on the step delete longhorn nodes because in couple of situations, your node will stuck with the validation of longhorn-webhook-validator. So you can follow this solution to ignore that stuck at (BUG) Wrong nodeOrDiskEvicted collected in node monitor
Disable the validator through longhorn-webhook-validator, just need to run edit command with kubectl
Now, if you lucky, your node will be erase following the rule updated, but not you can use kubectl or longhorn-ui to delete disk or not what you want. In my situations, I delete longhorn node stuck with command
kubectl delete nodes.longhorn.io <name-node>
If you wanna turn this node back again, It has trick by deleting pods longhorn-manager in that node
# Find it via -o wide to see what manager running in that nodekubectl get pods -o wide | grep -e "longhorn-manager"# Next if you detected it, you can delete this pod for restarting thiskubectl delete pods <name-longhorn-managers>
Now your node will one more time addition again, itβs will install instance-manager for your longhorn node
Lastly, you should regenerate rule again by deleting pods managed it, via command
kubectl delete pod -l app=longhorn-admission-webhook
Turn back again and you will see your node will be added successfully, if need you should be restart deployment longhorn-driver-deployer for reinstalling driver on this node, but carefully
Once youβve finished testing, you can press Ctrl+D to escape the terminal session in the Pod. Pod will continue running afterwards. You can keep try with command step 4 or delete.
kubectl delete pod xxxx
Note
Usually, curlimages/curl is regular used. Try to create new pod with fast as possible
# Normal commandkubectl run mycurlpod --image=curlimages/curl -i --tty -- sh# Delete pod when finish or exit with `--rm` optionkubectl run mycurlpod --image=curlimages/curl -i --tty --rm -- sh
wbitt/network-multitool - Include multiple tools, also tcpdump or tcptraceroute (Immediately Level)
nicolaka/netshoot - Wide range tools with superb like iptable, tshark, β¦ (Complex Level)
For playground with those one, you just need only spin off one of this pod into Kubernetes Cluster with kubectl command
# If you want to spin up a throw away container for debuggingkubectl run tmp-shell --rm -i --tty --image your_req_image -- /bin/bash# If you want to spin up a container on the host's network namespace.kubectl run tmp-shell --rm -i --tty --overrides='{"spec": {"hostNetwork": true}}' --image your_req_image -- /bin/bash
OOM Killed
Info
OOM (Out of Memory) is one of popular type of error in Kubernetes Cluster, but you know about how many does this error represent ? Letβs take a look below for find more techniques for investigate and resolve your problems
When you double-check, you OOM is not executing at Kubernetes Layer, it means your pod will not restart if meet limit (NOTE: this reallyβs strange, in my case our application run with multiple child process in parent, and itβs only killed child instead parent)
You can use journalctl to double-check this OOM killed
journalctl --utc -ke
This command allow you use journalctl to read information from kernel, and expose it into utc time. First of all, itβs really easier for debugging than using another tool like dmesg (same result)
Now you can see the error about OOM and figure out what happen with your application, if you use RKE2, you can double-check your killed process corresponds to what container
ctr -a /run/k3s/containerd/containerd.sock -n k8s.io containers ls | grep -e "<id-container>"
RKE2 Network DNS Debugging
Question
If youβve already faced or are currently facing - Kubernetes DNS issues, you know they can create incredibly frustrating debugging moments that are far from easy. Consequently, I dedicated two days to learning and resolving the specific problem detailed below. This tutorial outlines precisely how I fixed it. Be sure to take note of this one!
In my experience, when attempting to self-host Kubernetes clusters, specifically on-premise solutions like K3s, RKE2, or other local Kubernetes setups, youβre likely to encounter a specific problem. Your pods might spin up, and components like CoreDNS, CNI, KubeProxy, and Kubelet appear to be functioning perfectly, yet your pods cannot communicate with services to resolve domains.
This issue then cascades, causing significant problems for health-checks, InitContainers, Jobs, Prehooking, and more, leaving you unsure where to even begin troubleshooting. Letβs list a couple of potential reasons, and I will separate into three levels, Rare, Unique and Special
(Rare) Your CoreDNS is in wrong configuration and CoreDNS canβt resolve your service domain with current configuration. These issues linked to
(Special) The Checksum TX is wrong configuration or not able fit with your kernel version in your network Interface, honestly to say if you encounter this problem, that ainβt gonna easy for understanding
(Unique) Firewall is turning on and there are some rules you settle up but make conflict with RKE2 or K3S, including firewalld, ufw or iptables. This one is not simple if you doesnβt understand whatβs going on when turn on, turn off any rules in bunch of this π
Thatβs why you should make the checklist about DNS for your RKE2 or any selfhosted Kubernetes to prevent it
Always check out firewall because Itβs really non sense with your RKE2 and K3s, let them do it for yourself
# Disable Uncomplicated Firewallufw disable
Check your iptables rules and, if possible, decline anything related to UDP/53 traffic. Specifically, if you find an iptables rule referencing IP 10.43.0.10 on Port 53, this could potentially be a risky configuration
# Check IP table rulessudo iptables -L -n -v --line-numbers# If you seen DROP with 53 and 10.43.0.10, e.g# Chain OUTPUT (policy ACCEPT 0 packets, 0 bytes)# num pkts bytes target prot opt in out source destination # 1 0 0 DROP udp -- * * 0.0.0.0/0 10.43.0.10 udp dpt:53 # 2 ...# You should delete this rule with ## Delete by Specificationsudo iptables -D OUTPUT -p udp --dport 53 -d 10.43.0.10 -j DROP## Delete by line numbersudo iptables -D <CHAIN_NAME> <LINE_NUMBER>
Warning
If you use or apply rule with higher firewall, like firewalld or ufw, you should handle the firewall deletion at there firewall instead iptables to prevent conflict
(Option - 1 ), If the firewall doesnβt a target, you should care about the configuration of CoreDNS which actually handle the resolve domain, both external and internal
# Edit the configmap of your CoreDNSkubectl edit cm -n kube-system rke2-coredns-rke2-coredns# Enhance the log and change default forward from# /etc/resolv.conf --> 8.8.8.8.:53 { log # Enable log errors health { lameduck 5s } ready kubernetes cluster.local cluster.local in-addr.arpa ip6.arpa { pods insecure fallthrough in-addr.arpa ip6.arpa ttl 30 } prometheus 0.0.0.0:9153 forward . 8.8.8.8 # Instead for /etc/resolv.conf cache 30 loop reload loadbalance}# Restart your CoreDNSkubectl rollout restart deployment -n kube-system rke2-coredns-rke2-coredns
(Option - 2), Re apply your RKE2 with new version kernel via APT. Read more at Update Ubuntu new version
(Option - 3) The Checksum TX is acutally problem, because I really ware this problem related to Ubuntu Server 20.04 with Calico or Flannel CNI, so itβs up to you for try and test but the temporary solution because when your machine reboot, it will restore to default configuration. In the best way, you can write the script and apply it as systemd in your system like Medium - Resolving Flannel-Related DNS and Metrics Server Issues in RKE2 Kubernetes on Ubuntu 22.04