To find more information and example, you can double-check a some manifest collection at
Can use volume with cronjobs
Purpose
This note will content the thing which finding on working progress with K8s. Just take note and link for resolving the problem. Find out detail if it has unique directory
Cronjobs --> Create Jobs (Trigger by scheduled) --> Pod
: In this situation, Pod in K8s can used the volume and mount succeed when the script running. But if you applied it with pods, it will not, your command
will run faster than mount progress. Checked it in this link
apiVersion: batch/v1beta1
kind: CronJob
metadata:
name: update-db
spec:
schedule: "*/1 * * * *"
jobTemplate:
spec:
template:
spec:
containers:
- name: update-fingerprints
image: python:3.6.2-slim
command: ["/bin/bash"]
args: ["-c", "python /client/test.py"]
volumeMounts:
- name: application-code
mountPath: /where/ever
restartPolicy: OnFailure
volumes:
- name: application-code
persistentVolumeClaim:
claimName: application-code-pv-claim
Do Kubernetes Pods Really Get Evicted Due to CPU Pressure?
Reference article: Do Kubernetes Pods Really Get Evicted Due to CPU Pressure?
Tip
Pods are not directly evicted due to high CPU pressure or usage alone. Instead, Kubernetes relies on CPU throttling mechanisms to manage and limit a podβs CPU usage, ensuring fair resource sharing among pods on the same node.
While high CPU usage by a pod can indirectly contribute to resource pressure and potentially lead to eviction due to memory or other resource shortages, CPU throttling is the primary mechanism used to manage CPU-intensive workloads
Restart Statefulset
workload
Related link
Notice
- Do not removing
statefulset
workload, it will scale down to 0 and not bring up anymore. Instead of just removing pods, It will help the pods restart base onstatefulset
strategy - Rollout
statefulset
is not work when status ofstatefulset
iscompleted
- Deleting pods in
statefulset
will not remove associated volume
Note
Deleting the PVC after the pods have terminated might trigger deletion of the backing Persistent Volumes depending on the storage class and reclaim policy. You should never assume ability to access a volume after claim deletion.
Note: Use caution when deleting a PVC, as it may lead to data loss.
- Complete deletion of a
StatefulSet
To delete everything in a StatefulSet
, including the associated pods, you can run a series of commands similar to the following*
grace=$(kubectl get pods <stateful-set-pod> --template '{{.spec.terminationGracePeriodSeconds}}')
kubectl delete statefulset -l app.kubernetes.io/name=MyApp
sleep $grace
kubectl delete pvc -l app.kubernetes.io/name=MyApp
Create troubleshoot pods
You can create stateless
pods with no deployments for purpose
- Check and validate the networking in node, cluster like DNS resolve, health check
- Restore and Backup DB
- Debug or access service internal
For doing that, you need to use kubectl
- Use
kubectl
for create manifest of pod
k run <name-pod> --image=debian:11.7 --dry-run=client -o yaml > pods.yaml
apiVersion: v1
kind: Pod
metadata:
creationTimestamp: null
labels:
run: <name-pod>
name: <name-pod>
spec:
containers:
- image: debian:11.7
name: <name-pod>
resources: {}
dnsPolicy: ClusterFirst
restartPolicy: Always
status: {}
- Customize your pods, for keep alive, you should set command of pod to
tail -f /dev/null
apiVersion: v1
kind: Pod
metadata:
creationTimestamp: null
labels:
run: <name-pod>
name: <name-pod>
spec:
containers:
- image: debian:11.7
name: <name-pod>
resources: {}
# Another command: sleep 3600
command:
- tail
- -f
- /dev/null
dnsPolicy: ClusterFirst
restartPolicy: Always
status: {}
- Run
apply
command with manifest
k apply -f pods.yaml
- Wait few second, exec to the pods with command
k exec --tty --stdin pods/xxxx -- /bin/bash
- Once youβve finished testing, you can press Ctrl+D to escape the terminal session in the Pod. Pod will continue running afterwards. You can keep try with command step 4 or delete.
kubectl delete pod xxxx
NOTE: Usually, curlimages/curl
is regular used. Try to create new pod with fast as possible
kubectl run mycurlpod --image=curlimages/curl -i --tty -- sh
Stop or run the Cronjob with patch
You can see, cronjob
is scheduled workload of Kubernetes
which trigger on set-time for executing specify job. But sometimes, on during work time, your test job shouldnβt work, therefore you will concert about suspend state of jobs. You can update state with command
k patch -n <namespace> cronjobs.batch <cronjobs-name> -p '{"spec": {"suspend": true}}'
Enable again by change true
β false
k patch -n <namespace> cronjobs.batch <cronjobs-name> -p '{"spec": {"suspend": false}}'
Furthermore, you can use patch
for multiple purpose
- Update a containerβs image
- Partially update a node
- Disable a deployment livenessProbe using json patch
- Update a deploymentβs replica count
Updating resources
You can handle graceful restart, rollback version with roolout
command
# Graceful restart deployments, statefulset and deamonset
k rollout restart -n <namespace> <type-workload>/<name>
# Rollback version
kubectl rollout undo <type-workload>/<name>
kubectl rollout undo <type-workload>/<name> --to-revision=2
# Check the rollout status
kubectl rollout status -w <type-workload>/<name>
Kubernetes has some values with help to distinguish service with each others, specify identifying attributes of objects, attach arbitrary non-identifying metadata to objects, β¦
- Label
- Annotations
And you can update that with kubectl
via label
and anotation
command
# Add a Label
kubectl label pods my-pod new-label=awesome
# Remove a label
kubectl label pods my-pod new-label-
# Overwrite an existing value
kubectl label pods my-pod new-label=new-value --overwrite
# Add an annotation
kubectl annotate pods my-pod icon-url=http://goo.gl/XXBTWq
# Remove annotation
kubectl annotate pods my-pod icon-url-
Next, you can update autoscale for deployment by command autoscale
kubectl autoscale deployment foo --min=2 --max=10
Edit YAML manifest
kubectl
can help you directly change manifest on your shell. If you Linux
or macos
user, you can use nano
or vim
to use feature
# Edit the service named docker-registry
kubectl edit svc/docker-registry
# Use an alternative editor
KUBE_EDITOR="nano" kubectl edit svc/docker-registry
When you hit to complete button, your workload or resource will change immediately
Delete resource
Use the delete
command for executing
# Delete a pod using the type and name specified in pod.json
kubectl delete -f ./pod.json
# Delete a pod with no grace period
kubectl delete pod unwanted --now
kubectl delete pods <pod> --grace-period=0
# Delete pods and services with same names "baz" and "foo"
kubectl delete pod,service baz foo
Health check and interact with cluster, node and workload
Use the events
command for detect what happen occur on cluster node
# List Events sorted by timestamp
kubectl get events --sort-by=.metadata.creationTimestamp
# List all warning events
kubectl events --types=Warning
If the status of workload are not available
or running
, you can use describe
for verbose check workload
# Describe commands with verbose output
kubectl describe nodes my-node
kubectl describe pods my-pod
When the problem does not come up from workload, you can check log
for extract more information
# dump pod logs (stdout)
kubectl logs my-pod
# dump pod logs (stdout) for a previous instantiation of a container. Usually use for crashloopback
kubectl logs my-pod --previous
# dump pod container logs (stdout, multi-container case) for a previous instantiation of a container
kubectl logs my-pod -c my-container --previous
# stream pod logs (stdout)
kubectl logs -f my-pod
If you check any situation on workload, especially pods, container without results, you can return to check resources usage on cluster.
# Show metrics for all nodes
kubectl top node
# Show metrics for a given node
kubectl top node my-node
# For total overview, you resource-capacity plugin
# print information includes quantity available instead of percentage used
kubectl resource-capacity -a
# print information includes resource utilization, pods in output
kubectl resource-capacity --until -p
kubectl
can help you disable or manipulation node with command
# Mark my-node as unschedulable
kubectl cordon my-node
# Drain my-node in preparation for maintenance
kubectl drain my-node
# Mark my-node as schedulable
kubectl uncordon my-node
Tips
For explore more, you can do lots of things with
kubectl
. To read and understand command, you should use manual with--help
flag
Setup metrics-server
Metrics server will part if you self-hosted your kubernetes
, It means you need learn how setup metrics-server
, and this quite very easily. Read more about metrics-server
at metrics-server
Via kubectl
you can applied manifest
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml
Or you can use helm
to release metrics-server
chart at helm
# Add repo to your cluster
helm repo add metrics-server https://kubernetes-sigs.github.io/metrics-server/
# Create the metrics with find the helm-template inside repo
helm upgrade --install metrics-server metrics-server/metrics-server
Warning
Your
metrics-server
will stuck, because it meet problem to not authenticationtls
inside them withkube-apiserver
But donβt worry about it, you can bypass this via some trick. Read more about solution at
So solution about using edit
command of kubectl
to edit manifest of deployments kube-server
, you can do like this
# First of all, you can configure your editor to nano (Optional), you can't do this step if you prefer vim
export KUBE_EDITOR="nano"
# Use edit to change manifest of deployment
kubectl edit deployments -n kube-system metrics-server
Now scroll to args
of container metrics-server
, you can change them into
- args:
- --cert-dir=/tmp
- --secure-port=10250
- --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname
- --kubelet-use-node-status-port
- --metric-resolution=15s
- --kubelet-insecure-tls=true # This will help you bypass authentication
And now your metrics-server
will restart and running after 30s
Learn more about kubernetes
metrics, read the article Kubernetesβ Native Metrics and States
Configure Liveness, Readiness and Startup Probes
Kubernetes implement multiple probles type for health check your applications. See more at Liveness, Readiness and Startup Probes
If you want to learn about configuration, use this documentation
Tip
ProbesΒ have a number of fields that you can use to more precisely control the behavior of startup, liveness and readiness checks
Liveness
Info
Liveness probes determine when to restart a container. For example, liveness probes could catch a deadlock, when an application is running, but unable to make progress.
If a container fails its liveness probe repeatedly, the kubelet restarts the container.
You can set up liveness
probe with command configuration
apiVersion: v1
kind: Pod
metadata:
labels:
test: liveness
name: liveness-exec
spec:
containers:
- name: liveness
image: registry.k8s.io/busybox
args:
- /bin/sh
- -c
- touch /tmp/healthy; sleep 30; rm -f /tmp/healthy; sleep 600
livenessProbe:
exec:
command:
- cat
- /tmp/healthy
initialDelaySeconds: 5
periodSeconds: 5
Or use can use liveness
probe with HTTP request configuration
spec:
containers:
livenessProbe:
httpGet:
path: /healthz
port: 8080
httpHeaders:
- name: Custom-Header
value: Awesome
initialDelaySeconds: 3
periodSeconds: 3
You can use another protocol with liveness
, such as
Readiness
Info
Readiness probes determine when a container is ready to start accepting traffic. This is useful when waiting for an application to perform time-consuming initial tasks, such as establishing network connections, loading files, and warming caches.
If the readiness probe returns a failed state, Kubernetes removes the pod from all matching service endpoints.
You can try configure readiness
proble with
readinessProbe:
exec:
command:
- cat
- /tmp/healthy
initialDelaySeconds: 5
periodSeconds: 5
Configuration for HTTP and TCP readiness probes also remains identical to liveness probes.
Info
Readiness and liveness probes can be used in parallel for the same container. Using both can ensure that traffic does not reach a container that is not ready for it, and that containers are restarted when they fail.
Note
Readiness probes runs on the container during its whole lifecycle.
Startup
Info
A startup probe verifies whether the application within a container is started. This can be used to adopt liveness checks on slow starting containers, avoiding them getting killed by the kubelet before they are up and running.
If such a probe is configured, it disables liveness and readiness checks until it succeeds.
You can configure for you pod with configuration
livenessProbe:
httpGet:
path: /healthz
port: liveness-port
failureThreshold: 1
periodSeconds: 10
And mostly startup for helping Kubernetes toΒ protect slow starting containers
Note
This type of probe is only executed at startup, unlike readiness probes, which are run periodically
Setup SnapShotter for Elasticsearch
Following this documentation about snapshot with elasticsearch
for Azure Cloud, explore at # Elastic Cloud on Kubernetes (ECK) Quickstart with Azure Kubernetes Service,Istio and Azure Repository plugin
You can use terraform
with manifest
to apply this configuration
# https://www.linkedin.com/pulse/elastic-cloud-kubernetes-eck-quickstart-azure-repository-ajay-singh/
resource "kubernetes_secret" "azure_snapshot_secret" {
metadata {
name = "azure-snapshot-secret"
namespace = var.namespace
}
binary_data = {
"azure.client.default.account" = base64encode(var.remote_state.backup_storage_account_name)
"azure.client.default.key" = base64encode(var.remote_state.backup_storage_account_key)
}
depends_on = [
helm_release.elastic_operator
]
}
# Register the Azure snapshot with the Elasticsearch cluster
resource "kubectl_manifest" "elasticsearch_register_snapshot" {
yaml_body = <<YAML
apiVersion: batch/v1
kind: Job
metadata:
name: ${var.name}-register-snapshot
namespace: ${var.namespace}
spec:
template:
spec:
containers:
- name: register-snapshot
image: curlimages/curl:latest
volumeMounts:
- name: es-basic-auth
mountPath: /mnt/elastic/es-basic-auth
command:
- /bin/sh
args:
# - -x # Can be used to debug the command, but don't use it in production as it will leak secrets.
- -c
- |
curl -s -i -k -u "elastic:$(cat /mnt/elastic/es-basic-auth/elastic)" -X PUT \
'https://${var.name}-es-http:9200/_snapshot/azure' \
--header 'Content-Type: application/json' \
--data-raw '{
"type": "azure",
"settings": {
"client": "default"
}
}' | tee /dev/stderr | grep "200 OK"
restartPolicy: Never
volumes:
- name: es-basic-auth
secret:
secretName: ${var.name}-es-elastic-user
YAML
depends_on = [kubectl_manifest.elasticsearch]
}
# Create the snapshotter cronjob.
resource "kubectl_manifest" "elasticsearch_snapshotter" {
yaml_body = <<YAML
apiVersion: batch/v1
kind: CronJob
metadata:
name: ${var.name}-snapshotter
namespace: ${var.namespace}
spec:
schedule: "0 16 * * 0"
concurrencyPolicy: Forbid
jobTemplate:
spec:
template:
spec:
nodeSelector:
pool: infrapool
containers:
- name: snapshotter
image: curlimages/curl:latest
volumeMounts:
- name: es-basic-auth
mountPath: /mnt/elastic/es-basic-auth
command:
- /bin/sh
args:
- -c
- 'curl -s -i -k -u "elastic:$(cat /mnt/elastic/es-basic-auth/elastic)" -XPUT "https://${var.name}-es-http:9200/_snapshot/azure/%3Csnapshot-%7Bnow%7Byyyy-MM-dd%7D%7D%3E" | tee /dev/stderr | grep "200 OK"'
restartPolicy: OnFailure
volumes:
- name: es-basic-auth
secret:
secretName: ${var.name}-es-elastic-user
YAML
depends_on = [kubectl_manifest.elasticsearch_register_snapshot]
}
resource "kubectl_manifest" "elastic_cleanup_snapshots" {
yaml_body = <<YAML
apiVersion: batch/v1
kind: CronJob
metadata:
name: ${var.name}-cleanup-snapshotter
namespace: ${var.namespace}
spec:
schedule: "@daily"
ttlSecondsAfterFinished: 86400
backoffLimit: 3
concurrencyPolicy: Forbid
jobTemplate:
spec:
template:
spec:
nodeSelector:
pool: infrapool
containers:
- name: clean-snapshotter
image: debian:11.7
imagePullPolicy: IfNotPresent
volumeMounts:
- name: es-basic-auth
mountPath: /mnt/elastic/es-basic-auth
command:
- /bin/sh
args:
- -c
- |
# Update and install curl package
apt update && apt install -y curl
# Get the date base on decision which mark to deleting
deletionDate=$(date -d "$date -${var.retention_date} days" +%Y-%m-%d)
# Get list elasticsearch snapshot with including in deletion date
listElasticSnapshots=$(curl --insecure -X GET "https://elastic:$(cat /mnt/elastic/es-basic-auth/elastic)@${var.name}-es-http:9200/_cat/snapshots/azure" | awk '{print $1}' | grep -e "$deletionDate")
# Check if list snapshots are null or not
if [ "$listElasticSnapshots" = "" ]; then
# Ignore deleted snapshots if no snapshots available
echo "Not existing your deletion date"
exit 0
else
# For remove only or multiple snapshot in deletion date
for snapshot in $listElasticSnapshots;
do
res=$(curl -X DELETE --insecure "https://elastic:$(cat /mnt/elastic/es-basic-auth/elastic)@${var.name}-es-http:9200/_snapshot/azure/$snapshot" 2> /dev/null || echo "false")
if [ "$res" != "false" ]; then
echo "Deleted $snapshot"
else
echo "Failed to delete $snapshot"
fi
done
fi
restartPolicy: OnFailure
volumes:
- name: es-basic-auth
secret:
secretName: ${var.name}-es-elastic-user
YAML
depends_on = [kubectl_manifest.elasticsearch_register_snapshot]
}
Maintain Node in Kubernetes
Following this article Linkedin - Node Maintenance Commands In Kubernetes, we can catch up with how to maintain once of node inside your Kubernetes cluster
You will use two command to execute this workflow
- kubectl drain: Command safely evicts all the pods from a node before you perform any maintenance operation on it
- kubectl cordon: Command marks a node as unschedulable, which means that no new pods will be scheduled on that node
A workflow would be
- Run kubectl cordon node-name to mark the node as unschedulable.
- Run kubectl cordon node-name to mark the node as unschedulable.
- Perform your maintenance tasks on the node.
- Run kubectl uncordon node-name to mark the node as schedulable again.
In advantage, you can do some sort of configuration for best practice
- Configure a disruption budget. Explore at PodDisruptionBudget and how to Β configure a PodDisruptionBudgets
- You also use API provider to eviction your workload. Explore at API-initiated eviction.
- Learn and do practice in case you want to update your node. Explore at Upgrading kubeadm clusters
- In fun way, you can use operator inside Kubernetes cluster via API System used CRD. Explore at node-maintenance-operator
Assign Pods to Nodes
You have multiple ways to configuration to assign pods to specific nodes depend on a couple of conditions and itβs make you easier for control cluster, such as
Use Node Label and pick it up with nodeSelector
Explore at: nodeSelectorΒ field matching againstΒ node labels
If you setup couple of tags for your node, you can try to retrieve that with nodeSelector
for selecting where pods
be able to spawn into
In the situation, you wanna add more label and supplied it for your deployment, sure you can use kubectl label
to handle that
# Add a Label
kubectl label pods my-pod new-label=awesome
# Remove a label
kubectl label pods my-pod new-label-
# Overwrite an existing value
kubectl label pods my-pod new-label=new-value --overwrite
View that with get
command
kubectl get pods --show-labels
You can modify or set nodeSelector
for picking node or resource matching with label
# Assumes the existence of the label: node-role.kubernetes.io/master, and tries to assign the pod to the labelled node.
---
apiVersion: v1
kind: Pod
metadata:
name: pod-node-selector-simple
spec:
containers:
- command: ["sleep", "3600"]
image: busybox
name: pod-node-selector-simple-container
nodeSelector:
node-role.kubernetes.io/master: ""
Use affinity
and anti-affinity
Documentation: Affinity and anti-affinity
Info
nodeSelector
Β is the simplest way to constrain Pods to nodes with specific labels. Affinity and anti-affinity expands the types of constraints you can define.
With Node affinity
You will have two types
requiredDuringSchedulingIgnoredDuringExecution
: The scheduler canβt schedule the Pod unless the rule is met. This functions likeΒnodeSelector
, but with a more expressive syntax.preferredDuringSchedulingIgnoredDuringExecution
: The scheduler tries to find a node that meets the rule. If a matching node is not available, the scheduler still schedules the Pod.
You can specify node affinities using theΒ .spec.affinity.nodeAffinity
apiVersion: v1
kind: Pod
metadata:
name: with-node-affinity
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: topology.kubernetes.io/zone
operator: In
values:
- antarctica-east1
- antarctica-west1
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 1
preference:
matchExpressions:
- key: another-node-label-key
operator: In
values:
- another-node-label-value
containers:
- name: with-node-affinity
image: registry.k8s.io/pause:3.8
Info
You can use theΒ
operator
Β field to specify a logical operator for Kubernetes to use when interpreting the rules. You can useΒIn
,ΒNotIn
,ΒExists
,ΒDoesNotExist
,ΒGt
Β andΒLt
. Explore more about it at Operators
You can explore more about extend things with affinity
- Node affinity weight
- Node affinity per scheduling profile
- Inter-pod affinity and anti-affinity
- matchLabelKeys and mismatchLabelKeys
- More practical use-cases
Learn more with some article
- Quan Huynh - DevOpsVN - Kubernetes Series - BΓ i 18 - Advanced scheduling: node affinity and pod affinity
- StackState - Mastering Node Affinity in Kubernetes
Use taint
and tolerration
Documentation: Taints and Tolerations
One more way to configuration schedule is use taint
and tolerration
, opposite with affnity
because taint
used for repel a set of pods out of node
But you can use tolerration
for bypass to schedule workload into pod match with that taint
For example, you try to taint
node like
# Add taint
kubectl taint nodes node1 key1=value1:NoSchedule
# Remove taint
kubectl taint nodes node1 key1=value1:NoSchedule-
For deploy your workload into node with taint
, you can use tolerration
and set it for matching with taint
configuration for example
apiVersion: v1
kind: Pod
metadata:
name: nginx
labels:
env: test
spec:
containers:
- name: nginx
image: nginx
imagePullPolicy: IfNotPresent
# Configuration
tolerations:
- key: "key1"
operator: "Equal"
value: "value1"
effect: "NoSchedule"
# Another way
# tolerations:
# - key: "key1"
# operator: "Exists"
# effect: "NoSchedule"
Note
The default value forΒ
operator
Β isΒEqual
.
A toleration βmatchesβ a taint if the keys are the same and the effects are the same, and:
- theΒ
operator
Β isΒExists
Β (in which case noΒvalue
Β should be specified), or - theΒ
operator
Β isΒEqual
Β and the values should be equal.
The allowed values for theΒ effect
Β field are:
NoExecute
This affects pods that are already running on the node as follows:
NoSchedule
No new Pods will be scheduled on the tainted node unless they have a matching toleration. Pods currently running on the node areΒ notΒ evicted.
PreferNoSchedule
A βpreferenceβ or βsoftβ version ofΒ NoSchedule
. The control plane willΒ tryΒ to avoid placing a Pod that does not tolerate the taint on the node, but it is not guaranteed.
Warning
You can put multiple taints on the same node and multiple tolerations on the same pod. The way Kubernetes processes multiple taints and tolerations is like a filter: start with all of a nodeβs taints, then ignore the ones for which the pod has a matching toleration; the remaining un-ignored taints have the indicated effects on the pod. In particular,
- if there is at least one un-ignored taint with effectΒ
NoSchedule
Β then Kubernetes will not schedule the pod onto that node- if there is no un-ignored taint with effectΒ
NoSchedule
Β but there is at least one un-ignored taint with effectΒPreferNoSchedule
Β then Kubernetes willΒ tryΒ to not schedule the pod onto the node- if there is at least one un-ignored taint with effectΒ
NoExecute
Β then the pod will be evicted from the node (if it is already running on the node), and will not be scheduled onto the node (if it is not yet running on the node).
If you want to explore use-case and example, find out with