Prerequisites
Ensure NVIDIA drivers are installed on each node before proceeding.
Step 1: Install NVIDIA Container Runtime
Install the nvidia-container-runtime package on each node:
yum install nvidia-container-runtime
Step 2: Configure Docker
Edit /etc/docker/daemon.json to configure Docker to use the NVIDIA runtime:
{
"default-runtime": "nvidia",
"runtimes": {
"nvidia": {
"path": "/usr/bin/nvidia-container-runtime",
"runtimeArgs": []
}
}
}
Reload the Docker daemon and restart the service:
systemctl daemon-reload
systemctl restart docker
Step 3: Deploy NVIDIA Device Plugin
Create a DaemonSet manifest named nvidia-device-plugin.yaml:
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: nvidia-device-plugin-daemonset
namespace: kube-system
spec:
selector:
matchLabels:
name: nvidia-device-plugin-ds
updateStrategy:
type: RollingUpdate
template:
metadata:
annotations:
scheduler.alpha.kubernetes.io/critical-pod: ""
labels:
name: nvidia-device-plugin-ds
spec:
tolerations:
- key: CriticalAddonsOnly
operator: Exists
- key: nvidia.com/gpu
operator: Exists
effect: NoSchedule
priorityClassName: "system-node-critical"
containers:
- image: nvidia/k8s-device-plugin:1.11
name: nvidia-device-plugin-ctr
securityContext:
allowPrivilegeEscalation: false
capabilities:
drop: ["ALL"]
volumeMounts:
- name: device-plugin
mountPath: /var/lib/kubelet/device-plugins
volumes:
- name: device-plugin
hostPath:
path: /var/lib/kubelet/device-plugins
Apply the cnofiguration:
kubectl create -f nvidia-device-plugin.yaml
Verification
After deployment, verify the DaemonSet is running:
kubectl get daemonset -n kube-system -l name=nvidia-device-plugin-ds
Nodes with GPUs will now expose the nvidia.com/gpu resource, allowing pods to request GPU access via resource limits:
resources:
limits:
nvidia.com/gpu: 1
The scheduler will automatically distribute GPU workloads across available nodes.