Introduction to KubeStateMetrics
Kube-state-metrics is a service that listens to the Kubernetes API server and generates metrics about the state of various objects like deployments, nodes, and pods. It transforms these object states into metrics that can be consumed by Prometheus.
Key capabilities of kube-state-metrics include:
- Collecting node status information such as CPU/memory usage, node conditions, and labels
- Monitoring pod status including container states, image information, and annotations
- Tracking controller status for Deployments, DaemonSets, StatefulSets, and ReplicaSets
- Providing service metrics including type, IP, and port information
- Gathering storage volume metrics about capacity and type
- Monitoring API server status, request rates, and response times
These metrics enable effective cluster monitoring, issue detection, and proactive alerting.
Monitoring Cluster Components
Deploying KubeStateMetrics
The deployment requires several Kubernetes resources: ServiceAccount, ClusterRole, ClusterRoleBinding, Deployment, ConfigMap, and Service.
Setting up RBAC
apiVersion: v1
kind: ServiceAccount
metadata:
name: metrics-collector
namespace: monitoring
labels:
app: metrics-collector
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: metrics-collector
labels:
app: metrics-collector
rules:
- apiGroups: [""]
resources: ["configmaps","secrets","nodes","pods",
"services","resourcequotas",
"replicationcontrollers","limitranges",
"persistentvolumeclaims","persistentvolumes",
"namespaces","endpoints"]
verbs: ["list","watch"]
- apiGroups: ["extensions"]
resources: ["daemonsets","deployments","replicasets"]
verbs: ["list","watch"]
- apiGroups: ["apps"]
resources: ["statefulsets","daemonsets","deployments","replicasets"]
verbs: ["list","watch"]
- apiGroups: ["batch"]
resources: ["cronjobs","jobs"]
verbs: ["list","watch"]
- apiGroups: ["autoscaling"]
resources: ["horizontalpodautoscalers"]
verbs: ["list","watch"]
- apiGroups: ["authentication.k8s.io"]
resources: ["tokenreviews"]
verbs: ["create"]
- apiGroups: ["authorization.k8s.io"]
resources: ["subjectaccessreviews"]
verbs: ["create"]
- apiGroups: ["policy"]
resources: ["poddisruptionbudgets"]
verbs: ["list","watch"]
- apiGroups: ["certificates.k8s.io"]
resources: ["certificatesigningrequests"]
verbs: ["list","watch"]
- apiGroups: ["storage.k8s.io"]
resources: ["storageclasses","volumeattachments"]
verbs: ["list","watch"]
- apiGroups: ["admissionregistration.k8s.io"]
resources: ["mutatingwebhookconfigurations","validatingwebhookconfigurations"]
verbs: ["list","watch"]
- apiGroups: ["networking.k8s.io"]
resources: ["networkpolicies","ingresses"]
verbs: ["list","watch"]
- apiGroups: ["coordination.k8s.io"]
resources: ["leases"]
verbs: ["list","watch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: metrics-collector
labels:
app: metrics-collector
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: metrics-collector
subjects:
- kind: ServiceAccount
name: metrics-collector
namespace: monitoring
Deploying the Service and Deployment
apiVersion: v1
kind: Service
metadata:
name: metrics-collector
namespace: monitoring
labels:
k8s-app: metrics-collector
app.kubernetes.io/name: metrics-collector
spec:
type: ClusterIP
ports:
- name: http-metrics
port: 8080
targetPort: 8080
- name: telemetry
port: 8081
targetPort: 8081
selector:
k8s-app: metrics-collector
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: metrics-collector
namespace: monitoring
labels:
k8s-app: metrics-collector
spec:
replicas: 1
selector:
matchLabels:
k8s-app: metrics-collector
template:
metadata:
labels:
k8s-app: metrics-collector
spec:
serviceAccountName: metrics-collector
containers:
- name: metrics-collector
image: registry.k8s.io/kube-state-metrics:v2.8.2
securityContext:
runAsUser: 65534
ports:
- name: http-metrics
containerPort: 8080
- name: telemetry
containerPort: 8081
resources:
limits:
cpu: 200m
memory: 200Mi
requests:
cpu: 100m
memory: 100Mi
livenessProbe:
httpGet:
path: /healthz
port: 8080
initialDelaySeconds: 5
timeoutSeconds: 5
readinessProbe:
httpGet:
path: /
port: 8081
initialDelaySeconds: 5
timeoutSeconds: 5
Verifying the Deployment
kubectl -n monitoring get pods | grep metrics-collector
curl -kL $(kubectl get service -n monitoring | grep metrics-collector |awk '{ print $3 }'):8080/metrics | tail -20
Monitoring Cluster Components
Add the following configurations to your prometheus-config.yaml to monitor different cluster components:
Monitoring the API Server
When using HTTPS, you'll need TLS configuration. You can either specify a CA certificate path or set insecure_skip_verify: true to skip certificate verification. Additionally, specify the bearer_token_file to avoid authorization errors.
- job_name: kube-apiserver
kubernetes_sd_configs:
- role: endpoints
scheme: https
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
relabel_configs:
- source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name]
action: keep
regex: default;kubernetes
- source_labels: [__meta_kubernetes_endpoints_name]
action: replace
target_label: endpoint
- source_labels: [__meta_kubernetes_pod_name]
action: replace
target_label: pod
- source_labels: [__meta_kubernetes_service_name]
action: replace
target_label: service
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: namespace
Apply the configuratoin:
kubectl apply -f prometheus-configmap.yaml
Import Grafana dashboard template ID 15761 for visualization.
Monitoring the Controller Manager
First, check the controller-manager information:
kubectl describe pod -n kube-system kube-controller-manager-master1
The controller-manager typically binds to 127.0.0.1 by default. Modify the configuration on all master nodes by editing /etc/kubernetes/manifests/kube-controller-manager.yaml:
- command:
- kube-controller-manager
- --bind-address=0.0.0.0 # Change from 127.0.0.1
#- --port=0 # Comment out this line
Add the following configuration to Prometheus:
- job_name: kube-controller-manager
kubernetes_sd_configs:
- role: pod
relabel_configs:
- source_labels: [__meta_kubernetes_pod_label_component]
regex: kube-controller-manager
action: keep
- source_labels: [__meta_kubernetes_pod_ip]
regex: (.+)
target_label: __address__
replacement: ${1}:10252
- source_labels: [__meta_kubernetes_endpoints_name]
action: replace
target_label: endpoint
- source_labels: [__meta_kubernetes_pod_name]
action: replace
target_label: pod
- source_labels: [__meta_kubernetes_service_name]
action: replace
target_label: service
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: namespace
Monitoring the Scheduler
Check the scheduler information:
kubectl describe pod -n kube-system kube-scheduler-master1
Similar to the controller-manager, modify the scheduler configuration on all master nodes by editing /etc/kubernetes/manifests/kube-scheduler.yaml:
- command:
- kube-scheduler
- --bind-address=0.0.0.0 # Change from 127.0.0.1
#- --port=0 # Comment out this line
Add the following configuration to Prometheus:
- job_name: kube-scheduler
kubernetes_sd_configs:
- role: pod
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
relabel_configs:
- source_labels: [__meta_kubernetes_pod_label_component]
regex: kube-scheduler
action: keep
- source_labels: [__meta_kubernetes_pod_ip]
regex: (.+)
target_label: __address__
replacement: ${1}:10251
- source_labels: [__meta_kubernetes_endpoints_name]
action: replace
target_label: endpoint
- source_labels: [__meta_kubernetes_pod_name]
action: replace
target_label: pod
- source_labels: [__meta_kubernetes_service_name]
action: replace
target_label: service
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: namespace
Monitoring Kube-State-Metrics
- job_name: "kube-state-metrics"
kubernetes_sd_configs:
- role: endpoints
namespaces:
names: ["monitoring"]
relabel_configs:
- action: keep
source_labels: [__meta_kubernetes_service_label_app_kubernetes_io_name]
regex: metrics-collector
- action: labelmap
regex: __meta_kubernetes_service_label_(.+)
- source_labels: [__meta_kubernetes_pod_ip]
regex: (.+)
target_label: __address__
replacement: ${1}:8080
- action: replace
source_labels: [__meta_kubernetes_namespace]
target_label: k8s_namespace
- action: replace
source_labels: [__meta_kubernetes_service_name]
target_label: k8s_sname
Monitoring CoreDNS
- job_name: coredns
kubernetes_sd_configs:
- role: endpoints
relabel_configs:
- source_labels:
- __meta_kubernetes_service_label_k8s_app
regex: kube-dns
action: keep
- source_labels: [__meta_kubernetes_pod_ip]
regex: (.+)
target_label: __address__
replacement: ${1}:9153
- source_labels: [__meta_kubernetes_endpoints_name]
action: replace
target_label: endpoint
- source_labels: [__meta_kubernetes_pod_name]
action: replace
target_label: pod
- source_labels: [__meta_kubernetes_service_name]
action: replace
target_label: service
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: namespace
Monitoring etcd
Check the etcd pod details:
kubectl describe pod -n kube-system etcd-master1
Recent etcd versions expose metrics on port 2381 via HTTP. For older versions requiring certificate-based monitoring:
Create a secret with etcd certificates:
kubectl create secret generic etcd-certs --from-file=/etc/kubernetes/pki/etcd/ca.crt --from-file=/etc/kubernetes/pki/etcd/server.crt --from-file=/etc/kubernetes/pki/etcd/server.key -n monitoring
Create Service and Endpoints resources:
apiVersion: v1
kind: Service
metadata:
name: etcd-k8s
namespace: monitoring
labels:
k8s-app: etcd
app.kubernetes.io/name: etcd
spec:
type: ClusterIP
clusterIP: None
ports:
- name: port
port: 2379
protocol: TCP
---
apiVersion: v1
kind: Endpoints
metadata:
name: etcd-k8s
namespace: monitoring
labels:
k8s-app: etcd
subsets:
- addresses:
- ip: 10.10.10.11
- ip: 10.10.10.12
- ip: 10.10.10.13
ports:
- port: 2379
Add the following configuration to Prometheus:
- job_name: "kubernetes-etcd"
scheme: https
tls_config:
ca_file: /certs/ca.crt
cert_file: /certs/server.crt
key_file: /certs/server.key
insecure_skip_verify: false
kubernetes_sd_configs:
- role: endpoints
namespaces:
names: ["monitoring"]
relabel_configs:
- action: keep
source_labels: [__meta_kubernetes_service_label_app_kubernetes_io_name]
regex: etcd
Mount the certificates in your Prometheus deployment:
- name: certs
readOnly: true
mountPath: /certs
volumes:
- name: certs
secret:
secretName: etcd-certs
Apply the changes:
kubectl apply -f prometheus-deployment.yaml
Monitoring Kubelet
Kubelet runs on every node and exposes metrics on port 10250. Add the following configuration to Promehteus:
- job_name: kubelet
metrics_path: /metrics/cadvisor
scheme: https
tls_config:
insecure_skip_verify: true
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
kubernetes_sd_configs:
- role: node
relabel_configs:
- action: labelmap
regex: __meta_kubernetes_node_label_(.+)
- source_labels: [__meta_kubernetes_endpoints_name]
action: replace
target_label: endpoint
- source_labels: [__meta_kubernetes_pod_name]
action: replace
target_label: pod
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: namespace
Apply the configuration:
kubectl apply -f prometheus-configmap.yaml
kubectl apply -f prometheus-deployment.yaml
Monitoring Node Metrics
Node Exporter collects system-level metrics from each node. Deploy it as a DaemonSet:
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: node-exporter
namespace: monitoring
labels:
name: node-exporter
k8s-app: node-exporter
spec:
selector:
matchLabels:
name: node-exporter
template:
metadata:
labels:
name: node-exporter
app: node-exporter
spec:
hostPID: true
hostIPC: true
hostNetwork: true
containers:
- name: node-exporter
image: prom/node-exporter:v1.5.0
ports:
- containerPort: 9100
resources:
requests:
cpu: 0.15
securityContext:
privileged: true
args:
- --path.procfs
- /host/proc
- --path.sysfs
- /host/sys
- --collector.filesystem.ignored-mount-points
- '"^/(sys|proc|dev|host|etc)($|/)"'
volumeMounts:
- name: dev
mountPath: /host/dev
- name: proc
mountPath: /host/proc
- name: sys
mountPath: /host/sys
- name: rootfs
mountPath: /rootfs
tolerations:
- key: "node-role.kubernetes.io/master"
operator: "Exists"
effect: "NoSchedule"
volumes:
- name: proc
hostPath:
path: /proc
- name: dev
hostPath:
path: /dev
- name: sys
hostPath:
path: /sys
- name: rootfs
hostPath:
path: /
Apply the DaemonSet:
kubectl apply -f node-exporter.yaml
Add the following configuration to Prometheus:
- job_name: "kubernetes-nodes"
kubernetes_sd_configs:
- role: node
relabel_configs:
- source_labels: [__address__]
regex: '(.*):10250'
replacement: '${1}:9100'
target_label: __address__
action: replace
- action: labelmap
regex: __meta_kubernetes_node_label_(.+)
- source_labels: [__meta_kubernetes_endpoints_name]
action: replace
target_label: endpoint
- source_labels: [__meta_kubernetes_pod_name]
action: replace
target_label: pod
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: namespace
Apply the configuration:
kubectl apply -f prometheus-configmap.yaml
kubectl apply -f prometheus-deployment.yaml
Container Monitoring with cAdvisor
cAdvisor (Container Advisor) provides container resource usage and performance metrics. It's built into the Kubelet binary, so no separate deployment is needed.
Key features of cAdvisor:
- Collects, aggregates, processes, and exports information about running containers
- Native support for Docker containers with compatibility for other container runtimes
- Automatically integrated with Kubelet in Kubernetes deployments
Add the following configuration to Prometheus to collect cAdvisor metrics:
- job_name: 'k8s-cadvisor'
scheme: https
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
kubernetes_sd_configs:
- role: node
relabel_configs:
- target_label: __address__
replacement: kubernetes.default.svc:443
- source_labels: [__meta_kubernetes_node_name]
regex: (.+)
target_label: __metrics_path__
replacement: /api/v1/nodes/${1}/proxy/metrics/cadvisor
metric_relabel_configs:
- source_labels: [instance]
separator: ;
regex: (.+)
target_label: node
replacement: $1
action: replace
Reload Prometheus to apply the configuration changes:
curl -XPOST http://prometheus.monitoring.svc.cluster.local/-/reload