Diagnostic Workflow: The Three Core Techniques
When troubleshooting issues in a Kubernetes cluster—such as unresponsive nodes, crashing Pods, or network failures—it's essential to move beyond surface-level symptoms. A structured diagnostic approach significantly improves resolution speed and accuracy. The following three techniques form the foundation of effective Kubernetes debuggging: 1. Inspect logs 2. Examine resource details and events 3. Review live resource configurations (YAML/JSON)
If these steps don't yield clarity, advanced tools like kubectl-debug can provide deeper introspection. Once the root cause is identified, targeted remediation becomes straightforward. ### 1. Inspecting Logs
Logs are often the fastest path to understanding why a component or application is failing. #### System Service Logs via journalctl
On systems using systemd (most modern Linux distributions), service logs for components like Docker or kubelet can be accessed with: ``` journalctl -u docker journalctl -u kubelet -f # follow kubelet logs in real time
#### Container Logs via `kubectl logs`
Applications and control plane components (e.g., `kube-apiserver`, `coredns`) run in Pods. Their logs are retrieved using: ```
kubectl logs [-f] [-p] (POD | TYPE/NAME) [-c CONTAINER] [options]
Key options include: - -f: Stream logs continuously
-p: Show logs from previously terminated containers--since=24h: Limit to recent logs-l app=mssql: Filter by label selector-n kube-system: Specify namespace (critical for system Pods)
Examples: ``` kubectl logs my-app-pod-7d5b8c9f4-xk2v1 kubectl logs -l app=database --since=2h kubectl logs etcd-control-plane -n kube-system --timestamps
### 2. Examining Resource Details and Events
The `kubectl describe` command reveals resource state, scheduling decisions, and recorded events—often critical for diagnosing issues like unschedulable Pods or NotReady nodes. Syntax: ```
kubectl describe (TYPE [NAME] | TYPE/NAME)
Common uses: ```
Inspect a problematic node
kubectl describe node worker-03
Diagnose a stuck Pod
kubectl describe pod frontend-deployment-6b78c4d5f9-2xklm
View all events across namespaces
kubectl describe pods --all-namespaces
Events shown at the bottom of the output frequent indicate causes such as insufficient CPU/memory, image pull errors, or volume mount failures. ### 3. Reviewing Live Resource Configuration
Misconfigurations are a common source of failure. Use `kubectl get` with output formatting to inspect the actual applied configuration: ```
# View Pod spec in YAML
kubectl get pod my-pod -o yaml
# Export Deployment manifest
kubectl get deploy nginx -o yaml > nginx-deploy.yaml
# Inspect Service definition
kubectl get svc api-gateway -o json
This reveals discrepancies between intended and actual configurations—such as incorrect environment variables, missing volumes, or flawed probes. Advanced Debugging: Interactive Container Inspection
When logs and manifests aren’t sufficient, direct inspection of the runtime environment may be necessary. ### Using kubectl exec
Execute commands inside a running container to inspect files, network state, or processes: ```
Check DNS configuration
kubectl exec my-pod -- cat /etc/resolv.conf
Launch an interactive shell
kubectl exec -it my-pod -- sh
Test connectivity
kubectl exec my-pod -- nslookup kubernetes.default
### Using `kubectl-debug`
For cases where the target container lacks debugging tools (e.g., distroless images) or is crash-looping, `kubectl-debug` injects a sidecar container sharing the same namespaces (PID, network, IPC) as the target. Installation (Linux): ```
curl -Lo kubectl-debug.tar.gz https://github.com/aylei/kubectl-debug/releases/latest/download/kubectl-debug_linux_amd64.tar.gz
tar -xzf kubectl-debug.tar.gz
sudo mv kubectl-debug /usr/local/bin/
kubectl apply -f https://raw.githubusercontent.com/aylei/kubectl-debug/master/scripts/agent_daemonset.yml
Usage: ```
Debug a running Pod
kubectl debug my-pod
Debug a crash-looping Pod by forking
kubectl debug my-pod --fork
Use port-forward if node isn't directly reachable
kubectl debug my-pod --port-forward --daemonset-ns=kube-system --daemonset-name=debug-agent
Once inside, standard tools like `tcpdump`, `netstat`, `strace`, and `curl` become available without modifying the original container. Targeted Remediation Based on Diagnosis
---------------------------------------
Effective fixes depend on accurate diagnosis: - **Pod stuck in `Pending`**: Caused by insufficient resources or taint/toleration mismatches. Solutions: scale cluster, adjust requests/limits, or modify node selectors.
- **Pod in `Waiting` with `ImagePullBackOff`**: Indicates image fetch failure. Verify registry access, use private registry credentials, or switch to a mirror (e.g., replace `k8s.gcr.io` with a local proxy).
- **Pod in `CrashLoopBackOff`**: Application crashes on startup. Check application logs, validate startup scripts, and review liveness/readiness probe thresholds. Temporarily disabling probes can help isolate the issue.