Kubernetes Cluster Troubleshooting: Diagnostic Workflow and Core Techniques

Diagnostic Workflow: The Three Core Techniques

When troubleshooting issues in a Kubernetes cluster—such as unresponsive nodes, crashing Pods, or network failures—it's essential to move beyond surface-level symptoms. A structured diagnostic approach significantly improves resolution speed and accuracy. The following three techniques form the foundation of effective Kubernetes debuggging: 1. Inspect logs 2. Examine resource details and events 3. Review live resource configurations (YAML/JSON)

If these steps don't yield clarity, advanced tools like kubectl-debug can provide deeper introspection. Once the root cause is identified, targeted remediation becomes straightforward. ### 1. Inspecting Logs

Logs are often the fastest path to understanding why a component or application is failing. #### System Service Logs via journalctl

On systems using systemd (most modern Linux distributions), service logs for components like Docker or kubelet can be accessed with: ``` journalctl -u docker journalctl -u kubelet -f # follow kubelet logs in real time


#### Container Logs via `kubectl logs`

Applications and control plane components (e.g., `kube-apiserver`, `coredns`) run in Pods. Their logs are retrieved using: ```
kubectl logs [-f] [-p] (POD | TYPE/NAME) [-c CONTAINER] [options]

Key options include: - -f: Stream logs continuously

  • -p: Show logs from previously terminated containers
  • --since=24h: Limit to recent logs
  • -l app=mssql: Filter by label selector
  • -n kube-system: Specify namespace (critical for system Pods)

Examples: ``` kubectl logs my-app-pod-7d5b8c9f4-xk2v1 kubectl logs -l app=database --since=2h kubectl logs etcd-control-plane -n kube-system --timestamps


### 2. Examining Resource Details and Events

The `kubectl describe` command reveals resource state, scheduling decisions, and recorded events—often critical for diagnosing issues like unschedulable Pods or NotReady nodes. Syntax: ```
kubectl describe (TYPE [NAME] | TYPE/NAME)

Common uses: ```

Inspect a problematic node

kubectl describe node worker-03

Diagnose a stuck Pod

kubectl describe pod frontend-deployment-6b78c4d5f9-2xklm

View all events across namespaces

kubectl describe pods --all-namespaces


Events shown at the bottom of the output frequent indicate causes such as insufficient CPU/memory, image pull errors, or volume mount failures. ### 3. Reviewing Live Resource Configuration

Misconfigurations are a common source of failure. Use `kubectl get` with output formatting to inspect the actual applied configuration: ```
# View Pod spec in YAML
kubectl get pod my-pod -o yaml

# Export Deployment manifest
kubectl get deploy nginx -o yaml > nginx-deploy.yaml

# Inspect Service definition
kubectl get svc api-gateway -o json

This reveals discrepancies between intended and actual configurations—such as incorrect environment variables, missing volumes, or flawed probes. Advanced Debugging: Interactive Container Inspection

When logs and manifests aren’t sufficient, direct inspection of the runtime environment may be necessary. ### Using kubectl exec

Execute commands inside a running container to inspect files, network state, or processes: ```

Check DNS configuration

kubectl exec my-pod -- cat /etc/resolv.conf

Launch an interactive shell

kubectl exec -it my-pod -- sh

Test connectivity

kubectl exec my-pod -- nslookup kubernetes.default


### Using `kubectl-debug`

For cases where the target container lacks debugging tools (e.g., distroless images) or is crash-looping, `kubectl-debug` injects a sidecar container sharing the same namespaces (PID, network, IPC) as the target. Installation (Linux): ```
curl -Lo kubectl-debug.tar.gz https://github.com/aylei/kubectl-debug/releases/latest/download/kubectl-debug_linux_amd64.tar.gz
tar -xzf kubectl-debug.tar.gz
sudo mv kubectl-debug /usr/local/bin/
kubectl apply -f https://raw.githubusercontent.com/aylei/kubectl-debug/master/scripts/agent_daemonset.yml

Usage: ```

Debug a running Pod

kubectl debug my-pod

Debug a crash-looping Pod by forking

kubectl debug my-pod --fork

Use port-forward if node isn't directly reachable

kubectl debug my-pod --port-forward --daemonset-ns=kube-system --daemonset-name=debug-agent


Once inside, standard tools like `tcpdump`, `netstat`, `strace`, and `curl` become available without modifying the original container. Targeted Remediation Based on Diagnosis
---------------------------------------

Effective fixes depend on accurate diagnosis: - **Pod stuck in `Pending`**: Caused by insufficient resources or taint/toleration mismatches. Solutions: scale cluster, adjust requests/limits, or modify node selectors.
- **Pod in `Waiting` with `ImagePullBackOff`**: Indicates image fetch failure. Verify registry access, use private registry credentials, or switch to a mirror (e.g., replace `k8s.gcr.io` with a local proxy).
- **Pod in `CrashLoopBackOff`**: Application crashes on startup. Check application logs, validate startup scripts, and review liveness/readiness probe thresholds. Temporarily disabling probes can help isolate the issue.

Tags: kubernetes troubleshooting kubectl debugging cluster-management

Posted on Thu, 28 May 2026 20:09:59 +0000 by upnxwood16