Resolving Network and Runtime Constraints for Kubernetes and KubeSphere on Cloud VMs

Public cloud virtual machines typically route traffic through NAT gateways, which disrupts Kubernetes control plane and etcd peer communication that expects direct layer-3 connectivity. Deploying a cluster on low-spec instances requires explicit runtime configuration, network address translation rules, and careful component selection.

Container Runtime Endpoint and Registry Configuraton

Image pull failures during cluster bootstrap often stem from crictl attempting to connect to deprecated Docker shim sockets. Configure the container runtime interface to target the active containerd socket explicitly.

# /etc/crictl.yaml
runtime-endpoint: unix:///run/containerd/containerd.sock
image-endpoint: unix:///run/containerd/containerd.sock
timeout: 10
debug: false

If containerd fails to manage cgroups correctly, adjust its daemon configuration before restarting the service:

# /etc/containerd/config.toml
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc]
  runtime_type = "io.containerd.runc.v2"
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options]
  SystemdCgroup = true
systemctl daemon-reload && systemctl restart containerd

Registry timeouts during image synchronization can be mitigated by configuring mirror endpoints in the containerd hosts directory or Docker daemon configuration, depending on the chosen runtime.

Etcd Binding and Cloud NAT Routing

Etcd clusters fail to form when nodes advertise public IP addresses that are not locally bound. Cloud providers map public IPs via 1:1 NAT, causing cluster ID mismatch or connection refused errors. Override the default etcd static pod manifest to bind strictly to loopback and private interfaces.

Edit /etc/kubernetes/manifests/etcd.yaml on the control plane node:

containers:
- command:
  - etcd
  - --listen-client-urls=https://127.0.0.1:2379
  - --advertise-client-urls=https://10.0.1.10:2379
  - --listen-peer-urls=https://127.0.0.1:2380
  - --initial-advertise-peer-urls=https://10.0.1.10:2380

To ensure cross-node pod traffic routes correctly through the cloud NAT, implement destination NAT rules and override Flannel's external IP annotation:

# Route internal cluster traffic destined for other nodes through their public IPs
iptables -t nat -A OUTPUT -d 10.0.1.15 -j DNAT --to-destination 203.0.113.15
iptables -t nat -A OUTPUT -d 10.0.1.16 -j DNAT --to-destination 203.0.113.16

# Update Flannel annotations to use public IPs for VXLAN routing
kubectl annotate node worker-01 flannel.alpha.coreos.com/public-ip-overwrite=203.0.113.15 --overwrite
kubectl annotate node worker-02 flannel.alpha.coreos.com/public-ip-overwrite=203.0.113.16 --overwrite

Cluster Bootstrap with kubeadm

Generate a baseline configuration and adjust it for cloud environments and resource constraints. Single-core instances require bypassing the CPU preflight check.

# kubeadm-init.yaml
apiVersion: kubeadm.k8s.io/v1beta3
kind: InitConfiguration
localAPIEndpoint:
  advertiseAddress: 203.0.113.10
  bindPort: 6443
nodeRegistration:
  criSocket: unix:///run/containerd/containerd.sock
  name: cp-node-01
  taints:
  - effect: NoSchedule
    key: node-role.kubernetes.io/control-plane
---
apiVersion: kubeadm.k8s.io/v1beta3
kind: ClusterConfiguration
kubernetesVersion: v1.28.4
imageRepository: registry.cn-hangzhou.aliyuncs.com/google_containers
networking:
  podSubnet: 10.244.0.0/16
  serviceSubnet: 10.96.0.0/12

If initialization halts at the etcd phase due to stale state, terminate hanging processes, reset the runtime state, and resume initialization while skipping already-completed phases:

systemctl stop kubelet
pkill -9 -f "kube|etcd" || true

systemctl start kubelet
kubeadm init --config kubeadm-init.yaml --skip-phases=preflight,certs,kubeconfig,kubelet-start,control-plane,etcd --ignore-preflight-errors=NumCPU

Configure the local client context and deploy the overlay network:

mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config

kubectl apply -f https://raw.githubusercontent.com/flannel-io/flannel/master/Documentation/kube-flannel.yml
kubectl taint nodes --all node-role.kubernetes.io/control-plane-

Dynamic Storage and Cluster Metrics

KubeSphere and stateful workloads require a default StorageClass. An NFS-backed dynamic provisioner provides a lightweight solution for development clusters.

# nfs-provisioner-setup.yaml
apiVersion: v1
kind: ServiceAccount
metadata:
  name: nfs-provisioner
  namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: nfs-provisioner-role
rules:
  - apiGroups: [""]
    resources: ["persistentvolumes", "persistentvolumeclaims", "events", "endpoints", "services"]
    verbs: ["get", "list", "watch", "create", "update", "patch"]
  - apiGroups: ["storage.k8s.io"]
    resources: ["storageclasses"]
    verbs: ["get", "list", "watch"]
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: nfs-dynamic-provisioner
  namespace: kube-system
spec:
  replicas: 1
  selector:
    matchLabels:
      app: nfs-provisioner
  template:
    metadata:
      labels:
        app: nfs-provisioner
    spec:
      serviceAccountName: nfs-provisioner
      containers:
        - name: provisioner
          image: registry.k8s.io/sig-storage/nfs-subdir-external-provisioner:v4.0.2
          env:
            - name: PROVISIONER_NAME
              value: nfs-storage-provisioner
            - name: NFS_SERVER
              value: 10.0.1.50
            - name: NFS_PATH
              value: /exports/k8s-data
          volumeMounts:
            - name: nfs-volume
              mountPath: /persistentvolumes
      volumes:
        - name: nfs-volume
          nfs:
            server: 10.0.1.50
            path: /exports/k8s-data
---
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: nfs-default
  annotations:
    storageclass.kubernetes.io/is-default-class: "true"
provisioner: nfs-storage-provisioner
parameters:
  archiveOnDelete: "false"

Enable resource monitoring by deploying the Metrics Server with TLS verification disabled for internal kubelet communication:

# metrics-server-patch.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: metrics-server
  namespace: kube-system
spec:
  template:
    spec:
      containers:
      - name: metrics-server
        image: registry.k8s.io/metrics-server/metrics-server:v0.6.4
        args:
          - --cert-dir=/tmp
          - --secure-port=4443
          - --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname
          - --kubelet-use-node-status-port
          - --kubelet-insecure-tls
          - --metric-resolution=15s

Apply the configuration and verify node/pod resource reporting via kubectl top nodes.

KubeSphere Instalation and Worker Node CLI Access

When deploying KubeSphere on constrained hardware, avoid enabling all pluggable components simultaneously. Modify the ClusterConfiguration manifest to activate only core modules, then incrementally enable logging, DevOps, or service mesh features as resource capacity allows.

Worker nodes lack default API server credentials. Executing kubectl on compute nodes returns a localhost:8080 connection refused error because the client attempts to reach an unconfigured local endpoint. Distribute the administrator configuration from the control plane:

# On control plane
scp /etc/kubernetes/admin.conf worker-node-01:/etc/kubernetes/admin.conf

# On worker node
export KUBECONFIG=/etc/kubernetes/admin.conf
echo 'export KUBECONFIG=/etc/kubernetes/admin.conf' >> ~/.bashrc
source ~/.bashrc

Verify connectivity by querying cluster resources from the compute node. The API server will now route requests through the configured endpoint rather than the default loopback address.

Tags: kubernetes KubeSphere cloud-infrastructure containerd kubeadm

Posted on Fri, 15 May 2026 20:00:01 +0000 by coffejor