Public cloud virtual machines typically route traffic through NAT gateways, which disrupts Kubernetes control plane and etcd peer communication that expects direct layer-3 connectivity. Deploying a cluster on low-spec instances requires explicit runtime configuration, network address translation rules, and careful component selection.
Container Runtime Endpoint and Registry Configuraton
Image pull failures during cluster bootstrap often stem from crictl attempting to connect to deprecated Docker shim sockets. Configure the container runtime interface to target the active containerd socket explicitly.
# /etc/crictl.yaml
runtime-endpoint: unix:///run/containerd/containerd.sock
image-endpoint: unix:///run/containerd/containerd.sock
timeout: 10
debug: false
If containerd fails to manage cgroups correctly, adjust its daemon configuration before restarting the service:
# /etc/containerd/config.toml
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc]
runtime_type = "io.containerd.runc.v2"
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options]
SystemdCgroup = true
systemctl daemon-reload && systemctl restart containerd
Registry timeouts during image synchronization can be mitigated by configuring mirror endpoints in the containerd hosts directory or Docker daemon configuration, depending on the chosen runtime.
Etcd Binding and Cloud NAT Routing
Etcd clusters fail to form when nodes advertise public IP addresses that are not locally bound. Cloud providers map public IPs via 1:1 NAT, causing cluster ID mismatch or connection refused errors. Override the default etcd static pod manifest to bind strictly to loopback and private interfaces.
Edit /etc/kubernetes/manifests/etcd.yaml on the control plane node:
containers:
- command:
- etcd
- --listen-client-urls=https://127.0.0.1:2379
- --advertise-client-urls=https://10.0.1.10:2379
- --listen-peer-urls=https://127.0.0.1:2380
- --initial-advertise-peer-urls=https://10.0.1.10:2380
To ensure cross-node pod traffic routes correctly through the cloud NAT, implement destination NAT rules and override Flannel's external IP annotation:
# Route internal cluster traffic destined for other nodes through their public IPs
iptables -t nat -A OUTPUT -d 10.0.1.15 -j DNAT --to-destination 203.0.113.15
iptables -t nat -A OUTPUT -d 10.0.1.16 -j DNAT --to-destination 203.0.113.16
# Update Flannel annotations to use public IPs for VXLAN routing
kubectl annotate node worker-01 flannel.alpha.coreos.com/public-ip-overwrite=203.0.113.15 --overwrite
kubectl annotate node worker-02 flannel.alpha.coreos.com/public-ip-overwrite=203.0.113.16 --overwrite
Cluster Bootstrap with kubeadm
Generate a baseline configuration and adjust it for cloud environments and resource constraints. Single-core instances require bypassing the CPU preflight check.
# kubeadm-init.yaml
apiVersion: kubeadm.k8s.io/v1beta3
kind: InitConfiguration
localAPIEndpoint:
advertiseAddress: 203.0.113.10
bindPort: 6443
nodeRegistration:
criSocket: unix:///run/containerd/containerd.sock
name: cp-node-01
taints:
- effect: NoSchedule
key: node-role.kubernetes.io/control-plane
---
apiVersion: kubeadm.k8s.io/v1beta3
kind: ClusterConfiguration
kubernetesVersion: v1.28.4
imageRepository: registry.cn-hangzhou.aliyuncs.com/google_containers
networking:
podSubnet: 10.244.0.0/16
serviceSubnet: 10.96.0.0/12
If initialization halts at the etcd phase due to stale state, terminate hanging processes, reset the runtime state, and resume initialization while skipping already-completed phases:
systemctl stop kubelet
pkill -9 -f "kube|etcd" || true
systemctl start kubelet
kubeadm init --config kubeadm-init.yaml --skip-phases=preflight,certs,kubeconfig,kubelet-start,control-plane,etcd --ignore-preflight-errors=NumCPU
Configure the local client context and deploy the overlay network:
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
kubectl apply -f https://raw.githubusercontent.com/flannel-io/flannel/master/Documentation/kube-flannel.yml
kubectl taint nodes --all node-role.kubernetes.io/control-plane-
Dynamic Storage and Cluster Metrics
KubeSphere and stateful workloads require a default StorageClass. An NFS-backed dynamic provisioner provides a lightweight solution for development clusters.
# nfs-provisioner-setup.yaml
apiVersion: v1
kind: ServiceAccount
metadata:
name: nfs-provisioner
namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: nfs-provisioner-role
rules:
- apiGroups: [""]
resources: ["persistentvolumes", "persistentvolumeclaims", "events", "endpoints", "services"]
verbs: ["get", "list", "watch", "create", "update", "patch"]
- apiGroups: ["storage.k8s.io"]
resources: ["storageclasses"]
verbs: ["get", "list", "watch"]
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: nfs-dynamic-provisioner
namespace: kube-system
spec:
replicas: 1
selector:
matchLabels:
app: nfs-provisioner
template:
metadata:
labels:
app: nfs-provisioner
spec:
serviceAccountName: nfs-provisioner
containers:
- name: provisioner
image: registry.k8s.io/sig-storage/nfs-subdir-external-provisioner:v4.0.2
env:
- name: PROVISIONER_NAME
value: nfs-storage-provisioner
- name: NFS_SERVER
value: 10.0.1.50
- name: NFS_PATH
value: /exports/k8s-data
volumeMounts:
- name: nfs-volume
mountPath: /persistentvolumes
volumes:
- name: nfs-volume
nfs:
server: 10.0.1.50
path: /exports/k8s-data
---
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: nfs-default
annotations:
storageclass.kubernetes.io/is-default-class: "true"
provisioner: nfs-storage-provisioner
parameters:
archiveOnDelete: "false"
Enable resource monitoring by deploying the Metrics Server with TLS verification disabled for internal kubelet communication:
# metrics-server-patch.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: metrics-server
namespace: kube-system
spec:
template:
spec:
containers:
- name: metrics-server
image: registry.k8s.io/metrics-server/metrics-server:v0.6.4
args:
- --cert-dir=/tmp
- --secure-port=4443
- --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname
- --kubelet-use-node-status-port
- --kubelet-insecure-tls
- --metric-resolution=15s
Apply the configuration and verify node/pod resource reporting via kubectl top nodes.
KubeSphere Instalation and Worker Node CLI Access
When deploying KubeSphere on constrained hardware, avoid enabling all pluggable components simultaneously. Modify the ClusterConfiguration manifest to activate only core modules, then incrementally enable logging, DevOps, or service mesh features as resource capacity allows.
Worker nodes lack default API server credentials. Executing kubectl on compute nodes returns a localhost:8080 connection refused error because the client attempts to reach an unconfigured local endpoint. Distribute the administrator configuration from the control plane:
# On control plane
scp /etc/kubernetes/admin.conf worker-node-01:/etc/kubernetes/admin.conf
# On worker node
export KUBECONFIG=/etc/kubernetes/admin.conf
echo 'export KUBECONFIG=/etc/kubernetes/admin.conf' >> ~/.bashrc
source ~/.bashrc
Verify connectivity by querying cluster resources from the compute node. The API server will now route requests through the configured endpoint rather than the default loopback address.