Managing Distributed Storage on Kubernetes with Rook and Ceph

Cloud-native storage refers to storage architectures specifically engineered to run within containerized environments, primarily Kubernetes. Unlike traditional storage, cloud-native solutions are software-defined, distributed, and managed through the same orchestrator as the applications they serve. This integration enables dynamic provisioning, high availability, and horizontal scalability, ensuring that data persists even as containers are rescheduled across a cluster.

Understanding the Rook and Ceph Ecosystem

Ceph is a highly resilient, open-source distributed storage system that provides unified interfaces for block, object, and file storage. While powerful, Ceph is notoriously complex to deploy and maintain manually. This is where Rook comes in. Rook is a specialized Kubernetes operator that automates the deployment, configuration, and management of Ceph. By turning Ceph into a "cloud-native" service, Rook allows administrators to manage storage using standard Kubernetes Custom Resource Definitions (CRDs).

Deploying the Rook Operator

The first step in establishing a cloud-native storage layer is deploying the Rook operator, which acts as the "brain" for the storage cluster.

# Create the required resources and the operator
kubectl create -f https://raw.githubusercontent.com/rook/rook/release-1.12/cluster/examples/kubernetes/ceph/crds.yaml
kubectl create -f https://raw.githubusercontent.com/rook/rook/release-1.12/cluster/examples/kubernetes/ceph/common.yaml
kubectl create -f https://raw.githubusercontent.com/rook/rook/release-1.12/cluster/examples/kubernetes/ceph/operator.yaml

Once the operator is running, you can define the actual Ceph cluster. The following configuraton provides a basic setup that utilizes all available nodes and devices.

apiVersion: ceph.rook.io/v1
kind: CephCluster
metadata:
  name: storage-backend
  namespace: rook-ceph
spec:
  cephVersion:
    image: quay.io/ceph/ceph:v18.2.0
  dataDirHostPath: /var/lib/rook-data
  mon:
    count: 3
    allowMultiplePerNode: false
  storage:
    useAllNodes: true
    useAllDevices: true
  dashboard:
    enabled: true
    ssl: false

Apply this manifest to initiate the cluster creation: kubectl apply -f cluster-config.yaml.

Provisioning Storage Resources

After the cluster is healthy, you need to define how Kubernetes pods will consume the storage. This is done by creating a CephBlockPool and a corresponding StorageClass.

1. Defining the Block Pool

The pool defines the redundancy level. In this example, we use a replication factor of three.

apiVersion: ceph.rook.io/v1
kind: CephBlockPool
metadata:
  name: high-availability-pool
  namespace: rook-ceph
spec:
  failureDomain: host
  replicated:
    size: 3

2. Creating the StorageClass

The StorageClass links the Ceph pool to the Kubernetes CSI (Container Storage Interface).

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: ceph-block-storage
provisioner: rook-ceph.rbd.csi.ceph.com
parameters:
  clusterID: rook-ceph
  pool: high-availability-pool
  imageFormat: "2"
  imageFeatures: layering
  csi.storage.k8s.io/fstype: ext4
reclaimPolicy: Delete
allowVolumeExpansion: true

Consuming Storage in Applications

Applications request storage using a PersistentVolumeClaim (PVC). Kubernetes will automatically provision a volume in Ceph based on the defined StorageClass.

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: database-storage-pvc
spec:
  storageClassName: ceph-block-storage
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 10Gi

A Pod can then mount this volume to persist its data:

apiVersion: v1
kind: Pod
metadata:
  name: app-server
spec:
  containers:
  - name: web-app
    image: nginx
    volumeMounts:
    - name: data-disk
      mountPath: /usr/share/nginx/html
  volumes:
  - name: data-disk
    persistentVolumeClaim:
      claimName: database-storage-pvc

Operational Management and Monitoring

Monitoring a distributed storage system is critical for maintaining uptime. Rook provides a toolbox container equipped with the full Ceph CLI suite.

# Deploy the toolbox
kubectl apply -f https://raw.githubusercontent.com/rook/rook/release-1.12/cluster/examples/kubernetes/ceph/toolbox.yaml

# Access the CLI
kubectl -n rook-ceph exec -it deploy/rook-ceph-tools -- ceph status

Key commands for health checks include:

  • ceph health detail: Provides specific reasons for any non-optimal state.
  • ceph osd tree: Visualizes the status of physical disks across nodes.
  • ceph df: Shows storage utilization and remaining capacity.

Best Practices for Production Environments

For production deployments, consider the following strategies to ensure performance and reliability:

  • Resource Isolation: Dedicate specific nodes for storage tasks and use Taints/Tolerations to prevent non-storage workloads from consuming disk I/O.
  • Failure Domains: Configure the failureDomain at the 'rack' or 'zone' level if your cluster spans multiple physical locations to prevent data loss during hardware failures.
  • Snapshots: Leverage the VolumeSnapshot feature in Kubernetes to create point-in-time backups of your volumes.
  • Performance Tuning: Use SSDs or NVMe drives for Ceph WAL/DB logs to significantly improve write latency.

Tags: kubernetes Ceph Rook cloud-native distributed-storage

Posted on Tue, 09 Jun 2026 16:58:16 +0000 by outsidaz