Understanding GlusterFS: A Distributed File System Solution

Introduction to GlusterFS

GlusterFS is an open-source distributed file system composed of storage servers, clients, and optional NFS/Samba storage gateways. Unlike traditional distributed file systems that rely on metadata servers, GlusterFS operates without a metadata server component, enhancing performance, reliability, and stability.

Traditional distributed file systems typically use metadata servers to store directory information and structure. While this approach offers efficient directory browsing, it introduces potential single points of failure. If the metadata server fails, the entire storage system collapses regardless of node redundancy. GlusterFS, with its serverless metadata design, provides excellent horizontal scalability, high reliability, and efficient storage.

GlusterFS serves as the core of Gluster, a scale-out storage solution, offering powerful horizontal scalability for data storage. It can scale to support petabyte-level storage capacity and handle thousands of clients. GlusterFS aggregates physically distributed storage resources using TCP/IP or InfiniBand RDMA networks, providing unified storage services through a global namespace.

Key Features

  • Scalability and High Performance
    • Scale-out architecture allows increasing storage capacity and performance by simply adding storage nodes (disk, computing, and I/O resources can be independently scaled). Supports high-speed network interconnects like 10GbE and InfiniBand.
    • Gluster's Elastic Hash eliminates dependency on metadata servers, addressing single points of failure and performance bottlenecks, enabling truly parallel data access. The algorithm intelligently locates any data shard across the storage pool without requiring index lookups or metadata server queries.
  • High Availability
    • GlusterFS automatically replicates files through mirroring or multiple copies, ensuring data accessibility even during hardware failures.
    • The self-healing feature restores data to a consistent state when inconsistencies occur. Data repair is performed incrementally in the background with minimal performance impact.
    • Supports all storage types by using standard operating system file systems (EXT3, XFS, etc.) rather than proprietary formats. Data can be accessed using traditional disk access methods.
  • Global Unified Namespace

    Integrates all node namespaces into a unified namespace, combining all node storage capacities into a large virtual storage pool accessible to frontend hosts for data read/write operations.

  • Elastic Volume Management

    Data is stored in logical volumes independently partitioned from a logical storage pool. The logical storage pool can be expanded or removed online without service interruption. Logical volumes can be grown or shrunk online with load balancing across multiple nodes. Filesystem configurations can be modified and applied in real-time to adapt to workload changes or performance tuning.

  • Standard Protocol Support

    Gluster storage services support NFS, CIFS, HTTP, FTP, SMB, and Gluster native protocols, fully compatible with POSIX standards. Existing applications can access Gluster data without modifications, and dedicated APIs are also available.

Terminology

  • Brick: A dedicated partition on trusted hosts provided for physical storage. The basic storage unit in GlusterFS and the storage directory exposed by servers in the trusted pool. Format: SERVER:EXPORT (e.g., 192.168.232.10:/data/mydir/).
  • Volume: A collection of Bricks. A logical device for data storage, similar to LVM logical volumes. Most Gluster management operations are performed on volumes.
  • FUSE: A kernel module that allows users to create their own file systems without modifying kernel code.
  • VFS: The interface provided by kernel space to user space for disk access.
  • Glusterd: The background management process that runs on each node in the storage cluster.
  • Stripe: A data distribution technique that splits files into fixed-size blocks and sequentially stores them across multiple bricks. The stripe size can be configured, with a default of 4MB.

Workflow

  1. Clients or applications access data through GlusterFS mount points.
  2. The Linux kernel receives and processes requests via the VFS API.
  3. VFS passes data to the FUSE kernel filesystem, registering an actual FUSE filesystem. The FUSE filesystem delivers data to the GlusterFS client through the /dev/fuse device file, acting as a proxy.
  4. The GlusterFS client processes data according to configuration settings.
  5. Processed data is transmitted over the network to remote GlusterFS servers and written to server storage devices.

Volume Types

Distributed Volume

Files are distributed across all Brick Servers using a HASH algorithm. This is GlusterFS's default volume type. Files are stored as whole units on different servers based on the HASH algorithm, expanding disk space without redundancy. This is equivalent to file-level RAID0 without fault tolerance. Files are not split; each file resides entirely on one server node. Storage efficiency doesn't improve and may decrease due to network communication overhead.

Characteristics:

  • Files distributed across different servers without redundancy.
  • Easier and cheaper to expand volume size.
  • Single points of failure cause data loss.
  • Dependent on underlying data protection.

Example:

#!/bin/bash
# Create a distributed volume named 'dist-vol'
# Files will be distributed based on HASH across:
# server1:/storage/dir1, server2:/storage/dir2, server3:/storage/dir3
gluster volume create dist-vol server1:/storage/dir1 server2:/storage/dir2 server3:/storage/dir3 force

Stripe Volume

Similar to RAID0, files are split into data blocks and distributed across multiple Brick Servers in a round-robin fashion. File storage is block-based, supporting large files with higher read efficiency for larger files, but without redundancy.

Striping: A data distribution technique that splits files into fixed-size blocks (stripes) and sequentially stores them across multiple bricks. The stripe size can be configured, defaulting to 4MB.

Characteristics:

  • Data is split into smaller blocks distributed across different stripe areas in the block server cluster.
  • Distribution reduces load and smaller files accelerate access speed.
  • No data redundancy.

Example:

#!/bin/bash
# Create a stripe volume named 'stripe-vol' with 2 stripes
# Files will be split and stored across:
# server1:/storage/dir1 and server2:/storage/dir2
gluster volume create stripe-vol stripe 2 transport tcp server1:/storage/dir1 server2:/storage/dir2 force

Replica Volume

Files are synchronized across multiple Bricks, creating multiple file copies equivalent to file-level RAID1 with fault tolerance. Data distribution across multiple Bricks significantly improves read performance but decreases write performance. Replica volumes provide redundancy, ensuring normal data usage even if one node fails. However, disk utilization is lower due to replica storage.

Characteristics:

  • All servers in the volume maintain complete copies.
  • Replica count is determined during volume creation but must equal the number of storage servers in the volume's Bricks.
  • Requires at least two block servers.
  • Provides redundancy.

Example:

#!/bin/bash
# Create a replica volume named 'rep-vol' with 2 replicas
# Files will be stored as copies across:
# server1:/storage/dir1 and server2:/storage/dir2
gluster volume create rep-vol replica 2 transport tcp server1:/storage/dir1 server2:/storage/dir2 force

Distributed Stripe Volume

The number of Brick Servers is a multiple of the stripe count (number of Bricks distributing data blocks). Combines features of distributed and stripe volumes, primarily for large file access. Requires at least 4 servers to create.

Example:

File1 and File2 are located on Server1 and Server2 respectively through distributed volume functionality. In Server1, File1 is split into 4 segments, with segments 1 and 3 in Server1's exp1 directory and segments 2 and 4 in Server1's exp2 directory. In Server2, File2 is also split into 4 segments, with segments 1 and 3 in Server2's exp3 directory and segments 2 and 4 in Server2's exp4 directory.

Example:

#!/bin/bash
# Create a distributed stripe volume named 'dist-stripe-vol'
# When creating distributed stripe volumes, the number of storage servers in the volume's Bricks must be a multiple of the stripe count (≥2x)
# With 4 Bricks (server1:/storage/dir1, server2:/storage/dir2, server3:/storage/dir3, server4:/storage/dir4) and stripe count of 2
gluster volume create dist-stripe-vol stripe 2 transport tcp server1:/storage/dir1 server2:/storage/dir2 server3:/storage/dir3 server4:/storage/dir4 force

When creating volumes, if the number of storage servers equals the stripe or replica count, a stripe or replica volume is created. If the number of storage servers is 2x or more the stripe or replica count, a distributed stripe or distributed replica volume is created.

Distributed Replica Volume

The number of Brick Servers is a multiple of the replica count (number of data copies). Combines features of distributed and replica volumes, primarily for scenarios requiring redundancy.

Example:

File1 and File2 are located on Server1 and Server2 respectively through distributed volume functionality. When storing File1, two identical copies exist based on replica volume characteristics: one in Server1's exp1 directory and another in Server2's exp2 directory. When storing File2, two identical copies also exist: one in Server3's exp3 directory and another in Server4's exp4 directory.

Example:

#!/bin/bash
# Create a distributed replica volume named 'dist-replica-vol'
# When creating distributed replica volumes, the number of storage servers in the volume's Bricks must be a multiple of the replica count (≥2x)
# With 4 Bricks (server1:/storage/dir1, server2:/storage/dir2, server3:/storage/dir3, server4:/storage/dir4) and replica count of 2
gluster volume create dist-replica-vol replica 2 transport tcp server1:/storage/dir1 server2:/storage/dir2 server3:/storage/dir3 server4:/storage/dir4 force

Stripe Replica Volume

Similar to RAID10, combines features of stripe and replica volumes.

Distributed Stripe Replica Volume

A composite volume of the three basic types. When writing files, the system first splits files into stripes. These stripes are assigned to different storage nodes, with replicas created on other nodes. During reading, the system can retrieve different stripes from multiple nodes in parallel and reconstruct the file.

GlusterFS Deployment

Environment Setup

Node Disks Mount Points
node1/192.168.232.10 /dev/sdb1 /dev/sdc1 /dev/sdd1 /dev/sde1 /data/sdb1 /data/sdc1 /data/sdd1 /data/sde1
node2/192.168.232.20 /dev/sdb1 /dev/sdc1 /dev/sdd1 /dev/sde1 /data/sdb1 /data/sdc1 /data/sdd1 /data/sde1
node3/192.168.232.30 /dev/sdb1 /dev/sdc1 /dev/sdd1 /dev/sde1 /data/sdb1 /data/sdc1 /data/sdd1 /data/sde1
node4/192.168.232.40 /dev/sdb1 /dev/sdc1 /dev/sdd1 /dev/sde1 /data/sdb1 /data/sdc1 /data/sdd1 /data/sde1

Initial Preparation

Disable Firewall

#!/bin/bash
systemctl stop firewalld
setenforce 0

Configure Hostname

#!/bin/bash
hostnamectl set-hostname node1

Refresh Disk Interfaces

#!/bin/bash
echo "- - -" > /sys/class/scsi_host/host0/scan
echo "- - -" > /sys/class/scsi_host/host1/scan
echo "- - -" > /sys/class/scsi_host/host2/scan

Verify that disks have been added on all four machines.

Disk Partitioning, Formatting, and Mounting

#!/bin/bash
# Script for disk partitioning, formatting, and mounting
PART_SCRIPT="/opt/disk_setup.sh"
cat << 'EOF' > $PART_SCRIPT
#!/bin/bash
# Discover new disk devices
DISKS=$(ls /dev/sd* | grep -o 'sd[b-z]' | uniq)

for DISK in $DISKS
do
   # Partition the disk
   echo -e "n\np\n\n\n\nw\n" | fdisk /dev/$DISK &> /dev/null
   
   # Format with XFS
   mkfs.xfs /dev/${DISK}1 &> /dev/null
   
   # Create mount point
   mkdir -p /data/${DISK}1 &> /dev/null
   
   # Add to fstab
   echo "/dev/${DISK}1 /data/${DISK}1 xfs defaults 0 0" >> /etc/fstab
done

# Mount all filesystems
mount -a &> /dev/null
EOF

# Make script executable and run it
chmod +x $PART_SCRIPT
$PART_SCRIPT

# Verify mounting
df -h

Prepare Yum Repository

#!/bin/bash
# Setup local yum repository
cd /etc/yum.repos.d/
mkdir repo.bak
mv *.repo repo.bak

# Create repository configuration
cat << 'EOF' > /etc/yum.repos.d/glfs.repo
[glfs]
name=GlusterFS Repository
baseurl=file:///opt/gfsrepo
gpgcheck=0
enabled=1
EOF

# Clean and update repository cache
yum clean all && yum makecache

Install Server Software

#!/bin/bash
# Install GlusterFS packages
yum -y install glusterfs glusterfs-server glusterfs-fuse glusterfs-rdma

# Enable and start glusterd service
systemctl enable --now glusterd.service

# Verify service status
systemctl status glusterd.service

Add Nodes to Storage Trust Pool

Add nodes to the trust pool from any single node. If using domain names, ensure they're specified in the hosts file.

#!/bin/bash
# Add all nodes to the trust pool
gluster peer probe node1
gluster peer probe node2
gluster peer probe node3
gluster peer probe node4

Create Volumes

Volume Name Volume Type Bricks
dist-vol Distributed Volume node1(/data/sdb1), node2(/data/sdb1)
stripe-vol Stripe Volume node1(/data/sdc1), node2(/data/sdc1)
rep-vol Replica Volume node3(/data/sdb1), node4(/data/sdb1)
dist-stripe-vol Distributed Stripe Volume node1(/data/sdd1), node2(/data/sdd1), node3(/data/sdd1), node4(/data/sdd1)
dist-replica-vol Distributed Replica Volume node1(/data/sde1), node2(/data/sde1), node3(/data/sde1), node4(/data/sde1)

Distributed Volume

#!/bin/bash
# Create a distributed volume
gluster volume create dist-vol node1:/data/sdb1 node2:/data/sdb1 force

# Start the volume
gluster volume start dist-vol

# View volume information
gluster volume info dist-vol

# List all volumes
gluster volume list

Mounting: Mount the distributed volume on clients for permanent access.

Testing:

#!/bin/bash
# Test file creation and storage
MOUNT_POINT="/mnt/dist_test"
mkdir -p $MOUNT_POINT

# Create test files
dd if=/dev/zero of=$MOUNT_POINT/testfile1.log bs=1M count=40
dd if=/dev/zero of=$MOUNT_POINT/testfile2.log bs=1M count=40
dd if=/dev/zero of=$MOUNT_POINT/testfile3.log bs=1M count=40
dd if=/dev/zero of=$MOUNT_POINT/testfile4.log bs=1M count=40
dd if=/dev/zero of=$MOUNT_POINT/testfile5.log bs=1M count=40

Stripe Volume

#!/bin/bash
# Create a stripe volume with 2 stripes
gluster volume create stripe-vol stripe 2 node1:/data/sdc1 node2:/data/sdc1 force

# Start the stripe volume
gluster volume start stripe-vol

# View stripe volume information
gluster volume info stripe-vol

Testing: Data is split 50% with no replicas or redundancy.

Replica Volume

#!/bin/bash
# Create a replica volume with 2 replicas
gluster volume create rep-vol replica 2 node3:/data/sdb1 node4:/data/sdb1 force

# Start the replica volume
gluster volume start rep-vol

# View replica volume information
gluster volume info rep-vol

Verification: Verify data replication and redundancy.

Distributed Stripe Volume

#!/bin/bash
# Create a distributed stripe volume
# Specifying type as stripe with value 2, followed by 4 Brick Servers (2x the stripe count)
# creates a distributed stripe volume
gluster volume create dist-stripe-vol stripe 2 node1:/data/sdd1 node2:/data/sdd1 node3:/data/sdd1 node4:/data/sdd1 force

# Start the distributed stripe volume
gluster volume start dist-stripe-vol

# View volume information
gluster volume info dist-stripe-vol

Distributed Replica Volume

#!/bin/bash
# Create a distributed replica volume
# Specifying type as replica with value 2, followed by 4 Brick Servers (2x the replica count)
# creates a distributed replica volume
gluster volume create dist-replica-vol replica 2 node1:/data/sde1 node2:/data/sde1 node3:/data/sde1 node4:/data/sde1 force

# Start the distributed replica volume
gluster volume start dist-replica-vol

# View volume information
gluster volume info dist-replica-vol

Additional Maintenance Commands

#!/bin/bash
# List all GlusterFS volumes
gluster volume list

# View information for all volumes
gluster volume info

# Check status of all volumes
gluster volume status

# Stop a volume
gluster volume stop dist-stripe-vol

# Delete a volume (must be stopped first)
gluster volume delete dist-stripe-vol

# Set access control for a volume
# Deny specific IP
gluster volume set dist-replica-vol auth.deny 192.168.80.100

# Allow specific network range
gluster volume set dist-replica-vol auth.allow 192.168.80.*
# Allows all IPs in the 192.168.80.0 network to access the dist-replica-vol

Tags: glusterfs Distributed File System Storage data management filesystem

Posted on Wed, 03 Jun 2026 17:45:18 +0000 by Qense