GlusterFS Installation and Configuration Guide

Initial Environment Setup

System Environment: CentOS 7.3 (Kernel 3.10.0-514.el7.x86_64)

Hardware Requirements:

Two server nodes
Minimum two disks per node - one for OS, one for storage
Nodes named: server01, server02

IP Configuration:

192.168.238.129 - server01
192.168.238.130 - server02

Disk Configuration

After OS installation, format the secondary disk on both nodes:

[root@server01 ~]# fdisk /dev/sdb
# Create partition using fdisk interactive commands

1.1 Format and Mount the Storage Disk

Execute the following on both nodes. This example assumes the brick will reside on /dev/sdb1:

# Format with XFS filesystem
[root@server01 ~]# mkfs.xfs -i size=512 /dev/sdb1

# Create mount point and mount the partition
[root@server01 ~]# mkdir -p /data/brick1 && mount /dev/sdb1 /data/brick1

# Add to fstab for persistent mounting
[root@server01 ~]# echo "/dev/sdb1 /data/brick1 xfs defaults 0 0" >> /etc/fstab

# Verify mounting
[root@server01 ~]# mount -a && mount

Install GlusterFS

# Install the GlusterFS repository
[root@server01 ~]# yum install centos-release-gluster

# Install GlusterFS server packages
[root@server01 ~]# yum install glusterfs-server

Configure Firewall

For testing environments, it is recommended to disable the firewall:

# Temporarily stop firewall
[root@server01 ~]# systemctl stop firewalld

# Permanently disable firewall
[root@server01 ~]# systemctl disable firewalld

Note: For production environments, configure apppropriate firewall rules instead of disabling.

Start GlusterFS Service

# Start GlusterFS daemon
[root@server01 ~]# systemctl start glusterd

# Enable auto-start on boot
[root@server01 ~]# systemctl enable glusterd

Configure Trusted Storage Pool

First, ensure host resolution is configured in /etc/hosts or DNS. This example uses /etc/hosts:

[root@server01 ~]# cat /etc/hosts
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.238.129 server01
192.168.238.130 server02
192.168.238.133 server03
192.168.238.132 server04
192.168.238.134 server05

Add the second node to the trusted pool from the first node:

[root@server01 ~]# gluster peer probe server02
peer probe: success.

Verify peer status:

[root@server01 ~]# gluster peer status
Number of Peers: 1

Hostname: server02
Uuid: 61fe987a-99ff-419d-8018-90603ea16fe7
State: Peer in Cluster (Connected)

To remove a node from the cluster:

[root@server01 ~]# gluster peer detach server02

Create a GlusterFS Volume

Create the brick directories on both nodes:

[root@server01 ~]# mkdir -p /data/brick1
[root@server02 ~]# mkdir -p /data/brick1

Create a replicated volume:

[root@server01 ~]# gluster volume create myvolume replica 2 transport tcp server01:/data/brick1 server02:/data/brick1
volume create: myvolume: success: please start the volume to access data

Start the volume:

[root@server01 ~]# gluster volume start myvolume
volume start: myvolume: success

View volume information:

[root@server01 ~]# gluster volume info
Volume Name: myvolume
Type: Replicate
Volume ID: bc637d83-0273-4373-9d00-d794a3a3d2e7
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: server01:/data/brick1
Brick2: server02:/data/brick1
Options Reconfigured:
transport.address-family: inet
nfs.disable: on

Check volume status:

[root@server02 ~]# gluster volume status
Status of volume: myvolume
Gluster process            TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick server01:/data/brick1  49152     N/A        Y       10774
Brick server02:/data/brick1  N/A       N/A        N/A     N/A
Self-heal Daemon on localhost          N/A       N/A        Y      998
Self-heal Daemon on server01           N/A       N/A        Y      10794

Volume Expansion

To expand a volume (add new bricks), follow these steps:

Prerequisites: Ensure new nodes have the same setup as existing nodes, including /etc/hosts entries.

7.1 Probe New Nodes

[root@server01 ~]# gluster peer probe server03
peer probe: success.

[root@server01 ~]# gluster peer probe server04
peer probe: success.

7.2 Add New Bricks

[root@server01 ~]# gluster volume add-brick myvolume server03:/data/brick1 server04:/data/brick1
volume add-brick: success

7.3 Verify Volume Information

[root@server01 ~]# gluster volume info
Volume Name: myvolume
Type: Distributed-Replicate
Volume ID: 09363405-1c7c-4eb1-b815-b97822c1f274
Status: Started
Snapshot Count: 0
Number of Bricks: 2 x 2 = 4
Transport-type: tcp
Bricks:
Brick1: server01:/data/brick1
Brick2: server02:/data/brick1
Brick3: server03:/data/brick1
Brick4: server04:/data/brick1
Options Reconfigured:
transport.address-family: inet
nfs.disable: on

7.4 Rebalance the Volume

After expanding, run rebalance to distribute new data to new bricks:

[root@server01 ~]# gluster volume rebalance myvolume start
volume rebalance: myvolume: success: Rebalance on myvolume has been started successfully. Use rebalance status command to check status of the rebalance process.
ID: c9d052e8-2b6c-40b0-8d77-52290bcdb61

Volume Shrinking

To remove bricks from a volume (online shrink):

[root@server01 ~]# gluster volume remove-brick myvolume server03:/data/brick1 server04:/data/brick1 force
Removing brick(s) can result in data loss. Do you want to Continue? (y/n)
y
volume remove-brick commit force: success

Monitor remove-brick operation:

[root@server01 ~]# gluster volume remove-brick status

Rebalancing Operations

9.1 Fix Layout Only

This redistributes new file writes to new bricks without migrating existing data:

[root@server01 ~]# gluster volume rebalance myvolume fix-layout start
volume rebalance: myvolume: success: Rebalance on myvolume has been started successfully.

[root@server01 ~]# gluster volume rebalance myvolume status
Node        status    run time in h:m:s
----------------------------------------
localhost   completed 0:0:0
server02    completed 0:0:0
server03    completed 0:0:0
server04    completed 0:0:0
volume rebalance: myvolume: success

9.2 Fix Layout and Migrate Data

This redistributes layout and migrates existing data to new bricks:

[root@server01 ~]# gluster volume rebalance myvolume start
volume rebalance: myvolume: success: Rebalance on myvolume has been started successfully.

[root@server01 ~]# gluster volume rebalance myvolume start force
volume rebalance: myvolume: success

9.3 Stop Rebalance

[root@server01 ~]# gluster volume rebalance myvolume stop

Disk Failure Handling

Method 1: Using Local Spare Disk

Assume server02's /dev/sdb1 has failed:

Step 1: Prepare new disk on failed node

[root@server02 ~]# mkfs.xfs -i size=512 /dev/sdd1
[root@server02 ~]# mkdir /data/sdd1 -p
[root@server02 ~]# mount /dev/sdd1 /data/sdd1
[root@server02 ~]# echo "/dev/sdd1 /data/sdd1 xfs defaults 0 0" >> /etc/fstab

Step 2: Get extended attributes from healthy node

[root@server01 ~]# getfattr -d -m . -e hex /data/brick1/

Step 3: Mount volume and trigger self-heal

[root@server02 ~]# mount -t glusterfs server02:/myvolume /mnt
[root@server02 ~]# mkdir /mnt/tempdir
[root@server02 ~]# rmdir /mnt/tempdir
[root@server02 ~]# setfattr -n trusted.non-existent-key -v abc /mnt
[root@server02 ~]# setfattr -x trusted.non-existent-key /mnt

Step 4: Check current status

[root@server01 ~]# gluster volume heal myvolume info

Step 5: Replace failed brick with new disk

[root@server02 ~]# gluster volume replace-brick myvolume server02:/data/brick1 server02:/data/sdd1/brick commit force
volume replace-brick: success: replace-brick commit force operation successful

Method 2: Cross-Host Migration

When a disk fails and you want to use a different host's disk:

Step 1: Add new node to trusted pool

[root@server01 ~]# gluster peer probe server05
peer probe: success.

Step 2: Prepare disk on new node

[root@server05 ~]# mkdir -p /data/brick1 && mount /dev/sdb1 /data/brick1
[root@server05 ~]# echo "/dev/sdb1 /data/brick1 xfs defaults 0 0" >> /etc/fstab
[root@server05 ~]# mount -a && mount

Step 3: Replace brick across hosts

[root@server01 ~]# gluster volume replace-brick myvolume server02:/data/brick1 server05:/data/brick1 commit force
volume replace-brick: success: replace-brick commit force operation successful

Step 4: Later, when original disk is replaced, migrate data back

[root@server01 ~]# gluster volume replace-brick myvolume server05:/data/brick1 server02:/data/brick1 commit force
volume replace-brick: success: replace-brick commit force operation successful

Volume Operations

11.1 Stop Volume

[root@server01 ~]# gluster volume stop myvolume
Stopping volume will make its data inaccessible. Do you want to continue? (y/n)
y
Stopping volume myvolume has been successful

11.2 Delete Volume

[root@server01 ~]# gluster volume delete myvolume
Deleting volume will erase all information about the volume. Do you want to continue? (y/n)
y
Deleting volume myvolume has been successful

Self-Heal Operations

GlusterFS automatically runs self-heal daemon every 10 minutes to synchronize replica bricks.

12.1 Trigger Self-Heal on Files Requiring Healing

# gluster volume heal myvolume
Heal operation on volume myvolume has been successful

12.2 Trigger Full Self-Heal

# gluster volume heal myvolume full
Heal operation on volume myvolume has been successful

12.3 View Files Needing Healing

# gluster volume heal myvolume info
Brick server01:/data/brick1
Number of entries: 0

Brick server02:/data/brick1
Number of entries: 101
/file1.txt
/file2.txt
...

12.4 View Self-Healed Files

# gluster volume heal myvolume info healed

12.5 View Failed Healing Files

# gluster volume heal myvolume info failed

12.6 View Split-Brain Files

# gluster volume heal myvolume info split-brain

Mounting GlusterFS Volumes

13.1 Native GlusterFS Mount

[root@server01 ~]# mkdir -p /mnt/glusterfs
[root@server01 ~]# mount -t glusterfs server01:/myvolume /mnt/glusterfs

13.2 NFS Mount

[root@server01 ~]# mount -o mountproto=tcp -t nfs server01:/myvolume /mnt/nfs

GlusterFS Volume Types

14.1 Distributed Volume

Files are distributed across bricks using hash algorithm. No redundancy:

# gluster volume create gv1 server01:/data/brick1 server02:/data/brick1 force
volume create: gv1: success: please start the volume to access data

14.2 Replicated Volume

Similar to RAID 1, provides high availability:

# gluster volume create gv2 replica 2 server01:/data/brick1 server02:/data/brick1 force
volume create: gv2: success: please start the volume to access data

14.3 Striped Volume

Similar to RAID 0, provides better performance for large files:

# gluster volume create gv3 stripe 2 server01:/data/brick1 server02:/data/brick1 force
volume create: gv3: success: please start the volume to access data

14.4 Distributed-Replicated Volume

Most commonly used in production environments:

# gluster volume create gv4 replica 2 server01:/data/brick1 server02:/data/brick1 server03:/data/brick1 server04:/data/brick1 force
volume create: gv4: success: please start the volume to access data

Troubleshooting

Issue 1: Peer Shows "Disconnected" State

[root@server01 ~]# gluster peer status
Number of Peers: 1
Hostname: server02
Uuid: 61fe987a-99ff-419d-8018-90603ea16fe7
State: Peer in Cluster (Disconnected)

Solution: Check firewall status and /etc/hosts configuration. Ensure firewall is disabled or appropriate rules are configured.

Issue 2: Volume Start Fails with Extended Attribute Error

[root@server01 ~]# gluster volume start myvolume
volume start: myvolume: failed: Failed to get extended attribute trusted.glusterfs.volume-id for brick dir /data/brick1. Reason : No data available

Solution:

[root@server01 ~]# gluster volume delete myvolume

# Clean up extended attributes
[root@server01 ~]# setfattr -x trusted.glusterfs.volume-id /data/brick1
[root@server01 ~]# setfattr -x trusted.gfid /data/brick1
[root@server01 ~]# rm -rf /data/brick1/.glusterfs

# Recreate volume
[root@server01 ~]# gluster volume create myvolume replica 2 server01:/data/brick1 server02:/data/brick1
volume create: myvolume: success: please start the volume to access data

[root@server01 ~]# gluster volume start myvolume
volume start: myvolume: success

Issue 3: "Path Already Part of a Volume" Error

This occurs when trying to reuse a brick that was previously part of a volume:

[root@server01 ~]# gluster volume add-brick myvolume server03:/data/brick1
volume add-brick: failed: Pre Validation failed on server03. /data/brick1 is already part of a volume