Introduction to LVM
What is LVM
LVM (Logical Volume Manager) originated as IBM's storage management solution for AIX. It introduces a logical abstraction layer between physical disks and partitions, allowing multiple partitions or entire disks to be combined into a single logical volume (functioning as a virtual hard drive). This approach significantly improves flexibility in disk partition management. Linux LVM support was initially added by Heinz Mauelshagen in the Linux 2.4 kernel, with LVM2 becoming the standard in Linux 2.6. Current stable releases are available through distribution repositories.
LVM was historically deployed primarily on servers, often paired with hardware RAID for high availability and flexible storage configuration. As disk capacities have grown and prices have dropped, desktop computers now commonly feature terabyte-scale storage, making LVM an attractive solution for managing large storage environments on personal computers. However, without hardware redundancy typically found in server environments, desktop LVM deployments face higher risk when disasters occur. Fortunately, LVM includes disaster recovery capabilities that can help minimize data loss.
To check the installed LVM version on your system:
rpm -qa | grep lvm
lvm2-2.02.56-8.el5_5.4
Core LVM Concepts: PV, LV, VG, and VGDA
Physical Volumes (PV)
Physical volumes form the foundation of LVM. Every logical volume and volume group must be built upon at least one physical volume. A PV can be an entire disk or a single partition, identified by a unique name.
Logical Volumes (LV)
Logical volumes reside on top of volume groups. A volume group can contain multiple logical volumes, and LV sizes can be dynamically adjusted within available VG space. Data on an LV may be scattered across multiple PVs in either contiguous or fragmented patterns.
Volume Groups (VG)
Volume groups aggregate one or more physical volumes. All physical volumes in a system typically belong to a root volume group (often named rootvg or similar).
Volume Group Descriptor Area (VGDA)
The VGDA stores metadata describing how physical volumes, volume groups, and logical volumes are configured. Similar to how traditional partitioning stores metadata in the partition table at the beginning of a disk, LVM stores its metadata in the VGDA located at the start of each physical volume.
Disaster Recovery Strategy
Types of Disasters
Storage disasters fall into two categories: human-caused and natural. Human-caused disasters include accidental deletion, partition corruption, and damaged partition tables. Natural disasters involve hardware failures such as bad sectors, disk damage, or controller failures.
LVM faces the same natural disaster risks as conventional filesystems, but its virtualized storage model introduces additional human-caused failure modes specific to LVM. Since LVM consolidates all disks under unified management, corruption of volume groups, logical volumes, or physical volumes can occur.
Recovery Approach
Enterprise environments typically employ strict operational policies, comprehensive backup strategies, and hardware redundancy to mitigate disasters. Desktop users generally lack these safeguards. When disasters strike, the goal shifts to minimizing impact and recovering as much data as possible.
For human-caused LVM failures, LVM provides built-in tools for backing up data and configuration metadata. These backups enable recovery when PVs, LVs, or VGs become corrupted.
For natural disasters affecting disk hardware, LVM's recovery tools focus on restoring LVM functionality and salvaging data from unaffected disks. The same tools address inconsistencies caused by human errors that render logical volumes or entire volume groups inaccessible.
LVM Disaster Recovery Procedures
The following scenarios demonstrate recovery techniques using a test environment with two disks (/dev/sda and /dev/sdb) on an x86 PC with all other hardware functioning normally.
Logical Volume Corruption
Logical volume corruption can result from accidental operations, power failures, malware, or direct disk manipulation. Many such failures are recoverable.
First, examine the current disk and filesystem state:
$ pvs
PV VG Fmt Attr PSize PFree
/dev/sdb data lvm2 a- 2.00G 92.00M
/dev/sdc data lvm2 a- 2.00G 0
$ vgs
VG #PV #LV #SN Attr VSize VFree
data 2 1 0 wz--n- 4.00G 92.00M
$ lvs -o +devices
LV VG Attr LSize Origin Snap% Move Log Copy% Devices
storage data -wi-ao 3.90G /dev/sdc(0)
storage data -wi-ao 3.90G /dev/sdb(0)
$ mount | grep '/dev/mapper'
/dev/mapper/data-storage on /mnt/data type reiserfs (rw,acl,user_xattr)
Scenario 1: LVM Metadata Loss Without Physical Disk Failure
Before proceeding, ensure your root partition is not managed by LVM. If LVM fails and root resides on LVM, the system cannot boot normally. While rescue mode from an installation disc can assist recovery, it introduces unnecessary complications. If root is on LVM, maintain regular backups of files in /etc/lvm/backup/ for system restoration.
Attempting to reinitialize a PV containing an active logical volume will fail:
$ pvcreate -ff /dev/sdb
Really INITIALIZE physical volume "/dev/sdb" of volume group "data" [y/n]? y
Can't open /dev/sdb exclusively. Mounted filesystem?
The pvcreate command refuses to reinitialize PVs with active logical volumes. You must remove associated LVs first.
Now simulate LVM label corruption by erasing the LVM2 metadata while preserving the partition table using dd:
$ dd if=/dev/zero of=/dev/sdb bs=512 count=1 seek=1
1+0 records in
1+0 records out
512 bytes copied, 0.000098 s, 5.2 MB/s
$ pvs --partial
Partial mode. Incomplete volume groups will be activated read-only.
Couldn't find device with uuid '7m0QBR-3kNP-J0fq-GwAq-iv6u-LQkF-yhJh5R'.
PV VG Fmt Attr PSize PFree
/dev/sdc data lvm2 a- 2.00G 0
unknown device data lvm2 a- 2.00G 92.00M
$ vgs --partial
VG #PV #LV #SN Attr VSize VFree
data 2 1 0 rz-pn- 4.00G 92.00M
$ lvs --partial
LV VG Attr LSize Origin Snap% Move Log Copy%
storage data -wi-ao 3.90G
The corrupted disk becomes an unknown device, and the volume group metadata switches to read-only mode. However, the logical volume remains intact, and since the partition table is undamaged, the filesystem can still be mounted and accessed.
Remount the filesystem read-only and back up data immediately:
$ mount -o remount -o ro /mnt/data
Begin the repair process:
# If "Can't open /dev/sdb exclusively" appears, deactivate the VG first
$ vgchange -an data
$ pvcreate -ff --uuid 7m0QBR-3kNP-J0fq-GwAq-iv6u-LQkF-yhJh5R \
--restorefile /etc/lvm/backup/data /dev/sdb
Physical volume "/dev/sdb" successfully created
$ pvs
PV VG Fmt Attr PSize PFree
/dev/sdb data lvm2 a- 2.00G 92.00M
/dev/sdc data lvm2 a- 2.00G 0
$ vgcfgrestore -f /etc/lvm/backup/data data
Restored volume group data
$ vgchange -ay data
1 logical volume(s) in volume group "data" now active
The volume group is restored. Run filesystem checks to verify integrity.
If the root partition had been on LVM, the system would fail to boot. Boot from an installation disc into rescue mode. If you need the original system running to retrieve backup files, use vgreduce --removemissing data to remove disks with corrupted labels, then boot normally. Use the backed-up metadata to perform restoration. Remember that normal LVM startup modifies files in /etc/lvm/backup/, so maintain external backups of this directory.
Without backups and with root on LVM, recovery becomes extremely difficult even if the physical disk is intact.
Scenario 2: Physical Volume Damage and Replacement
Check the current system state:
$ pvs
PV VG Fmt Attr PSize PFree
/dev/sda2 main lvm2 a- 7.93G 0
/dev/sdb archive lvm2 a- 2.00G 0
/dev/sdc archive lvm2 a- 2.00G 0
/dev/sdd lvm2 -- 2.00G 2.00G
$ vgs
VG #PV #LV #SN Attr VSize VFree
main 1 2 0 wz--n- 7.93G 0
archive 2 1 0 wz--n- 3.99G 0
$ lvs
LV VG Attr LSize Origin Snap% Move Log Copy%
root main -wi-ao 7.38G
swap main -wi-ao 560.00M
files archive -wi-ao 3.99G
$ mount /dev/archive/files /archive
$ ls /archive
important.txt
$ umount /archive
A file exists in the archive volume.
Back up the LVM metadata:
$ cp /etc/lvm/backup/* /backup/lvm/
Simulate failure by zeroing the first 400 sectors of /dev/sdc:
$ dd if=/dev/zero of=/dev/sdc bs=512 count=400
400+0 records in
204800 bytes (205 kB) copied
$ pvs --partial
Partial mode. Incomplete volume groups will be activated read-only.
Couldn't find device with uuid 'noOzjf-8bk2-HJhC-92fl-2f6Q-DzNZ-XBqnBO'.
PV VG Fmt Attr PSize PFree
/dev/sda2 main lvm2 a- 7.93G 0
/dev/sdb archive lvm2 a- 2.00G 0
/dev/sdd lvm2 -- 2.00G 2.00G
unknown device archive lvm2 a- 2.00G 0
Attempt to mount the volume:
$ mount /dev/archive/files /archive
mount: you must specify the filesystem type
The filesystem is corrupted. Replace the damaged PV with the spare disk /dev/sdd:
$ pvcreate --restorefile /etc/lvm/backup/archive \
--uuid noOzjf-8bk2-HJhC-92fl-2f6Q-DzNZ-XBqnBO /dev/sdd
Physical volume "/dev/sdd" successfully created.
$ pvs -v
PV VG Fmt Attr PSize PFree DevSize PV UUID
/dev/sda2 main lvm2 a- 7.93G 0 7.93G HsErSz-vuAD-rEe1-wE5m-Mnvr-T0Re-EwO7rD
/dev/sdb archive lvm2 a- 2.00G 0 2.00G OBb3qb-TW97-hKZ7-vmsr-RZ2v-fzuP-qK3wmN
/dev/sdd archive lvm2 a- 2.00G 0 2.00G noOzjf-8bk2-HJhC-92fl-2f6Q-DzNZ-XBqnBO
The replacement is recognized. Synchronize metadata:
$ vgchange -an archive
Volume group archive metadata is inconsistent
Volume group for uuid not found:
tcyuXvjcxCN7912qAJlEYzphncdWabTJ21OTTrG6DrMS8dPF5Wlh04GOHyO5ClbY
1 logical volume(s) in volume group "archive" now active
$ reboot
$ vgchange -an archive
0 logical volume(s) in volume group "archive" now active
$ vgchange -ay archive
1 logical volume(s) in volume group "archive" now active
The metadata inconsistency is resolved and backups have been refreshed. Attempt to mount:
$ mount /dev/archive/files /archive
mount: you must specify the filesystem type
Repair the filesystem:
$ reiserfsck /dev/archive/files --check
$ reiserfsck /dev/archive/files --rebuild-sb
$ reiserfsck /dev/archive/files --check
$ reiserfsck /dev/archive/files --rebuild-tree
$ reiserfsck /dev/archive/files --check
The specific sequence depends on the severity of corruption. Verify the result:
$ mount /dev/archive/files /archive
$ ls /archive
important.txt lost+found
The file is recovered.
Disk Bad Sectors
Extended use, improper operations, or power management issues can cause bad sectors. Symptoms include the entire disk freezing when accessing affected areas. If the operating system resides on that disk, system crashes become inevitable. Bad sectors tend to spread, and each access compounds damage. This can eventually destroy the disk's dynamic balance, rendering the entire drive unrecoverable. Early detection and intervention are critical.
LVM implements three levels of bad block handling:
- Internal data relocation: Hardware-level relocation within the disk itself. Occurs transparently without user notification.
- LVM hardware relocation: LVM copies data from faulty address A to healthy address B. Reads from address A are transparently redirected by the disk firmware.
- Software relocation: LVM maintains a bad block table. Before reading from any address, LVM checks the table and redirects if necessary.
These mechanisms operate transparently unless disabled using lvcreate -r n. This paramter prevents LVM from creating bad block relocation areas (BBRA).引导、根和主交换逻辑卷必须使用此参数.
When LVM issues are suspected, the first action should be backing up all volume group data. For existing bad sectors, exercise extreme caution with fsck. For issues beyond relocation recovery, consider exporting the disk immediately (see Disk Position Changes) to prevent sector spread. If export encounters problems, treat as disk failure.
Disk Position Changes
Motherboard port failures, motherboard replacement, or adding new devices can change disk positions. Since the Master Boot Record (MBR) resides in the first sector of the first disk, changing the first disk's position prevents the system from booting. Simply ensure the disk containing the MBR remains in the primary position (not necessarily the first port).
Disk Movement Within a System
For single volume group configurations, LVM automatically recognizes disk position changes. Normal system startup will work correctly. For systems with separate volume groups or multiple volume groups, deactivate volume groups before moving disks:
# Deactivate volume group
$ vgchange -an /dev/main
0 logical volume(s) in volume group "main" now active
$ lvscan
inactive '/dev/main/data'[1.90 GB] inherit
# After physical disk relocation, activate volume group
$ vgchange -ay /dev/main
1 logical volume(s) in volume group "main" now active
$ lvscan
ACTIVE '/dev/main/data'[1.90 GB] inherit
Disk Movement Between Systems
Moving disks between systems requires exporting and importing the volume group:
# Deactivate and export
$ vgchange -an /dev/main
0 logical volume(s) in volume group "main" now active
$ vgexport /dev/main
Volume group "main" successfully exported
The disk can now be physically moved to another system.
# Import and activate on target system
$ pvscan
PV /dev/sdc is in exported VG main [2.00 GB / 96.00 MB free]
Total: 1 [2.00 GB] / in use: 1 [2.00 GB] / in no VG: 0 [0]
$ vgimport /dev/main
Volume group "main" successfully imported
$ vgchange -ay /dev/main
1 logical volume(s) in volume group "main" now active
For volume groups spanning multiple disks where only some disks are moved, use pvmove to migrate data to specified disks first, then perform the physical relocation.
Disk Failure
Desktop computers typically use multiple disks for storage expansion rather then RAID protection. When a disk fails, users face not only data loss on that disk but also potential inaccessibility of remaining disks. LVM's developers anticipated this scenario.
When the root partition is not on LVM, recovery follows standard disk replacement procedures: replace the failed disk, reinstall the system, and remount the original LVM partitions. This section focuses on scenarios where root resides on LVM.
Non-Root Disk Failure
Consider a system with Disk-A and Disk-B in a volume group named primary. Root and swap are on Disk-A, while multiple logical volumes for user data span both disks. After a reboot, the system fails to boot, and inspection reveals Disk-B has failed. The goal is to boot the system and recover data from Disk-A.
After removing the failed Disk-B, the system cannot boot. Output shows:
...
Reading all physical volumes. This may take a while...
Couldn't find device with uuid 'wWnmiu-IdIw-K1P6-u10C-A4j1-LQSZ-c08RDy'.
Couldn't find all physical volumes for volume group primary.
Volume group "primary" not found
...not found -- exiting to /bin/sh
The root partition on LVM cannot be read when a member disk is missing, causing the LVM inconsistency that blocks boot. Restoring LVM consistency will enable reading the root partition from Disk-A.
Boot from an installation disc into emergency recovery mode:
Rescue Login: root
Rescue:~#
Check the current state:
Rescue:~# lvscan
Reading all physical volumes. This may take a while...
Couldn't find device with uuid 'wWnmiu-IdIw-K1P6-u10C-A4j1-LQSZ-c08RDy'.
Volume group "primary" not found
Rescue:~# pvscan
PV /dev/sda2 VG primary lvm2 [7.93 GB / 0 free]
PV unknown VG primary lvm2 [2.00 GB / 2.00 GB free]
Total: 2 [9.92 GB] / in use: 2 [9.92 GB] / in no VG: 0 [0]
The system detects the missing disk but can still recognize the physical disk and identify the missing physical volume. Standard filesystem mounting in rescue mode won't work because all logical volumes depend on the inaccessible volume group. The solution is to remove the missing physical volume from the volume group.
Use vgreduce --removemissing to remove all missing physical volumes from the group:
Rescue:~# vgreduce --removemissing primary
Wrote out consistent volume group primary
Rescue:~# vgscan
Found volume group "primary" using metadata type lvm2
Rescue:~# pvscan
PV /dev/sda2 VG primary lvm2 [7.93 GB / 0 free]
Total: 1 [7.93 GB] / in use: 1 [7.93 GB] / in no VG: 0 [0]
The volume group is now accessible. Logical volumes remain inactive and require manual activation:
Rescue:~# lvscan
inactive '/dev/primary/root'[7.38 GB] inherit
inactive '/dev/primary/swap'[560.00 MB] inherit
Rescue:~# lvchange -ay /dev/primary
ACTIVE '/dev/primary/root'[7.38 GB] inherit
ACTIVE '/dev/primary/swap'[560.00 MB] inherit
Reboot the system. It will start normally, with only the failed disk permanently lost.
Root Disk Failure
When the disk containing the root partition fails, the approach is to install the surviving disks in another machine. Running fdisk -l shows the disk lacks a valid partition table. LVM diagnostics show the same errors as non-root failures. Following the non-root disk failure recovery procedure will restore the volume group, after which the restored volume group can be mounted for normal read/write operations.