LVM Disaster Recovery on Desktop Computers

Introduction to LVM

What is LVM

LVM (Logical Volume Manager) originated as IBM's storage management solution for AIX. It introduces a logical abstraction layer between physical disks and partitions, allowing multiple partitions or entire disks to be combined into a single logical volume (functioning as a virtual hard drive). This approach significantly improves flexibility in disk partition management. Linux LVM support was initially added by Heinz Mauelshagen in the Linux 2.4 kernel, with LVM2 becoming the standard in Linux 2.6. Current stable releases are available through distribution repositories.

LVM was historically deployed primarily on servers, often paired with hardware RAID for high availability and flexible storage configuration. As disk capacities have grown and prices have dropped, desktop computers now commonly feature terabyte-scale storage, making LVM an attractive solution for managing large storage environments on personal computers. However, without hardware redundancy typically found in server environments, desktop LVM deployments face higher risk when disasters occur. Fortunately, LVM includes disaster recovery capabilities that can help minimize data loss.

To check the installed LVM version on your system:

rpm -qa | grep lvm
lvm2-2.02.56-8.el5_5.4

Core LVM Concepts: PV, LV, VG, and VGDA

Physical Volumes (PV)

Physical volumes form the foundation of LVM. Every logical volume and volume group must be built upon at least one physical volume. A PV can be an entire disk or a single partition, identified by a unique name.

Logical Volumes (LV)

Logical volumes reside on top of volume groups. A volume group can contain multiple logical volumes, and LV sizes can be dynamically adjusted within available VG space. Data on an LV may be scattered across multiple PVs in either contiguous or fragmented patterns.

Volume Groups (VG)

Volume groups aggregate one or more physical volumes. All physical volumes in a system typically belong to a root volume group (often named rootvg or similar).

Volume Group Descriptor Area (VGDA)

The VGDA stores metadata describing how physical volumes, volume groups, and logical volumes are configured. Similar to how traditional partitioning stores metadata in the partition table at the beginning of a disk, LVM stores its metadata in the VGDA located at the start of each physical volume.

Disaster Recovery Strategy

Types of Disasters

Storage disasters fall into two categories: human-caused and natural. Human-caused disasters include accidental deletion, partition corruption, and damaged partition tables. Natural disasters involve hardware failures such as bad sectors, disk damage, or controller failures.

LVM faces the same natural disaster risks as conventional filesystems, but its virtualized storage model introduces additional human-caused failure modes specific to LVM. Since LVM consolidates all disks under unified management, corruption of volume groups, logical volumes, or physical volumes can occur.

Recovery Approach

Enterprise environments typically employ strict operational policies, comprehensive backup strategies, and hardware redundancy to mitigate disasters. Desktop users generally lack these safeguards. When disasters strike, the goal shifts to minimizing impact and recovering as much data as possible.

For human-caused LVM failures, LVM provides built-in tools for backing up data and configuration metadata. These backups enable recovery when PVs, LVs, or VGs become corrupted.

For natural disasters affecting disk hardware, LVM's recovery tools focus on restoring LVM functionality and salvaging data from unaffected disks. The same tools address inconsistencies caused by human errors that render logical volumes or entire volume groups inaccessible.

LVM Disaster Recovery Procedures

The following scenarios demonstrate recovery techniques using a test environment with two disks (/dev/sda and /dev/sdb) on an x86 PC with all other hardware functioning normally.

Logical Volume Corruption

Logical volume corruption can result from accidental operations, power failures, malware, or direct disk manipulation. Many such failures are recoverable.

First, examine the current disk and filesystem state:

$ pvs
  PV         VG     Fmt  Attr PSize PFree
  /dev/sdb   data   lvm2 a-   2.00G 92.00M
  /dev/sdc   data   lvm2 a-   2.00G     0

$ vgs
  VG     #PV #LV #SN Attr   VSize  VFree
  data     2   1   0 wz--n- 4.00G 92.00M

$ lvs -o +devices
  LV      VG     Attr   LSize   Origin Snap%  Move Log Copy% Devices
  storage  data   -wi-ao   3.90G                      /dev/sdc(0)
  storage  data   -wi-ao   3.90G                      /dev/sdb(0)

$ mount | grep '/dev/mapper'
/dev/mapper/data-storage on /mnt/data type reiserfs (rw,acl,user_xattr)

Scenario 1: LVM Metadata Loss Without Physical Disk Failure

Before proceeding, ensure your root partition is not managed by LVM. If LVM fails and root resides on LVM, the system cannot boot normally. While rescue mode from an installation disc can assist recovery, it introduces unnecessary complications. If root is on LVM, maintain regular backups of files in /etc/lvm/backup/ for system restoration.

Attempting to reinitialize a PV containing an active logical volume will fail:

$ pvcreate -ff /dev/sdb
Really INITIALIZE physical volume "/dev/sdb" of volume group "data" [y/n]? y
Can't open /dev/sdb exclusively.  Mounted filesystem?

The pvcreate command refuses to reinitialize PVs with active logical volumes. You must remove associated LVs first.

Now simulate LVM label corruption by erasing the LVM2 metadata while preserving the partition table using dd:

$ dd if=/dev/zero of=/dev/sdb bs=512 count=1 seek=1
1+0 records in
1+0 records out
512 bytes copied, 0.000098 s, 5.2 MB/s

$ pvs --partial
Partial mode. Incomplete volume groups will be activated read-only.
Couldn't find device with uuid '7m0QBR-3kNP-J0fq-GwAq-iv6u-LQkF-yhJh5R'.
PV             VG     Fmt  Attr PSize PFree
/dev/sdc       data   lvm2 a-   2.00G     0
unknown device data   lvm2 a-   2.00G 92.00M

$ vgs --partial
VG     #PV #LV #SN Attr   VSize  VFree
data     2   1   0 rz-pn- 4.00G 92.00M

$ lvs --partial
LV      VG     Attr   LSize   Origin Snap%  Move Log Copy%
storage  data   -wi-ao   3.90G

The corrupted disk becomes an unknown device, and the volume group metadata switches to read-only mode. However, the logical volume remains intact, and since the partition table is undamaged, the filesystem can still be mounted and accessed.

Remount the filesystem read-only and back up data immediately:

$ mount -o remount -o ro /mnt/data

Begin the repair process:

# If "Can't open /dev/sdb exclusively" appears, deactivate the VG first
$ vgchange -an data

$ pvcreate -ff --uuid 7m0QBR-3kNP-J0fq-GwAq-iv6u-LQkF-yhJh5R \
  --restorefile /etc/lvm/backup/data /dev/sdb
Physical volume "/dev/sdb" successfully created

$ pvs
  PV         VG     Fmt  Attr PSize PFree
  /dev/sdb   data   lvm2 a-   2.00G 92.00M
  /dev/sdc   data   lvm2 a-   2.00G     0

$ vgcfgrestore -f /etc/lvm/backup/data data
Restored volume group data

$ vgchange -ay data
1 logical volume(s) in volume group "data" now active

The volume group is restored. Run filesystem checks to verify integrity.

If the root partition had been on LVM, the system would fail to boot. Boot from an installation disc into rescue mode. If you need the original system running to retrieve backup files, use vgreduce --removemissing data to remove disks with corrupted labels, then boot normally. Use the backed-up metadata to perform restoration. Remember that normal LVM startup modifies files in /etc/lvm/backup/, so maintain external backups of this directory.

Without backups and with root on LVM, recovery becomes extremely difficult even if the physical disk is intact.

Scenario 2: Physical Volume Damage and Replacement

Check the current system state:

$ pvs
  PV         VG     Fmt  Attr PSize PFree
  /dev/sda2  main   lvm2 a-   7.93G    0
  /dev/sdb   archive lvm2 a-   2.00G    0
  /dev/sdc   archive lvm2 a-   2.00G    0
  /dev/sdd          lvm2 --   2.00G 2.00G

$ vgs
  VG      #PV #LV #SN Attr   VSize VFree
  main      1   2   0 wz--n- 7.93G    0
  archive   2   1   0 wz--n- 3.99G    0

$ lvs
  LV      VG      Attr   LSize   Origin Snap%  Move Log Copy%
  root    main    -wi-ao   7.38G
  swap    main    -wi-ao 560.00M
  files   archive -wi-ao   3.99G

$ mount /dev/archive/files /archive
$ ls /archive
important.txt
$ umount /archive

A file exists in the archive volume.

Back up the LVM metadata:

$ cp /etc/lvm/backup/* /backup/lvm/

Simulate failure by zeroing the first 400 sectors of /dev/sdc:

$ dd if=/dev/zero of=/dev/sdc bs=512 count=400
400+0 records in
204800 bytes (205 kB) copied

$ pvs --partial
Partial mode. Incomplete volume groups will be activated read-only.
Couldn't find device with uuid 'noOzjf-8bk2-HJhC-92fl-2f6Q-DzNZ-XBqnBO'.
PV             VG      Fmt  Attr PSize PFree
/dev/sda2      main    lvm2 a-   7.93G    0
/dev/sdb       archive lvm2 a-   2.00G    0
/dev/sdd               lvm2 --   2.00G 2.00G
unknown device archive lvm2 a-   2.00G    0

Attempt to mount the volume:

$ mount /dev/archive/files /archive
mount: you must specify the filesystem type

The filesystem is corrupted. Replace the damaged PV with the spare disk /dev/sdd:

$ pvcreate --restorefile /etc/lvm/backup/archive \
  --uuid noOzjf-8bk2-HJhC-92fl-2f6Q-DzNZ-XBqnBO /dev/sdd
Physical volume "/dev/sdd" successfully created.

$ pvs -v
  PV         VG      Fmt  Attr PSize PFree DevSize PV UUID
  /dev/sda2  main    lvm2 a-   7.93G    0    7.93G HsErSz-vuAD-rEe1-wE5m-Mnvr-T0Re-EwO7rD
  /dev/sdb   archive lvm2 a-   2.00G    0    2.00G OBb3qb-TW97-hKZ7-vmsr-RZ2v-fzuP-qK3wmN
  /dev/sdd   archive lvm2 a-   2.00G    0    2.00G noOzjf-8bk2-HJhC-92fl-2f6Q-DzNZ-XBqnBO

The replacement is recognized. Synchronize metadata:

$ vgchange -an archive
Volume group archive metadata is inconsistent
Volume group for uuid not found:
   tcyuXvjcxCN7912qAJlEYzphncdWabTJ21OTTrG6DrMS8dPF5Wlh04GOHyO5ClbY
1 logical volume(s) in volume group "archive" now active

$ reboot

$ vgchange -an archive
0 logical volume(s) in volume group "archive" now active
$ vgchange -ay archive
1 logical volume(s) in volume group "archive" now active

The metadata inconsistency is resolved and backups have been refreshed. Attempt to mount:

$ mount /dev/archive/files /archive
mount: you must specify the filesystem type

Repair the filesystem:

$ reiserfsck /dev/archive/files --check
$ reiserfsck /dev/archive/files --rebuild-sb
$ reiserfsck /dev/archive/files --check
$ reiserfsck /dev/archive/files --rebuild-tree
$ reiserfsck /dev/archive/files --check

The specific sequence depends on the severity of corruption. Verify the result:

$ mount /dev/archive/files /archive
$ ls /archive
important.txt  lost+found

The file is recovered.

Disk Bad Sectors

Extended use, improper operations, or power management issues can cause bad sectors. Symptoms include the entire disk freezing when accessing affected areas. If the operating system resides on that disk, system crashes become inevitable. Bad sectors tend to spread, and each access compounds damage. This can eventually destroy the disk's dynamic balance, rendering the entire drive unrecoverable. Early detection and intervention are critical.

LVM implements three levels of bad block handling:

  1. Internal data relocation: Hardware-level relocation within the disk itself. Occurs transparently without user notification.
  2. LVM hardware relocation: LVM copies data from faulty address A to healthy address B. Reads from address A are transparently redirected by the disk firmware.
  3. Software relocation: LVM maintains a bad block table. Before reading from any address, LVM checks the table and redirects if necessary.

These mechanisms operate transparently unless disabled using lvcreate -r n. This paramter prevents LVM from creating bad block relocation areas (BBRA).引导、根和主交换逻辑卷必须使用此参数.

When LVM issues are suspected, the first action should be backing up all volume group data. For existing bad sectors, exercise extreme caution with fsck. For issues beyond relocation recovery, consider exporting the disk immediately (see Disk Position Changes) to prevent sector spread. If export encounters problems, treat as disk failure.

Disk Position Changes

Motherboard port failures, motherboard replacement, or adding new devices can change disk positions. Since the Master Boot Record (MBR) resides in the first sector of the first disk, changing the first disk's position prevents the system from booting. Simply ensure the disk containing the MBR remains in the primary position (not necessarily the first port).

Disk Movement Within a System

For single volume group configurations, LVM automatically recognizes disk position changes. Normal system startup will work correctly. For systems with separate volume groups or multiple volume groups, deactivate volume groups before moving disks:

# Deactivate volume group
$ vgchange -an /dev/main
0 logical volume(s) in volume group "main" now active
$ lvscan
inactive '/dev/main/data'[1.90 GB] inherit

# After physical disk relocation, activate volume group
$ vgchange -ay /dev/main
1 logical volume(s) in volume group "main" now active
$ lvscan
ACTIVE '/dev/main/data'[1.90 GB] inherit

Disk Movement Between Systems

Moving disks between systems requires exporting and importing the volume group:

# Deactivate and export
$ vgchange -an /dev/main
0 logical volume(s) in volume group "main" now active
$ vgexport /dev/main
Volume group "main" successfully exported

The disk can now be physically moved to another system.

# Import and activate on target system
$ pvscan
PV /dev/sdc is in exported VG main [2.00 GB / 96.00 MB free]
Total: 1 [2.00 GB] / in use: 1 [2.00 GB] / in no VG: 0 [0]

$ vgimport /dev/main
Volume group "main" successfully imported

$ vgchange -ay /dev/main
1 logical volume(s) in volume group "main" now active

For volume groups spanning multiple disks where only some disks are moved, use pvmove to migrate data to specified disks first, then perform the physical relocation.

Disk Failure

Desktop computers typically use multiple disks for storage expansion rather then RAID protection. When a disk fails, users face not only data loss on that disk but also potential inaccessibility of remaining disks. LVM's developers anticipated this scenario.

When the root partition is not on LVM, recovery follows standard disk replacement procedures: replace the failed disk, reinstall the system, and remount the original LVM partitions. This section focuses on scenarios where root resides on LVM.

Non-Root Disk Failure

Consider a system with Disk-A and Disk-B in a volume group named primary. Root and swap are on Disk-A, while multiple logical volumes for user data span both disks. After a reboot, the system fails to boot, and inspection reveals Disk-B has failed. The goal is to boot the system and recover data from Disk-A.

After removing the failed Disk-B, the system cannot boot. Output shows:

...
Reading all physical volumes. This may take a while...
Couldn't find device with uuid 'wWnmiu-IdIw-K1P6-u10C-A4j1-LQSZ-c08RDy'.
Couldn't find all physical volumes for volume group primary.
Volume group "primary" not found
...not found -- exiting to /bin/sh

The root partition on LVM cannot be read when a member disk is missing, causing the LVM inconsistency that blocks boot. Restoring LVM consistency will enable reading the root partition from Disk-A.

Boot from an installation disc into emergency recovery mode:

Rescue Login: root
Rescue:~#

Check the current state:

Rescue:~# lvscan
Reading all physical volumes. This may take a while...
Couldn't find device with uuid 'wWnmiu-IdIw-K1P6-u10C-A4j1-LQSZ-c08RDy'.
Volume group "primary" not found

Rescue:~# pvscan
PV /dev/sda2 VG primary lvm2 [7.93 GB / 0 free]
PV unknown VG primary lvm2 [2.00 GB / 2.00 GB free]
Total: 2 [9.92 GB] / in use: 2 [9.92 GB] / in no VG: 0 [0]

The system detects the missing disk but can still recognize the physical disk and identify the missing physical volume. Standard filesystem mounting in rescue mode won't work because all logical volumes depend on the inaccessible volume group. The solution is to remove the missing physical volume from the volume group.

Use vgreduce --removemissing to remove all missing physical volumes from the group:

Rescue:~# vgreduce --removemissing primary
Wrote out consistent volume group primary

Rescue:~# vgscan
Found volume group "primary" using metadata type lvm2

Rescue:~# pvscan
PV /dev/sda2 VG primary lvm2 [7.93 GB / 0 free]
Total: 1 [7.93 GB] / in use: 1 [7.93 GB] / in no VG: 0 [0]

The volume group is now accessible. Logical volumes remain inactive and require manual activation:

Rescue:~# lvscan
inactive '/dev/primary/root'[7.38 GB] inherit
inactive '/dev/primary/swap'[560.00 MB] inherit

Rescue:~# lvchange -ay /dev/primary
ACTIVE '/dev/primary/root'[7.38 GB] inherit
ACTIVE '/dev/primary/swap'[560.00 MB] inherit

Reboot the system. It will start normally, with only the failed disk permanently lost.

Root Disk Failure

When the disk containing the root partition fails, the approach is to install the surviving disks in another machine. Running fdisk -l shows the disk lacks a valid partition table. LVM diagnostics show the same errors as non-root failures. Following the non-root disk failure recovery procedure will restore the volume group, after which the restored volume group can be mounted for normal read/write operations.

Tags: LVM Disaster Recovery Linux storage logical volume Data Recovery

Posted on Wed, 13 May 2026 11:22:03 +0000 by bengaltgrs