Tag: mdadm

So, how do I handle repairing a RAID-5 in a server I can’t touch?

Two drives failed in a 5-disk RAID-5 array at a client who dropped our services; fortunately, I’d put in a backup system, so when they brought us back on, I had a full backup from midnight to restore from. Unfortunately, only one drive was on order for various reasons and they needed it back up as soon as possible…now, here’s the million dollar question:

How do you reconstruct a RAID-5 array with 3 good drives remaining, which once was a 5-drive array, ? Consider the following:

  • The original array was 5x 500GB drives, of which only 3 remain
  • The backup data is about 500GB (so just RAID-1 for the data isn’t a choice)
  • The Linux boot/root filesystem sits on a RAID-1 across the first partitions of all five hard drives (resilient against up to all but the last single drive failing)

Since the software is still resilient against three drive failures, concerns over that are pointless. The client doesn’t have time to wait on a spare drive or two, but we want to order up a spare because of possible future capacity needs so we can pull off the next step. What was the best compromise? Simple!

mdadm --create /dev/md1 -l 5 -n 4 /dev/sd[abc]2 missing

This creates a 4-drive RAID-5 from the existing old 5-drive RAID’s partitions that is intentionally degraded for when we add the fourth disk in later. From there, I wrote a new XFS filesystem, mounted it, and restored the data. This is what /proc/mdstat looks like now on that system (note that no reboot was needed for these repairs at all!):

# cat /proc/mdstat
Personalities : [raid1] [raid6] [raid5] [raid4]
md1 : active raid5 sdc2[2] sdb2[1] sda2[0]
      1406519040 blocks super 1.2 level 5, 128k chunk, algorithm 2 [4/3] [UUU_]

md0 : active raid1 sda1[0] sdb1[1] sda1[4] sdd1[5](F)
      19542976 blocks [5/3] [UU__U]

unused devices: <none>