01 Sep 2013, 17:03

Replace a failed HDD in an SW RAID

Share

This post describes how to replace an failed HDD from an Linux Software-RAID while fixing GRUB. This is mostly for my own reference but posted here in the hope that someone may find it useful.

If one of your HDDs in a Software-RAID failed and your system is still running you should first mark all partitions of this device as failed. Let’s assume that /dev/sdb failed.

cat /proc/mdstat # check RAID status# mark the devices as failed, neccessary to remove them
mdadm --manage /dev/md0 --fail /dev/sdb1
mdadm --manage /dev/md1 --fail /dev/sdb2
mdadm --manage /dev/md2 --fail /dev/sdb3
cat /proc/mdstat # check RAID status
# remove the failed devices
mdadm --manage /dev/md0 --remove /dev/sdb1
mdadm --manage /dev/md1 --remove /dev/sdb2
mdadm --manage /dev/md2 --remove /dev/sdb3

Now you should replace the HDD. Usually this means shutting the system down - unless you have a very rare setup which allows for hot-plug of non-HW-RAID disks.

If the failed disk was part of the OS paritions you’ll proably need to boot into some kind of rescue system first to perform the follwing steps.

After the system is back up you have to copy the partition table from the old disk to the new one. This used to be done w/ sfdisk, however since HDDs are getting to big for MBR to handle many system are switching over to using GPT which isn’t handled by classic UNIX/Linux HDD tools. Thus we’ll be using GParted.

sgdisk -R=/dev/sdb /dev/sdb # copy partition table, be sure to get the devices right
sgdisk -G /dev/sdb # randomize GUID to avoid conflicts w/ first disk

Now you can re-assemable the RAID devices:

mdadm --manage /dev/md0 --add /dev/sdb1
mdadm --manage /dev/md1 --add /dev/sdb2
mdadm --manage /dev/md2 --add /dev/sdb3

You should let the system run until the RAID arrays have re-synced. You can check the status in /proc/mdstat.

Perhaps your bootloader got corrupted as well, just in case this are the steps necessary to re-install GRUB.

mount /dev/md1 /mnt # mount your /
mount /dev/md0 /mnt/boot # mount your /boot
mount -o bind /dev /mnt/dev # mount /dev, needed by grub-install
mount -o bind /sys /mnt/sys # mount /sys, needed by grub-install
mount -t proc /proc /mnt/proc
cp /proc/mounts /mnt/etc/mtab
chroot /mnt /bin/bash
grub-install --recheck /dev/sda
grub-install --recheck /dev/sdb
update-grub
exit # leave chroot

Now you should wait until all MD Arrays are back in shape (check /proc/mdstat), then reboot.

reboot # reboot into your fixed system