[BBLISA] Notes on RAID recovery Re: Whatever happened to Seagate?

Rich Braun richb at pioneer.ci.net
Wed May 6 16:27:05 EDT 2015


Ned Harvey noted about the SSD market:
> And they're relatively cheap - $200 for 512G or $300 for
> 1T... As far as I'm concerned, HDD's are now dead for all purposes

I wouldn't want to be an early-adopter of a low-price SSD cluster so I'll
wait, but hopefully the batch of 4TB/6TB drives I just bought will be my last.
In 3 to 5 years, hopefully SSDs will come down enough that I'll come to Ned's
conclusion.

As I mop up the pieces, I thought I'd share a couple of software-RAID Linux
features here that I haven't been using:

# echo check > /sys/block/mdN/md/sync_action
# echo want_replacement >/sys/block/mdN/md/dev-sdXN/state

Google is terrible at providing advice on these matters so it took some
digging. Here's what these do and why you want to use them (along with a
Nagios script to query SMART counters from each drive's firmware; I couldn't
find a good SMART script so I customized a mediocre one for my needs):

* Once a week, you should force the system to read (or "scrub" or whatever the
technical term for this is) every block of every drive. The "echo check"
command above should be performed as part of a script under cron.  That way
the drives' firmware will detect bit-rot at the earliest opportunity, and
you'll face less risk of multiple-drive failures.

* Whenever you're swapping out a drive, *don't* use the "mdadm --fail" command
when you want to force resilvering to a replacement drive, especially if a
software-RAID volume is already in sync. Use "want_replacement" to resilver to
the new drive (after you've mdadm --add'ed it); that way bitrot on an as
yet-undetected bad sector won't render your array unreadable.  (Newer versions
of mdadm have a --replace function to do this for you.)

I also haven't started using ZFS. A lot of people are adopting it to
automatically take care of most of these issues.

-rich








More information about the bblisa mailing list