[BBLISA] Notes on RAID recovery Re: Whatever happened to Seagate?

Edward Ned Harvey (bblisa4) bblisa4 at nedharvey.com
Thu May 7 07:06:48 EDT 2015


> From: Rich Braun [mailto:richb at pioneer.ci.net]
> 
> I wouldn't want to be an early-adopter of a low-price SSD cluster so I'll
> wait, 
> 
> * Whenever you're swapping out a drive, *don't* use the "mdadm --fail"
> command
> 
> I also haven't started using ZFS. A lot of people are adopting it to
> automatically take care of most of these issues.

It's interesting that drive reliability is on your radar, you perceive SSD as less reliable (despite how this conversation started with your HDD's dying in only 1 year), you're using md devices, and haven't gone to zfs yet... All of these are opposite me.

The SSD endurance test has concluded. http://ssdendurancetest.com/ All of the drives greatly exceeded their specified life cycles.

In btrfs and zfs, when you scrub, each used block on each disk gets read, and cksum is checked. If any block of any device fails cksum, then good data from redundant disks is used to repair the bad blocks. This will detect disk failures, even if they *pass* the hardware without triggering any hardware errors. But in mdadm, "echo check" it's not clear exactly what the behavior is - When I google for it now, I see "It compares the corresponding blocks of each disk in the array" which is not as good as having separately stored cksums on each disk like zfs, because, if you have two disks that are supposed to have the same data, but they're actually different, how do you know which one is the good one? They increment the block error counter.

I think you're being conservative, sticking with HDD's and old software raid, thinking that you're gaining reliability, but I think you're actually reducing both your reliability and performance as a result.


More information about the bblisa mailing list