[BBLISA] SunFire 4500: Linux + ZFS/FUSE ?

Edward Ned Harvey bblisa4 at nedharvey.com
Sat Jul 7 13:34:16 EDT 2012


> From: Theo Van Dinter [mailto:felicity at kluge.net]
> Sent: Saturday, July 07, 2012 10:04 AM
> 
> Also, ignoring the resilver speed issue, if you have one big raidz2
device,
> you're basically guaranteed to lose data eventually.  Lose a controller or
just 3
> disks (most of my disk failures happened while recovering from another
> failure for example) and you are probably toast.

IMHO, this statement is equally applicable (or equally avoidable) for both
mirror and/or raid configurations.  You can protect yourself against loss of
a controller by arranging your redundancy across controllers.  (Suppose you
have 8 buses in the system, you make a vdev from 1 disk on each bus, and
make another vdev from 1 disk on each bus, and so on...)  Any one controller
goes down, and all your vdev's get degraded, but not lost.  Same argument
applies to mirrors.

Comparing the probability of failure on a bunch of mirrors versus a larger
raidzN is actually pretty difficult to do...  In the mirror configuration,
you could lose up to N disks if you were lucky, but you could be toast by
losing 2 disks if you were unlucky, and the probability of concurrent
failures is affected by resilver times...  Versus the raidzN, in which you
could lose no more than N disks safely under any circumstance, but you're
guaranteed safe as long as you lose no more than N, and you have a longer
resilver time...

I actually went to the effort of modelling and calculating those odds once.
It was painstaking, and I had to make assumptions about the failure
characteristics of drives (I know the failure rate over time is
non-linear)...  In the end, the probability of concurrent failure was
something like 10^-6 for mirrors and 10^-5 for 10-disk raidz2.  In other
words, close enough that all my assumptions and modeling became distinctly
relevant.  It would be easy to see that each different model would be more
reliable under some circumstances.  There is a correct answer for every
specific situation, but that answer is sometimes "mirrors are safer," and
sometimes "the raidzN is safer."  In both cases, the probability of
concurrent failure is insignificant to the probability of administrator
error, or preventable causes.

I make my decision instead, based on performance and cost constraints.  Both
the raid & mirror configurations are safe enough.



More information about the bblisa mailing list