[BBLISA] Moving from RAID 0 to LVM RAID?

Daniel Hagerty hag at linnaean.org
Sat Mar 8 14:04:27 EST 2008


"Edward Ned Harvey" <bblisa2 at nedharvey.com> writes:

> Have you ever seen a chassis with hotswappable drives, that wasn't
> preconfigured with a hardware raid card?  In theory, there's no reason
> hotswappable drives & illuminated "disk failure" lights would require HW
> raid, but I know I've never seen it without the HW raid card.

    Yes.  FWIW, depending on your actual definition of hotswap, even
things like original scsi can be considered such (provided you're
foolhardy; I've probably done it a few hundred times and been lucky
enough to never fry anything).  SCA scsi was expressly designed for
hotswap, and I was buying machines w/ it simply for the easier cold
swap.

> with SW raid.  Unless you were insane and found some way to enable write
> caching in the kernel.

    As I thought I said yesterday, the kernel buffer cache is
write-back.  It will tell applications a write is complete when it's
in the *cache*, not the platter.  You have to take steps to get the
latter behavior.  Obviously, you lose anything that isn't flushed on
power failure.

    Also, the average disk has a write-back cache on it, where once
again, the disk will claim write completion before the block hits the
platter.  As before, you lose anything that isn't flushed on power
failure.


    Related to, but orthogonal to the use of caches is write ordering.
Disks have varying degrees of freedom to reorder I/O operations from
the host to account for performance issues that the host doesn't know
about.  Despite this, you can't just reorder operations arbitrarily.
Some operations will have very bad results if the second operation
makes it to disk, but the first one doesn't, while the first one
making it to disk but the second one not can be recovered from.

    Technologies like PATA can reorder because of the cache, but there
is no write barrier operation to enforce ordering other than disabling
the cache.  SATA can reorder either because of the cache, or because
of NCQ.  It has support for a write barrier as part of NCQ, but it's
pretty new, and support in mainstream linux is unclear to me.  SCSI
has had tagged command queuing forever (SATA NCQ is basically tagged
queueing light), and a write barrier operation as part of it.


    If you're doing any kind of raid system, do your homework.
There's a lot of places where the tradeoffs between speed and
reliability can screw you badly, and all of the parts have to be
playing together nicely to get the desired results.  A hardware card
with a battery backup is *not* a panacea.  I came across a nice
website, several years out of date now, that performed tests with
several battery backed raid cards and was able to produce corruption
with all of them.

http://sr5tech.com/write_back_cache_experiments.htm




More information about the bblisa mailing list