[BBLISA] Fwd: Re: Fileserver opinion

Tue Aug 31 08:48:25 EDT 2010

 Summary:  yes, Linux RAID is viable.

I have run Linux RAID for years in a few small environments (~a few TB,
a few users) and one larger one (2 years ago, around 6TB as virtual disk
storage for ~150 virtual machines).

We moved to Sun 7000 series storage for the larger environment.  The
immediate problem we were trying to solve was performance, but the
reason it wasn't solvable on Linux was some bugs in Linux LVM causing
volumes to hang until reboot and interfering with reboot.  I will note
that we were running RedHat EL 5.4 there where other places I've run
significant Linux servers have been Debian or Ubuntu and I've not had
any problems.

Linux RAID is quite easy to monitor -- "out of the box" "mdadm" will
send mail with drive status emails.  Additionally, I wrote some simple
Xymon scripts to monitor the drive SMART info (I'm a bit of a drive temp
nut for drive longevity).

I have never seen any issue rebuilding a raid 1, 10 or 5 from a single
drive failure and I've had a single instance of a 2 drive failure on a
raid6 that rebuilt with no issues.  It has only required downtime on
older systems without drive hot-swap capability. 

Daniel's point about read failure during rebuild is a good one -- with a
RAID5 of multi-TB levels I do indeed have that concern.  RAID is showing
its design age in this respect.  I combat this with periodic SMART drive
self-tests (exactly what they do varies from drive to drive) and also by
background full volume reads.  I'd be happy to have a more detailed
discussion on why this helps if anyone is interested.

Some serious drawbacks of Linux vs an "enterprise" storage system (my
direct experience is Sun 7000) include:

    * thick provisioning
    * snapshots affect run-time performance on the primary volume
    * lack of integrated monitoring tools (e.g. NFS IOps by client)
    * no efficient box-to-box transfer tools (i.e. some equivalent to
      "zfs send")

Some advantages:

    * Cheap, of course
    * OSS community support is, frankly, better than the paid enterprise
      support I've experienced in terms of response time and useful
      information available.
    * LVM's on-line migration features allow you to manipulate (reshape
      arrays, upgrade drives) on the fly without much difficulty.
    * Direct access to everything.  No "you can't get there from here".

Let me emphasize the utility of LVM to live migrate data amongst volumes
-- it's a capability I seriously miss when using the Sun boxes.  I
understand NetApp can do things like this but have never used NetApp.

Overall I believe we've experienced more down time on the Sun 7000
series.  We've had 3 issues that were a bug/problem in the Sun box (plus
a drive failure where resilvering had significant performance impact.). 
We've had at least 3 other issues where we asked more from the box than
it could give and it entered a performance domain several orders of
magnitude worse than "normal" operations.

On the other hand, we might have been able to squeeze 1500 IOPs out of
our 20 drive Linux system and we're usually getting around 5000 IOPs out
of our 40 drive Sun system.  If we had a larger read load I suspect we
could get even higher performance.

While we've been very happy with the Sun box performance in "normal"
operations and it is very cost effective on both per TB and per IOps
metrics and I've yet to lose any data, I have become very wary of the
edges of its performance domain and no longer consider it suitable for
mission critical applications unless I have performance based (not
connectivity based) fail-over capability.  Linux also exhibited edge of
envelope performance degradations but the slope was neither as steep or
as high.

--
Dewey

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.bblisa.org/pipermail/bblisa/attachments/20100831/ad0e8dbe/attachment.htm