[BBLISA] Fileserver opinion

David Miller david3d at gmail.com
Fri Aug 13 18:04:49 EDT 2010


ZFS RAIDz rebuilds are similar to your traditional hardware RAID5/6 in what
they have to do.  However there are two important differences between
rebuilding a hardware RAID5/6 volume and rebuilding ZFS RAIDz1/2 zpool.  ZFS
being the volume manager and file system can rebuild at the data level.
 Where hardware controllers work at the block level and don't know if a
block is in use or not so all blocks must be rebuilt.  This works in favor
of ZFS when the volume is not full.  However traditional RAID controllers
can read the data from the other drives in the volume, compute the
checksum/new data, and write it back to the new drive without passing any
data back to the host computer.  Where ZFS must pass all data from the
controller, across the bus, to the memory and cpu to compute the
checksum/new data and then has pass the new data back across the bus to the
controller and to the new drive.  So this is why having a large number of
drives in a RAID set can slow you down in this situation.  It means moving a
lot more data around which tends to cause other bottle necks.

For example in a 16 drive zpool consisting of a single RAIDz set, assuming
100% full zpool and that you actually get 1.5TB of usable space per drive,
will require that the system reads 22.5TB of data, and compute and write
1.5TB of new data.  Now a 16 drive zpool consisting 3 8 drive RAIDz
sets will only require 10.5TB of reads and compute and write 1.5TB of new
data.  Keep in mind that drives can read data faster than they can write
data.  So to maximize the rebuild speed with ZFS you really just need
to minimize the amount of data the system needs to push around to do the
job.  Even in a zpool or mirrored drives sets the write to the new drive
will be the bottle neck.  So you have to balance this knowledge with your
data redundancy and capacity goals while staying in your budget.

--
David.



On Fri, Aug 13, 2010 at 3:36 PM, Toby Burress <kurin at delete.org> wrote:

> On Fri, Aug 13, 2010 at 03:11:59PM -0400, Jeff Wasilko wrote:
> > On Fri, Aug 13, 2010 at 02:56:49PM -0400, Toby Burress wrote:
> > > On Fri, Aug 13, 2010 at 02:42:12PM -0400, David Miller wrote:
> > > > What does your zpool look like?  Ideally if you're using RAIDz or
> RAIDz2
> > > > then you should be using multiple RAIDz sets in the pool.  This way
> IO is
> > > > stripped across the RAIDz sets and any degradation, and recovery,
> should
> > > > only involve the smaller RAIDz set.  Which should be relatively quick
> > > > depending on the size and type of drives involved.
> > >
> > > The server that it taking a billion years to resilver does in fact have
> > > 15 disks in one big raidz2 pool.  The other server has a single pool of
> > > three raidz2 arrays of 8 disks each, so hopefully that will yield
> better
> > > recoveries.  Although if the bottleneck is reads, then wouldn't it be
> > > faster to read from 14 disks than 7?  And if the bottleneck is just
> > > writes, then wow, I need to buy some different disks next time.
> > >
> > > Since the load on the machine is 3, and it's doing nothing but
> > > resilvering, I suspect the bottleneck is actually the CPU.  I don't
> > > know a ton about the implementation of ZFS, but I do know it checksums
> > > every block.  It would be insane for it not to verify those checksums
> > > while resilvering, and perhaps it even recomputes them while writes
> them
> > > to the new disk.
> >
> > Did you lose 1 or 2 disks in the raidz2 pool?
>
> One disk.  I actually have replaced two disks at a time and it was about
> the same.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.bblisa.org/pipermail/bblisa/attachments/20100813/87922b6f/attachment.htm 


More information about the bblisa mailing list