[BBLISA] Fileserver opinion

Rob Taylor rgt at wi.mit.edu
Thu Aug 12 14:34:00 EDT 2010



On 08/11/2010 05:45 PM, Ian Stokes-Rees wrote:
>> I don't have any advice, but I have a few questions:
>>
>> 0) What is the drive interface card? Several cards? Do they do the RAID
>> support or is that in software?
> 
> Adaptec RAID 52445 28-Port (24 Int/4 Ext) (SAS) (SFF-8087)
> 
> Hardware RAID.  One card.

My advice on this it to buy a second one of these raid cards and put it
on the shelf if you can afford it. (Unless all adaptec cards can read
all other cards raid data). I've seen people build a box, and then a few
years later have the raid controller fail, and not have any way to read
it since it's some old card nobody can find anymore.

Make sure you have good backups too.

Also, make sure that any software that alerts you to drive failures
works. I don't mean the beeping the controller will do, I mean the
e-mail alerts, or sms alerts, or snmp or whatever it has. I've also seen
people not realize a drive failed in a system because it's in a noisy
machine room, and nobody heard the beeps or saw the amber light for days
or weeks.

> 
>> 1) Is it obvious that one needs 12 cores to fill a single GB ethernet
>> link? Or are there more ethernet links? If more, how do you balance the
>> load (not a rhetorical question - we have systems with multiple
>> ethernets and don't have any idea how to use them effectively).

One fast core should be able to fill a gb pipe no problem, if it's just
straight bit pushing.

I've routinely pushed done that with dd to some of our nfs servers using

dd if=/dev/zero of=/mnt/nfsshare/testfile bs=1024k

(you can tweak the block size to see if there is any difference)

on a linux box for testing. It can easily max out (or come pretty close)
a GB ethernet connection if the storage can keep up.
Use ttcp or a dd via netcat to a netcat piping to /dev/null on another
node to get raw network performance.

That 12 core node shouldn't have an issue maxing the pipe.

If you have multiple links, people tend to do Link Aggregates(LAG).
using either etherchannel(cisco) or LACP(standard).
Both as far as I know (correct me here bblisa'ers) statistically balance
the traffic, usually depending on some combination of

mac address of src and dst
ip address of src and dst
ip address and port on src and dst

Obviously if there are bottlenecks elsewhere then LAG's wont fix that.
(Like a 1 gig uplink to the default gateway and all destinations are on
the other side of it) but it will provide link redundancy on the host
with fast failover.

I would rather go 10gig if possible. Cabling is simpler, and
troubleshooting is easier. (troubleshooting a flakey link in a LAG can
be a pain)

> 
> Another colleague has already given the very good advice of considering
> a Dell 6248 switch that has 2 10GigE ports so we'd then get high b/w
> network access to the file server with the addition of a 10GigE card on
> the server.

We do something similar with Force10 switches. Storage on 10gig, bunch
of servers going 1gig into the switch. Works well for us.

> 
>> 2) With data spanned over 5 drives, is there still a performance
>> difference between 7,200 rpm and 15,000 rpm drives? Have you the ability
>> to experiment before putting out the cash?
> 
> I don't know.  No, we can't test this before purchasing, so any
> experience on this would be appreciated.  Our best guess is that faster
> is faster, and we need faster, but it is a "best guess", hence the
> petition to bblisa.
> 
>> 3) How long do you think it will take to rebuild a volume after a drive
>> failure? Do you need such large volumes. If the volumes are smaller the
>> rebuild times are lower.  I have no experience with 15,000 rpm SAS
>> drives, but is the system really usable during a rebuild? Now if the
>> drives have 500,000 hours mean time to failure, and you have 10, that is
>> still 5 years mean time to a rebuild, but somehow I don't believe 15,000
>> rpm drives are really that durable in real life. Will the system be down
>> a day a year for RAID rebuilding?
> 
> Good question.  With RAID10 I'd hope that re-buildling can be done in
> the background with the data "at risk" during that period.  We're not in
> an operational environment where uptime demands are extremely high --
> i.e. we're only aiming for >97% uptime (down several days over the
> course of a year is acceptable).

Rebuild priority can typically be set on the controller to determine
impact on the users. A few years ago on one of our servers had a failed
drive and was at a high priority setting, the server was annoying to the
point of almost unusable to the users until the drive finished
rebuilding, as most io's were used for the rebuild. It did rebuild very
fast though. So, you can have a  painful short rebuild time, or longer
less noticeable rebuild time. Keep in mind best case you can't rebuild
any faster than the write of the spare drive(s). Also, some
arrays/controllers will do a copyback when the failed drive is replaced,
to keep the "spare" as a spare. So, you may pay the performance penalty
twice, once for the rebuild, and once (potentially less since it's just
a copy) for the copyback.

rgt


> 
> Ian
> 
> 
> 
> 
> _______________________________________________
> bblisa mailing list
> bblisa at bblisa.org
> http://www.bblisa.org/mailman/listinfo/bblisa



More information about the bblisa mailing list