[BBLISA] Recommendation for NAS appliance?

Fri Mar 5 14:18:50 EST 2010

>> Sounds like you had both crappy tape drives, and just poor performance
>> over the SCSI bus.  From another email, I see you had a StoreVault
>> thingy, which is NOT enterprise class Netapp hardware.

Edward> The tape drive is LTO3 and the scsi 160.  The reason the
Edward> backup to tape took so long is cuz the data is a million small
Edward> files, and the filer didn't have any efficient way of
Edward> generating anything like a contiguous data stream.  It must
Edward> have been simply reading the filesystem, and walking the tree.

Sure, this all makes sense.  We have the same problems here, we have
volumes with 10 million files.  But the Netapp is pretty beefy and
good about handling metadata of that size.  Not perfect mind you, but
decent. 

>> Nice setup.  How many snapshots can you store on the Dell or the Sun
>> and how often have you had to restore from Tape?

Edward> We've never had to restore from tape.  Just do it once in a
Edward> while to be sure we can.  We're currently retaining a month of
Edward> daily snaps on the filer itself, and another month on the
Edward> secondary server.  But the whole system is about 4 months old.

Nice.

>> Sure, restores from snapshots are trivial.  Never argued they
>> weren't.  And I personally *like* snapshot restores.  But when an
>> engineer creates a 500+gb file during a simulation run, it will simply
>> *kill* your snapshot reserve, and reduce the usefulness of snapshots
>> remarkably.

Edward> Snapshot reserve?  I never really got the point of having a
Edward> snapshot reserve.  I just have one big data pool, I don't care
Edward> how much space the snaps take.  Unless my whole disk starts to
Edward> get full.  Then we've got to rm some files, and bump off some
Edward> of the oldest snaps.

Exactly.  There's the rub, which is having to bump off oldest snaps,
which can cause problems.  Esp if a user generates a single 500Gb
file, which fills a volume.  Now you have to bump off all your snaps
to get rid of that single, unwanted chunk of data.  Been there, done
that.

In your environment, it might not be a problem.  It's just something
to think about.  

It's also the big problem with hosting SAN volumes on Netapps.  You
have to have 100% reserve (or volumes 50%+ bigger than needed, to hold
the snapshots, due to the way Netapp does things. 

It's easier if they're on NFS and just image files, since then they
can just copy the changed blocks, but you lose performance and
management advantages.  

Edward> We have a rule: Since there are tons of compute servers, and
Edward> only one central file server, no matter how many aggregate
Edward> links you may have to that filer, you don't want heavy IO sims
Edward> running on it.  Every compute server has a /scratch area which
Edward> is local disk.  So simulations generate re-generatable output
Edward> on local disk, and all machines are able to work in parallel,
Edward> at local disk speeds.  All the source files stay on the filer,
Edward> where they are universally available and backed up.

It's a good rule.  It doesn't work as well in my environment though,
so we keep all data on the central server.  In our case, we're not IO
bound so much as compute bound.  So while large files are written at
times, most effort is in loading the tool, loading the data, then
crunching on it until you generate some output. 

Log files are a problem from time to time.

John