[BBLISA] Fwd: Moving 100 GB and 1.3 million files

John Orthoefer jco at direwolf.com
Thu Jul 22 18:37:24 EDT 2010


As others have said it's most likely the file creates.  Remember people use creates for locking because it's atomic.  

Netnews back till about 1997 each post was in a file by it's self.  Then several people (we where working with Highwind software) made systems that looked more like databases, with big files and posts where all jammed into that file.  Boils down to a general purpose filesystem has it's limits, when you know more about your data you can do better.

I've not seen how you are moving the files.  But something like rsync or a tar pipe over ssh (2 threads producer/consumer) will be better than say cp over nfs (a single thread that reads then writes.)   You might also try getting 3-4 (or even 20+) sessions running  against different bits of what you are moving (more threads.) I suspect if you look at your processes moving the files you'll see they are pretty much idle because they are sleeping on I/O (network or filesystem.)  Don't fear large load numbers, because a process in I/O wait is "Runnable" so it count.  

There are some file system options you can tinker with but I would be really careful with them and your data.  I can't find it, but there use to be a flag that even made file creation non-atomic/async.  

There are also options that deal with file block allocation, you're average file is about 100k (assuming I did the math right) so you want to make sure you  give tunefs hints on the filesize and number of entries per directory.  That way it won't have to extend as much as it might have to otherwise.


johno


More information about the bblisa mailing list