[BBLISA] Fwd: Moving 100 GB and 1.3 million files

Ian Stokes-Rees ijstokes at crystal.harvard.edu
Thu Jul 29 13:59:45 EDT 2010


> I think consensus was that things will be slow as long as you're creating a
> million small files.  Because the small sync writes will destroy
> performance.  But you can minimize that impact by enabling the writeback
> cache.

I have not done exhaustive testing of the various suggestions, however
by dividing the 250k files into sub-directories, rather than have them
all in a single directory, improved the tar creation rate by a factor of
about 200 (from ~1200s to 7s, for 250k files, 570 MB of data, and taking
up 1.2 GB of disk space).

Interestingly, on the RAID 5 system tar file creation takes 4.5 minutes.
 Anyone want to hazard a guess why this would be 40x slower?

Untar still takes a long time: almost 40 minutes.  One suggestion was to
put the tarball on a separate disk to avoid reading and writing to the
same volumes (I'm not sure how valid this is when the destination is a
RAID-5 array with 7 disks), but this has had no effect.

To minimize any CPU intensive operations I had been working with
uncompressed TAR files, however experimenting with different options:

tar  cf:   7s, 576 MB
tar zcf:  12s,  33 MB
tar jcf: 209s,  17 MB

gzip wins for my speed/performance needs.

My goal is to get a block of files from one file system to another.  In
fact, once I do this, I don't need easy/direct access to 95% of the
smaller files, so I can compress the two directories which contain most
of these, and use "tar --remove-files" to get rid of the files.  I was
worried this would be slow, but I was wrong:

tar zcf output.tar.gz --remove-files output && \
find output -depth -type d -empty | xargs rm -Rf

took 23 seconds for 250k files.

My current solution to the overall situation is:

1. Create intermediate subdirectories to keep single-directory contents
to < 10k files.

2. Create tar.gz files of the two main file holding directories,
removing the files and directories once this is done.

4. Moving everything that is left (about 100 files, and 100 MB) by scp
or just mv.

> What did you find when you checked the writeback settings of your xraid?

It is on, and there is no battery installed.  Sounds dangerous.

Ian
-- 
Ian Stokes-Rees, PhD                       W: http://hkl.hms.harvard.edu
ijstokes at hkl.hms.harvard.edu               T: +1 617 432-5608 x75
NEBioGrid, Harvard Medical School          C: +1 617 331-5993

-------------- next part --------------
A non-text attachment was scrubbed...
Name: ijstokes.vcf
Type: text/x-vcard
Size: 383 bytes
Desc: not available
Url : http://www.bblisa.org/pipermail/bblisa/attachments/20100729/f3fc3191/attachment.vcf 


More information about the bblisa mailing list