[BBLISA] Fwd: Moving 100 GB and 1.3 million files

Ian Stokes-Rees ijstokes at crystal.harvard.edu
Wed Jul 28 15:55:49 EDT 2010



On 7/22/10 10:25 PM, Dean Anderson wrote:
> I suggest transfering the files by do something similar to:
> 
>    tar czBf - . | ssh someuser at macintell4 '(cd somewhere; tar xzBpf -)'

There is a nice summary of options listed here:

http://www.lamolabs.org/blog/1766/pushing-pulling-files-around-using-tar-ssh-scp-rsync/

Trying a local tar of the files was not productive.  It took over 40
minutes for one sub-tree with 250k files and 577 MB of data, and about
18 minutes for for a sub-tree with 100k files and 432 MB of data.

I was hopeful that the netcat approach documented above would be fast (I
don't need ssh encryption -- the machines are on the same subnet which
is private), but no luck, at least not with "tar xv".  The following has
taken 70 minutes for the 100k files and 432 MB of data:

dest_host   $ cd dest_dir
dest_host   $ nc -l 20000   | tar xv
source_host $ tar c src_dir | nc dest_host 20000

Looks like the problem is with the file reading and writing.  Bulk
movement is fast:

src_dir.tar              100%  431MB  30.8MB/s   00:14

But untaring the local file (not even needing decryption) is awfully
slow too -- it has taken an hour not to complete yet, and I'm getting
tired of waiting to finish this post.

I'll do some more tests like splitting files into sub-directories (there
is currently one dir with almost all the files in it), but FUSE and some
archive FS look like a good possibility for my needs -- I can archive
and compress the files and then mount them with FUSE when read access is
required.

Ian
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ijstokes.vcf
Type: text/x-vcard
Size: 383 bytes
Desc: not available
Url : http://www.bblisa.org/pipermail/bblisa/attachments/20100728/2fadc529/attachment.vcf 


More information about the bblisa mailing list