[BBLISA] Fwd: Moving 100 GB and 1.3 million files

Rudie, Tony Tony.Rudie at fmr.com
Thu Jul 22 14:34:00 EDT 2010


As a couple of people have said, it's not the gear that's the problem, it's all those little files.  But following in your footsteps, doing the same calculation based on 1.3 million files, we get:

50% done = 650K files, in 20 * 3600 seconds = 9 files per second.  That seems low as well.  I just unpacked a tar file with 1000 files in it in 3 seconds.

 - Tony Rudié 


-----Original Message-----
From: bblisa-bounces at bblisa.org [mailto:bblisa-bounces at bblisa.org] On Behalf Of David Allan
Sent: Thursday, July 22, 2010 2:22 PM
To: Rob Taylor
Cc: bblisa at bblisa.org; Ian Stokes-Rees
Subject: Re: [BBLISA] Fwd: Moving 100 GB and 1.3 million files

Is my math right?  I'm calculating the OP is getting 650kbps throughput. 
That seems wrong for any local file transfer on modern gear.  I don't 
believe my own calculation, though.

94GB, 50% complete = 47GB = 47000MB
47000MB / 20 hr. = 2350MB/hr. = .652MB/s

Dave


On Thu, 22 Jul 2010, Rob Taylor wrote:

> Hi Ian. This is not really that surprising. Unfortunately, moving large
> numbers of small files always seems to have this problem.
> As I see it, the problems stem from the time needed for per file
> transactional overhead at multiple layers, including the filesystem
> ,protocol(nfs, cifs), and network(connection setups and teardowns+slow
> start). Straight writing of bits is at each level is just plain easier.
> A streaming tar using netcat could save some on network connections and
> protocol operations, but not on the filesystem. There may be some
> filesystem options that you can tweak as well.
>
> I have seen many pieces of software that claim to accelerate small file
> transfers, but haven't really found anything that is that great at it.
>
> rgt
>
>
>
> On 07/22/2010 01:25 PM, Ian Stokes-Rees wrote:
>>
>>
>>  I have a question regarding expectations for file movement between
>> disks on adjacent servers.
>>
>> Due to a sub-optimal file system layout, I regularly have to move lots
>> of files between file systems.  The servers are in the same rack, or at
>> least in racks next to each other, and I am fairly certain they are all
>> connected to the same GB switch.
>>
>> Moving blocks of ~300k files totaling about 5-10 GB takes hours to
>> complete.  Yesterday afternoon I started a move of 1.3 million files
>> totaling about 94 GB.  20 hours later the transfer seems to be less than
>> half done.
>>
>> Does this surprise anyone?  Any hints as to what might be wrong or what
>> might speed it up?  I'm at a loss to know where to start looking.
>>
>> Regards,
>>
>> Ian
>>
>>
>>
>> More details, for those who are interested:
>>
>> The files at the origin are on RAID1 SATA disks (1 TB, ext3, Seagate
>> Barracuda 7200 RPM), and I have a ganglia snapshot of the 24 hour status
>> (you can see the start of the transfer about 20 hours ago):
>>
>> http://dl.dropbox.com/u/1561496/shared/abitibi-origin.pdf
>>
>> The destination is an Apple X-RAID array (4TB) connected to an Apple
>> XServe.  The corresponding ganglia snapshot is here:
>>
>> http://dl.dropbox.com/u/1561496/shared/macintel-destination.pdf
>>
>>
>>
>>
>> _______________________________________________
>> bblisa mailing list
>> bblisa at bblisa.org
>> http://www.bblisa.org/mailman/listinfo/bblisa
>
> _______________________________________________
> bblisa mailing list
> bblisa at bblisa.org
> http://www.bblisa.org/mailman/listinfo/bblisa
>

_______________________________________________
bblisa mailing list
bblisa at bblisa.org
http://www.bblisa.org/mailman/listinfo/bblisa



More information about the bblisa mailing list