[BBLISA] Backing up sparse files ... VM's and TrueCrypt ... etc

Tom Metro tmetro+bblisa at vl.com
Mon Feb 22 20:56:31 EST 2010


John P. Rouillard wrote:
> I have seen a significant reduction in network load. However for large
> files the time to do the block comparisons seems to grow non-linearly...
...
> Now the tradeoff is as Dean days high read I/O load on the server side
> with some significant wait times.

In theory, one could reduce server side load by caching checksums for 
large chunks of a large file. For example, although comparisons may 
normally be done on say a 1KB block, you could take any file greater 
than some threshold and cache a checksum for each 25% chunk, which could 
then be used by the client to narrow the search for the chunk(s) 
containing differences.

Of course getting the file system to guide you to the right spot would 
be ideal. Conceptually one could create a ZFS or Btrfs aware version of 
rsync that could do that.


> One thing to remeber is that compression really screws up things. I
> had to make some of the files I backed up uncompressed or else a small
> change in the pre-compressed file would require backing up almost the
> entire compressed file.

See the --rsyncable option on Debian versions of gzip, or apply the 
equivalent patch.


> I mention that because compression and encryption can often overlap in
> the effect. Just because you are writing 1k of data inside a truecrypt
> file doesn't mean that that maps to a single block outside. Also
> depending on the algorithm it may chain multiple blocks together
> similar to what gzip does with a similar effect.

True. Not relevant to this thread, but there are tools that combine 
rsync and encryption using techniques similar to the --rsyncable option 
for gzip. (See rsyncrypto: http://sourceforge.net/projects/rsyncrypto/)

  -Tom

-- 
Tom Metro
Venture Logic, Newton, MA, USA
"Enterprise solutions through open source."
Professional Profile: http://tmetro.venturelogic.com/



More information about the bblisa mailing list