[BBLISA] Backing up sparse files ... VM's and TrueCrypt ... etc

Rob Taylor rgt at wi.mit.edu
Mon Feb 22 18:42:48 EST 2010


Has anyone looked at EMC's Avamar product? It seems like it might do
what you want it to do. It specifically mentions virtual machines as well.

rgt

On 02/22/2010 10:59 AM, Dean Anderson wrote:
> 
>> .         50G sparse file VMWare virtual disk, contains Windows XP
>> installation, 22G used.
>>
>> .         Back it up once.  22G go across the network.  It takes 30 mins.
>>
>> .         Boot into XP, change a 1K file, shutdown.  Including random
>> registry changes and system event logs and other random changes, imagine
>> that a total of twenty 1k blocks have changed.
>>
>> .         Now do an incremental backup.  Sure, you may need to scan the file
>> looking for which blocks changed, but you can do that as fast as you can
>> read the whole file once, assuming you kept some sort of checksums from the
>> previous time.  And then just send 20k across the net.  This should complete
>> at least 5x faster than before ... which means at most 6 mins.
> 
> But there is no /backup/ technology to do that now, that I know of. A
> checksum on the whole file won't tell you what /block/ changed.  One
> would need checksums on /each/ block.  I don't know of any backup system
> that does that. The backup log would be a significant fraction of the
> filesystem, or if sparse, a significant fraction of the data. Lets say
> you have a 1k block and you want to monitor for changes, and use a
> 160byte mac to ensure no collisions on changes having the same sum. See
> the problem?  Not to mention the issue of computing the checksums during
> backup, looking them up in a database, which has its own overhead.  The
> backup system could become the major load on the server.
> 
> Of course, the versioning filesystem doesn't do it that way.  It just
> keeps pointers to a copy-on-write set of blocks that have changed,
> rather just like virtual memory, starting with the root inode (actually
> plural).  One only needs to compare the two root inodes to find what
> blocks have changed between them. At some gross over-simplification,
> just 'Lather rinse repeat' for the rest of the inodes in the filesystem.  
> You get the point.
> 
> It would indeed be _nice_ if only the 20k changed were sent, but there
> aren't many filesystems that /can/ indicate anything more than "the file
> changed since the last time it was backed up". Hence the backup program
> has to read/restore the entire file.  "Incremental backup" refers to the
> whole filesystem, not the the blocks of files.
> 
>> .  If you do this with tar or dump ... even with compression ... still
>> 22G goes across the net.  Another 30 minute backup.
>>
>> Is it clear now?
> 
> Indeed.  To do what you want (only send the 20k that changed), one needs
> a versioning filesystem to do that. Like AFS or NetApp's fs.  What you
> want to do is intimately related to the capability of the filesystem to
> track what blocks have changed. AFS, for example, keeps one version back
> as the 'backup fileset' and an AFS incremental backup takes only the
> blocks that are different from the backup fileset. NetApp keeps 10
> versions but I don't remember how the NetApp backup works. There are
> efforts in AFS to allow more versions.  I don't know of any other
> filesystems that keep version information. Ordinary filesystems (like
> FFS, EXT2,3, NTFS) don't keep track of what blocks have changed since
> the last backup.
> 
> But your point should be well taken to FS implementors: We need
> versioning filesystems.
> 
> 
> 		--Dean
> 



More information about the bblisa mailing list