[BBLISA] Backing up sparse files ... VM's and TrueCrypt ... etc

Edward Ned Harvey bblisa3 at nedharvey.com
Tue Feb 23 07:44:31 EST 2010


> But there is no /backup/ technology to do that now, that I know of. A

I don't know why I have to keep repeating this.  Crashplan does backup this
way, but you can't restore a sparse file sparsely from a crashplan backup.
I have a ticket open with their support.  They say they'll address it, but I
don't count on such things before I see a result.

Is crashplan the only backup system in the world to calculate byte
differentials?  I doubt it.  But you seem to think so.


> checksum on the whole file won't tell you what /block/ changed.  One
> would need checksums on /each/ block.  I don't know of any backup
> system
> that does that. 

That's what I'm asking for.  Anyone who knows any tool that can do something
like this.


> The backup log would be a significant fraction of the
> filesystem, or if sparse, a significant fraction of the data. Lets say
> you have a 1k block and you want to monitor for changes, and use a
> 160byte mac to ensure no collisions on changes having the same sum. See
> the problem?  

No.  I don't see the problem.  You say 160byte, but I was thinking more like
this:
Store a checksum for every 1Mbyte, and one more for the file as a whole.
This way, you run the risk of checksum collisions on the 1M blocks (a very
small percentage risk), and the overall checksum is a cross-check, which
allows you to detect if such a collision occurs.  If a collision occurs,
there's no choice about it, you have to resend the whole image, but that
would happen very rarely.

And the checksums are less than 0.1% of the total backup size.


> Not to mention the issue of computing the checksums
> during
> backup, looking them up in a database, which has its own overhead.  The
> backup system could become the major load on the server.

More than backing up the whole file?  I think not.  This "overhead" is a
time and effort saver, not loser.


> Of course, the versioning filesystem doesn't do it that way.  It just
> keeps pointers to a copy-on-write set of blocks that have changed,
> rather just like virtual memory, starting with the root inode (actually
> plural).  One only needs to compare the two root inodes to find what
> blocks have changed between them. At some gross over-simplification,
> just 'Lather rinse repeat' for the rest of the inodes in the
> filesystem.
> You get the point.

If ZFS or something else that does Copy On Write is available in Windows,
and enables sending byte differential backups, please let me know.



More information about the bblisa mailing list