[BBLISA] Live Sync / Backup / Sync without crawling

John Orthoefer jco at direwolf.com
Mon Nov 3 21:34:33 EST 2008


On Nov 3, 2008, at 5:36 PM, Tom Metro wrote:

> John Orthoefer wrote:
>> Tony Rudie wrote:
>>> Rsync should be fine.  And searching for a specific entry in a  
>>> directory should be way faster than looking at every entry to see  
>>> if it needs copying.  Right?
>> If your filesystem stores directory entries in something other than  
>> an unsorted list, yes it should be faster.
>> ...
>> But if your filesystem still keeps you directories in something  
>> that is unhashed, then you might as well just let rsync do it's  
>> job, you are only saving a stat call at that point...
>
> I'm not sure I see the relevance of hashed directories with respect  
> to the OP's question.
>

The relevance is if you pass rsync a list of files (because we don't  
know if the OPs plan is to sync each file as inotify fires, or to sync  
the files every n minutes based on a list generated by inotify), as  
far as I know it takes the first file in the list, and grabs the file.  
Which means it needs to scan each directory down until it hits the  
file, rinse and repeat for each file.   If some of those directories  
are huge and the scan is linear it can a long time to scan each  
directory.    But if rsync does it's own thing it does each file as it  
encounters it, so it doesn't end up rescanning for each file.

Really I can't tell if it's relevant to the OP question, because he  
didn't give any of those details.   Further more, I was responding not  
to the OP question but statement that Tony made, that searching for a  
file is faster than checking each file.    I was trying to point out  
you need profile what is going on, and know where the bottleneck is  
and what the problem is you are trying to solve.  (not the problem you  
want to solve, that is called research. And no they aren't always the  
same thing.)

> If you let rsync operate in its usual fashion, then it needs to scan  
> the directory hierarchy, and look at the file system metadata for  
> each file, comparing it with the remote file, and if a difference is  
> found, perform a more detailed block comparison.

It depends on what other flags you give rsync, you can tell it to just  
blind sync the file based on metadata, --whole-file (as I recall.)

> The OP was seeking to replace that scan with an event driven model  
> using inotify or an equivalent service hooked into the OS's kernel  
> that would fire events when a change occurred in the area of interest.
>
>
>> When I first saw this message, my answer was use rsync with
>> --from-file...
>
> So where does the list of files come from that you put into the file  
> pointed at by --files-from?
>
> Sure, you can use something like:
>
> inotifywait -q -r -m ... /path | perl -pe ... | rsync --files- 
> from=- ...
>
> but it requires more than just rsync.
>

Yes, I didn't provide a whole script, some glue is required.   
Depending on how fast you are expecting files to change, how fast you  
can sync them, and how many are changing per unit time (maybe the  
right answer for his problem might be to do something even more  
complex, inotify feeding a program that forks off 20 or so rsyncs  
which it queues files to be synced.)   So it requires more knowledge  
about what is going on to find the best solutions.   With that said  
based on what he asked, I would look to use rsync --file-from=.  Also  
you'll have to do something like rsync to resync after a restart,  
because you won't be able to guarantee  that while your notify wasn't  
in place something didn't change.

Then again you might not be able to do what he needs with inotify, and  
you might have to bring out the really heavy guns.  Like a pair of  
netapps with SnapSync/SnapMirror (I think those are the products that  
keep 2 netapps in sync.)

Johno




More information about the bblisa mailing list