[owncloud-devel] GSoC Proposal for Large File Sync

Roeland Douma rullzer at owncloud.com
Tue Mar 22 14:33:17 GMT 2016


This is a bit of an oversimplification.
Just cutting up a file does not work. You need a rolling checksum. Else
you are going to be unable to detect moves within a file.


Assume a file.

'aaaaabbbbbccccc'

Now your chunks are 5 bytes. And you decide to modify the file locally to:

'adaaaabbbbbccccc'


Now if you would have 'static' chunks. This will force you to reupload all chunks.
Stuff like this is not uncommon.

But this is indeed exactly what zsync is.


Basically you need to store the zsync file as meta data. Because calculating the checksum
on the server is not really a scaleable solution.

Long story short. Change detection is not only about changed chunks. It is 
also about moved data (that did not change).

I agree that the blog series is a good place to start on how to design the API.

Cheers,
--Roeland


 From:   Klaas Freitag <freitag at owncloud.com> 
 To:   <devel at owncloud.org> 
 Sent:   22-3-2016 15:16 
 Subject:   Re: [owncloud-devel] GSoC Proposal for Large File Sync 

On 21.03.2016 23:58, Tomaz Canabrava wrote: 
> 
 
Hi, 
 
> 
> I can work on a proof of concept for large text files and virtual 
> machine images (wich would already be a win-situation for some users) 
> and then focus on *some* of the hard to sync files (like powerpoint 
> presentations) and see what I could get. 
> 
 
I do not think you should consider the file type at all. Just try to 
implement the zsync based approach I'd say, and just for the chunked  
upload mode. 
 
Raw steps: 
1. on the client, chop the file in chunks and create a list: 
 
   Number of chunk     start-at-byte    end-of-byte   Checksum? 
 
2. send this list to the server to get the servers checksums 
3. While waiting on the server list of checksums, calc the client checksums 
4. compare the lists once the both are ready and decide which need upload 
5. upload the chunks that changed. 
 
The trick is in the cutting of the chunks. The amount of chunks that do  
not change can be increased by picking clever boundaries. 
 
This project requires both client and server work. 
 
Please do the server work based on what is described in the blog series  
about the new chunking API, there is a branch with basic implementation  
of that here: 
https://github.com/owncloud/core/pull/20118 
 
Makes sense? 
 
regards, 
 
Klaas 
> 
> 
 
 
_______________________________________________ 
Devel mailing list 
Devel at owncloud.org 
http://mailman.owncloud.org/mailman/listinfo/devel 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.owncloud.org/pipermail/devel/attachments/20160322/b2ced905/attachment-0001.html>


More information about the Devel mailing list