[owncloud-devel] GSoC Proposal for Large File Sync

Tomaz Canabrava tcanabrava at kde.org
Tue Mar 22 14:35:04 GMT 2016


On Tue, Mar 22, 2016 at 11:33 AM, Roeland Douma <rullzer at owncloud.com>
wrote:

> This is a bit of an oversimplification.
> Just cutting up a file does not work. You need a rolling checksum. Else
> you are going to be unable to detect moves within a file.
>
>
> Assume a file.
>
> 'aaaaabbbbbccccc'
>
> Now your chunks are 5 bytes. And you decide to modify the file locally to:
>
> 'adaaaabbbbbccccc'
>
>
> Now if you would have 'static' chunks. This will force you to reupload all
> chunks.
> Stuff like this is not uncommon.
>

> But this is indeed exactly what zsync is.
>
>
> Basically you need to store the zsync file as meta data. Because
> calculating the checksum
> on the server is not really a scaleable solution.
>

What if the chechsum-chunk-calculation resulting file is also uploaded for
the server, this way we don't need to use the server to calculate that at
all.
 don't know if the resulting file will be large as I didn't started doing
that.



> Long story short. Change detection is not only about changed chunks. It is
> also about moved data (that did not change).
>
> I agree that the blog series is a good place to start on how to design the
> API.
>
> Cheers,
> --Roeland
>
> * From: * Klaas Freitag <freitag at owncloud.com>
> * To: * <devel at owncloud.org>
> * Sent: * 22-3-2016 15:16
> * Subject: * Re: [owncloud-devel] GSoC Proposal for Large File Sync
>
> On 21.03.2016 23:58, Tomaz Canabrava wrote:
> >
>
> Hi,
>
> >
> > I can work on a proof of concept for large text files and virtual
> > machine images (wich would already be a win-situation for some users)
> > and then focus on *some* of the hard to sync files (like powerpoint
> > presentations) and see what I could get.
> >
>
> I do not think you should consider the file type at all. Just try to
> implement the zsync based approach I'd say, and just for the chunked
> upload mode.
>
> Raw steps:
> 1. on the client, chop the file in chunks and create a list:
>
>   Number of chunk     start-at-byte    end-of-byte   Checksum?
>
> 2. send this list to the server to get the servers checksums
> 3. While waiting on the server list of checksums, calc the client
> checksums
> 4. compare the lists once the both are ready and decide which need upload
> 5. upload the chunks that changed.
>
> The trick is in the cutting of the chunks. The amount of chunks that do
> not change can be increased by picking clever boundaries.
>
> This project requires both client and server work.
>
> Please do the server work based on what is described in the blog series
> about the new chunking API, there is a branch with basic implementation
> of that here:
> https://github.com/owncloud/core/pull/20118
>
> Makes sense?
>
> regards,
>
> Klaas
> >
> >
>
>
> _______________________________________________
> Devel mailing list
> Devel at owncloud.org
> http://mailman.owncloud.org/mailman/listinfo/devel
>
>
> _______________________________________________
> Devel mailing list
> Devel at owncloud.org
> http://mailman.owncloud.org/mailman/listinfo/devel
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.owncloud.org/pipermail/devel/attachments/20160322/292ac511/attachment.html>


More information about the Devel mailing list