[Docs] [txt|pdf|xml|html] [Tracker] [Email] [Diff1] [Diff2] [Nits]

Versions: 00 01 02 03

Network Working Group                                             Y. Cui
Internet-Draft                                                    Z. Lai
Intended status: Informational                                    L. Sun
Expires: April 2, 2016                               Tsinghua University
                                                      September 30, 2015


                Internet Storage Sync: Problem Statement
                        draft-cui-iss-problem-02

Abstract

   Internet storage services have become more and more popular.  They
   attract a huge number of users and produce a significant share of
   Internet traffic.  Most existing Internet storage services make use
   of proprietary sync protocols with different capabilities to achieve
   the data sync.  However, a single Internet storage service using its
   proprietary sync protocols has intrinsic limitations on service
   usability and network performance.  This document outlines the
   related problems caused by using proprietary sync protocols and
   missing key capabilities.  It also shows a demand for designing a
   standard sync protocol to achieve better usability and sync
   performance.

Status of This Memo

   This Internet-Draft is submitted in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF).  Note that other groups may also distribute
   working documents as Internet-Drafts.  The list of current Internet-
   Drafts is at http://datatracker.ietf.org/drafts/current/.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   This Internet-Draft will expire on April 2, 2016.

Copyright Notice

   Copyright (c) 2015 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents



Cui, et al.               Expires April 2, 2016                 [Page 1]


Internet-Draft                iss Problems                September 2015


   (http://trustee.ietf.org/license-info) in effect on the date of
   publication of this document.  Please review these documents
   carefully, as they describe your rights and restrictions with respect
   to this document.  Code Components extracted from this document must
   include Simplified BSD License text as described in Section 4.e of
   the Trust Legal Provisions and are provided without warranty as
   described in the Simplified BSD License.

Table of Contents

   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   2
   2.  Terminology and Concepts  . . . . . . . . . . . . . . . . . .   4
   3.  Architecture of Internet Storage Service  . . . . . . . . . .   5
   4.  Problems  . . . . . . . . . . . . . . . . . . . . . . . . . .   6
     4.1.  Complicated Support for APIs  . . . . . . . . . . . . . .   6
     4.2.  Unavailable Cross-service Sync  . . . . . . . . . . . . .   7
     4.3.  Multiple Similar Clients  . . . . . . . . . . . . . . . .   7
     4.4.  Protocol Capability Configurations and Implementations  .   8
       4.4.1.  Chunking and Deduplication  . . . . . . . . . . . . .   9
       4.4.2.  Chunking and Delta-encoding . . . . . . . . . . . . .  10
       4.4.3.  Bundling  . . . . . . . . . . . . . . . . . . . . . .  10
     4.5.  Sync Protocols in Mobile and Wireless Environments  . . .  10
     4.6.  Unsatisfactory Concurrent Work Ability  . . . . . . . . .  11
   5.  Advantages of Standard Sync Protocol  . . . . . . . . . . . .  12
   6.  Understanding of Sync Protocol  . . . . . . . . . . . . . . .  13
   7.  Related Work in IETF  . . . . . . . . . . . . . . . . . . . .  14
   8.  Security Considerations (TBD) . . . . . . . . . . . . . . . .  14
   9.  Acknowledgements  . . . . . . . . . . . . . . . . . . . . . .  15
   10. Informative References  . . . . . . . . . . . . . . . . . . .  15
   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . .  16

1.  Introduction

   Internet storage services provide a convenient way for users to
   synchronize local files or folders with remote servers.  In recent
   years, Internet storage services have gained tremendous popularity
   and accounted for a large amount of Internet traffic.  This high
   public interest also pushes various providers to enter the Internet
   storage market.  Services like Dropbox, Google Drive, OneDrive and
   Box are becoming pervasive in people's routine.  Dropbox, typically
   considered as one of the leading providers, annouced that they have
   more than 400 million registered users in June, 2015 [users], and
   this number will keep growing in the future.  Internet storage
   services enable the users to access, operate and share their data
   from anywhere, on any devices, at any time and with any connectivity.
   Internet storage services also provide powerful APIs which allow
   third-party applications to offload the burden of data storage and
   management to the server.  By aggregating users' files or application



Cui, et al.               Expires April 2, 2016                 [Page 2]


Internet-Draft                iss Problems                September 2015


   data in the server, Internet storage services are becoming the "data
   entrance" personal users.

   Sync protocol is the key design consideration of Internet storage
   services.  The sync protocol can be equipped with several
   capabilities to optimize the storage usage and speed up data
   transmission.  Existing ISS employ their proprietary sync protocol to
   store/retrieve user data to/from the remote servers.  However, using
   proprietary sync protocol with different capabilities in different
   ISS has intrinsic limitations on service usability and network
   performance.

   Multi-service usability: Users prefer to use multiple Internet
   storage services for several reasons.  First, the upload and download
   performance of a single ISS vary drastically in both spatial and
   temporal dimensions.  Second, a service may experience temporal
   outrages or spatial outrages.  Besides, because an Internet storage
   service has full access to user data, user data is at risk when the
   service is attacked or when authorities require the providers to
   expose their data.  However, using different proprietary with
   different capability configurations and implementations makes it
   difficult for developers or users to use multi-service.  It is
   complicated for developers to use different APIs to combine their
   application with ISS.  It also makes it unavailable for an Internet
   storage service user to synchronize data with the users of other
   service.  Moreover, to use multi-service a user has to install a
   series of client applications with similar functionality, which
   wastes the local resource and sacrifices the user experience.

   Insufficient or unjustified capabilities: Previous works show that
   existing Internet storage services have different capability
   configurations and implementations.  These capabilities are closely
   related to each other that do help to efficiently synchronize user
   data and improve reliability.  However, most of the storage services
   are found to be lack of key capabilities or the capabilities are not
   reasonably configured, which may result in unexpected sync failure
   and sync inefficiency.  For example, for a mobile user who uploads a
   photo via his phone, if the wireless connectivity is interrupted some
   Internet storage services turn out that the sync is failed but still
   consumes lots of traffic and cell phone battery.  How to reasonably
   design and implement capabilities in the sync protocol has indeed
   become a critical problem for the providers.

   To address the problems mentioned above, an open and standard storage
   sync protocol is required.  In addition to this, this standard sync
   protocol are expected to support all the useful service capabilities
   to avoid unexpected sync failures and improve network performance.




Cui, et al.               Expires April 2, 2016                 [Page 3]


Internet-Draft                iss Problems                September 2015


   This document outlines the problems arisen in existing Internet
   storage services with various proprietary sync protocols.  Section 2
   lists the terminology and related concepts of Internet storage
   service.  Section 3 introduces the architecture of existing Internet
   storage services.  Section 4 describes the main problems and issues
   that need to be considered.  Section 5 explains the advantages of
   using open and standard sync protocol.  Section 6 shows a high-level
   understanding of the sync protocol.  Section 7 identifies the
   differences between ISS and related work in IETF (i.e.  WebDAV).

2.  Terminology and Concepts

   Data synchronization (sync): The most important operation of Internet
   storage services and is more than remote file transfer.  It makes it
   possible for the client to automatically update changes to the files
   stored in the remote servers.  Changes on a local file will be
   notified to the client promptly

   Client: An application which is installed at the user side (i.e. on
   multiple terminals).  It enables users to access and experience
   Internet storage service.

   Control server: The entity that takes the responsibility of
   authenticating users, managing metadata information and also
   notifying changes to the client.  It stores authentication and
   metadata information of users.

   Data storage server: The entity that stores the synchronized files of
   users.

   Control data: The control information exchanged with control server
   to fulfil the data sync process.  Typical control data includes
   metadata (e.g. hashes for chunks), authentication information and
   etc.

   Content data: The original data of user local file, often in forms of
   small chunks.

   Sync protocol: A communication protocol between client and remote
   servers to achieve data sync.  It contains control flow and data
   flow.  Sync protocols are always built on HTTPS/HTTP.

   o  Control flow: This flow is for client and control server to
      exchange control data.

   o  Data flow: This flow is for transmitting content data between
      client and data storage servers.




Cui, et al.               Expires April 2, 2016                 [Page 4]


Internet-Draft                iss Problems                September 2015


   Sync efficiency: A performance metric that indicates how fast the
   changes can be synchronized to the Internet with the lowest traffic
   overhead.

   Useful capabilities to improve sync efficiency:

   o  Chunking: Split large file into small chunks.

   o  Bundling: Transmit multiple small chunks as a single big chunk.

   o  Deduplication: Avoid retransmission of existing content on the
      Internet.

   o  Delta-encoding: Only synchronize modified data.

   o  Compression: Compress data before transmission.

3.  Architecture of Internet Storage Service

   The architecture of most Internet storage services is generally
   composed of three major components: client, control server and data
   storage server.  And the whole architecture is shown in Figure 1.


                           * * * * * * * *
              * * * * * * *               * * * * * * *
            *                                 INTERNET  *
            *  +------------+        +------------+     *
         ------|   Control  |        | +------------+    *
        |  *   |   server   |        | |Data storage|========
        |   *  +------------+        + |   servers  |   *    |
        |   *                          +------------+   *    |
        |     * * * * * * *                * * * * * * *     |
   Control Flow            * * * * * * * *               Data Flow
        |                                                    |
        |                                                    |
        |                     +--------+                     |
         ---------------------| Client |=====================
                              +--------+

                               Figure 1


   With the help of sync protocol, all the three components could
   communicate with each other.  Control server is responsible for
   storing all the control data, including authentication information,
   metadata and etc.  And once there are changes made on synchronized
   files, the control server will notify the clients.  However the other



Cui, et al.               Expires April 2, 2016                 [Page 5]


Internet-Draft                iss Problems                September 2015


   type of data, content data, is stored in the form of chunks on the
   data storage servers with no knowledge of sources, users and
   relationship with other data chunks.  That is to say, one user file
   is split and stored on several different data storage servers.  These
   two kinds of servers are separate logical entities and are usually
   deployed in different locations.  Every time the client synchronize a
   local file to the Internet, it needs to exchange both control data
   with the control server and content data with the data storage
   servers.

4.  Problems

   Existing popular Internet storage services, including Dropbox,
   OneDrive, GoogleDrive and etc, are using their own proprietary sync
   protocols to achieve the data sync.  Using different proprietary
   protocols are always considered not to be beneficial to the
   development of Internet services.  This section describes current
   problems for Internet storage services caused by their sync
   protocols.  We summarize six specific problems from three different
   aspects: service usability, protocol capabilities and concurrent work
   ability.  As we discussed in Section 1, users prefer to use multiple
   storage services for the considerations of performance, reliability
   and security.  Service usability among multiple services is still
   lacking to some extent due to the proprietary format of sync
   protocols.  Section 4.1, Section 4.2 and Section 4.3 describe the
   problems which are concerned with the usability.  Moreover, previous
   works and measurements have revealed that most sync protocols are
   lack of key service capabilities or the capabilities are not well
   configured, which significantly degrades the network performance,
   especially in the mobile and wireless environment.  Section 4.4 and
   Section 4.5 illustrate the problems of current protocol capabilities.
   In addition, the unsatisfied concurrent work ability is specified in
   Section 4.6.  All above problems are looking forward to be addressed
   by the IETF community.

4.1.  Complicated Support for APIs

   Popular Internet storage services provide APIs that extend access to
   the content management features in client software for use in third-
   party applications.  In practical platform, these APIs take care of
   synchronizing data with Internet storage servers through a familiar
   system-like way.  Behind the scenes, API synchronize changes to the
   server and automatically notify the client when changes are made on
   other devices.  These APIs can also include some further advanced
   features or functions, e.g. revision or restoration of files, to make
   the client work better.  Different providers have different APIs
   provided to the developers and their APIs have different styles and




Cui, et al.               Expires April 2, 2016                 [Page 6]


Internet-Draft                iss Problems                September 2015


   features in order to support different platforms (e.g.  Windows and
   Andorid).

   Third-party applications prefer to combine multiple Internet storage
   services into their applications to achieve better performance,
   reliability and security.  However, for these developers who want to
   use multiple storage services, they need to learn the APIs of all
   service providers in order to design and implement their own clients.
   Although there have already been some successful third party clients
   that support multiple services (e.g.  ExpanDrive [ExpanDrive], IFTTT
   [IFTTT]), it is not easy for the developers to learn and apply so
   many different APIs to develop and maintain their third party
   clients.

4.2.  Unavailable Cross-service Sync

   Synchronizing is one of the most important functions provided by
   Internet storage services.  With this function provided, files in the
   Internet could be easily shared and manipulated by different people
   and groups.  Anyone who is permitted to read and download the file is
   able to modify and upload new versions of this file to the Internet.

   However, this synchronizing function merely works well inside a
   single service.  That is to say, users who are using the same
   Internet storage service could easily achieve the sharing (i.e.
   download) and coordinated operations on their files.  When referring
   to the synchronizing among different Internet storage services, it is
   not complete since the sync among different services is not
   available.  For example, if a Dropbox user wants to work on a
   cooperative file with a Google Drive user currently, he is only able
   to share this file with the other one by sending an open HTTP link of
   this file.  After clicking on that link, the Google Drive user could
   only download this file through HTTP.  However, the Google Drive user
   can only read and download the shared file.  He cannot modify and
   update the shared file since Dropbox and Google Drive are using two
   different proprietary sync protocols.  This is because the
   cooperative file is stored on Dropbox servers.  A Google Drive client
   cannot download/upload the file through Dropbox's sync protocol since
   it has no idea of the Dropbox's sync protocol.  Different services
   using different proprietary sync protocols results in the
   unavailability.

4.3.  Multiple Similar Clients

   The emergency of more and more Internet storage services provides
   users with a wide range of choices for storing their local files
   remotely.  Like other Internet applications, users are not restricted
   to use only one of those services.  Actually, they tend to have



Cui, et al.               Expires April 2, 2016                 [Page 7]


Internet-Draft                iss Problems                September 2015


   multiple accounts for different Internet storage services and
   experience them simultaneously.  One important reason is that users
   are always pursuing better functionality.  For example, Dropbox is
   better at file processing, OneDrive is better at the interoperability
   and compatibility with Microsoft Office while GoogleDrive has a
   better performance at mail attachment.  To enable all the desired
   functions and features, a simple way is to register and use all the
   desired Internet storage services.  Furthermore, people may simply
   need multiple Internet storage services for larger storage space and
   higher reliability.

   However, using different Internet storage service results in a
   problem that users have to install multiple similar client
   applications.  Since almost all commercial Internet storage services
   have their own proprietary sync protocols and corresponding client
   applications, installing and running multiple similar client
   applications sacrifices the user experience and also increases the
   complexity of synchronizing files with different providers' servers
   in Internet.  For instance, users usually suffer from duplicate
   operations in order to upload the same file to their different
   service accounts.

4.4.  Protocol Capability Configurations and Implementations

   Data sync is not a simple remote file transfer process, it can
   implement several capabilities to optimize the data storage usage and
   speed up data transmissions.  There exists five well-known
   capabilities that can be employed by Internet storage services to
   improve the sync efficiency and reliability: chunking, bundling,
   deduplication, delta-encoding and compression.  All these
   capabilities are aimed to help to efficiently synchronize user data
   via Internet communications.

   However, the investigation of [Benchmarking] shows that different
   Internet storage services have different capability configurations
   and implementations.  And most existing Internet storage services do
   not implement all the five capabilities in their sync protocol.  Lack
   of such capabilities can do affect the sync efficiency.  Table 1 from
   [QuickSync] shows different capabilities implementations of four
   popular Internet storage services (i.e.  Dropbox, GoogleDrive,
   OneDrive and Seafile) on Windows OS.










Cui, et al.               Expires April 2, 2016                 [Page 8]


Internet-Draft                iss Problems                September 2015


 +----------------+-------------+-------------+-------------+-------------+
 |  Capabilities  |   Dropbox   | GoogleDrive |   OneDrive  |   Seafile   |
 |                |             |             |             |             |
 +----------------+-------------+-------------+-------------+-------------+
 |    Chunking    |     4MB     |     8MB     |   Variable  |   Variable  |
 +----------------+-------------+-------------+-------------+-------------+
 |    Bundling    |     Yes     |      No     |      No     |      No     |
 +----------------+-------------+-------------+-------------+-------------+
 |  Deduplication |     Yes     |      No     |      No     |     Yes     |
 +----------------+-------------+-------------+-------------+-------------+
 | Delta-encoding |     Yes     |      No     |      No     |      No     |
 +----------------+-------------+-------------+-------------+-------------+
 |   Compression  |     Yes     |     Yes     |      No     |      No     |
 +----------------+-------------+-------------+-------------+-------------+
                                   Table 1


   Measurements and study from [QuickSync] also reveal that those key
   capabilities significantly affect the sync performance.  Most of them
   should be implemented and well configured to achieve data sync.  The
   remaining part of this subsection lists the problems caused by
   insufficient or unreasonably configured capabilities.

4.4.1.  Chunking and Deduplication

   Chunking is the most widely implemented capability that simplifies
   the transmission recovery when the sync of a large file is
   interrupted.  Different implementations of chunking has different
   chunking schemes (i.e. dynamic chunking or static chunking) and chunk
   sizes.  Chunking is closely related to deduplication since the
   deduplication is performed in the chunk granularity.  Typically,
   smaller chunk size and dynamic chunking scheme (e.g.  Content Defined
   Chunking) are better for detecting and eliminating redundancy.
   However the ability to detect more redundancy is not always equal to
   better sync efficiency since it will introduce more computation
   overhead (i.e. finding more redundancy needs more CPU time).
   Aggressive dynamic chunking scheme (e.g.  Content Defined Chunking)
   performs better in a high delay (i.e. high RTT) environment, while
   fixed-size scheme performs well in good network conditions.  A trade-
   off between computation time and transmission time need to be
   considered to achieve an effective chunking.  A better chunking
   strategy may be network-aware which means the sync should be able to
   employ appropriate chunking strategy according to its current network
   condition.







Cui, et al.               Expires April 2, 2016                 [Page 9]


Internet-Draft                iss Problems                September 2015


4.4.2.  Chunking and Delta-encoding

   Delta-encoding is an algorithm that can be used to find the different
   portion of two files and achieve incremental sync.  However, not all
   Internet storage services implement delta-encoding.  One possible
   reason is that most delta-encoding algorithms work at the granularity
   of file, while to save the storage space thus reducing the cost,
   files are often split into chunks to manage for Internet storage
   services.  Naively piecing together all chunks to reconstruct the
   whole file to achieve incremental sync would waste massive intra-
   cluster bandwidth.  Therefore, some Internet storage services, e.g.
   Dropbox, implement delta-encoding at the chunk granularity.  The
   delta-encoding is performed between two chunks in the original and
   modified version respectively according to the chunk offset from the
   beginning of the file.  If a service uses the fixed size chunking
   method, some types of modifications, e.g. inserting some new data at
   the head of a file, may cause that the two chunks used to perform
   delta-encoding have very little similarity.  In this circumstance,
   delta-encoding is unable to reveal the delta between the original and
   modified file so that the incremental sync fails.  To solve the
   problem, we need to design an improved delta-encoding algorithm with
   appropriate chunking that makes the incremental sync always available
   in various scenarios.

4.4.3.  Bundling

   Small files are more likely to be modified and synchronized
   frequently.  For example, people usually collaborate on a number of
   small files (e.g. a project's source code always consists of multiple
   small files).  In a high delay environment, synchronizing large
   number of small files is not efficient.  One reason is that most
   existing Internet storage services employ a sequential
   acknowledgement mechanism.  Under this circumstance, the next chunk
   is only allowed to be transmitted until the last chunk's
   acknowledgement has been received.  The sequential acknowledgement
   mechanism wastes the limited bandwidth since the TCP connection is in
   idle state for a long time.  Bundling small files together and
   employing delayed acknowledgement mechanism can effectively make full
   use of limited bandwidth so that the whole sync time and traffic
   overhead can be significantly decreased.

4.5.  Sync Protocols in Mobile and Wireless Environments

   The increasing number of mobile terminals introduces the requirement
   of synchronizing data on any device via any connectivity at anytime
   and anywhere.  A change made on the data through the desktop is
   required to be automatically transferred to the user's mobile phone
   or other mobile devices.  Based on the measurements from



Cui, et al.               Expires April 2, 2016                [Page 10]


Internet-Draft                iss Problems                September 2015


   [Look_at_Mobile_Cloud], the problem of missing capabilities is more
   severe when referring to the mobile Internet storage services.  The
   root cause and problem are twofold:

   First of all, mobile devices have limited storage and computation
   ability, it is really hard to implement all the five useful
   capabilities discussed previously on a mobile client since the
   implementation of those capabilities will bring extra overhead
   (Table 2 shows the implementations for capabilities on Android OS).
   The measurement results from [Look_at_Mobile_Cloud] shows that none
   of existing mobile Internet storage services implement all the five
   key capabilities and only very few of them could be found on a mobile
   Internet storage client.  That explains why most Internet storage
   services wastes limited bandwidth, produce large useless traffic and
   suffer long sync time in the mobile environment.  How to implement
   all the desired capabilities with lower requirement of storage and
   computation resources is a critical problem needs to be addressed.


 +----------------+-------------+-------------+-------------+-------------+
 |  Capabilities  |   Dropbox   | GoogleDrive |   OneDrive  |   Seafile   |
 |                |             |             |             |             |
 +----------------+-------------+-------------+-------------+-------------+
 |    Chunking    |     4MB     |     260K    |     1MB     |      No     |
 +----------------+-------------+-------------+-------------+-------------+
 |    Bundling    |      No     |      No     |      No     |      No     |
 +----------------+-------------+-------------+-------------+-------------+
 |  Deduplication |     Yes     |      No     |      No     |      No     |
 +----------------+-------------+-------------+-------------+-------------+
 | Delta-encoding |      No     |      No     |      No     |      No     |
 +----------------+-------------+-------------+-------------+-------------+
 |   Compression  |      No     |      No     |      No     |      No     |
 +----------------+-------------+-------------+-------------+-------------+
                                   Table 2


   Secondly, sync protocol cannot well handle network disruptions caused
   by unstable network connection.  For example, some services fail to
   resume sync if the data transmission is interrupted, or incur too
   much additional recovery overhead when exception happens.  A well
   designed sync protocol that guarantees reliability and efficiency in
   mobile or wireless networks is expected.

4.6.  Unsatisfactory Concurrent Work Ability

   With the popularity of Internet storage services, collaborative work
   is becoming an important feature of such services.  This feature is
   especially important and provides convenience for a team or an



Cui, et al.               Expires April 2, 2016                [Page 11]


Internet-Draft                iss Problems                September 2015


   organization since participants could easily retrieve and edit the
   target file on the Internet.  Currently, such collaborative work
   ability is still unsatisfactory that some common and frequent
   operations may lead to redundant file versions.  More specifically,
   parallel updates from different end users may result in a version
   conflict.  If two or more users are editing the same file
   concurrently, it is hard to make the file updated correctly.  To
   ensure every participant's modification would be considered, the
   typical way is to lock the file and allow other participants to
   create different versions for the same file.  To obtain a final
   version, participants have to negotiate with each other about their
   modifications (versions) and merge the final version manually.  This
   would definitely affect the work efficiency since people have to
   spend lots of time and effort on managing redundant versions and
   merging a final version.

   A desired concurrent work ability is when different people are
   working on the same file, the client should automatically create
   exclusive versions for their users locally.  And after they finished
   and uploaded to the server, the server would automatically merge
   different versions to get a final version without any human
   involvement.  Furthermore, a better solution is like what
   [GoogleDocs] does which provides actual real-time edit.  Multiple
   people could edit the same file and are able to find each other's
   cursor and real-time operation.  Such desired ability does help to
   improve the collaborative work ability but is really challenging when
   designing a protocol.

5.  Advantages of Standard Sync Protocol

   An open and standard sync protocol between client and server can
   effectively address the problems mentioned above.  The sync protocol
   consists of two types of flows: control flow and data flow.  Control
   flow is between client and control server.  It is intended for user
   authentication, metadata management and also the active notification
   of data changes.  Data flow is between client and data storage
   servers which is only for transmitting actual file data (in the form
   of numerous chunks).  The combining work of control flow and data
   flow enables the whole data sync.  According to the analysis of
   problems above, the key capabilities should be supported as options
   in the sync protocol and it would be better if the protocol is
   network-aware.  The rest of this section lists the advantages of
   employing an open and standard sync protocol.

   First off, with a standard sync protocol provided, a third party
   client that supports multiple Internet storage services is easy to
   implement since APIs provided by different providers would be
   unnecessary or at least simplified.  This would attract more and more



Cui, et al.               Expires April 2, 2016                [Page 12]


Internet-Draft                iss Problems                September 2015


   people or organizations to develop and implement their own client
   (sometimes it is even possible for the user himself to implement his
   client).  As a result, users do not need multiple clients for
   multiple services any more and their user experience is improved.
   Furthermore, the competition in the (third party) client market is
   increasing which is benefit for the users.  They are able to choose
   their clients flexibly and the frequent update of clients enable
   users to obtain more better features and functions.

   Another advantage of having standard sync protocol is that the sync
   among different services is available or at least possible to
   achieve.  If two different services both employ the standard sync
   protocol, their users could synchronize files with each other using
   the same standard sync protocol (not the basic HTTP download any
   more).  That is to say, the user could access, retrieve, modify or
   upload files of users from other different service.

   Using standard sync protocol also makes it easy to improve Internet
   storage services.  Compared with the existing proprietary formats,
   standard sync protocol is totally open and designed by many
   contributors.  People are welcome to revise and improve the standard
   protocol.  We believe that both users and providers will benefit a
   lot from such a standard sync protocol.

6.  Understanding of Sync Protocol

    Client                 Control Server           Data Storage Server
       |                          |                          |
       |---meta data, auth info-->|                          |
       |<-------start sync--------|                          |
       |     sync preparation     |                          |
       |                          |                          |
       |--------------------store/retrieve------------------>|
       |<--------------------ok/content----------------------|
       |                         ...                         |
       |--------------------store/retrieve------------------>|
       |<--------------------ok/content----------------------|
       |                   data transmission                 |
       |                          |                          |
       |---meta data, ver info--->|                          |
       |<-----conclude sync-------|                          |
       |        sync finish       |                          |
       |                          |                          |

                               Figure 2






Cui, et al.               Expires April 2, 2016                [Page 13]


Internet-Draft                iss Problems                September 2015


   Figure 2 shows a preliminary and high level understanding of the sync
   protocol.  The whole sync process could be divided into three stages:
   sync preparation, data transmission and sync finish.  In the first
   stage, the client should exchange its metadata, authentication
   information with the control server to initiate a sync process.
   During this stage, the capabilities including network-aware chunking
   and deduplication should be performed.  In the second stage, data
   transmission, client sends/retrieves chunks to/from the data storage
   servers.  To make the sync efficient enough, the capabilities like
   bundling and delta-encoding should be employed.  When the sync
   finishes (i.e. sync finish stage), the client would send its metadata
   again for the control server to check and conclude the sync process.
   Also some version information is exchanged for the version control.
   From this understanding we could derive that the control flow and
   data flow are closely related which cannot work without each other.

7.  Related Work in IETF

   WebDAV ([RFC4918]) provides an alternative way to synchronize local
   data with remote web servers.  It can be treated as previous IETF
   effort on file collections, authoring and versioning over HTTP.
   WebDAV mainly focuses on the authoring and versioning for distributed
   web contents.  Typical WebDAV protocol extends HTTP protocol to
   enable users to collaboratively edit and manage files on remote
   servers.  However it does not consider how to make transmission
   efficient.  For example, if we modify a small portion of a large file
   in local device, WebDAV needs to upload the full file to the server.

   To improve the transmission efficiency, rsync ([rsync]) can be used
   to combine with WebDAV.  However, it is still inadequate.  In an
   Internet storage services system, files are split into chunks.  The
   rsync algorithm is originally designed to find the delta content
   between two files.  Rsync works well in the file granularity, but we
   need an approach that can work in chunk granularity, and find the
   delta of two file versions even though files are split into chunks.

   Besides, there are some other capabilities (e.g. deduplication,
   bundling and compression) that can be used to speed up transmission
   for Internet storage services.  These techniques can be used to
   improve network and storage performance for Internet storage
   services.

8.  Security Considerations (TBD)

   TBD






Cui, et al.               Expires April 2, 2016                [Page 14]


Internet-Draft                iss Problems                September 2015


9.  Acknowledgements

   The authors would like to thank Barry Leiba, Mark Nottingham, Julian
   Reschke, Marc Blanchet, Mike Bishop, Haibing Song, Philip Hallam
   Baker, Michiel de Jong and Ted Lemon for their valuable comments and
   contributions to this work.

10.  Informative References

   [Batched]  Li, Z., Wilson, C., Jiang, Z., Liu, Y., Zhao, B., Jin, C.,
              Zhang, Z., and Y. Dai, "Efficient Batched Synchronization
              in Dropbox-Like Cloud Storage Services", Middleware ,
              2013.

   [Benchmarking]
              Drago, I., Bocchi, E., Mellia, M., Slatman, H., and A.
              Pras, "Benchmarking Personal Cloud Storage", IMC , 2013.

   [ExpanDrive]
              "ExpanDrive", <http://www.expandrive.com/>.

   [GoogleDocs]
              "Google Docs",
              <http://www.google.com/intl/en/docs/about/>.

   [IFTTT]    "IFTTT", <https://ifttt.com/>.

   [Inside_Dropbox]
              Drago, I., Mellia, M., Munafo, M., Sperotto, A., Sadre,
              R., and A. Pras, "Inside Dropbox: Understanding Personal
              Cloud Storage Services", IMC , 2012.

   [Look_at_Mobile_Cloud]
              Cui, Y., Lai, Z., and N. Dai, "A First Look at Mobile
              Cloud Storage Services: Architecture, Experimentation and
              Challenge", IEEE Network , 2015.

   [QuickSync]
              Cui, Y., Lai, Z., Wang, X., Dai, N., and C. Miao,
              "QuickSync: Improving Synchronization Efficiency for
              Mobile Cloud Storage Services", MOBICOM , 2015.

   [RFC4918]  Dusseault, L., Ed., "HTTP Extensions for Web Distributed
              Authoring and Versioning (WebDAV)", RFC 4918,
              DOI 10.17487/RFC4918, June 2007,
              <http://www.rfc-editor.org/info/rfc4918>.

   [rsync]    "rsync", <https://rsync.samba.org/>.



Cui, et al.               Expires April 2, 2016                [Page 15]


Internet-Draft                iss Problems                September 2015


   [Towards]  Li, Z., Jin, C., Xu, T., Wilson, C., Liu, Y., Cheng, L.,
              Liu, Y., Dai, Y., and Z. Zhang, "Towards Network-level
              Efficiency for Cloud Storage Services", IMC , 2014.

   [users]    "400 million strong", <https://blogs.dropbox.com/
              dropbox/2015/06/400-million-users/>.

Authors' Addresses

   Yong Cui
   Tsinghua University
   Beijing  100084
   P.R.China

   Phone: +86-10-6260-3059
   Email: yong@csnet1.cs.tsinghua.edu.cn


   Zeqi Lai
   Tsinghua University
   Beijing  100084
   P.R.China

   Phone: +86-10-6278-5822
   Email: uestclzq@gmail.com


   Linhui Sun
   Tsinghua University
   Beijing  100084
   P.R.China

   Phone: +86-10-6278-5822
   Email: lh.sunlinh@gmail.com

















Cui, et al.               Expires April 2, 2016                [Page 16]

Html markup produced by rfcmarkup 1.129c, available from https://tools.ietf.org/tools/rfcmarkup/