A string-wise CRDT algorithm for smart and large-scale collaborative editing systems

A string-wise CRDT algorithm for smart and large-scale collaborative editing systems

Advanced Engineering Informatics xxx (2016) xxx–xxx Contents lists available at ScienceDirect Advanced Engineering Informatics journal homepage: www...

1MB Sizes 38 Downloads 607 Views

Advanced Engineering Informatics xxx (2016) xxx–xxx

Contents lists available at ScienceDirect

Advanced Engineering Informatics journal homepage: www.elsevier.com/locate/aei

Full length article

A string-wise CRDT algorithm for smart and large-scale collaborative editing systems q Xiao Lv a,c, Fazhi He a,b,⇑, Weiwei Cai a, Yuan Cheng a a

State Key Laboratory of Software Engineering, Wuhan University, Wuhan 430072, China School of Computer Science and Technology, Wuhan University, Wuhan 430072, China c Department of Computer Engineering, Naval University of Engineering, Wuhan 430033, China b

a r t i c l e

i n f o

Article history: Received 29 June 2016 Received in revised form 9 October 2016 Accepted 27 October 2016 Available online xxxx Keywords: Smart and large-scale collaborative editing String-wise operation Operational transformation Commutative replicated data type

a b s t r a c t With the development of big data and cloud computing, real-time collaborative editing systems have to face new challenges. How to support string-wise operations for smart and large-scale collaborations is one of the key issues in next generation of collaborative editing systems, which is both the core topic of collaborative computing area and the fundamental research of many collaborative systems in science and engineering. However, string-wise operations have troubled the existing collaborative editing algorithms, including Operational Transformation (OT) and Commutative Replicated Data Type (CRDT), for many years. This paper proposes a novel and efficient CRDT algorithm that integrates string-wise operations for smart and massive-scale collaborations. Firstly, the proposed algorithm ensures the convergence and maintains operation intentions of collaborative users under an integrated string-wise framework. Secondly, formal proofs are provided to prove both the correctness of the proposed algorithm and the intentions preserving of string-wise operations. Thirdly, the time complexity of the proposed algorithm has been analyzed in theory to be lower than that of the state of the art OT algorithm and CRDT algorithm. Fourthly, experiment evaluations show that the proposed algorithm outperforms the state of the art OT algorithm and CRDT algorithm. Ó 2016 Elsevier Ltd. All rights reserved.

1. Introduction Collaborative editing systems (CESs) allow multiple geographically dispersed users to view and edit the shared document over computer networks, which have been a core topic of continuous research in Computer Supported Cooperative Work (CSCW). Over the past 25 years, an increasing number of collaborative editing algorithms have been researched, developed and applied for collaborative systems in science and engineering, e.g. Google Wave/ Docs,1 2D spreadsheets [1], 2D images [2,3], 3D digital media design systems [4–6], 2D/3D Computer-Aided Design [7–10] and so on. More recently, with the development of big data and cloud computing [11–17], CESs increasingly tend to smart and largescale collaborations, which have to face new technical challenges. Smart and large-scale CESs support tens or hundreds of collaborators to share and exchange the intention, idea, knowledge and

q

Fully documented templates are available in the elsarticle package on CTAN.

⇑ Corresponding author at: State Key Laboratory of Software Engineering, Wuhan University, Wuhan 430072, China. E-mail address: [email protected] (F. He). 1 http://docs.google.com.

wisdom of people in a large-scale collaborative scenario [18–20]. As stated in [21], the CESs should ‘‘make a more intelligent and semantically meaningful usage of the network resources”. Only if the ‘‘atomic operation of collaborative editing” is advanced from ‘‘character-wise operation” to ‘‘string-wise operation”, will the knowledge-based collaboration be effectively supported. In other words, the collaborative editing operations are always ‘‘knowl edge-grained” such as blocks or paragraphs in smart and massive-scale collaborations. Therefore, the string-wise CESs become the foot-stone of smart collaborations. In addition, for a large-scale collaboration [22], in which a large amount of users edit the shared document simultaneously, the shared document will be updated frequently. This situation will generally lead to the decrease of collaborative computing performance [23,24]. How to enhance the computing performance is another challenge for the success of smart and massive-scale collaborative editing systems. The ‘‘string-wise operation” has a potential advantage over the ‘‘character-wise operation” for high efficient collaborations in large-scale collaborative applications. In a short, string-wise CESs have been the research focus for smart and large-scale collaborative applications in the time of big data and cloud computing.

http://dx.doi.org/10.1016/j.aei.2016.10.005 1474-0346/Ó 2016 Elsevier Ltd. All rights reserved.

Please cite this article in press as: X. Lv et al., A string-wise CRDT algorithm for smart and large-scale collaborative editing systems, Adv. Eng. Informat. (2016), http://dx.doi.org/10.1016/j.aei.2016.10.005

2

X. Lv et al. / Advanced Engineering Informatics xxx (2016) xxx–xxx

As a great extension of CSCWD2016 paper [25], this paper proposes a string-wise CRDT algorithm for smart and large-scale collaborative editing systems to solve the new challenges. Major contributions of this paper are listed below. (1) The proposed algorithm can preserve operation intentions of collaborative users and maintain the consistency of the shared document. (2) The proposed algorithm has been formally proved its correctness and the intention preserving of string-wise operations as long as it can satisfy two conditions operation commutativity (OC) and precedence transitivity (PT). (3) The time complexity of the proposed algorithm is analyzed in theory to be lower than that of the state of the art OT algorithm and CRDT algorithm. (4) The experiment evaluations show that the proposed algorithm has better computing performance than that of the state of the art OT algorithm and CRDT algorithm.

2. Background and related work A fully replicated architecture is adopted in CESs in order to achieve high responsiveness, which brings a great challenge for the consistency maintenance [26–29]. OT algorithms are particularly suitable for consistency maintenance and have been proposed for nearly three decades [1–5,7,8,26,27,30–39]. A plethora of OT algorithms have been increasingly developed for collaborative applications in sciences and engineering, such as Jupiter [30], Nice [31], IBM OpenCoWeb,2 CoWord.3 The main idea of OT algorithms is that local editing operations are executed as soon as they are issued and then propagated to remote sites. Remote operations need to be transformed with concurrent operations before their executions in order to repair divergence. The major advantage of OT algorithms is the high responsiveness of local operations. Multiple users may freely and simultaneously generate and edit local operations. Despite the good local responsiveness, how to support string-wise operations has troubled the existing OT algorithms.  Since the first OT algorithm developed by Ellis and Gibbs [26], most published OT algorithms only support character-based operations due to the inherent sophistication of OT, which are not suitable for smart and large-scale collaborations. GOT is the first work which describes how to support string-based operations [27], but no published work shows how to achieve string-based operations [36]. ABTS supports string-based primitive operations and handles overlapping and splitting of operations, but the time complexity is OðjHj2 Þ [36]. Based on ABTS, ABTSO has improved the time complexity to OðjHjÞ by keeping history operations according to the operation effects relation [35]. To the best of our knowledge, ABTSO has the best computing performance in a representative class of OT algorithms in publications. Despite the good computing performance, there is a space for improvement. In addition, ABTSO cannot support string-wise deletions. In recent years, another class of collaborative editing algorithms called CRDT have been proposed and gradually become the hot research in collaborative computing and distributed computing [23,40–46]. The main idea of CRDT algorithms is to design commutative concurrent operations. Hence, transformations are not required anymore and concurrent operations can be executed in any order. By assigning unique identifiers for all objects of opera2 3

https://github.com/opencoweb/coweb#readme. http://www.codoxware.com.

tions, CRDT algorithms can place all objects into abstract data structure in a total order. Therefore, CRDT algorithms can preserve operation intentions of collaborative users and guarantee eventual consistency. CRDT algorithms have been proved to outperform traditional algorithms by a factor between 25 and 1000 [44,45]. However, CRDT algorithms are quite young, how to support string-wise operations has been a challenge issue.  In the typical CRDT algorithms, except the literature [23,45], most existing CRDT algorithms only support character-based operations. The literature [45] is based on WOOT [40], this work uses a WOOT-like way to sort concurrent strings in the same position, the time complexity of integrating remote insertions 2

is Oðk Þ; k is the number of the concurrent insertions. With the increase of the number of concurrent insertions, it costs much higher computing time. The literature [23] is based on LOGOOT [41,42], strings are assigned to unique compressed identifiers with letters of the alphabet, which can reduce the memory consumption. However, the literature [23] only support unbreakable line-based operations, which cannot handle splitting of lines. In addition, similar to LOGOOT, how to make sure causality is not given. 3. Proposed algorithm 3.1. The integrated string-wise framework A real-time collaborative system consists of a large number of collaborative sites. Every site maintains a two-layer data structure including View and Model. Model is composed of a hash table called HT and a double-linked list named Lmodel . Lmodel links all visible and invisible nodes in a total order. Every node represents a string including a paragraph or a block. HT stores all original and splitting nodes. View is composed of a double-linked list named Lv iew , which can provide the interaction interface for collaborative users. Lv iew links all the visible nodes of Lmodel . The whole framework is shown in Fig. 1. The control procedure is as follows. A user at each site can concurrently generate local operations and receive remote operations from other sites. The integrated procedures of both local operations and remote operations include two steps. Firstly, local and remote operations need to find the target node in HT with unique identifiers. Secondly, the correct operation position needs to be found in Lmodel before their executions. The procedure of synchronization between View and Model needs to make the effects of integrated updates appear in View. 3.2. Basic operations and splitting functions The basic primitive operations include string-wise insertions and string-wise deletions. A user may run the following operations as follows. (1) LocalInsert(ID tar key; int pos; string str; ID key). (2) RemoteInsert(int pos; ID tar key; string str; ID key). (3) LocalDelete(ID tar key; int pos; int del len; ID key). (4) RemoteDelete(int pos; int del len; ID key list; ID key). The parameter tar key is used for finding the target node in HT. The parameter pos is an integer index, which is used for finding the operation position in Lmodel . The parameter str is the inserted string, which is specially used for an insertion. The parameter key is the identifier of the inserted(deleted) string. The parameters del len is the length of the deleted string, which is specially used for a deletion. The parameter key list is used for reserving multiple IDs of deleted nodes. The target node may be split by current operations. The splitting cases are as follows. (1) The target node is split into two sub-nodes by insertions or deletions, which is shown in

Please cite this article in press as: X. Lv et al., A string-wise CRDT algorithm for smart and large-scale collaborative editing systems, Adv. Eng. Informat. (2016), http://dx.doi.org/10.1016/j.aei.2016.10.005

X. Lv et al. / Advanced Engineering Informatics xxx (2016) xxx–xxx

3

Fig. 1. The integrated string-wise framework.

Figs. 2–4. (2) The target node is split into three sub-nodes by deletions, which is shown in Fig. 5. Fig. 2 presents how to insert the string ‘‘abcde” within the target node(‘‘lmnopqr”). After the insertion, the target node is split into two sub-nodes FNode and LNode. Figs. 3 and 4 present how to delete the prior part and the rear part of the target node. After the deletion, the target node is split into two sub-nodes FNode and LNode. Fig. 5 illustrates how to delete the middle part of the target node. After the deletion, the target node is split into three sub-nodes FNode; MNode and LNode. Algorithm 1 specifies how to split the target node(Tar Node) into two sub-nodes. Algorithm 2 presents how to split the target node(Tar Node) into three sub-nodes, whose procedure is similar to that of Algorithm 1.

Fig. 2. Insert the string within the target node

Algorithm 1. SplitTwoNode(Tar Node; pos) INPUT: Tar Node; pos FNode = Tar Node.Clone(), FNode:key = Tar Node:key.Clone() FNode:key:offset = Tar Node:key:offset; FNode:key:len = pos FNode:content = Tar Node:content.Substring(0, pos) LNode = Tar Node.Clone(), LNode:key = Tar Node:key.Clone() LNode:key:offset = Tar Node:key:offset þ pos LNode:key:len = Tar Node:key:len  pos LNode:content = Tar Node:content.Substring (pos; Tar Node:key:len-FNode:key:len) 9: Tar Node:flag ¼ 1, Tar Node. list = fFNode; LNodeg 10: OUTPUT:fFNode; LNodeg

1: 2: 3: 4: 5: 6: 7: 8:

Fig. 3. Delete the prior part of the target node.

Algorithm 2. SplitThreeNode(Tar Node; pos; del len) INPUT: Tar Node; pos; del len FNode = Tar Node.Clone(), FNode:key = Tar Node:key.Clone() FNode:key:offset = Tar Node:key:offset; FNode:key:len = pos-1 FNode:content = Tar Node:content.Substring (0,FNode:key:len) 5: MNode = Tar Node.Clone(), MNode:key = Tar Node:key.Clone () 6: MNode:key:offset = Tar Node:key:offset þ pos  1 7: MNode:key:len = del len; MNode:content = Tar Node:content. Substring (FNode:key:len; MNode:key:len) 8: LNode = Tar Node.Clone(), LNode:key = Tar Node:key.Clone() 9: LNode:key:offset = MNode:key:offset + del len 10: LNode:key:len = Tar Node:key:len-FNode:key:len-MNode:key:len 11: LNode:content = Org Node:content.Substring(FNode:key:len + MNode:key:len; LNode:key:len) 12: Tar Node:flag = 1, Tar Node. list = fFNode; MNode; LNodeg 13: OUTPUT: fFNode; MNode; LNodeg 1: 2: 3: 4:

Fig. 4. Delete the rear part of the target node.

Fig. 5. Delete the middle part of the target node.

3.3. Integrating string-wise insertions Algorithm 3 presents how to integrate a local string-wise insertion. In line 2, the target node (Tar Node) is found in Lmodel using the function hashðtar keyÞ and a new node (New Node) is created for the inserted string. The position of Tar Node has two cases. Firstly,

Please cite this article in press as: X. Lv et al., A string-wise CRDT algorithm for smart and large-scale collaborative editing systems, Adv. Eng. Informat. (2016), http://dx.doi.org/10.1016/j.aei.2016.10.005

4

X. Lv et al. / Advanced Engineering Informatics xxx (2016) xxx–xxx

the position is head of Lmodel ; New Node is inserted after head and links for New Node are created (lines 3–4). Secondly, Tar Node is not head, according to the operation position (pos), New Node is inserted after Tar Node or within Tar Node (lines 6–13). In lines 6–7, New Node is inserted after Tar Node. In lines 9–11, New Node is inserted within Tar Node and Tar Node is split into two sub-nodes FNode and LNode. Then, links for FNode; LNode and New Node are created and FNode and LNode need to be placed into HT. In line 14, New Node is placed into HT. At last, a remote insertion is generated and broadcast to remote sites (line 15). Algorithm 3. LocalInsert(tar key; pos; str; key) 1: INPUT: tar key; pos; str; key 2: Tar Node = hash(tar key), New Node = new node(key; str) 3: if New Node is head then 4: double link New Node next to head 5: else 6: if pos == Tar Node:key:len then 7: double link New Node next to Tar Node 8: else 9: fFNode; LNodeg = SplitTwoNode(Tar Node; pos) 10: create links for FNode; New Node and LNode 11: place FNode and LNode into HT 12: end if 13: end if 14: place New Node into HT 15: broadcast RemoteInsert (pos; tar key; str; key) to remote sites 16: OUTPUT: Lmodel

Algorithm 4 gives how to integrate a remote string-wise insertion. Algorithm 4. RemoteInsert(pos; tar key; str; key) 1: INPUT: pos; tar key; str; key 2: Tar Node = FindNode(tar key; pos),New Node = new node (key; str) 3: if Tar Node is not head then 4: if pos==Tar Node:key:len then 5: while ðTar Node! = null and key < Tar Node:keyÞ do 6: Tar Node = Tar Node:next 7: end while 8: double link New Node after Tar Node 9: else 10: fFNode; LNodeg = SplitTwoNode(Tar Node; pos) 11: create links for FNode; New Node and LNode 12: place FNode; LNode into HT 13: end if 14: else 15: initialize pre = head; cur = head:next 16: while ðcur!=nullÞ do 17: if key < cur:key then 18: pre = cur; cur = cur.next 19: end if 20: end while 21: double link New Node after pre 22: end if 23: place New Node into HT 24: OUTPUT: Lmodel

As specified in Algorithm 4, in line 2, Tar Node is found using the function FindNode(tar key; pos) and New Node is created for the inserted string. According to the position of Tar Node, there are two cases. Firstly, Tar Node is not head of Lmodel (lines 3–13). When New Node is inserted after Tar Node, some concurrent insertions may have already inserted their new nodes next to Tar Node, so we need to compare keys of new nodes with New Node:key until a target node whose key  New Node:key is firstly encountered is inserted within (lines 5–7). When New Node Tar Node; Tar Node is split into two sub-nodes FNode and LNode (lines 10–12). Secondly, Tar Node is head of Lmodel (lines 15–21). Some other concurrent insertions may have already inserted their new nodes next to head. Therefore, it is necessary to scan the nodes next to head until a node whose key  Tar Node:key is firstly encountered. In line 23, New Node is placed into HT. The function FindNode (tar key; pos) is specified in Algorithm 5. Algorithm 5. FindNode(tar key; pos) 1: INPUT: tar key; pos 2: Tar Node = hash(tar key) 3: while ð Tar Node:flag == 1 Þ do 4: if pos<¼Tar Node:list ½0:key:len then 5: Tar Node = Tar Node:list ½0 6: else if pos<¼Tar Node:list½1:key:offset + Tar Node:list ½1:key:len then 7: Tar Node = Tar Node:list ½1 8: else 9: Tar Node = Tar Node:list ½2 10: end if 11: end while 12: OUTPUT: Tar Node

As specified in line 2, Tar Node is found using the function hash (tar key). Tar Node may have been split into sub-nodes by concurrent operations, therefore, we need to find the target sub-node (lines 3–11). In lines 4–5, the target sub-node is the first splitting sub-node. In lines 6–7, the target node is the second splitting sub-node. In line 9, the target sub-node is the third splitting subnode. 3.4. Integrating string-wise deletions Algorithm 6 presents how to integrate a local string-wise deletion. Algorithm 6. LocalDelete(tar key; pos; del len; key) 1: INPUT: tar key; pos; del len; key 2: Tar node = hash(tar key), l = Tar Node:key:len 3: if pos == 1 and del len == l then 4: key list.Add(DeleteWholeNode(Tar Node)) 5: end if 6: if pos == 1 and del len < l then 7: key list.Add(DeletePriorNode(Tar Node; del len)) 8: end if 9: if pos > 1 and pos + del len  1 == l then 10: key list.Add(DeleteLastNode(Tar Node; pos  1)) 11: end if 12: if pos > 1 and pos + del len  1 < l then 13: key list.Add(DeleteMiddleNode(Tar Node; pos; del len))

Please cite this article in press as: X. Lv et al., A string-wise CRDT algorithm for smart and large-scale collaborative editing systems, Adv. Eng. Informat. (2016), http://dx.doi.org/10.1016/j.aei.2016.10.005

X. Lv et al. / Advanced Engineering Informatics xxx (2016) xxx–xxx

14: 15: 16: 17: 18: 19:

end if if pos > 1 and pos + del len  1 > l then key list.Add(DeleteMutipleNode(Tar Node; pos; del len)) end if broadcast(pos; del len; key list; key) to other sites OUTPUT: Lmodel

As specified in Algorithm 6, we need to consider five cases. In lines 3–5, the whole Tar node is deleted and kept as a tombstone, the identifier of Tar node is placed into key list by using the function DeleteWholeNode (Tar Node). In lines 6–8, the prior part of Tar node is deleted and kept as a tombstone. Identifiers of two sub-nodes are placed into key list by using the function DeletePriorNode (Tar Node; del len). In lines 9–11, the rear part of Tar node is deleted and kept as a tombstone. Identifiers of two sub-nodes are placed into key list by using the function DeleteLastNode (Tar Node; pos  1). In lines 12–14, the middle part of Tar node is deleted and kept as a tombstone. Identifiers of three sub-nodes are placed into key list by using the function DeleteMiddleNode (Tar Node; pos; del len). In lines 15–17, multiple target nodes are deleted, the procedure needs to be considered the above four cases by using the function DeleteMutipleNode (Tar Node; pos; del len). In line 18, a remote string-wise deletion is generated and broadcast to remote sites. Algorithm 7 presents how to integrate a remote string-wise deletion. Two cases need to be considered. Firstly, the deleted string involves only one node (lines 3–4). Secondly, the deleted string involves multiple nodes (lines 6–15). In lines 6–7, the first node is deleted. In lines 8–13, middle nodes are deleted. In lines 14–15, the last node is deleted. The function Del is used for finding and deleting splitting target nodes, which is presented in Algorithm 8. Algorithm 7. Remotedelete(pos; del len; key list; key) 1: INPUT: pos; del len; key list; key 2: count = key list.count 3: if count == 1 then 4: Tar Node = hash(key list ½0), Del(pos; del len; Tar Node) 5: else 6: Tar Node = hash(key list ½0) 7: Del(pos; Tar Node:key:len-pos þ 1; Tar Node) 8: sum_len = Tar Node:key:len-pos þ 1, p = 1 9: for (int i = 1; i < count  1; i++) do 10: tempnode = hash(key list ½i) 11: Del(p,key list½i:len; tempnode) 12: sum_len þ ¼ key list½i:len 13: end for 14: lastlen = del len-sum_len, lastnode = hash (key list ½count  1) 15: Del(p,lastlen,lastnode) 16: end if 17: OUTPUT: Lmodel

Algorithm 8. Del(pos; del len; Tar Node) 1: INPUT: pos; del len; Tar Node 2: sub1 = Tar Node:list½0; sub2 = Tar Node:list½1; sub3 = Tar Node:list½2 3: l1 = sub1:key:len; l2 = sub2:key:len; l = Tar Node:key:len 4: if Tar Node:flag == 0 then 5: if pos == 1 and del len == l then

5

6: DeleteWholeNode(Tar Node) 7: else if pos == 1 and del len < l then 8: DeletePriorNode(Tar Node; del len) 9: else if pos > 1 and pos þ len  1 == l then 10: DeleteLastNode(Tar Node; pos) 11: else 12: DeleteMiddleNode(Tar Node; pos; del len) 13: endi f 14: else 15: if pos <¼ l1 and pos þ del len  1 <¼ l1 then 16: Del(pos; del len; sub1) 17: else if pos <¼ l1 and del len  ðl1  pos þ 1)<¼l2 then 18: Del(pos; l1-pos þ 1; sub1),Del(1,del len-(l1-pos þ 1Þ; sub2) 19: else if pos <¼ l1 and del len  ðl1  pos þ 1)>¼l2 then 20: p = l1-pos þ 1,Del(pos,p,sub1),Del(1,l2; sub2),Del (1,del len-p-l2; sub3) 21: else if pos > l1 and pos  l1<¼l2 and pos  l1þ del len  1<¼l2 then 22: p = pos-l1,Del(p,del len; sub2) 23: else if pos > l1 and pos  l1<¼l2 and pos  l1þ del len  1<¼l2 then 24: p = pos-l1,Del(p,l2-p + 1,sub2),Del(1,del len-(l2-p + 1), sub3) 25: else 26: q = pos-l1-l2,Del(q,del len; sub3) 27: end if 28: end if 29: OUTPUT: Lmodel

As specified in Algorithm 8, we need to consider two cases. First case, Tar Node has not been split (lines 4–13). Based on the value of pos and del len, the corresponding part of Tar Node is deleted or the whole Tar Node is deleted. Second case, Tar Node has been split into sub-nodes sub1; sub2 and sub3, the function Del is called to find and delete splitting sub-nodes. (lines 15–28). In lines 15–16, the deleted string involves sub1. In lines 17–18, the deleted string involves sub1 and sub2. In lines 19–20, the deleted string involves all three sub-nodes. In lines 21–22, the deleted string involves sub2. In lines 23–24, the deleted string involves sub2 and sub3. In lines 25–26, the deleted string involves sub3. 3.5. Synchronization between View and Model Algorithm 9 presents the synchronization procedure between View and Model. In line 3, we get the first node of Lmodel . Then, we scan nodes of Lmodel one by one. When the node is not a tombstone, we place the node into Lv iew (lines 4–9). Algorithm 9. Synchronizing(Lmodel ) 1: INPUT: Lmodel 2: Lv iew = new List < node >() 3: node = head:next 4: while ðnode!= nullÞ do 5: if node.IsVisible() then 6: Lv iew .Add(node) 7: end if 8: node=node:next 9: end while 10: OUTPUT: Lv iew

Please cite this article in press as: X. Lv et al., A string-wise CRDT algorithm for smart and large-scale collaborative editing systems, Adv. Eng. Informat. (2016), http://dx.doi.org/10.1016/j.aei.2016.10.005

6

X. Lv et al. / Advanced Engineering Informatics xxx (2016) xxx–xxx

At site2, O5 is generated, after executing O5 ; nodeO2 = ‘‘bbb” has

4. An integrated example Fig. 6 shows a specific scenario that involves three collaborative sites. Assume that three sites start from the same initial state ‘‘ ”, the number of session is 1 and IDs of three sites have such relations: ID[site1] < ID[site2] < ID[site3]. Sij represents the j state of site i and represents the tombstone of b. In order to describe splitting nodes and the total order relation of all nodes, we use ‘‘-” to link all nodes. Site1 generates O1 = insertðnull, 0, ‘‘aaa”, IDnodeO Þ and 1

O4 = insertðIDnodeO2 ,

2,

IDnodeO4 Þ.

‘‘ddd”,

Site2

generates

O2 = insertðnull, 0, ‘‘bbb”, IDnodeO Þ and O5 = insertðIDnodeO , 1, ‘‘eee”, 2

2

IDnodeO5 Þ. Site3 generates O3 = insertðnull, 0, ‘‘ccc”, IDnodeO Þ and 3

O6 = deleteðIDnodeO , 1, 2, IDnodeO Þ. State Vectors of all operations 3

6

are: SVðO1 Þ = ð1; 0; 0Þ; SVðO2 Þ = ð0; 1; 0Þ; SVðO3 Þ = ð0; 0; 1Þ; SVðO4 Þ = ð2; 1; 0Þ; SVðO5 Þ = ð0; 2; 0Þ; SVðO6 Þ = ð0; 0; 2Þ. IDs of all operation nodes are: IDnodeO = ð1; 1; 1; 0; 3Þ; IDnodeO = ð1; 1; 2; 0; 3Þ; IDnodeO = 1

2

3

ð1; 1; 3; 0; 3Þ; IDnodeO = ð1; 3; 1; 0; 3Þ; IDnodeO5 = ð1; 2; 2; 0; 3Þ; IDnodeO = 4

6

ð1; 2; 3; 0; 2Þ. All operations are integrated as in Fig. 6, which is explained in three stages.

been split into two sub-nodes ‘‘b” and ‘‘bb”, S22 = ‘‘b - eee - bb”. The remote operation of O5 is denoted as O5remote = (1, IDnodeO ,‘‘eee”, 2

IDnodeO5 ). Then, O5remote is broadcast to other sites. At site3, O6 is generated, after executing O6 ; nodeO3 = ‘‘ccc” has been split into two sub-nodes ‘‘ ” and ‘‘c”, S32 = ‘‘ - c”. The remote operation of O6 is denoted as O6remote = (1, 2, IDnodeO ; IDnodeO ). Then, O6remote is broadcast to other sites. 3

6

4.3. Stage three At site1, when O5remote is received, the target node of O5remote is nodeO2 . nodeO2 has been split into two sub-nodes ‘‘bb” and ‘‘b” at site1. According to pos of O5remote , find the effective target subnode ‘‘bb” and insert ‘‘eee” within a string ‘‘bb”. Then, ‘‘bb” is split into two sub-nodes ‘‘b” and ‘‘b”. At last, after executing O5remote ; S14 = ‘‘b - eee - b - ddd - b - aaa”. When O3remote is received, the target node of O3remote is head. Some other concurrent insertions have already inserted their new nodes next to head, it is necessary to compare IDnodeO with keys of other concurrent inserted nodes. 3

Because there exists IDnodeb IDnodeO ; nodeO3 needs to be inserted 3

4.1. Stage one At site1, O1 is generated, after executing O1 ; S11 = ‘‘aaa”. The remote operation of O1 is denoted as O1remote = (0, null, ‘‘aaa”, IDnodeO ). Then, O1remote is broadcast to other sites. 1

At site2, O2 is generated, after executing O2 ; S21 = ‘‘bbb”. The remote operation of O2 is denoted as O2remote = (0, null, ‘‘bbb”, IDnodeO2 ). Then, O2remote is broadcast to other sites. O3 ; S31

= ‘‘ccc”. The At site3, O3 is generated, after executing remote operation of O3 is denoted as O3remote = (0, null, ‘‘ccc”, IDnodeO3 ). Then, O3remote is broadcast to other sites.

before ‘‘b”, S15 = ‘‘ccc - b - eee -b - ddd -b - aaa”. When O6remote is received, the target node nodeO3 has not been split. After executing O6remote ; S16 = ‘‘ - c - b - eee - b - ddd - b - aaa”. At site2, O3remote is received, the target node of O3remote is head. Some other concurrent insertions have already inserted their new nodes next to head, it is necessary to compare IDnodeO with 3

keys of other concurrent inserted nodes. Because there exists IDnodeb IDnodeO ; nodeO3 need to be inserted before ‘‘b”, S23 = ‘‘ccc - b 3

- eee - bb”. When O1remote is received, the target node of O1remote is also head. By the same way, IDnodeO needs to be compared to keys of 1

other concurrent inserted nodes. There exists IDnodeO  IDnodeO ; 1

IDnodeO  IDnodeb ; IDnodeO 1

4.2. Stage two At site1, when O2remote is received, there exists IDnodeO1 IDnodeO2 , therefore,

nodeO2

is

before

nodeO1 .

After

executing

O2remote ; S12 = ‘‘bbb - aaa”. Then, O4 is generated, after executing O4 ; nodeO2 = ‘‘bbb” has been split into two sub-nodes ‘‘bb” and ‘‘b”, S13 = ‘‘bb - ddd - b - aaa”. The remote operation of O4 is denoted as O4remote = (2,IDnodeO ,‘‘ddd”,IDnodeO ). Then, O4remote is broadcast to 2

other sites.

4

1

 IDnodeO5 ; IDnodeO

1

3

 IDnodebb . Therefore,

nodeO5 needs to be inserted after ‘‘bb”, S24 = ‘‘ccc - b - eee - bb - aaa”. When O6remote is received, the target node of O6remote is nodeO6 that has not been split. After executing O6remote ; S25 = ‘‘ - c - b - eee - bb aaa”. When O4remote is received, the target node of O4remote is nodeO2 and nodeO2 has been split into sub-nodes ‘‘b” and ‘‘bb”. According to pos of O4remote , find the target node ‘‘bb” and insert ‘‘ddd” within the sub-node ‘‘bb”. Then, ‘‘bb” is split into two sub-nodes ‘‘b” and ‘‘b”. At last, after executing O4remote ; S26 = ‘‘ ddd - b - aaa”.

- c - b - eee - b -

Fig. 6. A scenario of three sites starting from the initial state.

Please cite this article in press as: X. Lv et al., A string-wise CRDT algorithm for smart and large-scale collaborative editing systems, Adv. Eng. Informat. (2016), http://dx.doi.org/10.1016/j.aei.2016.10.005

7

X. Lv et al. / Advanced Engineering Informatics xxx (2016) xxx–xxx

At site3, O2remote is received, the target node of O2remote is head. Some other concurrent insertions have already inserted their new nodes next to head; IDnodeO needs to be compared to keys of 2

other concurrent inserted nodes. There exsits IDnodeO ID of ‘‘ 2

”,

IDnodeO IDnodec , therefore, nodeO2 needs to be inserted after ‘‘c”, 2

S33 = ‘‘ - c - bbb”. When O1remote is received, the target node of O1remote is head. By the same way, IDnodeO needs to be compared to 1

keys of other concurrent inserted nodes. There exists IDnodeO ID 1

of ‘‘

”, IDnodeO IDnodec ; IDnodeO IDnodeO . Therefore, nodeO1 needs 1

1

2

- c - bbb - aaa”. When O4remote to be inserted after ‘‘bbb”, S34 = ‘‘ is received, the target node of O4remote is nodeO2 that has not been split. After executing O4remote ; S35 = ‘‘ - c - bb - ddd - b - aaa”. When O5remote is received, the target node of O5remote is nodeO2 . Because nodeO2 has been split into sub-nodes ‘‘bb” and ‘‘b”, according to pos of O5remote , the target node ‘‘bb” is found. Then, ‘‘eee” is inserted within the sub-node ‘‘bb”. After executing O5remote ; S36 = ‘‘ -c-beee - b - ddd - b - aaa”. It is obvious that all three sites converge in the same final state ‘‘ - c - b - eee - b - ddd - b - aaa”. The final Lmodel in all three sites is shown in Fig. 7.

5. Time complexity analysis and experimental evaluation 5.1. Time complexity analysis As specified in Algorithms 3 and 6, the proposed algorithm can perform Oð1Þ time to integrate local insertions and local deletions, because the target node is searched via HT directly using unique ID. Assume that the average number of characters in every node is d. Then, the time complexity of a local character insertion(deletion) is Oð1=dÞ. Assume that the average splitting times of the target node is m and the average number of concurrent inserted nodes is n. As specified in Algorithm 4, the best-case time complexity of a remote character insertion is Oð1=dÞ as the target node has not been split by other operations, then the target node is searched via HT directly. The worst-case time complexity of a remote character insertion is Oððm þ nÞ=dÞ in case that the target node has been split for m times and the number of concurrent inserted nodes is n. As shown in Algorithm 7, the computing time of a remote character deletion depends on the recursive function of Del presented in Algorithm 8. The best-case time complexity of a remote character deletion is Oð1=dÞ in case that the target node has not been split,

then the target node is searched via HT directly. The worse-case time complexity of a remote character deletion is Oðc  3m =dÞ; c is a constant. In this case, the target node has been split for m times. At each split, the target node is further split into three sub-nodes and the deleted string involves three sub-nodes. Table 1 shows the average time complexity of a character operation for RGA, ABTSO and our approach. N is the average number of nodes including tombstones. Obviously, our approach outperforms RGA and ABTSO, especially in local operations and the best-case remote operations. More significantly, in our approach, with the number of characters increases, the computing time of a character operation decreases. 5.2. Experimental evaluation In this section, we perform some experiments on our approach to verify if our approach actually works as the theoretical analysis. We compare our approach with previous RGA and ABTSO, because RGA and ABTSO have the best average performance in a representative class of CRDT algorithms and OT algorithms. All three algorithms are implemented in C# language and compiled by Visual Studio 2010 on windows 7 system, Intel Core i7, 3.30 GHz CPU. Until now, there are no available public data sets on the collaborative editing, performance evaluations generally build on the operation log generated by collaboration. We design the real-time collaboration in order to obtain operation logs. Then, we replay all operations on three algorithms. The collaborative process is as follows. We assume that two sites concurrently generate M string-wise operations and N string-wise operations at random positions respectively. The number of characters in each operation is between 1 and 1000. After every site executing all local operations and remote operations, the same procedure continues. We run three algorithms for 25 times in each operation log and the average values are recorded. 5.2.1. Computing time of our approach and RGA (1) Assume that there are no splitting nodes in our approach, we present the performance behaviour over time for RGA and our approach. When each string-wise operation includes fixed-size characters, with the number of operations increases, we present the performance behaviour over time for RGA and our approach, which is shown in Fig. 8. As shown in Fig. 8, when the number of operations increases, the execution time of local character operations of RGA increases

Fig. 7. The final data structure of Lmodel in all three sites.

Table 1 The average time complexity of character operations for RGA, ABTSO and our approach. Algorithms

Local operations

Remote operations

Insertion

Deletion

Insertion

Deletion

RGA

OðNÞ

OðNÞ

Best: Oð1Þ Worst: OðnÞ

Oð1Þ

ABTSO

OðjHd j=dÞ

OðjHd j=dÞ

OððjHi j þ jHd jÞ=dÞ

OððjHi j þ jHd jÞ=dÞ

Our approach

Oð1=dÞ

Oð1=dÞ

Best: Oð1=dÞ Worst: Oððm þ nÞ=dÞ

Best: Oð1=dÞ Worst: Oðc  3m =dÞ

Please cite this article in press as: X. Lv et al., A string-wise CRDT algorithm for smart and large-scale collaborative editing systems, Adv. Eng. Informat. (2016), http://dx.doi.org/10.1016/j.aei.2016.10.005

8

X. Lv et al. / Advanced Engineering Informatics xxx (2016) xxx–xxx

Fig. 8. The computing time of character operations when the delete ratio is set to 20% and operations increase with step 500.

greatly. In particular, when the number of operations is 3000, the execution time of local and remote character operations for RGA are respectively 1708.9 ls and 1.033 ls. On the contrary, the execution time of local and remote character operations for our approach are individually 0.017 ls and 0.05 ls. As shown in Fig. 8, the execution time of local and remote character operations for our approach decreases when the number of operations increases. Apparently, our approach has lower execution time than that of RGA. When the number of operations is fixed, with the number of characters increases, we present the performance behaviour over time for RGA and our approach, which is shown in Fig. 9. As shown in Fig. 9, compared with RGA, the average execution time of local and remote character operations for our approach is much lower. In particular, with the increase of the numbers of characters, the average execution time of local character opera-

tions for RGA increases greatly. On the contrary, the execution time of both local and remote character operations for our approach decreases gradually. (2) Assume that there are some splitting nodes in our approach, we present the performance behaviour over time for RGA and our approach. Fig. 10 presents the execution time of remote character operations for RGA and our approach when the deletion ratio is 20%. Obviously, the performance of our approach overwhelms RGA greatly. Even though all nodes are split into sub-nodes in our approach(SP = 100%), the execution time of our approach is much lower than that of RGA. Table 2 presents the integration time of remote character operations for different types of operations in our approach. We want to see which kind of operation takes the maximum time or the least time under the same number of splitting nodes. For deletions,

Fig. 9. The computing time of character operations when the delete ratio is set to 20% and the number of characters increases with step 50.

Please cite this article in press as: X. Lv et al., A string-wise CRDT algorithm for smart and large-scale collaborative editing systems, Adv. Eng. Informat. (2016), http://dx.doi.org/10.1016/j.aei.2016.10.005

9

X. Lv et al. / Advanced Engineering Informatics xxx (2016) xxx–xxx

Fig. 10. The computing time of remote character operations when the delete ratio is set to 20%.

Table 2 A comparison of the integration time for different types of operations under the same number of splitting nodes. N = 10,000, the number of characters is 100, the number of splitting nodes varies from 200 to 2000 with step 300 Our approach Insert(2) Delete(2) Delete(3) Delete(n)

200 0.0102 0.0110 0.0112 0.0140

500 0.0110 0.0112 0.0114 0.0144

800 0.0132 0.0138 0.0160 0.0170

1100 0.0140 0.0160 0.0170 0.0190

1400 0.0146 0.0164 0.0172 0.0208

1700 0.0150 0.0168 0.0176 0.0220

2000 0.0160 0.0174 0.0180 0.0230

we consider three splitting cases. Firstly, all nodes are split into two sub-nodes by deletions, which is denoted as Delete(2). Secondly, all nodes are split into three sub-nodes by deletions, which is denoted as Delete(3). Thirdly, the deleted string involves multiple nodes, which is denoted as Delete(n). For insertions, we consider the case that the target node is split into two subnodes, which is denoted as Insert(2). As shown in Table 2, when different kinds of operations have the same number of splitting nodes, Insert(2) takes the least time and Delete(n) takes the maximum time. Fig. 11 presents the execution time of integrating the same number of splitting nodes for different types of operations. As shown in Fig. 11, Deletion(n) takes the maximum time and Insert (2) takes the least time. 5.2.2. Computing time of our approach and ABTSO In this section, we compare the computing time of our approach with that of ABTSO. Firstly, we study how long our approach and ABTSO take to integrate remote string-wise operations into operation history H. Secondly, we study the average computing time of a remote character insertion/deletion for ABTSO and our approach. Fig. 12 shows the computing time of remote string-wise operations for ABTSO and our approach. As shown in Fig. 12, with the length of operation history increases and the number of integrated remote operations increases, ABTSO takes more time than that of our approach. Especially, when the length of operation history is 200, ABTSO takes 68 ms to integrate 100 remote operations and our approach only takes 4 ms to integrate 100 remote operations. Fig. 13 shows the average computing time of a character-based operation for ABTSO and our approach when the deletion ratio is

Fig. 11. The computing time of integrating the same number of splitting nodes.

set to 20% and the number of remote operations increases with step 300. As shown in Fig. 13, ABTSO takes more time to integrate a character deletion, but our approach takes less time. The main reason is that each string-based deletion of ABTSO is processed as a sequence of character deletions. For example, when the number of remote operations is 3000 and the delete ratio is 20%, ABTSO needs to integrate 600  100 character deletions. However, our approach only needs to integrate 600 deletions.

Please cite this article in press as: X. Lv et al., A string-wise CRDT algorithm for smart and large-scale collaborative editing systems, Adv. Eng. Informat. (2016), http://dx.doi.org/10.1016/j.aei.2016.10.005

10

X. Lv et al. / Advanced Engineering Informatics xxx (2016) xxx–xxx

Fig. 12. Time to integrate N remote string-based operations of ABTSO and our approach.

Fig. 13. Time to integrate a remote character-based operation of ABTSO and our approach.

6. Conclusions and future work This paper proposes a string-wise CRDT algorithm to support smart and large-scale collaborations in the time of big data and cloud computing. The proposed algorithm can preserve the intentions of operations and achieve eventual consistency. Related definitions and formal proofs of correctness and intentions preserving have been presented in Appendicies A,B,C. Theoretical analysis and experimental evaluation show that the computing performance of the proposed algorithm outperforms the state of the art OT algorithm and CRDT algorithm greatly. Therefore, compared with existing algorithms, the proposed algorithm is more adaptable to smart and massive-scale collaborative applications.

In future research, we plan to explore at least four directions but not limited. Firstly, we will continue to enhance the proposed algorithm. Secondly, we will try to apply our method in real-life collaborative applications. For examples, cloud-based collaborative services, such as Google Drive, Codoxware, SubEthaEdit, Novell Vibe, have been becoming popular in recent years. Theses collaborative applications need higher efficiency and quality in co-authoring shared documents by exchanging the intention, idea, knowledge and wisdom among collaborators. The proposed algorithm match well with the requirements and will be applied in these areas in future. Thirdly, we will extend the idea in other collaborative applications in CAD/Graphics/Images/Optimization [47–53]. Fourthly, we will explore how to accelerate massive-scale collaborative applications with multi-core CPU/many-core GPU [54–56].

Please cite this article in press as: X. Lv et al., A string-wise CRDT algorithm for smart and large-scale collaborative editing systems, Adv. Eng. Informat. (2016), http://dx.doi.org/10.1016/j.aei.2016.10.005

11

X. Lv et al. / Advanced Engineering Informatics xxx (2016) xxx–xxx

Acknowledgment

have been executed in n sites, consistency of the intention of operLj

Li

This research has been funded by the National Science Foundation of China (Grant No. 61472289 and 61502353) and the National Key Research and Development Project of China (Grant No. 2016YFC0106305). Appendix A. Related definitions

Definition 1. A node is a eight-tuple hkey, flag, visible, content, next, prior, link, listi where (1) key is the identifier of each node. (2) flag is an integer variable whose value is 0 or 1. (3) visible is an integer variable whose value is 0 or 1. (4) content is a string variable, which stores the inserted string or the deleted string. (5) next is a pointer, which points the next node in Lmodel . (6) prior is a pointer, which points the prior node in Lmodel . (7) link is a pointer, which is used for the double chaining in HT. (8) list is a set of nodes, which is used for reserving the splitting sub-nodes by the same operation.

Definition 2. (node) Given any two nodes in Lmodel ; nodei and nodej . The position of nodei is denoted as pos(nodei ), the position of nodej is denoted as pos(nodej ). If nodei nodej , iff: pos(nodei ) < pos(nodej ) (nodei is before nodej in Lmodel ). Definition 3. IdentifierðIDÞ of each node is a five-tuple hs, ssv, site, offset, leni where (1) s is the identifier of session, a global increasing number. (2) ssv is the sum of state vector of an operation. (3) site is the unique identifier of the site. (4) offset is the length from the leftmost position of the current node to the leftmost position of the original node. (5) len is the length of string contained in the current node. Definition 4. Given two nodes nodei and nodej ; IDs of two nodes if IDnodei IDnodej ,iff: (1) are respectively IDnodei ; IDnodej , IDnodei [s] < IDnodej [s], or (2) IDnodei [s] = IDnodej [s], IDnodei [ssv] < IDnodej [ssv], or (3) IDnodei [s] = IDnodej [s], IDnodei [ssv] = IDnodej [ssv], IDnodei [site] < IDnodej [site], or (4) IDnodei [s] = IDnodej [s], IDnodei [ssv] = IDnodej [ssv], IDnodei [site] = IDnodej [site], IDnodei [offset] > IDnodej ½offset. Definition 5. Given two nodes, nodei and nodej ; node  is not established between two nodes. IDnodei is the identifier of nodei ; IDnodej is the identifier of nodej , if nodei  nodej , iff: IDnodej IDnodei Definition 6. Given an operation o; node is the object of an operation o, the intention of an operation o is the relative position relation of node in Lmodel . Definition 7. Given any three nodes, nodei ; nodej and nodek , then node is a total order, iff: (1)nodei nodej , or nodej nodei , or (2)nodei nodej ; nodej nodek , then nodei nodek . Definition 8. Given R = {node1 ; node2 ; . . . ; nodem },let m be the number of nodes in Lmodel . L ¼ fL1 ; L2 ; . . . ; Ln g, for any Li ; Lj L; i – j; i; j{1, 2, . . . ,n}. Li and Lj double link m nodes of R. L1 ; L2 ; . . . ; Ln are maintained by n sites respectively. For any two nodes nodek and nodel in Li , any two nodes nodei and nodej in Lj ; k – l and i – j; k; l; i; j{1, 2, . . . , m}. When all operations issued

ations is achieved iff: nodek nodel ¼ nodei  nodej . Appendix B. Proof of correctness We formalize two correctness criteria called PT (Precedence Transitivity) and OC (Operation Commutativity) for providing correctness verification. These two conditions are first proposed in RGA which only supports for character-based operations or unbreakable block operations. Here we extend the conditions to support for string-wise operations. Our algorithm is correct if the following two formal conditions are always hold. (1) PT(Precedence Transitivity): Given two operations O1 and O2 ; O2 takes precedence over O1 , iff: O1 is happened-before O2 or O1 jjO2 ; O2 has higher priority than O1 in preserving their intentions. Precedence transitivity holds iff: given any three operations O1 ; O2 and O3 , if O1 takes precedence O2 ; O2 takes precedence O3 , for O1 takes precedence O3 . (2) OC(Operation Commutativity): Given two operations O1 and O2 from the same state S; O1 and O2 are commutative, O1 !O2

O2 !O1

denoted as O1 $O2 , iff: S ) S1 ; S ) S2 ; S1 is equal to S2 (S1 = S2 ).

Theorem 1. In our approach, all operations satisfy PT. For any three operations Oa ; Ob and Oc , if Oa takes precedence Ob ; Ob takes precedence Oc , then, Oa takes precedence Oc . Proof. In our algorithm, two cases need to be considered to satisfy PT. Firstly, when O1 is happened-before O2 , SV can decide the precedence relations. Secondly, when two operations are concurrent, unique identifiers can decide the precedence relations and get a total order of concurrent objects. When every operation is issued, unique ID is assigned for each operation. Based on Definition 7 and 8, IDs of different sessions are ordered by monotonous IDnode ½s. In the same session, IDs of a site are totally ordered because IDnode [ssv] grows monotonously. If IDnode [ssv] are equal across different sites, they are ordered by unique IDnode [site]. If the above three conditions are equal, they are ordered by IDnode [offset]. Since all IDs are totally ordered by the four conditions, IDnode  is unique and transitive. Therefore, PT holds in our approach. h Theorem 2. For any two remote insertions I1 and I2 , if I1 jjI2 ; I1 $I2 . Proof. Assume that I1 and I2 start from the same initial state S = ½node1 ; node2 ; . . . ; nodei ; . . . ; noden . The object of I1 is nodeI1 , the object of I2 is nodeI2 ; IDnodeI1 IDnodeI2 . We need to consider three cases. (1) The target node of I1 is the same as the target node of I2 , denoted as nodei . And pos of I1 is also the same as pos of I2 , denoted as pos(I1 ) = pos(I2 ). ① At the initial state S; I1 is first executed, then, I2 is executed. I1 is first executed at the initial state S. nodeI1 is inserted within nodei ; nodei is split into two sub-nodes FNodei and LNodei . After executing I1 ; S01 ¼ ½node1 ; node2 ; . . . ; FNodei ; nodeI1 ; LNodei ; . . . ; noden . Then, I2 is executed at the state S01 . S02 is get. Because there exists IDnodeI  IDnodeI ; S02 ¼ ½node1 ; node2 ; . . . ; FNodei ; nodeI2 ; nodeI1 ; 1

2

LNodei ; . . . ; noden .

Please cite this article in press as: X. Lv et al., A string-wise CRDT algorithm for smart and large-scale collaborative editing systems, Adv. Eng. Informat. (2016), http://dx.doi.org/10.1016/j.aei.2016.10.005

12

X. Lv et al. / Advanced Engineering Informatics xxx (2016) xxx–xxx

② At the initial state S; I2 is first executed, then, I1 is executed. I2 is first executed at the initial state S. nodeI2 is inserted within nodei ; nodei is split into two sub-nodes FNodei and LNodei . After executing I2 ;S001 ¼ ½node1 ;node2 ; .. .; FNodei ;nodeI2 ; LNodei ; .. .; noden . Then, I1 is executed at the state S001 . S002 is get. Because there exists IDnodeI  IDnodeI ; S002 ¼ ½node1 ; node2 ;. .. ;FNodei ;nodeI2 ;nodeI1 ;LNodei ;

Proof. Based on the above proof of correctness, in our approach, concurrent operations can execute commutatively and can guarantee eventual consistency. Therefore, for Li and Lj maintained by any two sites, when m operations have been executed in any two sites, node1 ; node2 ; . . . ; nodem of Li and node1 ; node2 ; . . . ; nodem of Lj have the same total order. Therefore, our approach can achieve

.. .; noden . From ①②, S02 is equal to S002 ; I1 $ I2 .(2) The target node of I1 is the same as the target node of I2 , denoted as nodei . But pos of I1 is different from pos of I2 , denoted as pos(I1 ) – pos(I2 ). Assume that nodei is composed of many characters. nodei ¼ ½i1 ; i2 ; . . . ; ij ; im ; . . . ; ik ; ip ; . . . ; in . And pos of I1 is after the character ij ; pos of I2 is after the character ik . ① At the initial state S; I1 is first executed, then, I2 is executed. I1 is first executed at the initial state S. After executing I1 ; S01 ¼ ½node1 ; node2 ; . . . ; ½i1 ; i2 ; . . . ; ij ; nodeI1 ; ½im ; . . . ; ik ; ip ; . . . ; in , . . . ;

nodek nodel = nodei  nodej . As a result,our approach can achieve consistency of operation intentions. h

1

2

noden . Then, I2 is executed at the state S01 . After executing I2 ; S02 ¼ ½node1 ; node2 ; . . . ; ½i1 ; i2 ; . . . ; ij ; nodeI1 , ½im ; . . . ; ik ; nodeI2 ; ½ip ; . . . ; in ; . . . ; noden . ② At the initial state S; I2 is first executed, then, I1 is executed. Due to the similar procedure of ①, S002 ¼ ½node1 ; node2 ; . . . ; ½i1 ; i2 ; . . . ; ij , nodeI1 ; ½im ; . . . ; ik ; nodeI2 ; ½ip ; . . . ; in ; . . . ; noden .

From ①②, S02 = S002 ; I1 $ I2 . (3) The target node of I1 is different from the target node of I2 . Assume that nodei is the target node of I1 ; nodej is the target node of I2 . In this case, due to different target nodes, there exists no conflict between two concurrent insertions. Therefore, I1 $I2 . h Theorem 3. For a remote insertion I1 and a remote deletion D1 , if I1 jjD1 ; I1 $ D1 . Proof. The target node of I1 is nodeI1 , the target node of D1 is nodeD1 . (1) nodeI1 and nodeD1 are different nodes. Whatever pos of I1 is within nodeI1 or after nodeI1 and the deleted string involves only one node or multiple nodes, there is no confliction. A deletion only set the target node as a tombstone, a insertion is not affected. Therefore, I1 $ D1 . (2) nodeI1 and nodeD1 are the same node. Due to the same reason of (1). h Theorem 4. For any two remote deletions D1 D1 jjD2 ; D1 $D2 .

and D2 , if

Proof. If the target nodes of D1 and D2 are different, obviously, D1 $ D2 . If the target nodes of D1 and D2 are the same, the deleted string of D1 and D2 are not overlapping, obviously, D1 $ D2 . Even the deleted string are overlapping, there is no confliction. Because a deletion only set the target node as a tombstone. Therefore, D1 $ D2 . h Appendix C. Proof of consistency of operation intentions Theorem 5. Given m operations are generated in a session, the objects of m operations are denoted as node1 ; node2 ; . . . ; nodem . Lmodel of n sites are respectively denoted as L1 ; L2 ; . . . ; Ln . For any Li ; Lj L1 ; L2 ; . . . ; Ln , any two nodes nodek and nodel of Li , any two nodes nodei and nodej of Lj ; k – l and i – j; k; l; i; j{1, 2, . . . , m}. When all operations issued have been executed in n sites, our algorithm can Li

Lj

achieve nodek nodel = nodei  nodej .

Li

Lj

References [1] C. Sun, H. Wen, H. Fan, Operational transformation for orthogonal conflict resolution in real-time collaborative 2d editing systems, in: Proceedings of the ACM 2012 Conference on Computer Supported Cooperative Work, ACM, 2012, pp. 1391–1400, http://dx.doi.org/10.1145/2145204.2145411. [2] X. Wang, J. Bu, C. Chen, Achieving undo in bitmap-based collaborative graphics editing systems, in: Proceedings of the 2002 ACM Conference on Computer Supported Cooperative Work, ACM, 2002, pp. 68–76, http://dx.doi.org/ 10.1145/587078.587089. [3] L. Gao, F. Yu, Q. Chen, N. Xiong, Consistency maintenance of do and undo/redo operations in real-time collaborative bitmap editing systems, Cluster Comput. 19 (1) (2016) 255–267, http://dx.doi.org/10.1007/s10586-015-0499-8. [4] C. Sun, Dependency-conflict detection in real-time collaborative 3d design systems, in: Proceedings of the 2013 Conference on Computer Supported Cooperative Work, ACM, 2013, pp. 715–728, http://dx.doi.org/10.1145/ 2441776.2441856. [5] F. Liu, S. Xia, H. Shen, C. Sun, et al., Comaya: incorporating advanced collaboration capabilities into 3d digital media design tools, in: Proceedings of the 2008 ACM Conference on Computer Supported Cooperative Work, ACM, 2008, pp. 5–8, http://dx.doi.org/10.1145/1460563.1460566. [6] S. Zhao, D. Li, H. Gu, B. Shao, N. Gu, An approach to sharing legacy tv/arcade games for real-time collaboration, in: 29th IEEE International Conference on Distributed Computing Systems, 2009, ICDCS’09, IEEE, 2009, pp. 165–172, http://dx.doi.org/10.1109/ICDCS.2009.26. [7] Y. Cheng, F. He, Y. Wu, D. Zhang, Meta-operation conflict resolution for human– human interaction in collaborative feature-based cad systems, Cluster Comput. 19 (1) (2016) 237–253, http://dx.doi.org/10.1007/s10586-016-0538-0. [8] L. Gao, B. Shao, T. Lu, N. Gu, Maintaining semantic intention of step-wise operations in replicated cad environments, in: 12th International Conference on Computer Supported Cooperative Work in Design, 2008, CSCWD 2008, IEEE, 2008, pp. 154–159, http://dx.doi.org/10.1109/CSCWD.2008.4536972. [9] S. Jing, F. He, S. Han, X. Cai, H. Liu, A method for topological entity correspondence in a replicated collaborative cad system, Comput. Ind. 60 (7) (2009) 467–475, http://dx.doi.org/10.1016/j.compind.2009.02.005. [10] H. Gu, Q. Zhang, B. Shao, Making autocad collaborative: implementation and application of coautocad, in: 2nd International Conference on Pervasive Computing and Applications, 2007, ICPCA 2007, IEEE, 2007, pp. 168–173, http://dx.doi.org/10.1109/ICPCA.2007.4365433. [11] A.J. Trappey, W. Shen, J.J. Cha, Special issue editorial on advances in collaborative systems engineering for product design, production and service network, J. Syst. Sci. Syst. Eng. (2016) 1–3, http://dx.doi.org/10.1007/s11518016-5313-5. [12] W. Shen, J. Barthès, J. Luo, Computer supported collaborative design: technologies, systems, and applications, Contemp. Issues Syst. Sci. Eng. (2015) 537–573, http://dx.doi.org/10.1002/9781119036821.ch14. [13] J. Fischer, M. Porcheron, A. Lucero, A. Quigley, S. Scott, L. Ciolfi, J. Rooksby, N. Memarovic, Collocated interaction: new challenges in’same time, same place’research, in: Proceedings of the 19th ACM Conference on Computer Supported Cooperative Work and Social Computing Companion, ACM, 2016, pp. 465–472, http://dx.doi.org/10.1145/2818052.2855522. [14] A. James, J. Chung, Business and industry specific cloud: challenges and opportunities, Future Gener. Comput. Syst. 48 (2015) 39–45, http://dx.doi.org/ 10.1016/j.future.2014.12.006. [15] Y. Wu, F. He, D. Zhang, X. Li, Service-oriented feature-based data exchange for cloud-based design and manufacturing, in: IEEE Transactions on Services Computing, online, http://dx.doi.org/10.1109/TSC.2015.2501981. [16] D. Zhang, F. He, S. Han, X. Li, Quantitative optimization of interoperability during feature-based data exchange, Integr. Comput.-Aided Eng. 23 (1) (2016) 31–50, http://dx.doi.org/10.3233/ICA-150499. [17] X. Li, F. He, X. Cai, D. Zhang, Y. Chen, A method for topological entity matching in the integration of heterogeneous cad systems, Integr. Comput.-Aided Eng. 20 (1) (2013) 15–30, http://dx.doi.org/10.3233/ICA-120416. [18] A.J. Trappey, C.V. Trappey, T. Chiang, Y.-H. Huang, Ontology-based neural network for patent knowledge management in design collaboration, Int. J. Prod. Res. 51 (7) (2013) 1992–2005, http://dx.doi.org/10.1080/ 00207543.2012.701775. [19] A.J. Trappey, P. Wognum, Advanced knowledge engineering related to innovation, intellectual property and patent analysis, Adv. Eng. Infor. 3 (27) (2013) 315–316. [20] M.S. Ackerman, J. Dachtera, V. Pipek, V. Wulf, Sharing knowledge and expertise: the CSCW view of knowledge management, Comput. Supported

Please cite this article in press as: X. Lv et al., A string-wise CRDT algorithm for smart and large-scale collaborative editing systems, Adv. Eng. Informat. (2016), http://dx.doi.org/10.1016/j.aei.2016.10.005

X. Lv et al. / Advanced Engineering Informatics xxx (2016) xxx–xxx

[21]

[22]

[23]

[24]

[25]

[26]

[27]

[28] [29]

[30]

[31]

[32]

[33]

[34]

[35]

[36]

[37]

Cooperative Work (CSCW) 22 (4-6) (2013) 531–573, http://dx.doi.org/ 10.1007/s10606-013-9192-8. A.P. Negrao, J. Costa, P. Ferreira, L. Veiga, Interest aware consistency for cooperative editing in heterogeneous environments, Int. J. Cooperative Inform. Syst. 23 (01) (2014) 1440002, http://dx.doi.org/10.1142/S0218843014400024. N. Gu, Q. Zhang, J. Yang, W. Ye, Dcv: a causality detection approach for largescale dynamic collaboration environments, in: Proceedings of the 2007 International ACM Conference on Supporting Group Work, ACM, 2007, pp. 157–166, http://dx.doi.org/10.1145/1316624.1316647. L. André, S. Martin, G. Oster, C. Ignat, Supporting adaptable granularity of changes for massive-scale collaborative editing, in: 2013 9th International Conference Conference on Collaborative Computing: Networking, Applications and Worksharing (Collaboratecom), IEEE, 2013, pp. 50–59, http://dx.doi.org/ 10.4108/icst.collaboratecom.2013.2541. C. Ignat, G. Oster, O. Fox, V.L. Shalin, F. Charoy, How do user groups cope with delay in real-time collaborative note taking, in: ECSCW 2015: Proceedings of the 14th European Conference on Computer Supported Cooperative Work, 19– 23 September 2015, Springer, Oslo, Norway, 2015, pp. 223–242, http://dx.doi. org/10.1007/978-3-319-20499-4_12. X. Lv, F. He, W. Cai, An efficient collaborative editing algorithm supporting string-based operations, in: Proceedings of the 2016 IEEE 20th International Conference on Computer Supported Cooperative Work in Design, IEEE, 2016, pp. 45–50. C.A. Ellis, S.J. Gibbs, Concurrency control in groupware systems, Acm Sigmod Record, vol. 18, ACM, 1989, pp. 399–407, http://dx.doi.org/10.1145/ 66926.66963. C. Sun, Y. Zhang, X. Jia, Y. Yang, A generic operation transformation scheme for consistency maintenance in real-time cooperative editing systems, in: Proceedings of the International ACM SIGGROUP Conference on Supporting Group Work: The Integration Challenge, ACM, 1997, pp. 425–434, http://dx. doi.org/10.1145/266838.267366. Y. Saito, M. Shapiro, Optimistic replication, ACM Comput. Surv. (CSUR) 37 (1) (2005) 42–81, http://dx.doi.org/10.1145/1057977.1057980. N. Gu, J. Yang, Q. Zhang, Consistency maintenance based on the mark & retrace technique in groupware systems, in: Proceedings of the 2005 International ACM SIGGROUP Conference on Supporting Group Work, ACM, 2005, pp. 264– 273, http://dx.doi.org/10.1145/1099203.1099250. D.A. Nichols, P. Curtis, M. Dixon, J. Lamping, High-latency, low-bandwidth windowing in the jupiter collaboration system, in: Proceedings of the 8th Annual ACM Symposium on User Interface and Software Technology, ACM, 1995, pp. 111–120, http://dx.doi.org/10.1145/215585.215706. H. Shen, C. Sun, Flexible notification for collaborative systems, in: Proceedings of the 2002 ACM Conference on Computer Supported Cooperative Work, ACM, 2002, pp. 77–86, http://dx.doi.org/10.1145/587078.587090. D. Sun, C. Sun, Context-based operational transformation in distributed collaborative editing systems, IEEE Trans. Parallel Distrib. Syst. 20 (10) (2009) 1454–1470, http://dx.doi.org/10.1109/TPDS.2008.240. Y. Xu, C. Sun, M. Li, Achieving convergence in operational transformation: conditions, mechanisms and systems, in: Proceedings of the 17th ACM Conference on Computer Supported Cooperative Work & Social Computing, ACM, 2014, pp. 505–518, http://dx.doi.org/10.1145/2531602.2531629. Y. Xu, C. Sun, Conditions and patterns for achieving convergence in ot-based co-editors, IEEE Trans. Parallel Distrib. Syst. 27 (3) (2016) 695–709, http://dx. doi.org/10.1109/TPDS.2015.2412938. B. Shao, D. Li, N. Gu, An optimized string transformation algorithm for realtime group editors, in: 2009 15th International Conference on Parallel and Distributed Systems (ICPADS), IEEE, 2009, pp. 376–383, http://dx.doi.org/ 10.1109/ICPADS.2009.72. B. Shao, D. Li, N. Gu, Abts: a transformation-based consistency control algorithm for wide-area collaborative applications, in: 5th International Conference on Collaborative Computing: Networking, Applications and Worksharing, 2009, CollaborateCom 2009, IEEE, 2009, pp. 1–10, http://dx. doi.org/10.4108/ICST.COLLABORATECOM2009.8271. W. Cai, F. He, X. Lv, An efficient preserving intention operational transformation for real-time collaborative editing, Chinese J. Comput. 38 (51) (2015) 2041–2053, http://dx.doi.org/10.11897/SP.1016.2015.02041.

13

[38] C. Sun, Y. Xu, A. Agustina, Exhaustive search of puzzles in operational transformation, in: Proceedings of the 17th ACM Conference on Computer Supported Cooperative Work & Social Computing, ACM, 2014, pp. 519–529, http://dx.doi.org/10.1145/2531602.2531630. [39] A. Randolph, H. Boucheneb, A. Imine, A. Quintero, On synthesizing a consistent operational transformation approach, IEEE Trans. Comput. 64 (4) (2015) 1074– 1089, http://dx.doi.org/10.1109/TC.2014.2308203. [40] G. Oster, P. Urso, P. Molli, A. Imine, Data consistency for p2p collaborative editing, in: Proceedings of the 2006 20th Anniversary Conference on Computer Supported Cooperative Work, ACM, 2006, pp. 259–268, http://dx.doi.org/ 10.1145/1180875.1180916. [41] S. Weiss, P. Urso, P. Molli, Logoot: a scalable optimistic replication algorithm for collaborative editing on p2p networks, in: 29th IEEE International Conference on Distributed Computing Systems, 2009, ICDCS’09, IEEE, 2009, pp. 404–412, http://dx.doi.org/10.1109/ICDCS.2009.75. [42] S. Weiss, P. Urso, P. Molli, Logoot-undo: distributed collaborative editing system on p2p networks, IEEE Trans. Parallel Distrib. Syst. 21 (8) (2010) 1162– 1174, http://dx.doi.org/10.1109/TPDS.2009.173. [43] H. Roh, M. Jeon, J. Kim, J. Lee, Replicated abstract data types: building blocks for collaborative applications, J. Parallel Distrib. Comput. 71 (3) (2011) 354– 368, http://dx.doi.org/10.1109/ICDE.2010.5447883. [44] M. AhmedNacer, C. Ignat, G. Oster, H. Roh, P. Urso, Evaluating crdts for realtime document editing, in: Proceedings of the 11th ACM Symposium on Document Engineering, ACM, 2011, pp. 103–112, http://dx.doi.org/10.1145/ 2034691.2034717. [45] W. Yu, Supporting string-wise operations and selective undo for peer-to-peer group editing, in: Proceedings of the 18th International Conference on Supporting Group Work, ACM, 2014, pp. 226–237, http://dx.doi.org/10.1145/ 2660398.2660401. [46] A.L. Yu, Weihai, C. Ignat, A CRDT supporting selective undo for collaborative text editing, in: IFIP International Conference on Distributed Applications and Interoperable Systems, Springer, 2015, pp. 193–206, http://dx.doi.org/ 10.1007/978-3-319-19129-4_16. [47] Z. Huang, F. He, X. Cai, Z. Zou, J. Liu, M. Liang, X. Chen, Efficient random saliency map detection, Sci. China Inform. Sci. 54 (6) (2011) 1207–1217, http://dx.doi. org/10.1007/s11432-011-4263-2. [48] K. Li, F. He, X. Chen, Real-time object tracking via compressive feature selection, Frontiers Comput. Sci. (2016) 1–13, http://dx.doi.org/10.1007/ s11704-016-5106-5. [49] J. Sun, F. He, Y. Chen, X. Chen, A multiple template approach for robust tracking of fast motion target, Appl. Math. – A J. Chinese Universities 31 (2) (2016) 177– 197, http://dx.doi.org/10.1007/s11766-016-3378-z. [50] B. Ni, F. He, Z. Yuan, Segmentation of uterine fibroid ultrasound images using a dynamic statistical shape model in HIFU therapy, Comput. Med. Imaging Graphics 46 (3) (2015) 302–314, http://dx.doi.org/10.1016/ j.compmedimag.2015.07.004. [51] B. Ni, F. He, Y. Pan, Z. Yuan, Using shapes correlation for active contour segmentation of uterine fibroid ultrasound images in computer-aided therapy, Appl. Math. – A J. Chinese Universities 31 (1) (2016) 37–52, http://dx.doi.org/ 10.1007/s11766-016-3340-0. [52] X. Yan, F. He, Y. Chen, Z. Yuan, An efficient improved particle swarm optimization based on prey behavior of fish schooling, J. Adv. Mech. Design Syst. Manuf. 9 (4) (2015), http://dx.doi.org/ 10.1299/jamdsm.2015jamdsm0048. [53] H. Yu, F. He, Y. Pan, X. Chen, An efficient similarity-based level set model for medical image segmentation, J. Adv. Mech. Des. Syst. Manuf. 10 (8) (2016), http://dx.doi.org/10.1299/jamdsm.2016jamdsm0100. [54] T. Van Luong, N. Melab, E. Talbi, Gpu computing for parallel local search metaheuristic algorithms, IEEE Trans. Comput. 62 (1) (2013) 173–185, http:// dx.doi.org/10.1109/TC.2011.206. [55] Y. Zhou, F. He, Y. Qiu, Optimization of parallel iterated local search algorithms on graphics processing unit, J. Supercomput. 72 (6) (2016) 2394–2416, http:// dx.doi.org/10.1007/s11227-016-1738-3. [56] Y. Zhou, F. He, Y. Qiu, Dynamic Strategy based Parallel Ant Colony Optimization on GPUs for TSPs, Sci. China Inform. Sci. (2016), http://dx.doi. org/10.1007/s11432-015-0594-2.

Please cite this article in press as: X. Lv et al., A string-wise CRDT algorithm for smart and large-scale collaborative editing systems, Adv. Eng. Informat. (2016), http://dx.doi.org/10.1016/j.aei.2016.10.005