- Email: [email protected]

Contents lists available at ScienceDirect

Information Sciences journal homepage: www.elsevier.com/locate/ins

Privacy-preserving disjunctive normal form operations on distributed sets Ji Young Chun a, Dowon Hong b, Ik Rae Jeong a,⇑, Dong Hoon Lee a a b

Graduate School of Information Security, CIST, Korea University 1, 5-Ga, Anam-dong Sungbuk-ku, Seoul 136-701, Republic of Korea Electronics and Telecommunications Research Institute (ETRI), 161 Gajeong-dong, Yuseong-Gu, Daejeon 305-700, Republic of Korea

a r t i c l e

i n f o

Article history: Available online 14 July 2011 Keywords: Set operation DNF Set union Threshold set intersection

a b s t r a c t Privacy-preserving set operations such as set union and set intersection on distributed sets are widely used in data mining in which the preservation of privacy is of the utmost concern. In this paper, we extended privacy-preserving set operations and considered privacy-preserving disjunctive normal form (DNF) operations on distributed sets. A privacy-preserving DNF operation on distributed sets can be used to ﬁnd a set SF satisfying SF ¼ ðS1;1 \ . . . \ S1;t2 Þ [ . . . [ ðSt1 ;1 \ . . . \ St1 ;t2 Þ without revealing any other information besides just the information which could be inferred from the DNF operations, where Si;j 2 fA1 ; . . . ; An ; A1 ; . . . ; An g and set Ak is known only to a party Pk. A complement set Ak is deﬁned as Ak ¼ ðA1 [ . . . [ An Þ Ak . Using privacy-preserving DNF operations on distributed sets, it is possible to ﬁnd set union, (threshold) set intersection, and a set of k-repeated elements. Ó 2011 Elsevier Inc. All rights reserved.

1. Introduction Privacy-preserving set operations in distributed environments are widely used in privacy-preserving data mining [1,16,25,24,14,18]. When multiple parties want to discover some information from their private data while preserving their privacy, privacy-preserving set operations can be used. For example, suppose multiple hospitals want to discover the relationship between a speciﬁc disease and genetic information from the medical data of their patients. Since there are many privacy and security restrictions involved in medical data, hospitals should not reveal the medical data of their patients to the other hospitals. In this situation, hospitals can extract useful genetic information using privacy-preserving set operations without revealing their patients’ data. The extracted genetic information could be used to determine the likelihood that a person has a speciﬁc disease. Assume there are three sets A1, A2, and A3. Many useful relationships between sets can be represented as disjunctive normal forms. Some of them are as follows (See Fig. 1.): – – – – – –

A A A A A A

set union is SU = A1 [ A2 [ A3. set intersection is SI = A1 \ A2 \ A3. 2-over-threshold set intersection is ST O ¼ ðA1 \ A2 Þ [ ðA2 \ A3 Þ [ ðA3 \ A1 Þ. 2-under-threshold set intersection is ST U ¼ ðA1 \ A2 Þ [ ðA2 \ A3 Þ [ ðA3 \ A1 Þ. set of 1-repeated elements is SR1 ¼ ðA1 \ A2 \ A3 Þ [ ðA1 \ A2 \ A3 Þ [ ðA1 \ A2 \ A3 Þ. set of 2-repeated elements is SR2 ¼ ðA1 \ A2 \ A3 Þ [ ðA1 \ A2 \ A3 Þ [ ðA1 \ A2 \ A3 Þ.

⇑ Corresponding author. E-mail addresses: [email protected] (J.Y. Chun), [email protected] (D. Hong), [email protected] (I.R. Jeong), [email protected] (D.H. Lee). 0020-0255/$ - see front matter Ó 2011 Elsevier Inc. All rights reserved. doi:10.1016/j.ins.2011.07.003

114

J.Y. Chun et al. / Information Sciences 231 (2013) 113–122

Fig. 1. (i) ST O (ii) ST U (iii) SR1 (iv) SR2 .

Fig. 2. SF ¼ ðA1 \ A2 \ A3 Þ [ ðA1 \ A3 Þ.

More generally, a DNF operation on distributed sets can be used to ﬁnd a set SF satisfying SF ¼ ðS1;1 \ . . . \ S1;t2 Þ [ . . . [ ðSt1 ;1 \ . . . \ St1 ;t2 Þ, where Si;j 2 fA1 ; . . . ; An ; A1 ; . . . ; An g and set Ak is owned by a party Pk (1 6 k 6 n). A complement set Ak is deﬁned as Ak ¼ ðA1 [ . . . [ An Þ Ak . t1 2 N and t2 2 {1, . . . , n}. Privacy-preserving set operations on distributed sets are useful in privacy-preserving data mining and secure multi-party computations. Recently, a number of privacy-preserving set operations have been proposed, such as privacy-preserving set union protocols [15,6,2], privacy-preserving set intersection protocols [7,15,17,20,13,21,26,3,5,22], and privacy-preserving subset protocols [15,22]. Another protocol, a privacy-preserving over-threshold set union protocol, was proposed in [15,22,23]. Unfortunately, a collection of known privacy-preserving set operations is not enough to extract the set elements deﬁned by DNF in a privacy-preserving manner. For instance, suppose we want to ﬁnd the elements that exactly k parties have. That is, we want to ﬁnd SRk . We might try to use the privacy-preserving over-threshold set union protocol to ﬁnd SRk . A privacypreserving over-threshold set union protocol Ok is used to ﬁnd the elements which more than k parties have. Using two overthreshold set unions Ok and Okþ1 we can get SRk ¼ Ok Okþ1 . However, this approach reveals some information besides just SRk . That is, the elements in Okþ1 are additionally revealed. A good privacy-preserving protocol should not reveal any extra information such as Okþ1 . We can ﬁnd SRk using our privacy-preserving protocol for DNF operations without revealing any extra information. Our protocol can ﬁnd any arbitrarily-deﬁned set elements which can be represented as DNF in a privacy-preserving manner (See Fig. 2.). In this paper, we proposed a privacy-preserving protocol for DNF operations with distributed sets which does not reveal any other information except the information which can be inferred from the DNF operations. Our privacy-preserving protocol for DNF operations with distributed sets makes it possible to construct many useful relationships between sets such as set union and (threshold) set intersection, as well as a set of k-repeated elements, while preserving the privacy of all the parties involved. Our privacy-preserving protocol is the ﬁrst construction for DNF operations on distributed sets. The rest of the paper is organized as follows: In Section 2, we deﬁne security notions and review primitives. In Section 3, we suggest sub-protocols which were used in our main protocol. We propose our main protocol, the privacy-preserving protocol for DNF operations with distributed sets, in Section 4. Finally, we conclude the paper in Section 5. 2. Preliminaries In this section, we deﬁne the security in the presence of honest-but-curious adversaries and describe the cryptographic tools which were used in this paper. 2.1. Security in the presence of honest-but-curious adversaries There are two types of standard adversaries, honest-but-curious adversaries and malicious adversaries. We assume that these adversaries can corrupt a proper subset of parties and thus can control the corrupted parties. Informally, an

115

J.Y. Chun et al. / Information Sciences 231 (2013) 113–122

honest-but-curious adversary correctly follows the prescribed protocol on behalf of the corrupted parties, but the adversary attempts to extract additional information using the transcript of all the received messages during the protocol execution. On the other hand, a malicious adversary behaves arbitrarily to extract additional information. In this paper, we only consider honest-but-curious adversaries and design the protocol to be secure against these honest-but-curious adversaries. A protocol secure against honest-but-curious adversaries can be easily converted into a protocol secure against malicious adversaries using zero-knowledge proofs as in the papers [15,6,22]. In this section, we present the standard deﬁnition for secure multi-party protocols in the presence of honest-but-curious adversaries. We followed the deﬁnitions in [12]. If a protocol is a secure multi-party protocol privately computing an agreedupon function f, whatever information an honest-but-curious adversary can obtain after executing the protocol for computing f could be obtained from the inputs and outputs of the corrupted parties without executing the protocol. To prove the security of a multi-party protocol to be secure against honest-but-curious adversaries, we need to build a simulator.’’ A simulator is given inputs and outputs of corrupted parties and generates a simulated transcript. A simulated transcript and a real transcript from a real execution of the protocol should be indistinguishable. The indistinguishability is deﬁned as follows: def

def

Deﬁnition 1. Ensembles of random variables indexed by strings, X ¼ fX w gw2S and Y ¼ fY w gw2S , are computationally indistinguishable, if for every polynomial-size circuit family fC n gn2N , every positive polynomial p(), every sufﬁciently large n, and every w 2 S \ {0, 1}n, the following holds:

jPr½C n ðX w Þ ¼ 1 Pr½C n ðY w Þ ¼ 1j <

1 : pðnÞ

c

In such a case, we denote this by X Y. To formally deﬁne secure multi-party protocols, we assume that there exist n parties, P1, . . . , Pn, where each party Pi is having xi. The parties want to compute an n-ary functionality f ð xÞ ¼ ðf1 ð xÞ; . . . ; fn ð xÞÞ, where x ¼ ðx1 ; . . . ; xn Þ, deﬁned as follows: Deﬁnition 2. An n-ary functionality f : ({0, 1}⁄)n ? ({0, 1}⁄)n is a random process that maps sequences of inputs x ¼ ðx1 ; . . . ; xn Þ to corresponding sequences of random variables f ð xÞ ¼ ðf1 ð xÞ; . . . ; fn ð xÞÞ, where n is the number of parties. That is, for every i, the i-th party who holds an input xi wants to obtain the i-th element fi ð xÞ in f ð xÞ. Let f be an n-ary functionality and P be an n-ary protocol for computing f. For I = {i1, . . . , it} # [n], we let fI ð xÞ be the subsequence ðfi1 ð xÞ; . . . ; fit ð xÞÞ, where [n] = {1, . . . , n} and x ¼ ðx1 ; . . . ; xn Þ. The view of the i-th party during an execution of P is v iewPi ðxÞ ¼ ðxi ; ri ; m1i ; . . . ; mki Þ, where ri represents the i-th party’s internal coin tosses, and mji is the j-th received message def P P P during the protocol execution. For I = {i1, . . . , it}, let v iewI ð xÞ ¼ ðI; v iewi1 ð xÞ; . . . ; v iewit ð xÞÞ. The output of the i-th party after P P P an execution of P on x is outputi ð xÞ. Let output ð xÞ ¼ ðoutput P ð x Þ; . . . ; output ð x ÞÞ. 1 n Finally, we deﬁne the formal deﬁnition of secure multi-party protocols. Deﬁnition 3. We say that protocol P securely computes f in the presence of honest-but-curious adversaries, if there exists a probabilistic polynomial-time simulator S such that

n o c P ðSðI; ðxi1 ; . . . ; xit Þ; fI ðxÞÞ; f ðxÞÞ x2ðf0;1g Þn v iewI ðxÞ; output P ðxÞ

x2ðf0;1g Þn

;

for every I # [n]. 2.2. Homomorphic encryption In this paper, we used a fully homomorphic encryption scheme as a building block [9,11,10]. Let HPE = (HPE key, HPE enc, HPE dec, HPE mult, HPE add) be a fully homomorphic cryptosystem. When h is a security parameter, HPE key(1h) generates a pair of public/private keys (pk, sk). HPE encpk(m) denotes a homomorphic encryption of a message m with a public key pk and HPE decsk(c) denotes a homomorphic decryption of a ciphertext c with a private key sk. When c = HPE encpk(m), HPE decsk(c) extracts a message m from a ciphertext c with a private key sk. A fully homomorphic cryptosystem provides the following properties: (1) Let c1 = HPE encpk(m1), . . . , cn = HPE encpk(mn). We can use an algorithm HPE add to make a new ciphertext c0 = HPE add(c1, . . . , cn). Then the following equation holds: m1 + + mn = HPE decsk(c0 ) (2) Let c1 = HPE encpk(m1), . . . , cn = HPE encpk(mn). We can use an algorithm HPE mult to make a new ciphertext c00 = HPE mult(c1, . . . , cn). Then the following equation holds: m1 mn = HPE decsk(c00 ) In our suggested protocol, we will use an (n, n)-threshold version of a fully homomorphic cryptosystem HPE = (HPE key, HPE enc, HPE dec, HPE mult, HPE add). In an (n, n)-threshold version of a fully homomorphic cryptosystem,

116

J.Y. Chun et al. / Information Sciences 231 (2013) 113–122

any proper subset of n parties cannot decrypt any ciphertexts. That is, all n parties should agree and participate in the decryption process to jointly decrypt any ciphertext. 2.3. Polynomial representation of sets We can use polynomials to represent sets [15]. To represent a set A = {e1, . . . , en}, we use a polynomial representation of a set A as fA(x) = (x e1) (x en). If we represent a homomorphic encryption of the polynomial f(x) = anxn + an1xn1 + + a1x + a0 as HPE encpk(f(x)) = (HPE encpk(an), . . . , HPE encpk(a0)) which is the ordered set of homomorphic encryptions of coefﬁcients, then we can compute the following operations on encrypted polynomials using homomorphisms of HPE. Assume c is a constant value and g(x) = bnxn + bn1xn1 + + b1x + b0. (1) Polynomial addition: given HPE encpk(f(x)) and HPE encpk(g(x)), we can compute HPE encpk(f(x) + g(x)). (2) Polynomial multiplication: given HPE encpk(f(x)) and HPE encpk(g(x)), we can compute HPE encpk(f(x) g(x)). (3) Polynomial evaluation: given HPE encpk(f(x)) and HPE encpk(c), we can compute HPE encpk(f(c)).

3. Privacy-preserving sub-protocol In this section, we describe a privacy-preserving set union protocol EncUnion and a privacy-preserving membership test protocol TestMem which were both used in our main protocol. These protocols are secure against honest-but-curious adversaries. Table 1 shows the complexities of EncUnion and TestMem. 3.1. Privacy-preserving set union protocol We assume that a private set of each party Pk (1 6 k 6 n) is Ak, where n is the number of parties. We also assume that the ‘ private set Ak has ‘k elements, Ak ¼ fe1k ; . . . ; ekk g, and ‘0 = ‘1 + + ‘n. Let U = A1 [ [ An = {u1, . . . , um}, where m is the number of elements in U. Privacy-preserving encrypted set union protocol EncUnion is jointly executed by all parties to get a list of encrypted and shufﬂed set union tuples C = ((HPE encpk(x1), HPE encpk(y1)), . . . , (HPE encpk(xm), HPE encpk(ym))) as the result of EncUnion(A1, . . . , An), where yxii 2 U (1 6 i 6 m). EncUnion is a slightly modiﬁed version of a privacy-preserving set union protocol [6] by using a fully homomorphic cryptosystem. The description of EncUnion(A1, . . . , An) is as follows: (1) Each party Pk (1 6 k 6 n) except P1 performs the following: ‘ (a) Pk calculates the polynomial fAk ðxÞ ¼ ðx e1k Þ ðx ekk Þ and randomly selects r‘k ð1 6 ‘ 6 ‘0 Þ. 0 ‘ (b) Pk sends HPE encpk ðfAk ðxÞÞ; HPE encpk ðrk Þð1 6 ‘ 6 ‘ Þ, and the encrypted private set fHPE encpk ðe1k Þ; . . . ; ‘ HPE encpk ðekk Þg to P1. (2) Party P1 performs the following: (a) P1 calculates tuples ðHPE encpk ðei1 Þ; HPE encpk ð1ÞÞð1 6 i 6 ‘1 Þ. Q Q k1 k1 j j (b) P1 calculates tuples HPE encpk ðejk ‘¼1 fA‘ ðek ÞÞ; HPE encpk ‘¼1 fA‘ ðek Þ Þð1 6 j 6 ‘k Þ using homomorphisms of HPE for each k = 2 to n. Q Q (c) P1 randomly selects r ‘1 ð1 6 ‘ 6 ‘0 Þ and calculates HPE encpk a‘ nk¼1 r ‘k ; HPE encpk b‘ nk¼1 r‘k ð1 6 ‘ 6 ‘0 Þ using homomorphisms of HPE where each tuple (HPE encpk(a‘), HPE encpk(b‘)) is from the results of steps (a) and (b). 1 Note that a‘ b‘ is one of the set union elements if b‘ – 0. Q Q (d) P1 broadcasts ðX ‘ ; Y ‘ Þ ¼ ðHPE encpk ðx‘ Þ; HPE encpk ðy‘ ÞÞ ¼ ðHPE encpk a‘ nk¼1 r ‘k ; HPE encpk b‘ nk¼1 r‘k Þð1 6 ‘ 6 ‘0 Þ. (3) All parties perform the following: (a) All parties perform the shufﬂe protocol [4,8,19] for the broadcasted tuples. (b) All parties jointly decrypt Y‘ = HPE encpk (y‘) for each tuple (X‘, Y‘) (1 6 ‘ 6 ‘0 ) to test y‘ = 0. If y‘ – 0, append (X‘, Y‘) to the output list C. SECURITY ANALYSIS. After executing EncUnion(A1, . . . , An), all parties can know a list of encrypted and shufﬂed set union tuples (Xi, Yi) (1 6 i 6 m) and the size of each set ‘k for 1 6 k 6 n, but cannot get any other information.

Table 1 Complexity analysis of EncUnion and TestMem. Computation cost EncUnion TestMem

2

Communication cost 2

2

O(n ‘) HPE enc + O(n‘) HPE dec + O(n‘ + n ‘) HPE mult + O(n‘) HPE add O(n) HPE enc + O(1) HPE dec + O(‘) HPE mult + O(‘) HPE add

O(n2‘ jHPE encj) O(n jHPE encj)

Let n be the number of parties. ‘ ¼ maxni¼1 ‘i , where ‘i is the number of set elements of party Pi. Let HPE = (HPE key, HPE enc, HPE dec, HPE mult, HPE add) be a fully homomorphic cryptosystem. jHPE encj denotes the length of a ciphertext.

J.Y. Chun et al. / Information Sciences 231 (2013) 113–122

117

Theorem 1. EncUnion is a privacy-preserving encrypted set union protocol in the honest-but-curious adversary model. Proof of Theorem 1. Let P be EncUnion. Suppose that the parties in I ¼ fPi1 ; . . . ; Pit g have been corrupted. We can make a simulator S as follows. S is given real private sets ðARi1 ; . . . ; ARit Þ for inputs of corrupted parties ðP i1 ; . . . ; Pit Þ and the size of each private sets ‘Rk ð1 6 k 6 nÞ for all parties. S is also given SRF ¼ ððHPE encpk ðx1 Þ; HPE encpk ðy1 ÞÞ; . . . ; ðHPE encpk ðxm Þ; HPE encpk ðym ÞÞÞ which is an output of a real protocol execution. With these inputs, S should generate a simulated transcript which P is indistinguishable from v iewI ð xÞ in a real protocol execution. Since the VIEW of all parties except P1 is the same, if P P P 1 2 fPi1 ; . . . ; P it g; S should generate a simulated transcript which is indistinguishable from v iewP1 ð xÞ and v iewI0 ð xÞ, where 0 I ¼ fPi1 ; . . . ; Pit g fP 1 g. P

xÞ as follows: (1) S generates a simulated transcript for v iewI0 ð (a) S computes ‘0 ð‘i1 þ þ ‘it Þ tuples of a form (HPE encpk(0), HPE encpk(0)), where ‘0 = ‘1 + + ‘n and ‘i = jAij. (b) S selects r1, . . . , rm randomly, and computes ((HPE encpk(r1 x1), HPE encpk(r1 y1 )), . . . , (HPE encpk(rm xm), HPE encpk (rm ym))). (c) S shufﬂes the tuples from above two steps. P xÞ as follows: (2) S generates a simulated transcript for v iewP1 ð

(a) S arbitrarily makes ASi ði 2 f1; . . . ; ng fi1 ; . . . ; it gÞ as the input sets of uncorrupted parties such that ‘Si ¼ ‘Ri ði 2 f1; . . . ; ng fi1 ; . . . ; it gÞ. (b) For 2 6 k 6 n; S generates HPE encpk ðfAS ðxÞÞ; HPE encpk ðr0‘k Þð1 6 ‘ 6 ‘0 Þ, where ‘0 = ‘1 + + ‘n and ‘i = jAij, and the k n o 0‘k 0‘k encrypted private set HPE encpk ðe01 using randomly selected values, r0‘k ; e01 k Þ; . . . ; HPE encpk ðek Þ k ; . . . ; ek .

P

xÞ by only performing Step (1). A simulated transcript If P1 R fP i1 ; . . . ; Pit g; S can generate a simulated transcript for v iewI ð is indistinguishable from a transcript of a real protocol execution due to a semantically-secure fully homomorphic encryption scheme. h

3.2. Privacy-preserving membership test protocol We assume that a private set of each party Pk (1 6 k 6 n) is Ak where n is the number of parties and the private set Ak has ‘k ‘ elements, Ak ¼ fe1k ; . . . ; ekk g. We assume that each party Pk (1 6 k 6 n) is having Ck = E(Ai)k(Xj, Yj) where ‘ EðAi Þ ¼ fHPE encpk ðe1i Þ; . . . ; HPE encpk ðei i Þg and (Xj, Yj) = (HPE encpk(xj), HPE encpk(yj)). Suppose that Pi wants to know xj whether or not yj is a member of Ai. All parties jointly execute a privacy-preserving membership test protocol Testx Mem(C1, . . . , Cn). If yj 2 Ai ; Pi gets 1 as a result of TestMem(C1, . . . , Cn). Otherwise, Pi gets 0. All other Pk except Pi get nothing j as a result of TestMem (C1, . . . , Cn). The detailed description of TestMem(C1, . . . , Cn) is as follows: (1) Each party Pk (1 6 k 6 n) except Pi randomly selects rk and sends HPE encpk(rk) to Pi. (2) Party Pi performs the following: (a) Pi randomly selects ri and R. Qn Q‘i ‘ (b) Pi calculates B ¼ HPE encpk þ R using homomorphisms of HPE and broadcasts B. j ei xj Þ k¼1 r kQ ‘¼1 ðyQ ‘i (3) All parties jointly decrypt B to get b ¼ ‘¼1 ðyj e‘i xj Þ nk¼1 rk þ R. If b = R, Pi outputs 1. Otherwise, Pi outputs 0. The other parties except Pi output nothing. Q‘i Q Note that b = R means yj e‘i xj ¼ 0 for some ‘(1 6 ‘ 6 ‘i). The probability that ‘¼1 ðyj e‘i xj Þ nk¼1 r k ¼ 0 is negligible, if ‘ ‘ – ei for all ei 2 Ai . x SECURITY ANALYSIS. After executing TestMem(C1, . . . , Cn), party Pi can know whether or not yj 2 Ai , but all other parties j Pk( – Pi, 1 6 k 6 n) get nothing. All parties including Pi cannot get any other information about xj and yj. More formally, we state the following theorem.

xj yj

Theorem 2. TestMem is a privacy-preserving membership test protocol in the honest-but-curious adversary model. Proof of Theorem 2. Let P be TestMem. Suppose that I ¼ fPi1 ; . . . ; P it g have been corrupted. We can make a simulator S as follows. If P i 2 fPi1 ; . . . ; P it g; S is given EðARi ÞkðX Rj ; Y Rj Þ for the inputs of fPi1 ; . . . ; P it g and dR 2 {0, 1} for the output of the protocol execution. If P i R fP i1 ; . . . ; P it g; S is given only EðARi ÞkðX Rj ; Y Rj Þ for the inputs of fP i1 ; . . . ; Pit g. With these inputs, S should generP ate a simulated transcript which is indistinguishable from v iewI ð xÞ in the real protocol execution. Since the VIEW of all parties P except Pi is the same, if P i 2 fPi1 ; . . . ; Pit g; S should generate a simulated transcript which is indistinguishable from v iewPi ð xÞ P 0 and v iewI0 ð xÞ, where I ¼ fPi1 ; . . . ; Pit g fP i g.

118

J.Y. Chun et al. / Information Sciences 231 (2013) 113–122

P (1) S generates a simulated transcript for v iewI0 ð xÞ as follows: 0 (a) S selects r0k for Pk (1 6 k 6 n) and R for Pi as in the protocol TestMem. Q Qn 0 ‘i 0 ‘ using given EðARi ÞkðX Rj ; Y Rj Þ. (b) S computes BS ¼ HPE:encpk k¼1 r k þ R ‘¼1 ðyj ei xj Þ P xÞ as HPE:encpk ðr 0k Þ for Pk (k 2 {1,. . .,n} {i}). (2) S generates a simulated transcript for v iewP ð i

P Þ by only performing Step (1). A simulated transcript If Pi R fP i1 ; . . . ; P it g; S can generate a simulated transcript for v iewI ðx is indistinguishable from a transcript of a real protocol execution due to a semantically-secure fully homomorphic encryption scheme. It is obvious that the output by decrypting BS is the same as dR. h

4. Privacy-preserving protocol for DNF operations with distributed sets In this section, we construct our main protocol, the privacy-preserving protocol PPDNF, for DNF operations with distributed sets. A DNF operation on distributed sets can be used to ﬁnd a set SF satisfying SF ¼ ðS1;1 \ \ S1;t2 Þ [ [ ðSt1 ;1 \ \ St1 ;t2 Þ, where Si;j 2 fA1 ; . . . ; An ; A1 ; . . . ; An g and set Ak is owned by party Pk. A complement set Ak is deﬁned as Ak ¼ ðA1 [ [ An Þ Ak . If a protocol is a privacy-preserving protocol for DNF operations, each party Pk gets SF through the protocol but cannot extract more information about Aj(1 6 j 6 n) other than the information extracted from Ak and SF. We now construct a privacy-preserving protocol for DNF operations. Suppose that we want to ﬁnd SF ¼ ðS1;1 \ \ S1;t2 Þ [ [ ðSt1 ;1 \ \ St1 ;t2 Þ where Si;j 2 fA1 ; . . . ; An ; A1 ; . . . ; An g and set Ak is owned by party Pk. Note that each party Pk knows only Ak, but does not know Ak . All parties jointly execute a privacy-preserving protocol for DNF operations PPDNF(A1, . . . , An) with distributed sets. We assume that a private set of each party Pk (1 6 k 6 n) is Ak, where n is the number of parties. We also assume that the ‘ private set Ak has ‘k elements, Ak ¼ fe1k ; . . . ; ekk g. We ﬁrst describe the basic idea of PPDNF(A1, . . . , An) to make SF. Suppose that there are three parties P1, P2, and P3 who have their private sets A1 = {1, 2, 3}, A2 = {1, 2, 4}, and A3 = {2, 4, 5}, respectively (See Fig. 3.). They want to ﬁnd a set of 1-repeated elements SR1 ¼ ðS1;1 \ S1;2 \ S1;3 Þ [ ðS2;1 \ S2;2 \ S2;3 Þ [ ðS3;1 \ S3;2 \ S3;3 Þ ¼ ðA1 \ A2 \ A3 Þ [ ðA1 \ A2 \ A3 Þ[ ðA1 \ A2 \ A3 Þ ¼ f3; 5g. (1) P1, P2, and P3 make encrypted and shufﬂed set union tuples (Xj, Yj) = (HPE encpk(xj), HPE encpk(yj)) (1 6 j 6 5), where xj 2 U. The parties cannot know (xj, yj) for 1 6 j 6 5 which would be as follows: yj xj

3R1

1R2

4R3

2R4

5R5

yj

R1

R2

R3

R4

R5

Note that Rj (1 6 j 6 5) is a random number. ‘ (2) Each party Pk (1 6 k 6 3) executes membership tests for Ak ¼ fe1k ; . . . ; ekk g without revealing Ak. If Pk has an element xj j which is equal to yj ; bk is set to 1 as follows: xj yj

3

1

4

2

5

b1

j

1

1

0

1

0

j b2 j b3

0

1

1

1

0

0

0

1

1

1

Fig. 3. SR1 .

119

J.Y. Chun et al. / Information Sciences 231 (2013) 113–122 i;j

(3) Each party Pk (1 6 k 6 3) calculates T i;j ¼ HPE encpk ðbk Þð1 6 i 6 3; 1 6 j 6 5Þ. For 1 6 i 6 3, if Ak 2 {Si,1, Si,2, Si,3}, Pk k j i;j ~j ~j makes T i;j k ¼ HPE encpk ðbk Þð1 6 j 6 5Þ. If Ak 2 fSi;1 ; Si;2 ; Si;3 g; P k makes T k ¼ HPE encpk ðbk Þð1 6 j 6 5Þ, where bk is a ones’ j j j ~ complement of bk . That is, bk ¼ 1 bk , where is a bitwise exclusive OR. Otherwise, Pk makes i;j T i;j k ¼ HPE encpk ð0Þð1 6 j 6 5Þ. bk ð1 6 k 6 3; 1 6 i 6 3; 1 6 j 6 5Þ are as follows: b1

i;j

0

0

1

0

1

0

0

1

0

1

1

1

0

1

0

i;j b2 i;j b3

1

0

0

0

1

0

1

1

1

0

1

0

0

0

1

0

0

1

1

1

1

1

0

0

0

1

1

0

0

0

and HPE encpk ðr jk Þð1 6 i 6 3; 1 6 j 6 5Þ. (4) Each party Pk (1 6 k 6 3) randomly selects rjk ð1 6 j 6 5Þ and broadcasts T i;j k i;j i;j After receiving the broadcasted messages, each party calculates T i;j ¼ HPE encpk ðb Þ ¼ HPE multðT i;j 1 ; T2 ; i;j i;j i;j i;j i;j i;j i;j T 3 Þ ¼ HPE multðHPE encpk ðb1 Þ; HPE encpk ðb2 Þ; HPE encpk ðb3 ÞÞ ¼ HPE encpk ðb1 b2 b3 Þ. i;j i;j i;j i;j b ¼ b1 b2 b3 ð1 6 i 6 3; 1 6 j 6 5Þ are as follows: bi,j

0

0

0

0

1

0

0

0

0

0

1

0

0

0

0

(5) Each party Pk (1 6 k 6 3) calculates Tj = HPE encpk(bj) = HPE add (T1,j, T2,j, T3,j) = HPEadd (HPEencpk(b1,j), HPE encpk(b2,j), HPE encpk(b3,j)) = HPE encpk(b1,j + b2,j + b3,j) (1 6 j 6 5). bj = b1,j + b2,j + b3,j for 1 6 j 6 5 are as follows: bj

1

0

0

0

1

Q (6) Each party Pk (1 6 k 6 3) calculates HPE encpk ð 3k¼1 rjk Þ ¼ HPE multðHPE encpk ðrj1 Þ; HPE encpk ðr j2 Þ; HPE encpk ðr j3 ÞÞ, ^j Þ ¼ HPE multðT j ; HPE enc Q3 rj Þ ¼ HPE multðHPE enc ðbj Þ; HPE b j ¼ HPE encpk ðb and then calculates T pk pk k¼1 k Q j Q3 j b j b 3 j j j ^ encpk k¼1 r k Þ ¼ HPE encpk b k¼1 r k ¼ HPE encpk ðb R j Þð1 6 j 6 5Þ. b ¼ b R j for 1 6 j 6 5 are as follows: ^j b

b1 1R

b2 0R

b3 0R

b4 0R

b5 1R

^j where T ^j Þ. If b ^j – 0, all parties jointly decrypt (X , Y ) to b j ð1 6 j 6 5Þ to get b b j ¼ HPE encpk ðb (7) All parties jointly decrypt T j j x x ^j ¼ 0, all parties do not decrypt (X , Y ). As a result, all parties get calculate yj and append yj to the output set SR1 . If b j j j j SR1 ¼ f3; 5g. Now we describe completely our privacy-preserving protocol for DNF operations, PPDNF(A1, . . . , An), with distributed sets as follows: (1) All parties collaborate to get a list of encrypted and shufﬂed set union tuples (Xj, Yj) = (HPE encpk(xj), HPE encpk(yj)) (1 6 j 6 m) using EncUnion. Let SF = ;. (2) Each party Pk (1 6 k 6 n) 1 m j (a) Pk calculates ðbk ; . . . ; bk Þ a list of bits indicating membership using TestMem. That is, bk ¼ TestMemðEðAk Þ; ðX j ; Y j ÞÞ x j j ‘ for 1 6 j 6 m, where EðAk Þ ¼ fHPE encpk ðe1k Þ; . . . ; HPE encpk ðekk Þg. Note that if yjj 2 Ak ; bk ¼ 1. Otherwise, bk ¼ 0. (b) Pk calculates the followings for 1 6 i 6 t1. j (i) If Ak 2 fSi;1 ; . . . ; Si;t2 g; P k makes T i;j k ¼ HPE encpk ðbk Þð1 6 j 6 mÞ. i;j j ~ ~j is a ones’ complement of bj . That is, (ii) If Ak 2 fSi;1 ; . . . ; Si;t2 g; P k makes T k ¼ HPE encpk ðbk Þð1 6 j 6 mÞ where b k k j j ~ bk ¼ 1 bk , where is a bitwise exclusive OR. i;j (iii) Otherwise, Pk makes T k ¼ HPE encpk ð0Þð1 6 j 6 mÞ. (c) Pk randomly selects r jk ð1 6 j 6 mÞ. j (d) Pk broadcasts T i;j k and HPE encpk ðr k Þð1 6 i 6 t 1 ; 1 6 j 6 mÞ. (e) Pk calculates T i;j ¼ HPE multðT i;j ; . . . ; T i;j n Þð1 6 i 6 t 1 ; 1 6 j 6 mÞ after receiving the broadcasted messages. 1 j 1;j (f) Pk calculates T ¼ HPE addðT ; . . . ; T t1 ;j Þð1 6 j 6 mÞ. Q n j j j and then calculates (g) Pk calculates HPE encpk k¼1 r k ¼ HPE mult HPE encpk ðr 1 Þ; . . . ; HPE encpk ðr n Þ Þ, Q n j j j b T ¼ HPE multðT ; HPE encpk k¼1 r k Þð1 6 j 6 mÞ. (3) All parties collaborate as follows for 1 6 j 6 m. ^j where T ^j Þ. b j and get b b j ¼ HPE encpk ðb (a) All parties jointly decrypt T n o ^j – 0, all parties jointly decrypt (X , Y ) and calculate xj . Let SF ¼ SF [ xj . (b) If b j j yj yj

120

J.Y. Chun et al. / Information Sciences 231 (2013) 113–122

^j ¼ 0, all parties do not decrypt (X , Y ). (c) If b j j (4) Each party Pk (1 6 k 6 n) outputs SF. Fig. 4 describes Step (1), Step (2.a), and Step (2.b) in the PPDNF protocol. C OMPLEXITY ANALYSIS. Table 2 shows the complexities of PPDNF. We could use any fully homomorphic encryption scheme [9,11,10] as our building blocks without modifying the structure of PPDNF. That is, the structure of PPDNF does not depend upon the structure of the underlying homomorphic encryption scheme. However, the efﬁciency of our scheme depends upon

Fig. 4. Description of Step (1), Step 2. (a), and Step 2. (b) in PPDNF Protocol.

Table 2 Complexity analysis of PPDNF.

PPDNF

Computation cost

Communication cost

O(n2‘ + n2m + nmt1) HPE enc + O(n‘ + nm) HPE dec + O(n‘2 + n2‘ + nm‘ + nmt1) HPE mult + O(nm‘) HPE add

O((n2‘ + n2m + nmt1) jHPE encj)

Let n be the number of parties and Ai be a private set of party Pi. m is the number of elements in U where U = A1 [ [ An = {u1, . . . , um}. ‘ ¼ maxni¼1 ‘i , where ‘i = jAij. t1 is the number of disjunction of target set SF ¼ ðS1;1 \ \ S1;t2 Þ [ [ ðSt1 ;1 \ \ St1 ;t2 Þ. Let HPE = (HPE key, HPE enc, HPE dec, HPE mult, HPE add) be a fully homomorphic cryptosystem. jHPE encj denotes the length of a ciphertext.

J.Y. Chun et al. / Information Sciences 231 (2013) 113–122

121

the efﬁciency of the underlying homomorphic encryption scheme. So, if we use a more efﬁcient homomorphic encryption scheme, the complexity of PPDNF will be decreased by the factors in Table 2. SECURITY ANALYSIS. PPDNF outputs SF ¼ ðS1;1 \ \ S1;t2 Þ [ [ ðSt1 ;1 \ \ St1 ;t2 Þ. PPDNF does not reveal any other information except the size of each set ‘k = jAkj for 1 6 k 6 n. The size of each set ‘k is revealed using EncUnion in PPDNF. Since EncUnion reveals the size of each set ‘k (1 6 k 6 n), PPDNF also reveals the size of each set. However, EncUnion does not reveal the set union U. So PPDNF does not reveal any extra information except the elements in SF. More formally, we state the following theorem. Theorem 3. PPDNF is a privacy-preserving protocol for DNF operations with distributed sets in the honest-but-curious adversary model.PPDNF does not reveal any other information except the sizes of each private sets.

Proof of Theorem 3. Let P be PPDNF. Suppose that parties in I ¼ fP i1 ; . . . ; P it g have been corrupted. We can make a simulator S as follows. S is given real private sets ðARi1 ; . . . ; ARit Þ for inputs of corrupted parties fP i1 ; . . . ; Pit g and the size of each private set ‘Rk ð1 6 k 6 nÞ for all parties. S is also given SRF which is an output of a real protocol execution. With these inputs, S P should generate a simulated transcript which is indistinguishable from v iewI ð xÞ in a real protocol execution. S sets ASi ¼ ARi R for i1 6 i 6 it. Note that S cannot know real private input sets Ai (i 2 {1, . . . , n} {i1, . . . , it}) of uncorrupted parties. Thus, the simulator arbitrarily makes ARi ði 2 f1; . . . ; ng fi1 ; . . . ; it gÞ as input sets of uncorrupted parties such that ‘Sk ¼ ‘Rk ð1 6 k 6 nÞ and SSF ¼ SRF . Then, S simply follows PPDNF with inputs ðAS1 ; . . . ; ASn Þ to generate the protocol messages. It is obvious that a simulated transcript is indistinguishable from a transcript of a real protocol execution due to a semantically-secure fully homomorphic encryption scheme. h

5. Conclusion We have constructed a privacy-preserving protocol for DNF operations, PPDNF, with distributed sets to ﬁnd a set SF satisfying SF ¼ ðS1;1 \ \ S1;t2 Þ [ [ ðSt1 ;1 \ \ St1 ;t2 Þ in a privacy-preserving manner, where Si;j 2 fA1 ; . . . ; An ; A1 ; . . . ; An g and a complement set Ak is deﬁned as Ak ¼ ðA1 [ [ An Þ Ak . PPDNF does not reveal any other information besides just the information which could be inferred from an output set SF and the size of each private set. PPDNF reveals the size of each private set, since PPDNF is based on EncUnion which reveals the size of each private set. It might be an interesting study to make a privacy-preserving protocol for DNF operations with distributed sets which does not even reveal the size of each private set. Acknowledgement This work was partly supported by the IT R& D program of MKE/KEIT [KI002113, Development of Security Technology for Car-Healthcare], the IT R& D program of MKE, Korea [Development of Privacy Enhancing Cryptography on Ubiquitous Computing Environment], and Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education, Science and Technology (2010-0024219). References [1] R. Agrawal, R. Srikant, Privacy-Preserving Data Mining, in: Proceedings 19th ACM SIGMOD Conference on Management of Data, May 2000, pp. 439–450 [2] S. Böttcher, S. Obermeier, Secure set union and bag union computation for guaranteeing anonymity of distrustful participants, Journal of Software 3 (1) (2008) 9–17. [3] J. Camenisch, G.M. Zaverucha, Private Intersection of Certiﬁed Sets, in: Proceedings Financial Cryptography and Data Security (FC ’09), February 2009, pp. 108–127. [4] Y. Desmedt, K. Kurosawa, How to Break a Practical MIX and Design a New One, Advances in Cryptology – in: Proceedings EUROCRYPT 2000, May 2000, pp. 557–572. [5] D. Dachman-Soled, T. Malkin, M. Raykova, M. Yung, Efﬁcient robust private set intersection, in: Proceedings 7th International Conference on Applied Cryptography and Network Security (ACNS ’09), June 2009, pp. 125–142. [6] K.B. Frikken, Privacy preserving set union, in: Proceedings 5th International Conference on Applied Cryptography and Network Security (ACNS ’07), July 2007, pp. 237–252. [7] M.J. Freedman, K. Nissim, B. Pinkas, Efﬁcient private matching and set intersection, in: Proceedings EUROCRYPT Advances in Cryptology 2004, May 2004, pp. 1–19. [8] J. Furukawa, K. Sako, An Efﬁcient Scheme for Proving a Shufﬂe, in: Proceedings CRYPTO Advances in Cryptology 2001, August 2001, pp. 368–387. [9] C. Gentry, Fully homomorphic encryption using ideal lattices, in: Proceedings 41st ACM Symposium on Theory of Computing (STOC ’09), May 2009, pp. 169–178. [10] C. Gentry, S. Halevi, Implementing gentry’s fully-homomorphic encryption scheme, in press in the proceeding of EUROCRYPT 2011. (

122

J.Y. Chun et al. / Information Sciences 231 (2013) 113–122

[14] S.W. Kim, S. Park, J.I. Won, S.W. Kim, Privacy preserving data mining of sequential patterns for network trafﬁc data, Information Sciences 178 (3) (2008). [15] L. Kissner, D. Song, Privacy-preserving set operations, in: Advances in Cryptology – Proceedings CRYPTO 2005, pp. 241–257, August 2005. [16] Y. Lindell, B. Pinkas, Privacy Preserving Data Mining, in: Advances in Cryptology – Proceedings CRYPTO 2000, pp. 36-54, August 2000. [17] R. Li, C. Wu, An unconditionally secure protocol for multi-party set intersection, in: Proceedings 5th International Conference on Applied Cryptography and Network Security (ACNS ’07), July 2007, pp. 226–236. [18] N. Matatov, L. Rokach, O. Maimon, Privacy-preserving data mining: a feature set partitioning approach, Information Sciences 180 (14) (2010) 2696– 2720. [19] C.A. Neff, A veriﬁable secret shufﬂe and its application to e-voting, in: Proceedings 8th ACM Conference on Computer and Communications Security (CCS ’01), pp. 116–125, November 2001. [20] Y. Sang, H. Shen, Privacy preserving set intersection protocol secure against malicious behaviors, in: Proceedings 8th International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT ’07), December 2007, pp. 461–468. [21] Y. Sang, H. Shen, Privacy preserving set intersection based on bilinear groups, in: Proceedings 31st Australasian conference on Computer science (ACSC ’08), December 2007, pp. 47–54. [22] Y. Sang, H. Shen, Efﬁcient and secure protocols for privacy-preserving set operations, ACM Transactions on Information and System Security (TISSEC) 13 (1) (2009). [23] J.H. Seo, H.J. Yoon, S.G. Lim, J.H. Cheon, D.W. Hong, Analysis of privacy-preserving element reduction of a multiset, Journal of the Korean Mathematical Society 46 (1) (2009) 59–69. [24] D. Shah, S. Zhong, Two methods for privacy preserving data mining with malicious participants, Information Sciences 177 (23) (2007) 5468–5483. [25] J. Vaidya, C. Clifton, Secure set intersection cardinality with application to association rule mining, Journal of Computer Security 13 (4) (2005) 593–622. [26] Q. Ye, H. Wang, J. Pieprzyk, Distributed private matching and set operations, in: Proceedings 4th Information Security Practice and Experience Conference (ISPEC ’08), April 2008, pp. 347–360.

Copyright © 2021 COEK.INFO. All rights reserved.