Privacy-preserving disjunctive normal form operations on distributed sets

Privacy-preserving disjunctive normal form operations on distributed sets

Information Sciences 231 (2013) 113–122 Contents lists available at ScienceDirect Information Sciences journal homepage: www.elsevier.com/locate/ins...

502KB Sizes 0 Downloads 0 Views

Information Sciences 231 (2013) 113–122

Contents lists available at ScienceDirect

Information Sciences journal homepage: www.elsevier.com/locate/ins

Privacy-preserving disjunctive normal form operations on distributed sets Ji Young Chun a, Dowon Hong b, Ik Rae Jeong a,⇑, Dong Hoon Lee a a b

Graduate School of Information Security, CIST, Korea University 1, 5-Ga, Anam-dong Sungbuk-ku, Seoul 136-701, Republic of Korea Electronics and Telecommunications Research Institute (ETRI), 161 Gajeong-dong, Yuseong-Gu, Daejeon 305-700, Republic of Korea

a r t i c l e

i n f o

Article history: Available online 14 July 2011 Keywords: Set operation DNF Set union Threshold set intersection

a b s t r a c t Privacy-preserving set operations such as set union and set intersection on distributed sets are widely used in data mining in which the preservation of privacy is of the utmost concern. In this paper, we extended privacy-preserving set operations and considered privacy-preserving disjunctive normal form (DNF) operations on distributed sets. A privacy-preserving DNF operation on distributed sets can be used to find a set SF satisfying SF ¼ ðS1;1 \ . . . \ S1;t2 Þ [ . . . [ ðSt1 ;1 \ . . . \ St1 ;t2 Þ without revealing any other information besides just the information which could be inferred from the DNF operations, where Si;j 2 fA1 ; . . . ; An ; A1 ; . . . ; An g and set Ak is known only to a party Pk. A complement set Ak is defined as Ak ¼ ðA1 [ . . . [ An Þ  Ak . Using privacy-preserving DNF operations on distributed sets, it is possible to find set union, (threshold) set intersection, and a set of k-repeated elements. Ó 2011 Elsevier Inc. All rights reserved.

1. Introduction Privacy-preserving set operations in distributed environments are widely used in privacy-preserving data mining [1,16,25,24,14,18]. When multiple parties want to discover some information from their private data while preserving their privacy, privacy-preserving set operations can be used. For example, suppose multiple hospitals want to discover the relationship between a specific disease and genetic information from the medical data of their patients. Since there are many privacy and security restrictions involved in medical data, hospitals should not reveal the medical data of their patients to the other hospitals. In this situation, hospitals can extract useful genetic information using privacy-preserving set operations without revealing their patients’ data. The extracted genetic information could be used to determine the likelihood that a person has a specific disease. Assume there are three sets A1, A2, and A3. Many useful relationships between sets can be represented as disjunctive normal forms. Some of them are as follows (See Fig. 1.): – – – – – –

A A A A A A

set union is SU = A1 [ A2 [ A3. set intersection is SI = A1 \ A2 \ A3. 2-over-threshold set intersection is ST O ¼ ðA1 \ A2 Þ [ ðA2 \ A3 Þ [ ðA3 \ A1 Þ. 2-under-threshold set intersection is ST U ¼ ðA1 \ A2 Þ [ ðA2 \ A3 Þ [ ðA3 \ A1 Þ. set of 1-repeated elements is SR1 ¼ ðA1 \ A2 \ A3 Þ [ ðA1 \ A2 \ A3 Þ [ ðA1 \ A2 \ A3 Þ. set of 2-repeated elements is SR2 ¼ ðA1 \ A2 \ A3 Þ [ ðA1 \ A2 \ A3 Þ [ ðA1 \ A2 \ A3 Þ.

⇑ Corresponding author. E-mail addresses: [email protected] (J.Y. Chun), [email protected] (D. Hong), [email protected] (I.R. Jeong), [email protected] (D.H. Lee). 0020-0255/$ - see front matter Ó 2011 Elsevier Inc. All rights reserved. doi:10.1016/j.ins.2011.07.003

114

J.Y. Chun et al. / Information Sciences 231 (2013) 113–122

Fig. 1. (i) ST O (ii) ST U (iii) SR1 (iv) SR2 .

Fig. 2. SF ¼ ðA1 \ A2 \ A3 Þ [ ðA1 \ A3 Þ.

More generally, a DNF operation on distributed sets can be used to find a set SF satisfying SF ¼ ðS1;1 \ . . . \ S1;t2 Þ [ . . . [ ðSt1 ;1 \ . . . \ St1 ;t2 Þ, where Si;j 2 fA1 ; . . . ; An ; A1 ; . . . ; An g and set Ak is owned by a party Pk (1 6 k 6 n). A complement set Ak is defined as Ak ¼ ðA1 [ . . . [ An Þ  Ak . t1 2 N and t2 2 {1, . . . , n}. Privacy-preserving set operations on distributed sets are useful in privacy-preserving data mining and secure multi-party computations. Recently, a number of privacy-preserving set operations have been proposed, such as privacy-preserving set union protocols [15,6,2], privacy-preserving set intersection protocols [7,15,17,20,13,21,26,3,5,22], and privacy-preserving subset protocols [15,22]. Another protocol, a privacy-preserving over-threshold set union protocol, was proposed in [15,22,23]. Unfortunately, a collection of known privacy-preserving set operations is not enough to extract the set elements defined by DNF in a privacy-preserving manner. For instance, suppose we want to find the elements that exactly k parties have. That is, we want to find SRk . We might try to use the privacy-preserving over-threshold set union protocol to find SRk . A privacypreserving over-threshold set union protocol Ok is used to find the elements which more than k parties have. Using two overthreshold set unions Ok and Okþ1 we can get SRk ¼ Ok  Okþ1 . However, this approach reveals some information besides just SRk . That is, the elements in Okþ1 are additionally revealed. A good privacy-preserving protocol should not reveal any extra information such as Okþ1 . We can find SRk using our privacy-preserving protocol for DNF operations without revealing any extra information. Our protocol can find any arbitrarily-defined set elements which can be represented as DNF in a privacy-preserving manner (See Fig. 2.). In this paper, we proposed a privacy-preserving protocol for DNF operations with distributed sets which does not reveal any other information except the information which can be inferred from the DNF operations. Our privacy-preserving protocol for DNF operations with distributed sets makes it possible to construct many useful relationships between sets such as set union and (threshold) set intersection, as well as a set of k-repeated elements, while preserving the privacy of all the parties involved. Our privacy-preserving protocol is the first construction for DNF operations on distributed sets. The rest of the paper is organized as follows: In Section 2, we define security notions and review primitives. In Section 3, we suggest sub-protocols which were used in our main protocol. We propose our main protocol, the privacy-preserving protocol for DNF operations with distributed sets, in Section 4. Finally, we conclude the paper in Section 5. 2. Preliminaries In this section, we define the security in the presence of honest-but-curious adversaries and describe the cryptographic tools which were used in this paper. 2.1. Security in the presence of honest-but-curious adversaries There are two types of standard adversaries, honest-but-curious adversaries and malicious adversaries. We assume that these adversaries can corrupt a proper subset of parties and thus can control the corrupted parties. Informally, an

115

J.Y. Chun et al. / Information Sciences 231 (2013) 113–122

honest-but-curious adversary correctly follows the prescribed protocol on behalf of the corrupted parties, but the adversary attempts to extract additional information using the transcript of all the received messages during the protocol execution. On the other hand, a malicious adversary behaves arbitrarily to extract additional information. In this paper, we only consider honest-but-curious adversaries and design the protocol to be secure against these honest-but-curious adversaries. A protocol secure against honest-but-curious adversaries can be easily converted into a protocol secure against malicious adversaries using zero-knowledge proofs as in the papers [15,6,22]. In this section, we present the standard definition for secure multi-party protocols in the presence of honest-but-curious adversaries. We followed the definitions in [12]. If a protocol is a secure multi-party protocol privately computing an agreedupon function f, whatever information an honest-but-curious adversary can obtain after executing the protocol for computing f could be obtained from the inputs and outputs of the corrupted parties without executing the protocol. To prove the security of a multi-party protocol to be secure against honest-but-curious adversaries, we need to build a simulator.’’ A simulator is given inputs and outputs of corrupted parties and generates a simulated transcript. A simulated transcript and a real transcript from a real execution of the protocol should be indistinguishable. The indistinguishability is defined as follows: def

def

Definition 1. Ensembles of random variables indexed by strings, X ¼ fX w gw2S and Y ¼ fY w gw2S , are computationally indistinguishable, if for every polynomial-size circuit family fC n gn2N , every positive polynomial p(), every sufficiently large n, and every w 2 S \ {0, 1}n, the following holds:

jPr½C n ðX w Þ ¼ 1  Pr½C n ðY w Þ ¼ 1j <

1 : pðnÞ

c

In such a case, we denote this by X  Y. To formally define secure multi-party protocols, we assume that there exist n parties, P1, . . . , Pn, where each party Pi is having xi. The parties want to compute an n-ary functionality f ð xÞ ¼ ðf1 ð xÞ; . . . ; fn ð xÞÞ, where  x ¼ ðx1 ; . . . ; xn Þ, defined as follows: Definition 2. An n-ary functionality f : ({0, 1}⁄)n ? ({0, 1}⁄)n is a random process that maps sequences of inputs  x ¼ ðx1 ; . . . ; xn Þ to corresponding sequences of random variables f ð xÞ ¼ ðf1 ð xÞ; . . . ; fn ð xÞÞ, where n is the number of parties. That is, for every i, the i-th party who holds an input xi wants to obtain the i-th element fi ð xÞ in f ð xÞ. Let f be an n-ary functionality and P be an n-ary protocol for computing f. For I = {i1, . . . , it} # [n], we let fI ð xÞ be the subsequence ðfi1 ð xÞ; . . . ; fit ð xÞÞ, where [n] = {1, . . . , n} and  x ¼ ðx1 ; . . . ; xn Þ. The view of the i-th party during an execution of P is v iewPi ðxÞ ¼ ðxi ; ri ; m1i ; . . . ; mki Þ, where ri represents the i-th party’s internal coin tosses, and mji is the j-th received message def P P P during the protocol execution. For I = {i1, . . . , it}, let v iewI ð xÞ ¼ ðI; v iewi1 ð xÞ; . . . ; v iewit ð xÞÞ. The output of the i-th party after P P P   an execution of P on  x is outputi ð xÞ. Let output ð xÞ ¼ ðoutput P ð x Þ; . . . ; output ð x ÞÞ. 1 n Finally, we define the formal definition of secure multi-party protocols. Definition 3. We say that protocol P securely computes f in the presence of honest-but-curious adversaries, if there exists a probabilistic polynomial-time simulator S such that

n o   c P ðSðI; ðxi1 ; . . . ; xit Þ; fI ðxÞÞ; f ðxÞÞ x2ðf0;1g Þn  v iewI ðxÞ; output P ðxÞ

x2ðf0;1g Þn

;

for every I # [n]. 2.2. Homomorphic encryption In this paper, we used a fully homomorphic encryption scheme as a building block [9,11,10]. Let HPE = (HPE  key, HPE  enc, HPE  dec, HPE  mult, HPE  add) be a fully homomorphic cryptosystem. When h is a security parameter, HPE  key(1h) generates a pair of public/private keys (pk, sk). HPE  encpk(m) denotes a homomorphic encryption of a message m with a public key pk and HPE  decsk(c) denotes a homomorphic decryption of a ciphertext c with a private key sk. When c = HPE  encpk(m), HPE  decsk(c) extracts a message m from a ciphertext c with a private key sk. A fully homomorphic cryptosystem provides the following properties: (1) Let c1 = HPE  encpk(m1), . . . , cn = HPE  encpk(mn). We can use an algorithm HPE  add to make a new ciphertext c0 = HPE  add(c1, . . . , cn). Then the following equation holds: m1 +    + mn = HPE  decsk(c0 ) (2) Let c1 = HPE  encpk(m1), . . . , cn = HPE  encpk(mn). We can use an algorithm HPE  mult to make a new ciphertext c00 = HPE  mult(c1, . . . , cn). Then the following equation holds: m1      mn = HPE  decsk(c00 ) In our suggested protocol, we will use an (n, n)-threshold version of a fully homomorphic cryptosystem HPE = (HPE  key, HPE  enc, HPE  dec, HPE  mult, HPE  add). In an (n, n)-threshold version of a fully homomorphic cryptosystem,

116

J.Y. Chun et al. / Information Sciences 231 (2013) 113–122

any proper subset of n parties cannot decrypt any ciphertexts. That is, all n parties should agree and participate in the decryption process to jointly decrypt any ciphertext. 2.3. Polynomial representation of sets We can use polynomials to represent sets [15]. To represent a set A = {e1, . . . , en}, we use a polynomial representation of a set A as fA(x) = (x  e1)    (x  en). If we represent a homomorphic encryption of the polynomial f(x) = anxn + an1xn1 +    + a1x + a0 as HPE  encpk(f(x)) = (HPE  encpk(an), . . . , HPE  encpk(a0)) which is the ordered set of homomorphic encryptions of coefficients, then we can compute the following operations on encrypted polynomials using homomorphisms of HPE. Assume c is a constant value and g(x) = bnxn + bn1xn1 +    + b1x + b0. (1) Polynomial addition: given HPE  encpk(f(x)) and HPE  encpk(g(x)), we can compute HPE  encpk(f(x) + g(x)). (2) Polynomial multiplication: given HPE  encpk(f(x)) and HPE  encpk(g(x)), we can compute HPE  encpk(f(x)  g(x)). (3) Polynomial evaluation: given HPE  encpk(f(x)) and HPE  encpk(c), we can compute HPE  encpk(f(c)).

3. Privacy-preserving sub-protocol In this section, we describe a privacy-preserving set union protocol EncUnion and a privacy-preserving membership test protocol TestMem which were both used in our main protocol. These protocols are secure against honest-but-curious adversaries. Table 1 shows the complexities of EncUnion and TestMem. 3.1. Privacy-preserving set union protocol We assume that a private set of each party Pk (1 6 k 6 n) is Ak, where n is the number of parties. We also assume that the ‘ private set Ak has ‘k elements, Ak ¼ fe1k ; . . . ; ekk g, and ‘0 = ‘1 +    + ‘n. Let U = A1 [    [ An = {u1, . . . , um}, where m is the number of elements in U. Privacy-preserving encrypted set union protocol EncUnion is jointly executed by all parties to get a list of encrypted and shuffled set union tuples C = ((HPE  encpk(x1), HPE  encpk(y1)), . . . , (HPE  encpk(xm), HPE  encpk(ym))) as the result of EncUnion(A1, . . . , An), where yxii 2 U (1 6 i 6 m). EncUnion is a slightly modified version of a privacy-preserving set union protocol [6] by using a fully homomorphic cryptosystem. The description of EncUnion(A1, . . . , An) is as follows: (1) Each party Pk (1 6 k 6 n) except P1 performs the following: ‘ (a) Pk calculates the polynomial fAk ðxÞ ¼ ðx  e1k Þ    ðx  ekk Þ and randomly selects r‘k ð1 6 ‘ 6 ‘0 Þ. 0 ‘ (b) Pk sends HPE  encpk ðfAk ðxÞÞ; HPE  encpk ðrk Þð1 6 ‘ 6 ‘ Þ, and the encrypted private set fHPE  encpk ðe1k Þ; . . . ; ‘ HPE  encpk ðekk Þg to P1. (2) Party P1 performs the following: (a) P1 calculates tuples ðHPE  encpk ðei1 Þ; HPE  encpk ð1ÞÞð1 6 i 6 ‘1 Þ. Q    Q  k1 k1 j j (b) P1 calculates tuples HPE  encpk ðejk  ‘¼1 fA‘ ðek ÞÞ; HPE  encpk ‘¼1 fA‘ ðek Þ Þð1 6 j 6 ‘k Þ using homomorphisms of HPE for each k = 2 to n.      Q Q (c) P1 randomly selects r ‘1 ð1 6 ‘ 6 ‘0 Þ and calculates HPE  encpk a‘  nk¼1 r ‘k ; HPE  encpk b‘  nk¼1 r‘k ð1 6 ‘ 6 ‘0 Þ using homomorphisms of HPE where each tuple (HPE  encpk(a‘), HPE  encpk(b‘)) is from the results of steps (a) and (b). 1 Note that a‘  b‘ is one of the set union elements if b‘ – 0.     Q Q (d) P1 broadcasts ðX ‘ ; Y ‘ Þ ¼ ðHPE  encpk ðx‘ Þ; HPE  encpk ðy‘ ÞÞ ¼ ðHPE  encpk a‘  nk¼1 r ‘k ; HPE  encpk b‘  nk¼1 r‘k Þð1 6 ‘ 6 ‘0 Þ. (3) All parties perform the following: (a) All parties perform the shuffle protocol [4,8,19] for the broadcasted tuples. (b) All parties jointly decrypt Y‘ = HPE  encpk (y‘) for each tuple (X‘, Y‘) (1 6 ‘ 6 ‘0 ) to test y‘ = 0. If y‘ – 0, append (X‘, Y‘) to the output list C. SECURITY ANALYSIS. After executing EncUnion(A1, . . . , An), all parties can know a list of encrypted and shuffled set union tuples (Xi, Yi) (1 6 i 6 m) and the size of each set ‘k for 1 6 k 6 n, but cannot get any other information.

Table 1 Complexity analysis of EncUnion and TestMem. Computation cost EncUnion TestMem

2

Communication cost 2

2

O(n ‘) HPE  enc + O(n‘) HPE  dec + O(n‘ + n ‘) HPE  mult + O(n‘) HPE  add O(n) HPE  enc + O(1) HPE  dec + O(‘) HPE  mult + O(‘) HPE  add

O(n2‘  jHPE  encj) O(n  jHPE  encj)

Let n be the number of parties. ‘ ¼ maxni¼1 ‘i , where ‘i is the number of set elements of party Pi. Let HPE = (HPE  key, HPE  enc, HPE  dec, HPE  mult, HPE  add) be a fully homomorphic cryptosystem. jHPE  encj denotes the length of a ciphertext.

J.Y. Chun et al. / Information Sciences 231 (2013) 113–122

117

Theorem 1. EncUnion is a privacy-preserving encrypted set union protocol in the honest-but-curious adversary model. Proof of Theorem 1. Let P be EncUnion. Suppose that the parties in I ¼ fPi1 ; . . . ; Pit g have been corrupted. We can make a simulator S as follows. S is given real private sets ðARi1 ; . . . ; ARit Þ for inputs of corrupted parties ðP i1 ; . . . ; Pit Þ and the size of each private sets ‘Rk ð1 6 k 6 nÞ for all parties. S is also given SRF ¼ ððHPE  encpk ðx1 Þ; HPE  encpk ðy1 ÞÞ; . . . ; ðHPE  encpk ðxm Þ; HPE encpk ðym ÞÞÞ which is an output of a real protocol execution. With these inputs, S should generate a simulated transcript which P is indistinguishable from v iewI ð xÞ in a real protocol execution. Since the VIEW of all parties except P1 is the same, if P P P 1 2 fPi1 ; . . . ; P it g; S should generate a simulated transcript which is indistinguishable from v iewP1 ð xÞ and v iewI0 ð xÞ, where 0 I ¼ fPi1 ; . . . ; Pit g  fP 1 g. P

xÞ as follows: (1) S generates a simulated transcript for v iewI0 ð (a) S computes ‘0  ð‘i1 þ    þ ‘it Þ tuples of a form (HPE  encpk(0), HPE  encpk(0)), where ‘0 = ‘1 +    + ‘n and ‘i = jAij. (b) S selects r1, . . . , rm randomly, and computes ((HPE  encpk(r1  x1), HPE  encpk(r1  y1 )), . . . , (HPE  encpk(rm  xm), HPE  encpk (rm  ym))). (c) S shuffles the tuples from above two steps. P xÞ as follows: (2) S generates a simulated transcript for v iewP1 ð

(a) S arbitrarily makes ASi ði 2 f1; . . . ; ng  fi1 ; . . . ; it gÞ as the input sets of uncorrupted parties such that ‘Si ¼ ‘Ri ði 2 f1; . . . ; ng  fi1 ; . . . ; it gÞ. (b) For 2 6 k 6 n; S generates HPE  encpk ðfAS ðxÞÞ; HPE  encpk ðr0‘k Þð1 6 ‘ 6 ‘0 Þ, where ‘0 = ‘1 +    + ‘n and ‘i = jAij, and the k n o 0‘k 0‘k encrypted private set HPE  encpk ðe01 using randomly selected values, r0‘k ; e01 k Þ; . . . ; HPE  encpk ðek Þ k ; . . . ; ek .

P

xÞ by only performing Step (1). A simulated transcript If P1 R fP i1 ; . . . ; Pit g; S can generate a simulated transcript for v iewI ð is indistinguishable from a transcript of a real protocol execution due to a semantically-secure fully homomorphic encryption scheme. h

3.2. Privacy-preserving membership test protocol We assume that a private set of each party Pk (1 6 k 6 n) is Ak where n is the number of parties and the private set Ak has ‘k ‘ elements, Ak ¼ fe1k ; . . . ; ekk g. We assume that each party Pk (1 6 k 6 n) is having Ck = E(Ai)k(Xj, Yj) where ‘ EðAi Þ ¼ fHPE  encpk ðe1i Þ; . . . ; HPE  encpk ðei i Þg and (Xj, Yj) = (HPE  encpk(xj), HPE  encpk(yj)). Suppose that Pi wants to know xj whether or not yj is a member of Ai. All parties jointly execute a privacy-preserving membership test protocol Testx Mem(C1, . . . , Cn). If yj 2 Ai ; Pi gets 1 as a result of TestMem(C1, . . . , Cn). Otherwise, Pi gets 0. All other Pk except Pi get nothing j as a result of TestMem (C1, . . . , Cn). The detailed description of TestMem(C1, . . . , Cn) is as follows: (1) Each party Pk (1 6 k 6 n) except Pi randomly selects rk and sends HPE  encpk(rk) to Pi. (2) Party Pi performs the following: (a) Pi randomly selects ri and R.  Qn Q‘i ‘ (b) Pi calculates B ¼ HPE  encpk þ R using homomorphisms of HPE and broadcasts B. j  ei  xj Þ  k¼1 r kQ ‘¼1 ðyQ ‘i (3) All parties jointly decrypt B to get b ¼ ‘¼1 ðyj  e‘i  xj Þ  nk¼1 rk þ R. If b = R, Pi outputs 1. Otherwise, Pi outputs 0. The other parties except Pi output nothing. Q‘i Q Note that b = R means yj  e‘i  xj ¼ 0 for some ‘(1 6 ‘ 6 ‘i). The probability that ‘¼1 ðyj  e‘i  xj Þ  nk¼1 r k ¼ 0 is negligible, if ‘ ‘ – ei for all ei 2 Ai . x SECURITY ANALYSIS. After executing TestMem(C1, . . . , Cn), party Pi can know whether or not yj 2 Ai , but all other parties j Pk( – Pi, 1 6 k 6 n) get nothing. All parties including Pi cannot get any other information about xj and yj. More formally, we state the following theorem.

xj yj

Theorem 2. TestMem is a privacy-preserving membership test protocol in the honest-but-curious adversary model. Proof of Theorem 2. Let P be TestMem. Suppose that I ¼ fPi1 ; . . . ; P it g have been corrupted. We can make a simulator S as follows. If P i 2 fPi1 ; . . . ; P it g; S is given EðARi ÞkðX Rj ; Y Rj Þ for the inputs of fPi1 ; . . . ; P it g and dR 2 {0, 1} for the output of the protocol execution. If P i R fP i1 ; . . . ; P it g; S is given only EðARi ÞkðX Rj ; Y Rj Þ for the inputs of fP i1 ; . . . ; Pit g. With these inputs, S should generP ate a simulated transcript which is indistinguishable from v iewI ð xÞ in the real protocol execution. Since the VIEW of all parties P except Pi is the same, if P i 2 fPi1 ; . . . ; Pit g; S should generate a simulated transcript which is indistinguishable from v iewPi ð xÞ P 0 and v iewI0 ð xÞ, where I ¼ fPi1 ; . . . ; Pit g  fP i g.

118

J.Y. Chun et al. / Information Sciences 231 (2013) 113–122

P (1) S generates a simulated transcript for v iewI0 ð xÞ as follows: 0 (a) S selects r0k for Pk (1 6 k 6 n) and R for Pi as in the protocol TestMem. Q  Qn 0 ‘i 0 ‘ using given EðARi ÞkðX Rj ; Y Rj Þ. (b) S computes BS ¼ HPE:encpk k¼1 r k þ R ‘¼1 ðyj  ei  xj Þ  P xÞ as HPE:encpk ðr 0k Þ for Pk (k 2 {1,. . .,n}  {i}). (2) S generates a simulated transcript for v iewP ð i

P Þ by only performing Step (1). A simulated transcript If Pi R fP i1 ; . . . ; P it g; S can generate a simulated transcript for v iewI ðx is indistinguishable from a transcript of a real protocol execution due to a semantically-secure fully homomorphic encryption scheme. It is obvious that the output by decrypting BS is the same as dR. h

4. Privacy-preserving protocol for DNF operations with distributed sets In this section, we construct our main protocol, the privacy-preserving protocol PPDNF, for DNF operations with distributed sets. A DNF operation on distributed sets can be used to find a set SF satisfying SF ¼ ðS1;1 \    \ S1;t2 Þ [    [ ðSt1 ;1 \    \ St1 ;t2 Þ, where Si;j 2 fA1 ; . . . ; An ; A1 ; . . . ; An g and set Ak is owned by party Pk. A complement set Ak is defined as Ak ¼ ðA1 [    [ An Þ  Ak . If a protocol is a privacy-preserving protocol for DNF operations, each party Pk gets SF through the protocol but cannot extract more information about Aj(1 6 j 6 n) other than the information extracted from Ak and SF. We now construct a privacy-preserving protocol for DNF operations. Suppose that we want to find SF ¼ ðS1;1 \    \ S1;t2 Þ [    [ ðSt1 ;1 \    \ St1 ;t2 Þ where Si;j 2 fA1 ; . . . ; An ; A1 ; . . . ; An g and set Ak is owned by party Pk. Note that each party Pk knows only Ak, but does not know Ak . All parties jointly execute a privacy-preserving protocol for DNF operations PPDNF(A1, . . . , An) with distributed sets. We assume that a private set of each party Pk (1 6 k 6 n) is Ak, where n is the number of parties. We also assume that the ‘ private set Ak has ‘k elements, Ak ¼ fe1k ; . . . ; ekk g. We first describe the basic idea of PPDNF(A1, . . . , An) to make SF. Suppose that there are three parties P1, P2, and P3 who have their private sets A1 = {1, 2, 3}, A2 = {1, 2, 4}, and A3 = {2, 4, 5}, respectively (See Fig. 3.). They want to find a set of 1-repeated elements SR1 ¼ ðS1;1 \ S1;2 \ S1;3 Þ [ ðS2;1 \ S2;2 \ S2;3 Þ [ ðS3;1 \ S3;2 \ S3;3 Þ ¼ ðA1 \ A2 \ A3 Þ [ ðA1 \ A2 \ A3 Þ[ ðA1 \ A2 \ A3 Þ ¼ f3; 5g. (1) P1, P2, and P3 make encrypted and shuffled set union tuples (Xj, Yj) = (HPE  encpk(xj), HPE  encpk(yj)) (1 6 j 6 5), where xj 2 U. The parties cannot know (xj, yj) for 1 6 j 6 5 which would be as follows: yj xj

3R1

1R2

4R3

2R4

5R5

yj

R1

R2

R3

R4

R5

Note that Rj (1 6 j 6 5) is a random number. ‘ (2) Each party Pk (1 6 k 6 3) executes membership tests for Ak ¼ fe1k ; . . . ; ekk g without revealing Ak. If Pk has an element xj j which is equal to yj ; bk is set to 1 as follows: xj yj

3

1

4

2

5

b1

j

1

1

0

1

0

j b2 j b3

0

1

1

1

0

0

0

1

1

1

Fig. 3. SR1 .

119

J.Y. Chun et al. / Information Sciences 231 (2013) 113–122 i;j

(3) Each party Pk (1 6 k 6 3) calculates T i;j ¼ HPE  encpk ðbk Þð1 6 i 6 3; 1 6 j 6 5Þ. For 1 6 i 6 3, if Ak 2 {Si,1, Si,2, Si,3}, Pk k j i;j ~j ~j makes T i;j k ¼ HPE  encpk ðbk Þð1 6 j 6 5Þ. If Ak 2 fSi;1 ; Si;2 ; Si;3 g; P k makes T k ¼ HPE  encpk ðbk Þð1 6 j 6 5Þ, where bk is a ones’ j j j ~ complement of bk . That is, bk ¼ 1  bk , where  is a bitwise exclusive OR. Otherwise, Pk makes i;j T i;j k ¼ HPE  encpk ð0Þð1 6 j 6 5Þ. bk ð1 6 k 6 3; 1 6 i 6 3; 1 6 j 6 5Þ are as follows: b1

i;j

0

0

1

0

1

0

0

1

0

1

1

1

0

1

0

i;j b2 i;j b3

1

0

0

0

1

0

1

1

1

0

1

0

0

0

1

0

0

1

1

1

1

1

0

0

0

1

1

0

0

0

and HPE  encpk ðr jk Þð1 6 i 6 3; 1 6 j 6 5Þ. (4) Each party Pk (1 6 k 6 3) randomly selects rjk ð1 6 j 6 5Þ and broadcasts T i;j k i;j i;j After receiving the broadcasted messages, each party calculates T i;j ¼ HPE  encpk ðb Þ ¼ HPE  multðT i;j 1 ; T2 ; i;j i;j i;j i;j i;j i;j i;j T 3 Þ ¼ HPE  multðHPE  encpk ðb1 Þ; HPE  encpk ðb2 Þ; HPE  encpk ðb3 ÞÞ ¼ HPE  encpk ðb1  b2  b3 Þ. i;j i;j i;j i;j b ¼ b1  b2  b3 ð1 6 i 6 3; 1 6 j 6 5Þ are as follows: bi,j

0

0

0

0

1

0

0

0

0

0

1

0

0

0

0

(5) Each party Pk (1 6 k 6 3) calculates Tj = HPE  encpk(bj) = HPE  add (T1,j, T2,j, T3,j) = HPEadd (HPEencpk(b1,j), HPE  encpk(b2,j), HPE  encpk(b3,j)) = HPE  encpk(b1,j + b2,j + b3,j) (1 6 j 6 5). bj = b1,j + b2,j + b3,j for 1 6 j 6 5 are as follows: bj

1

0

0

0

1

Q (6) Each party Pk (1 6 k 6 3) calculates HPE  encpk ð 3k¼1 rjk Þ ¼ HPE  multðHPE  encpk ðrj1 Þ; HPE  encpk ðr j2 Þ; HPE  encpk ðr j3 ÞÞ,   ^j Þ ¼ HPE  multðT j ; HPE  enc Q3 rj Þ ¼ HPE  multðHPE  enc ðbj Þ; HPE b j ¼ HPE  encpk ðb and then calculates T pk pk k¼1 k Q    j Q3 j b j b 3 j j j ^ encpk k¼1 r k Þ ¼ HPE  encpk b  k¼1 r k ¼ HPE  encpk ðb  R j Þð1 6 j 6 5Þ. b ¼ b  R j for 1 6 j 6 5 are as follows: ^j b

b1 1R

b2 0R

b3 0R

b4 0R

b5 1R

^j where T ^j Þ. If b ^j – 0, all parties jointly decrypt (X , Y ) to b j ð1 6 j 6 5Þ to get b b j ¼ HPE  encpk ðb (7) All parties jointly decrypt T j j x x ^j ¼ 0, all parties do not decrypt (X , Y ). As a result, all parties get calculate yj and append yj to the output set SR1 . If b j j j j SR1 ¼ f3; 5g. Now we describe completely our privacy-preserving protocol for DNF operations, PPDNF(A1, . . . , An), with distributed sets as follows: (1) All parties collaborate to get a list of encrypted and shuffled set union tuples (Xj, Yj) = (HPE  encpk(xj), HPE  encpk(yj)) (1 6 j 6 m) using EncUnion. Let SF = ;. (2) Each party Pk (1 6 k 6 n) 1 m j (a) Pk calculates ðbk ; . . . ; bk Þ a list of bits indicating membership using TestMem. That is, bk ¼ TestMemðEðAk Þ; ðX j ; Y j ÞÞ x j j ‘ for 1 6 j 6 m, where EðAk Þ ¼ fHPE  encpk ðe1k Þ; . . . ; HPE  encpk ðekk Þg. Note that if yjj 2 Ak ; bk ¼ 1. Otherwise, bk ¼ 0. (b) Pk calculates the followings for 1 6 i 6 t1. j (i) If Ak 2 fSi;1 ; . . . ; Si;t2 g; P k makes T i;j k ¼ HPE  encpk ðbk Þð1 6 j 6 mÞ. i;j j ~ ~j is a ones’ complement of bj . That is, (ii) If Ak 2 fSi;1 ; . . . ; Si;t2 g; P k makes T k ¼ HPE  encpk ðbk Þð1 6 j 6 mÞ where b k k j j ~ bk ¼ 1  bk , where  is a bitwise exclusive OR. i;j (iii) Otherwise, Pk makes T k ¼ HPE  encpk ð0Þð1 6 j 6 mÞ. (c) Pk randomly selects r jk ð1 6 j 6 mÞ. j (d) Pk broadcasts T i;j k and HPE  encpk ðr k Þð1 6 i 6 t 1 ; 1 6 j 6 mÞ. (e) Pk calculates T i;j ¼ HPE  multðT i;j ; . . . ; T i;j n Þð1 6 i 6 t 1 ; 1 6 j 6 mÞ after receiving the broadcasted messages. 1 j 1;j (f) Pk calculates T ¼ HPE  addðT ; . . . ; T t1 ;j Þð1 6 j 6 mÞ. Q    n j j j and then calculates (g) Pk calculates HPE  encpk k¼1 r k ¼ HPE  mult HPE  encpk ðr 1 Þ; . . . ; HPE  encpk ðr n Þ Þ, Q  n j j j b T ¼ HPE  multðT ; HPE  encpk k¼1 r k Þð1 6 j 6 mÞ. (3) All parties collaborate as follows for 1 6 j 6 m. ^j where T ^j Þ. b j and get b b j ¼ HPE  encpk ðb (a) All parties jointly decrypt T n o ^j – 0, all parties jointly decrypt (X , Y ) and calculate xj . Let SF ¼ SF [ xj . (b) If b j j yj yj

120

J.Y. Chun et al. / Information Sciences 231 (2013) 113–122

^j ¼ 0, all parties do not decrypt (X , Y ). (c) If b j j (4) Each party Pk (1 6 k 6 n) outputs SF. Fig. 4 describes Step (1), Step (2.a), and Step (2.b) in the PPDNF protocol. C OMPLEXITY ANALYSIS. Table 2 shows the complexities of PPDNF. We could use any fully homomorphic encryption scheme [9,11,10] as our building blocks without modifying the structure of PPDNF. That is, the structure of PPDNF does not depend upon the structure of the underlying homomorphic encryption scheme. However, the efficiency of our scheme depends upon

Fig. 4. Description of Step (1), Step 2. (a), and Step 2. (b) in PPDNF Protocol.

Table 2 Complexity analysis of PPDNF.

PPDNF

Computation cost

Communication cost

O(n2‘ + n2m + nmt1) HPE  enc + O(n‘ + nm) HPE  dec + O(n‘2 + n2‘ + nm‘ + nmt1) HPE  mult + O(nm‘) HPE  add

O((n2‘ + n2m + nmt1)  jHPE  encj)

Let n be the number of parties and Ai be a private set of party Pi. m is the number of elements in U where U = A1 [    [ An = {u1, . . . , um}. ‘ ¼ maxni¼1 ‘i , where ‘i = jAij. t1 is the number of disjunction of target set SF ¼ ðS1;1 \    \ S1;t2 Þ [    [ ðSt1 ;1 \    \ St1 ;t2 Þ. Let HPE = (HPE  key, HPE  enc, HPE  dec, HPE  mult, HPE  add) be a fully homomorphic cryptosystem. jHPE  encj denotes the length of a ciphertext.

J.Y. Chun et al. / Information Sciences 231 (2013) 113–122

121

the efficiency of the underlying homomorphic encryption scheme. So, if we use a more efficient homomorphic encryption scheme, the complexity of PPDNF will be decreased by the factors in Table 2. SECURITY ANALYSIS. PPDNF outputs SF ¼ ðS1;1 \    \ S1;t2 Þ [    [ ðSt1 ;1 \    \ St1 ;t2 Þ. PPDNF does not reveal any other information except the size of each set ‘k = jAkj for 1 6 k 6 n. The size of each set ‘k is revealed using EncUnion in PPDNF. Since EncUnion reveals the size of each set ‘k (1 6 k 6 n), PPDNF also reveals the size of each set. However, EncUnion does not reveal the set union U. So PPDNF does not reveal any extra information except the elements in SF. More formally, we state the following theorem. Theorem 3. PPDNF is a privacy-preserving protocol for DNF operations with distributed sets in the honest-but-curious adversary model.PPDNF does not reveal any other information except the sizes of each private sets.

Proof of Theorem 3. Let P be PPDNF. Suppose that parties in I ¼ fP i1 ; . . . ; P it g have been corrupted. We can make a simulator S as follows. S is given real private sets ðARi1 ; . . . ; ARit Þ for inputs of corrupted parties fP i1 ; . . . ; Pit g and the size of each private set ‘Rk ð1 6 k 6 nÞ for all parties. S is also given SRF which is an output of a real protocol execution. With these inputs, S P should generate a simulated transcript which is indistinguishable from v iewI ð xÞ in a real protocol execution. S sets ASi ¼ ARi R for i1 6 i 6 it. Note that S cannot know real private input sets Ai (i 2 {1, . . . , n}  {i1, . . . , it}) of uncorrupted parties. Thus, the simulator arbitrarily makes ARi ði 2 f1; . . . ; ng  fi1 ; . . . ; it gÞ as input sets of uncorrupted parties such that ‘Sk ¼ ‘Rk ð1 6 k 6 nÞ and SSF ¼ SRF . Then, S simply follows PPDNF with inputs ðAS1 ; . . . ; ASn Þ to generate the protocol messages. It is obvious that a simulated transcript is indistinguishable from a transcript of a real protocol execution due to a semantically-secure fully homomorphic encryption scheme. h

5. Conclusion We have constructed a privacy-preserving protocol for DNF operations, PPDNF, with distributed sets to find a set SF satisfying SF ¼ ðS1;1 \    \ S1;t2 Þ [    [ ðSt1 ;1 \    \ St1 ;t2 Þ in a privacy-preserving manner, where Si;j 2 fA1 ; . . . ; An ; A1 ; . . . ; An g and a complement set Ak is defined as Ak ¼ ðA1 [    [ An Þ  Ak . PPDNF does not reveal any other information besides just the information which could be inferred from an output set SF and the size of each private set. PPDNF reveals the size of each private set, since PPDNF is based on EncUnion which reveals the size of each private set. It might be an interesting study to make a privacy-preserving protocol for DNF operations with distributed sets which does not even reveal the size of each private set. Acknowledgement This work was partly supported by the IT R& D program of MKE/KEIT [KI002113, Development of Security Technology for Car-Healthcare], the IT R& D program of MKE, Korea [Development of Privacy Enhancing Cryptography on Ubiquitous Computing Environment], and Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education, Science and Technology (2010-0024219). References [1] R. Agrawal, R. Srikant, Privacy-Preserving Data Mining, in: Proceedings 19th ACM SIGMOD Conference on Management of Data, May 2000, pp. 439–450 [2] S. Böttcher, S. Obermeier, Secure set union and bag union computation for guaranteeing anonymity of distrustful participants, Journal of Software 3 (1) (2008) 9–17. [3] J. Camenisch, G.M. Zaverucha, Private Intersection of Certified Sets, in: Proceedings Financial Cryptography and Data Security (FC ’09), February 2009, pp. 108–127. [4] Y. Desmedt, K. Kurosawa, How to Break a Practical MIX and Design a New One, Advances in Cryptology – in: Proceedings EUROCRYPT 2000, May 2000, pp. 557–572. [5] D. Dachman-Soled, T. Malkin, M. Raykova, M. Yung, Efficient robust private set intersection, in: Proceedings 7th International Conference on Applied Cryptography and Network Security (ACNS ’09), June 2009, pp. 125–142. [6] K.B. Frikken, Privacy preserving set union, in: Proceedings 5th International Conference on Applied Cryptography and Network Security (ACNS ’07), July 2007, pp. 237–252. [7] M.J. Freedman, K. Nissim, B. Pinkas, Efficient private matching and set intersection, in: Proceedings EUROCRYPT Advances in Cryptology 2004, May 2004, pp. 1–19. [8] J. Furukawa, K. Sako, An Efficient Scheme for Proving a Shuffle, in: Proceedings CRYPTO Advances in Cryptology 2001, August 2001, pp. 368–387. [9] C. Gentry, Fully homomorphic encryption using ideal lattices, in: Proceedings 41st ACM Symposium on Theory of Computing (STOC ’09), May 2009, pp. 169–178. [10] C. Gentry, S. Halevi, Implementing gentry’s fully-homomorphic encryption scheme, in press in the proceeding of EUROCRYPT 2011. (). [11] M. Dijk, C. Gentry, S. Halevi, V. Vaikuntanathan, Fully Homomorphic Encryption over the Integers, in: Proceedings Advances in Cryptology EUROCRYPT 2010, pp. 24–43, 2010. [12] O. Goldreich, Foundations of Cryptography: Volume Basic Application, Cambridge University Press, 2004. [13] C. Hazay, Y. Lindell, Efficient protocols for set intersection and pattern matching with security against malicious and covert adversaries, in: Proceedings 5th IACR Theory of Cryptography Conference (TCC ’08), March 2008, pp. 155–175.

122

J.Y. Chun et al. / Information Sciences 231 (2013) 113–122

[14] S.W. Kim, S. Park, J.I. Won, S.W. Kim, Privacy preserving data mining of sequential patterns for network traffic data, Information Sciences 178 (3) (2008). [15] L. Kissner, D. Song, Privacy-preserving set operations, in: Advances in Cryptology – Proceedings CRYPTO 2005, pp. 241–257, August 2005. [16] Y. Lindell, B. Pinkas, Privacy Preserving Data Mining, in: Advances in Cryptology – Proceedings CRYPTO 2000, pp. 36-54, August 2000. [17] R. Li, C. Wu, An unconditionally secure protocol for multi-party set intersection, in: Proceedings 5th International Conference on Applied Cryptography and Network Security (ACNS ’07), July 2007, pp. 226–236. [18] N. Matatov, L. Rokach, O. Maimon, Privacy-preserving data mining: a feature set partitioning approach, Information Sciences 180 (14) (2010) 2696– 2720. [19] C.A. Neff, A verifiable secret shuffle and its application to e-voting, in: Proceedings 8th ACM Conference on Computer and Communications Security (CCS ’01), pp. 116–125, November 2001. [20] Y. Sang, H. Shen, Privacy preserving set intersection protocol secure against malicious behaviors, in: Proceedings 8th International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT ’07), December 2007, pp. 461–468. [21] Y. Sang, H. Shen, Privacy preserving set intersection based on bilinear groups, in: Proceedings 31st Australasian conference on Computer science (ACSC ’08), December 2007, pp. 47–54. [22] Y. Sang, H. Shen, Efficient and secure protocols for privacy-preserving set operations, ACM Transactions on Information and System Security (TISSEC) 13 (1) (2009). [23] J.H. Seo, H.J. Yoon, S.G. Lim, J.H. Cheon, D.W. Hong, Analysis of privacy-preserving element reduction of a multiset, Journal of the Korean Mathematical Society 46 (1) (2009) 59–69. [24] D. Shah, S. Zhong, Two methods for privacy preserving data mining with malicious participants, Information Sciences 177 (23) (2007) 5468–5483. [25] J. Vaidya, C. Clifton, Secure set intersection cardinality with application to association rule mining, Journal of Computer Security 13 (4) (2005) 593–622. [26] Q. Ye, H. Wang, J. Pieprzyk, Distributed private matching and set operations, in: Proceedings 4th Information Security Practice and Experience Conference (ISPEC ’08), April 2008, pp. 347–360.