A Parallel Multi-keyword Top-k Search Scheme over Encrypted Cloud Data

Yang, Maohu; Dai, Hua; Bao, Jingjing; Yi, Xun; Yang, Geng

doi:10.1007/978-3-030-30709-7_14

Maohu Yang¹³,
Hua Dai^13,14,
Jingjing Bao¹³,
Xun Yi¹⁵ &
…
Geng Yang^13,14

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 11783))

Included in the following conference series:

IFIP International Conference on Network and Parallel Computing

1440 Accesses
3 Citations

Abstract

With searchable encryptions in the cloud computing, users can outsource their sensitive data in ciphertext to the cloud that provides efficient and privacy-preserving multi-keyword top-k searches. However, most existing top-k search schemes over encrypted cloud data are the centralize schemes which are limited in large scale data environment. To support scalable searches, we propose a parallel multi-keyword top-k search scheme over encrypted cloud data. In this scheme, the fragment-based encrypted inverted index is designed, which is indistinguishable and can be used for parallel searching. On the basis of such indexes, a Map-Reduce-based distributed computing framework is adopted to implement the parallel multi-keyword top-k search algorithms. Security analysis and experiment evaluation show that the proposed scheme is privacy-preserving, efficient and scalable.

Supported by the National Natural Science Foundation of China under the grant Nos. 61872197, 61572263, 61672297 and 61872193; the Postdoctoral Science Foundation of China under the Grand No. 2019M651919; the Natural Research Foundation of NJUPT under the grand No. NY217119; the Natural Science Foundation of Anhui Province under the grant No. 1608085MF127; the University Natural Science Foundation of Anhui Province under the grant No. KJ2017A419.

You have full access to this open access chapter, Download conference paper PDF

Inverted Index Based Ranked Keyword Search in Multi-user Searchable Encryption

Verifiable top-k searchable encryption for cloud data

Article 20 December 2019

A Privacy-Preserving Top-k Query Processing Algorithm in the Cloud Computing

Keywords

1 Introduction

With the rapid development of computer technology and internet application, data in many areas are growing exponentially, thus the demand for large and scalable storage and computation is becoming urgent. More and more enterprises and individuals outsource their storage and computation to the cloud for using data anytime and anywhere and saving costs of hardware and software [1]. However, while enjoying the benefits of cloud computing, users have to face the risk that sensitive outsourced data could be leaked or abused because cloud service providers can access the data without authorization. Therefore, data owners usually encrypt data before outsourcing [2]. Although encryption preserves the security of data, it also affects the data availability. In this scenario, searchable encryptions (SE) [3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18] that guarantee the security and availability of data have been proposed.

At present, most solutions are based on the vector space model (VSM) and TF-IDF model which extract keywords of documents into “points” in multi-dimensional space and describe the relevance scores between documents and search keywords. The top-k documents are determined by comparing the relevance scores. However, if a scheme calculates the relevance scores between every document and the search keywords, it will cost a large amount of time and computing resources. To improve search efficiency, researchers have provided a variety of schemes. Song et al. [3] proposed the first SE scheme where users need to traverse entire documents while searching, and the search time is proportional to the amount of data set. Goh et al. [4] proposed a search scheme based on the Bloom Filter. Curtmola et al. [5] proposed an efficient SE scheme based on the inverted index, but using this scheme could expose the privacy of keywords.

Cao et al. [6] proposed a new structure to adapt to multi-keyword search, but it’s search time increases exponentially when document size grows. Xia et al. [8] proposed a secure and dynamic multi-keyword ranked search scheme which can reduce the large inner product calculation by pruning function. Jiang et al. [9] proposed a secure ciphertext search scheme based on the inverted index, which avoids calculating the relevance scores of irrelevant documents. Chen et al. [10] proposed a method based on data mining, which can achieve linear time complexity with the exponential growth of the document set. Our previous work [11] also proposed a hierarchical agglomerative clustering tree index scheme, which can perform an effective and verifiable ranked search.

However, the above existing schemes need to load the complete indexes into memory at one time for performing search. Because the index size is proportional to the number of documents, when the scale of documents grows to a certain level, the memory will be overflow. Moreover, the indexes of those schemes should be kept in integrity and cannot be segmented, which also limits their scalability. To conquer such limitation, we propose a parallel privacy-preserving top-k search (PPTS) scheme, which can meet the requirements of large scale data. First, we propose a fragment-based encrypted inverted index model. In data preprocessing and outsourcing phase, the indistinguishable fragment-based encrypted inverted indexes are constructed and outsourced to the cloud together with the encrypted documents. In the search phase, the Map-Reduce-based distributed computing framework is adopted and the parallel multi-keyword top-k search algorithms are proposed. After that, we analyze the security of PPTS and perform experiments to evaluate its efficiency.

The contributions of this paper are: (1) We present the fragment-based encrypted inverted index model which is indistinguishable through adding random paddings. (2) By adopting the Map-Reduce-based distributed computing framework, the parallel multi-keyword top-k search algorithms are proposed. (3) We analyze the security and evaluate the search performance. The result shows that the proposed scheme can realize parallel search while preserving data privacy.

2 Models and Problem Formulation

2.1 Notations and Preliminaries

D: The document set, $D=\{d_1, d_2,...,d_n\}$. $\widetilde{D}$ is the encrypted form.
n: The number of documents in D.
W: The dictionary, namely, the set of keywords, denoted as $W=\big \{w_1,$$ w_2,..., w_m\big \}$.
m: The number of keywords in W.
Q: The query consisting of a set of the search keywords, $Q = \{w_1, w_2,..., w_q\}$.
$V_{d_i}$: The m-dimensional document vector of $d_i$. $\widetilde{V}_{d_i}$ is the encrypted form.
V: The document vector set, $V = \{V_{d_1}, V_{d_2},..., V_{d_n}\}$. $\widetilde{V}$ is the encrypted form.
$V_q$: The m-dimensional query vector for Q. $\widetilde{V}_q$ is the encrypted form.
TD: The trapdoor for the search request.
RS: The result of the search.
$ P_{i,j}$: The posting corresponding to the document $d_j$ containing keyword $w_i$, $P_{i,j}={<}id(d_j), V_{d_j}{>}$. $\widetilde{P}_{i,j}$ is the encrypted form.
$PL_i$: The posting list of keyword $w_i$, $PL_i=\{P_{i,1}, P_{i,2}, ..., P_{i,\delta }\}$. $\widetilde{PL}_i$ is the encrypted form.
$\delta $: The number of postings in $PL_i$.
$\varepsilon $: The fragmentation parameter.
F: The fragmented documents of D according to $\varepsilon $, $F=\{F_1, F_2,..., F_t\}$.
t: The number of fragments of F.
$\beta $: The number of posting list in $F_i$.

Vector Space Model (VSM) and TF-IDF Model. The VSM and TF-IDF are widely used in multi-keyword privacy-preserving top-k search [6,7,8,9,10,11,12,13]. The term frequency (TF) refers to the number of times a given keyword or term appears in documents, while the inverse document frequency (IDF) is equal to the total number of documents in the set divided by the number of documents containing a given keyword. VSM is used to convert a given document $d_i$ and search keywords Q into vectors $V_{d_i}$ and $V_q$. The calculation of those vectors can be referred to [6,7,8,9,10,11,12,13].

Secure Inner Product Operation. This scheme uses the secure inner product operation to calculate the inner product of two encrypted vectors without knowing the plaintext value. The basic idea of this is as follows. Assuming that p and q are two n-dimensional vectors and M is a random $n\times n$-dimensional invertible matrix. M is treated as the secure key. The encrypted form of p and q are denoted as $\widetilde{p}$ and $\widetilde{q}$ respectively, where $\widetilde{p}=pM^{-1} $ and $\widetilde{q}=qM^T$. Then we have $\widetilde{p}\cdot \widetilde{q} = (pM^{-1})\cdot (qM^T)=pM^{-1}(qM^T)^T = pM^{-1}Mq = p\cdot q$, i.e. $\widetilde{p}\cdot \widetilde{q}=p\cdot q$. Therefore, we have that the inner product of two encrypted vectors equals the inner product of the corresponding two plaintext vectors.

Inverted Index. Inverted index can be used to quickly find those documents containing a given keyword by mapping to improves search efficiency. It consists of dictionary and posting list. The dictionary is a collection of all keywords that appeared in the D. Each index item in inverted index records a keyword and a pointer to the posting list, which is the entry of posting. The posting list records a list of all documents that contain a specified keyword. Each record in the posting list is a posting that describes the information of the document.

2.2 System Model

The system model is shown in Fig. 1, which is the same as [6,7,8,9,10,11, 14,15,16]. It includes three different entities. Data owners (DO) are responsible for constructing fragment-based encrypted inverted indexes ($\widetilde{I}$), and outsourcing the encrypted indexes and documents to the cloud server (CS). CS provide the search service in parallel according to the search request submitted by data users (DU). DU construct a search trapdoor based on its needs and send it to CS, then wait for CS to return the search results.

2.3 Problem Description

We adopt the “Honest-but-Curious” threat model. In this model, CS honestly and correctly executes instructions in the designated protocol. However, CS can analyze stored data and try to snoop on sensitive information.

The search result of PPTS is represented as RS. $V_q$ is the query vector of Q. $V_{d_i}$ and $V_{d_j}$ respectively represent the document vector of $d_i$ and $d_j$. Then, RS meets the requirement:

$$|RS|=k \wedge \forall d_i,d_j(d_i \in RS \wedge d_j \in (D - RS)) \rightarrow V_{d_i} \cdot V_q > V_{d_j} \cdot V_q.$$

The PPTS should satisfy three goals. First, the contents are directly seen by CS only include encrypted documents, indexes, and trapdoors, that is, the confidentiality of documents, indexes and trapdoors cannot be leaked. Second, PPTS can handle the search requirements of large document sets in parallel with Map-Reduce parallel search framework. Third, PPTS should fully guarantee the accuracy of search, that is, to improve the efficiency without reducing the accuracy.

2.4 Search Framework

To clearly describe the scheme proposed in this paper, we define a framework for the PPTS scheme. As shown in Fig. 2, the search model is composed of five modules: GenKey, Setup, BuildIndex, GenTrapdoor, and Search.

Genkey: DO generate the key for encryption, and share it with DU.
Setup: DO preprocess the document set D, generate a document vector for each document, and encrypt the D.
BuildIndex: DO fragment the D and then construct an indistinguishable inverted index to provide the CS to perform the search service.
GenTrapdoor: DU generate a trapdoor based on the search keywords.
Search: CS perform the top-k search service in parallel according to the TD and $\widetilde{I}$, and return the RS that satisfy the condition to the DU.

3 Parallel Privacy-Preserving Top-k Search Scheme

3.1 Fragment-Based Encrypted Inverted Indexes Model

Definition 1

Document Fragmentation. The document set D is divided into equal lengths according to the parameter $\varepsilon $. The generated fragmented documents are denoted as $F=\{F_1, F_2,..., F_t\}$, satisfying Formula 3 and 4.

$$\begin{aligned} |F_1| = |F_2| = ... = |F_{t-1}| = \varepsilon , 1 \le |F_t| \le \varepsilon \end{aligned}$$

(1)

$$\begin{aligned} D = F_1 \cup F_2 \cup ... \cup F_t \end{aligned}$$

(2)

where |X| represent the number of elements contained in the list or set X.

Definition 2

Parameter-($\delta , \beta $). $\delta $ is the maximum number of documents containing a certain keyword $w_j$ in a certain fragment $F_i$. And $\beta $ is the maximum number of keywords contained in a certain fragment of the F.

$$\begin{aligned} {\left\{ \begin{array}{ll} \delta = max \{ |F_{v,j}|\ |\ 1< v< t, 1< j< m \}\\ \beta =max \{ |W_v|\ |\ 1<v<t, 1<j<m \} \end{array}\right. } \end{aligned}$$

(3)

where $F_{v,j} \subset F_v$ is the document set containing the keywords $w_j$ in fragment $F_v$, and $W_v \subset W$ is the keywords set contained in fragment $F_v$.

Definition 3

Fragment-based Encrypted Inverted Indexes. $\widetilde{I}=\{\widetilde{I}_1, \widetilde{I}_2, ..., \widetilde{I}_t\}$. Here, $\widetilde{I}_v \in \widetilde{I} $ is an encrypted inverted index corresponding to fragment $F_v$. Each row in $\widetilde{I}_v$ is ${<} tag_j, \widetilde{PL}_j {>}$, corresponding to a keyword $w_j$. $tag_j$ is a hash-based message authentication code of $w_j$ generated by key c, $tag_j = hash(c, w_j)$. $\widetilde{PL}_j$ is the encrypted posting list of $w_j$. To protect the private information of keyword frequencies, we make the index $\widetilde{I}_v$ corresponding to a fragment $F_v \in F$ has the same number of rows and the posting list in each row has the same number of posting. Thus, the generated indexes $\widetilde{I}$ indistinguishable. The construction of the index $\widetilde{I}_v$ is given as follows:

(1)
For any $d_i\in F_{v,j}$, the corresponding posting $\widetilde{P}_{j,i}=\, <id(d_i),\widetilde{V}_{d_i}>$ is generated to form the $\widetilde{PL}_j$. Here, $id(d_i)$ is the id information of document $d_i$. If $|F_{v,j}| < \delta $, $\delta - |F_{v,j}|$ different artificial padding $\widetilde{P}_{j,s}=\, <id(d_s), \widetilde{V}_{d_s}{'}> $ are constructed to add to the $\widetilde{PL}_j$. $d_s$ represents a randomly selected document that satisfies $d_s \in F_v - F_{v,j}$. $V_{d_s}{'}$ represents a randomly generated m-dimensional vector, and the values of each dimension are as follows:
$$\begin{aligned} V_{d_s}{'}[k] = {\left\{ \begin{array}{ll} 0, &{}k \ne j \\ rand(min\{V_{d_i}[k]\}), &{} k=j \wedge d_i \in F_{v,j} \\ \end{array}\right. } \end{aligned}$$
(4)
(2)
For any $w_j \in W_v$, the corresponding row ${<} tag_j, \widetilde{PL}_j {>}$ is generated according to the above steps to form the $\widetilde{I}_v$. If $|W_v| < \beta $, $\beta - |W_v|$ different artificial rows ${<} tag_s, \widetilde{PL}_s {>}$ are constructed to add to the $\widetilde{I}_v$. $tag_s$ is generated by randomly selected keyword $w_s$, where $w_s \in W - W_v$. $\widetilde{PL}_s$ is composed of $\delta $ artificial padding generated by the above steps.

We take an example to explain the above definitions. We assume $D=\{d_i|i=1, ..., 10\}$ and $W=\{w_1,w_2,w_3\}$. D is divided to $F_1=\{d_1, d_2, d_3, d_4\}$, $F_2=\{d_5, d_6, d_7, d_8\}$, $F_3=\{d_9, d_{10}\}$. Then, $\delta = 3$ and $\beta =3 $ are calculated. Finally, the indexes $\widetilde{I}=\{\widetilde{I_1}, \widetilde{I_2}, \widetilde{I_3}\}$ are generated as shown in Fig. 3.

3.2 Data Preprocessing and Outsourcing

The data preprocessing and outsourcing of PPTS are mainly performed by DO which include three algorithms: GenKey, Setup and BuildIndex.

$\varvec{K \leftarrow GenKey(1^ \lambda )}$: On input a security parameter $\lambda $, the key generation algorithm output the key K. DO randomly generate the key sk, $c\in \{0,1\}^ \lambda $, an m-dimensional vector S and two $m\times m$ invertible matrices $M_1$, $M_2$. Finally, the key $K=(sk, c, S, M_1, M_2)$ is formed. K is shared between DO and DU but is private to CS.

$\varvec{(\widetilde{V}, \widetilde{D}) \leftarrow Setup(D)}$: On input the document set D, this algorithm output the encrypted document vector set $\widetilde{V}$ and encrypted document set $\widetilde{D}$. DU encrypts document $d_i$ into $\widetilde{d_i}$ using sk. Then DU generate document vector $V_{d_i}$ according to VSM and TF-IDF models. The key S is used to split the document vector $V_{d_i}$ into $V_{d_i}^{'}$ and $V_{d_i}^{''}$ according to the following formula, and then the reversible matrices $M_1$ and $M_2$ are used to encrypt $V_{d_i}$ to $\widetilde{V_{d_i}}=(M_1^{T} V_{d_i}^{'}, M_2^{T} V_{d_i}^{''})$. Finally, the generated $\widetilde{d_i}$ and $\widetilde{V_{d_i}}$ are added to $\widetilde{D}$ and $\widetilde{V}$ respectively.

$$\begin{aligned} {\left\{ \begin{array}{ll} V_{d_i}^{'}[j]=V_{d_i}^{''}[j]=V_{d_i}[j], \quad &{}S[j]=0\\ V_{d_i}^{'}[j]+V_{d_i}^{''}[j]=D_i[j], \quad &{}S[j]=1 \end{array}\right. } \end{aligned}$$

(5)

$\varvec{\widetilde{I} \leftarrow BuildIndex(K, D, \widetilde{V}, \varepsilon )}$: This algorithm is run by DU to generate encrypted indexes. Its inputs are the key K, the document set D, the encrypted document vector set $\widetilde{V}$, and the fragmentation parameter $\varepsilon $, and output the $\widetilde{I}$. Procedures of this algorithm is shown in Algorithm 1 where DO first divides D into fragments and then builds an encrypted index for each fragment. Since the operations on each fragment are exactly the same after fragmented, the data preprocessing stage can be executed in parallel. Finally, DO outsource the $\widetilde{D}$ and $\widetilde{I}$ to CS.

3.3 Map-Reduce-Based Top-k Search

The Map-Reduce-based top-k search phase of PPTS is performed by DU and CS. DU generate a search trapdoor TD and submit it to CS. CS perform the Map operation according to TD to obtain the k documents most relevant to each fragment, and then perform Reduce operation to merge and rank the previously acquired documents to generate the final top-k results. This phase mainly contains two polynomial-time algorithms: GenTrapdoor and Search.

$\varvec{TD \leftarrow GenTrapdoor (K, Q, k)}$: This algorithm takes a plaintext query containing the key K, the search keyword set Q, and the number of documents to be returned k, and outputs the encrypted query as a trapdoor TD. Its goal is to protect the keyword information in the query from CS. The construction process of TD as the following steps:

(1)
The query vector $V_q$ is constructed according to Q. If $w_i\in Q$, the IDF of $w_i$ is stored in $V_q[i]$, otherwise, the value of $V_q[i]$ is 0. Then, according to the following formula, $V_q$ is split into two vectors $V_q^{'}$ and $V_q^{''}$. Finally, $V_q^{'}$ and $V_q^{''}$ are encrypted with reversible matrices $M_1$ and $M_2$ to obtain the encrypted query vector $\widetilde{V_q}=(M_1^{-1}V_q{'}, M_2^{-1}V_q{''})$.
$$\begin{aligned} {\left\{ \begin{array}{ll} V_q^{'}[j]+V_q^{''}[j]=V_q[j], \quad S[j]=0\\ V_q^{'}[j]=V_q^{''}[j]=V_q[j], \quad S[j]=1 \end{array}\right. } \end{aligned}$$
(6)
(2)
The hash-based message authentication code $tag_i$ of $w_i$ is calculated and constitutes the set $T = \{tag_i\ |\ tag_i= hash(c,w_i) \wedge w_i\in Q\}$.
(3)
Output $TD=(T, \widetilde{V}_q, k)$.

$\varvec{RS \leftarrow Search (\widetilde{I}, TD)}$: When CS receives the trapdoor TD, it performs the top-k search in parallel on the basis of the indexes $\widetilde{I}$, and then returns the result encrypted documents. The standard Map-Reduce model is adopted to find the top-k relevant documents. In the Map stage, local top-k result is obtained in each fragment. In the Reduce phase, all local top-k results are merged to obtain the global top-k result is calculated. Detailed procedures are shown in Algorithms 2 and 3.

According to the structure of index and the top-k search algorithms, we have that each encrypted inverted index for a fragment is independent and the top-k search follows the Map-Reduce model. Thus, the proposed search scheme is scalable. It means that, when the volume of outsourced data grows, the search efficiency can be preserved by adding servers.

4 Security Analysis

This chapter mainly elaborates PPTS from two aspects of security and efficacy. Security is to analyze the confidentiality of documents, indexes, and trapdoors. Efficacy is to analyze the scalability of PPTS, and prove that it has the capability of parallel search and can store large document sets.

Theorem 1

PPTS satisfies privacy requirements.

Proof

First, the symmetric encryption algorithm is used to encrypt documents in PPTS, which can protect the privacy of documents when the key is not leaked. Second, the security of the $\widetilde{I}$ is guaranteed by random mapping of keywords, filling redundant values and matrix encryption. Because of the characteristics of the hash-based message authentication code, attackers cannot recover keyword information according to the codes. The document vectors and query vectors are encrypted by matrix encryption technology. Therefore, it can be fully proved that the indexes and trapdoors of PPTS are confidential. In addition, the problem of correlation between similar trapdoors can be solved by randomly adding redundant values to the search trapdoors.

Theorem 2

PPTS has parallel execution capability.

Proof

First, the indexes $\widetilde{I}$ is designed according to the parallel computing framework Map-Reduce, that is, both Map and Reduce phases can be executed in parallel. Second, HDFS as a distributed file system can store large data. On the Hadoop cluster, DO do not need to pay attention to the details of data storage and transmission. It only needs to submit the $\widetilde{D}$ and the $\widetilde{I}$ to Hadoop to provide a secure, stable and effective search service for DU, which has very high practicability. Also, the execution capability can be linearly improved by server expansions, providing almost unlimited processing power.

Theorem 3

The accuracy and privacy of search are not affected by artificial paddings and rows.

Proof

We assume that $\widetilde{P}_{j,s} = \, < id(d_s), \widetilde{V}_{d_s}{'}>$ is an artificial padding added to $\widetilde{PL}_j$ corresponding to the keyword $w_j$. $\forall d_i \in F_{v,j}$ satisfies the following equation:

$$\begin{aligned} Score(V_{d_s}{'}, V_q) = V_{d_s} \cdot V_q = rand(min\{V_{d_i}[j]\}) \times V_q[j] < Score(V_{d_i}{'}, V_q) \end{aligned}$$

(7)

Therefore, the relevance score corresponding to the padding must be lower than any document in $F_{v,j}$. When searching, DU only focuses on the documents with the highest relevance score, so the added paddings will not affect the accuracy of the search results. By adding paddings and rows, the posting list corresponding to each keyword is equal in length and to each fragment equal in width. Therefore, it is impossible to judge whether it is artificial padding or row based on the length of the posting list. Because $d_s \notin F_{v,j} \wedge d_s \in F_v$, $d_s$ has uniqueness and indistinguishability in posting list $PL_j$. As a result, the added paddings will not affect the privacy of the search results.

5 Performance Evaluation

To evaluate the performance of PPTS, we implement it on the Hadoop platform and compared time cost with SPTS. Here, SPTS is a sequential privacy-preserving top-k search scheme running on a single server. In other words, SPTS use one server to search each $\widetilde{I}_v \in \widetilde{I}$ sequentially. We extent the New York Times Dataset [19] to generate our experimental dataset which has 3,600,000 documents and 228,623 keywords are extracted. We implement the schemes using Java in Hadoop platform with three servers. Each server has 3.2 GHz, 8-core CPU, 16 G memory and 1 T hard disk. Default parameters are $n=3{,}600{,}000,|Q|=15,$$k=5$, and $\varepsilon =30{,}000$ which are the number of documents, search keywords, search documents and fragmentation parameter respectively.

In the following experiments, we evaluate the time cost of searches where one of the parameters n, k, and |Q| changes and the other parameters adopt the default values. The results are shown in Figs. 4, 5 and 6.

Figures 4, 5 and 6 all show that the proposed PPTS outperforms SPTS in the time cost of ranked searches, and the former saves at least 80% of the time cost compared with the latter. The reason is that both PPTS and SPTS are based on the inverted index. In the inverted index, the number of candidate posting corresponding to search keywords is positively correlated with the number of documents and search keywords, but is not affected by the value of search documents. As the number of documents or search keywords increases, more resources are needed to calculate the relevance score. SPTS only use a single server with limited processing power, which can possibly reach its processing bottleneck and make the search speed slower and slower. However, PPTS use multiple servers to perform the search at the same time, and the task pressure is shared on multiple servers, so the time cost will not increase too much.

6 Conclusion

In this paper, we propose a parallel privacy-preserving top-k search scheme over encrypted cloud data. In this scheme, the fragment-based encrypted inverted index is designed, which is indistinguishable and can be used for parallel searching. On the basis of such indexes, the Map-Reduce-based distributed computing framework is adopted and the parallel multi-keyword top-k search algorithms are proposed. Security analysis and experiment evaluation show that the proposed scheme is privacy-preserving, efficient and scalable.

References

González, L.M.V., Rodero-Merino, L., Caceres, J., Lindner, M.A.: A break in the clouds: towards a cloud definition. Comput. Commun. Rev. 39(1), 50–55 (2008)
Article Google Scholar
Kamara, S., Lauter, K.: Cryptographic cloud storage. In: Sion, R., et al. (eds.) FC 2010. LNCS, vol. 6054, pp. 136–149. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-14992-4_13
Chapter Google Scholar
Song, D.X., Wagner, D.A., Perrig, A.: Practical techniques for searches on encrypted data. In: 2000 IEEE Symposium on Security and Privacy, Berkeley, California, USA, pp. 44–55, May 2000
Google Scholar
Goh, E.: Secure indexes. IACR Cryptology ePrint Archive, vol. 2003, p. 216 (2003)
Google Scholar
Curtmola, R., Garay, J.A., Kamara, S., Ostrovsky, R. : Searchable symmetric encryption: improved definitions and efficient constructions. In: Proceedings of the 13th ACM Conference on Computer and Communications Security, CCS 2006, Alexandria, VA, USA, pp. 79–88 (2006)
Google Scholar
Cao, N., Wang, C., Li, M., Ren, K., Lou, W.: Privacy-preserving multi-keyword ranked search over encrypted cloud data. In: 30th IEEE International Conference on Computer Communications, Joint Conference of the IEEE Computer and Communications Societies, INFOCOM 2011, pp. 829–837, April 2011
Google Scholar
Sun, W., Wang, B., Cao, N., Li, M., Lou, W.: Verifiable privacy-preserving multi-keyword text search in the cloud supporting similarity-based ranking. IEEE Trans. Parallel Distrib. Syst. 25(11), 3025–3035 (2014)
Article Google Scholar
Xia, Z., Wang, X., Sun, X., Wang, Q.: A secure and dynamic multi-keyword ranked search scheme over encrypted cloud data. IEEE Trans. Parallel Distrib. Syst. 27(2), 340–352 (2016)
Article Google Scholar
Jiang, X., Yu, J., Yan, J., Hao, R.: Enabling efficient and verifiable multi-keyword ranked search over encrypted cloud data. Inf. Sci. 403, 22–41 (2017)
Article Google Scholar
Chen, C., Zhu, X., Shen, P., Hu, J., Guo, S.: An efficient privacy-preserving ranked keyword search method. IEEE Trans. Parallel Distrib. Syst. 27(4), 951–963 (2016)
Article Google Scholar
Zhu, X., Dai, H., Yi, X., Yang, G., Li, X.: MUSE: an efficient and accurate verifiable privacy-preserving multikeyword text search over encrypted cloud data. Secur. Commun. Netw. 2017, 1 923 476:1–1 923 476:17 (2017)
Article Google Scholar
Fu, Z., Wu, X., Guan, C., Sun, X., Ren, K.: Toward efficient multi-keyword fuzzy search over encrypted outsourced data with accuracy improvement. IEEE Trans. Inf. Forensics Secur. 11(12), 2706–2716 (2016)
Article Google Scholar
Ge, X., Yu, J., Hu, C., Zhang, H., Hao, R.: Enabling efficient verifiable fuzzy keyword search over encrypted data in cloud computing. IEEE Access 6, 45 725–45 739 (2018)
Article Google Scholar
Guo, C., Zhuang, R., Chang, C., Yuan, Q.: Dynamic multi-keyword ranked search based on bloom filter over encrypted cloud data. IEEE Access 7, 35 826–35 837 (2019)
Article Google Scholar
Sun, W., et al.: Privacy-preserving multi-keyword text search in the cloud supporting similarity-based ranking. In: 8th ACM Symposium on Information, Computer and Communications Security, ASIA CCS 2013, Hangzhou, China, pp. 71–82, May 2013
Google Scholar
Yang, Y., Zhan, Y., Liu, J., Liu, X., Yuan, F., Zhong, S.: Chinese multi-keyword fuzzy rank search over encrypted cloud data based on locality-sensitive hashing. J. Inf. Sci. Eng. 35(1), 137–158 (2019)
Google Scholar
Zhang, R., Xue, R., Yu, T., Liu, L.: Dynamic and efficient private keyword search over inverted index-based encrypted data. ACM Trans. Internet Technol. 16(3), 21:1–21:20 (2016)
Article Google Scholar
Wang, H., Dong, X., Cao, Z.: Secure and efficient encrypted keyword search for multi-user setting in cloud computing. Peer-To-Peer Netw. Appl. 12(1), 32–42 (2019)
Article Google Scholar
B. D: New York times dataset[db/ol] (2018). http://developer.nytimes.com/docs

Download references

Author information

Authors and Affiliations

Nanjing University of Posts and Telecommunications, Nanjing, 210023, China
Maohu Yang, Hua Dai, Jingjing Bao & Geng Yang
Jiangsu Security and Intelligent Processing Lab of Big Data, Nanjing, 210023, China
Hua Dai & Geng Yang
Royal Melbourne Institute of Technology University, Melbourne, 3001, Australia
Xun Yi

Authors

Maohu Yang
View author publications
You can also search for this author in PubMed Google Scholar
Hua Dai
View author publications
You can also search for this author in PubMed Google Scholar
Jingjing Bao
View author publications
You can also search for this author in PubMed Google Scholar
Xun Yi
View author publications
You can also search for this author in PubMed Google Scholar
Geng Yang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hua Dai .

Editor information

Editors and Affiliations

Shanghai University of Finance and Economics, Shanghai, China
Xiaoxin Tang
Shanghai Jiao Tong University, Shanghai, China
Quan Chen
IBM T. J. Watson Research Center, Yorktown Heights, NY, USA
Pradip Bose
Tsinghua University, Beijing, China
Weiming Zheng
University of California, Irvine, CA, USA
Jean-Luc Gaudiot

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Yang, M., Dai, H., Bao, J., Yi, X., Yang, G. (2019). A Parallel Multi-keyword Top-k Search Scheme over Encrypted Cloud Data. In: Tang, X., Chen, Q., Bose, P., Zheng, W., Gaudiot, JL. (eds) Network and Parallel Computing. NPC 2019. Lecture Notes in Computer Science(), vol 11783. Springer, Cham. https://doi.org/10.1007/978-3-030-30709-7_14

Download citation

DOI: https://doi.org/10.1007/978-3-030-30709-7_14
Published: 29 September 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-30708-0
Online ISBN: 978-3-030-30709-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Federation for Information Processing (opens in a new tab)

A Parallel Multi-keyword Top-k Search Scheme over Encrypted Cloud Data

Abstract

Similar content being viewed by others

Inverted Index Based Ranked Keyword Search in Multi-user Searchable Encryption

Verifiable top-k searchable encryption for cloud data

A Privacy-Preserving Top-k Query Processing Algorithm in the Cloud Computing

Keywords

1 Introduction

2 Models and Problem Formulation

2.1 Notations and Preliminaries

2.2 System Model

2.3 Problem Description

2.4 Search Framework

3 Parallel Privacy-Preserving Top-k Search Scheme

3.1 Fragment-Based Encrypted Inverted Indexes Model

Definition 1

Definition 2

Definition 3

3.2 Data Preprocessing and Outsourcing

3.3 Map-Reduce-Based Top-k Search

4 Security Analysis

Theorem 1

Proof

Theorem 2

Proof

Theorem 3

Proof

5 Performance Evaluation

6 Conclusion

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Societies and partnerships

Search

Navigation