Abstract
There are different types of join operation dealing with different issues in database research. However, existing join operations cannot meet the increasing demands of the real world. In this paper, we define a new join operation, the preference join (p-join), which introduces the concepts of the personal preference and the satisfaction operator on various data types. We present a general join algorithm (Nested Loop) to deal with the p-join, and we also propose an advanced algorithm called MFV for p-join. To improve the MFV algorithm, two enhanced mapping methods are employed. A large number of experiments on both real-world and synthetic data sets are conducted. The experimental results demonstrate the effectiveness, efficiency and scalability of our methods, and show the advanced algorithms have advantages over the general algorithms.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Afrati, F.N., Sarma, A.D., Menestrina, D., Parameswaran, A., Ullman, J.D.: Fuzzy joins using MapReduce. In: ICDE, pp. 498–509 (2012)
Arasu, A., Ganti, V., Kaushik, R.: Efficient exact set-similarity joins. In: VLDB, pp. 918–929 (2006)
Chaudhuri, S., Ganti, V., Kaushik, R.: A primitive operator for similarity joins in data cleaning. In: ICDE, pp. 5–16 (2006)
Deng, D., Li, G., Feng, J.: A pivotal prefix based filtering algorithm for string similarity search. In: SIGMOD, pp. 673–684. ACM (2014)
Deng, D., Li, G., Feng, J., Li, W.-S.: Top-k string similarity search with edit-distance constraints. In: ICDE (2013)
Jacox, E.H., Samet, H.: Metric space similarity joins. TODS 33(2), 7:1–7:38 (2008)
Kelion, L.: Tinder to charge older users more for premium facilities (2015). http://www.bbc.com/news/technology-31700036
Li, G., Deng, D., Wang, J., Feng, J.: Pass-join: a partition-based method for similarity joins. PVLDB 5(3), 253–264 (2012)
Lu, W., Du, X., Hadjieleftheriou, M., Ooi, B.: Efficiently supporting edit distance based string similarity search using B+-trees. TKDE 26(12), 2983–2996 (2014)
Molla, R.: The current state of online dating (2015). http://blogs.wsj.com/speakeasy/2015/02/27/the-current-state-of-online-dating/
Rong, C., Lu, W., Wang, X., Du, X., Chen, Y., Tung, A.: Efficient and scalable processing of string similarity join. TKDE 25(10), 2217–2230 (2013)
Vernica, R., Carey, M.J., Li, C.: Efficient parallel set-similarity joins using MapReduce. In: SIGMOD, pp. 495–506 (2010)
Wang, C., Wang, J., Lin, X., Wang, W., Wang, H., Li, H., Tian, W., Xu, J., Li, R.: MapDupReducer: detecting near duplicates over massive datasets. In: SIGMOD, pp. 1119–1122 (2010)
Wang, J., Feng, J., Li, G.: Trie-join: efficient trie-based string similarity joins with edit-distance constraints. PVLDB 3(1–2), 1219–1230 (2010)
Wang, J., Li, G., Deng, D., Zhang, Y., Feng, J.: Two birds with one stone: an efficient hierarchical framework for top-k and threshold-based string similarity search. In: ICDE, pp. 519–530. IEEE (2015)
Wang, J., Li, G., Fe, J.: Fast-join: an efficient method for fuzzy token matching based string similarity join. In: ICDE, pp. 458–469 (2011)
Wang, W., Qin, J., Chuan, X., Lin, X., Shen, H.: VChunkJoin: an efficient algorithm for edit similarity joins. TKDE 25(8), 1916–1929 (2013)
Xiang, L., Lei, C.: Efficient similarity join over multiple stream time series. TKDE 21(11), 1544–1558 (2009)
Xiao, C., Wang, W., Lin, X., Yu, J.X., Wang, G.: Efficient similarity joins for near-duplicate detection. TODS 36(3), 15:1–15:41 (2011)
Zhao, X., Xiao, C., Lin, X., Wang, W.: Efficient graph similarity joins with edit distance constraints. In: ICDE, pp. 834–845 (2012)
Acknowledgments
This work was supported in part by the National Natural Science Foundation of China (No. 61373023, No. 61170064).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Wang, C., Wang, C., Wang, H., Chen, J., Ye, X. (2016). Preference Join on Heterogeneous Data. In: Li, F., Shim, K., Zheng, K., Liu, G. (eds) Web Technologies and Applications. APWeb 2016. Lecture Notes in Computer Science(), vol 9932. Springer, Cham. https://doi.org/10.1007/978-3-319-45817-5_25
Download citation
DOI: https://doi.org/10.1007/978-3-319-45817-5_25
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-45816-8
Online ISBN: 978-3-319-45817-5
eBook Packages: Computer ScienceComputer Science (R0)