Preference Join on Heterogeneous Data

Wang, Changping; Wang, Chaokun; Wang, Hao; Chen, Jun; Ye, Xiaojun

doi:10.1007/978-3-319-45817-5_25

Changping Wang¹⁷,
Chaokun Wang¹⁷,
Hao Wang¹⁷,
Jun Chen¹⁷ &
…
Xiaojun Ye¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9932))

Included in the following conference series:

Asia-Pacific Web Conference

1646 Accesses
1 Citations

Abstract

There are different types of join operation dealing with different issues in database research. However, existing join operations cannot meet the increasing demands of the real world. In this paper, we define a new join operation, the preference join (p-join), which introduces the concepts of the personal preference and the satisfaction operator on various data types. We present a general join algorithm (Nested Loop) to deal with the p-join, and we also propose an advanced algorithm called MFV for p-join. To improve the MFV algorithm, two enhanced mapping methods are employed. A large number of experiments on both real-world and synthetic data sets are conducted. The experimental results demonstrate the effectiveness, efficiency and scalability of our methods, and show the advanced algorithms have advantages over the general algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: JPY 3498; Price includes VAT (Japan)

eBook: JPY 5719; Price includes VAT (Japan)

Softcover Book: JPY 7149; Price includes VAT (Japan)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Enhancing Multi-Attribute Similarity Join using Reduced and Adaptive Index Trees

Article 09 April 2024

The $$\theta $$ -Join as a Join with $$\theta $$

Skyline Join Query Processing over Multiple Relations

References

Afrati, F.N., Sarma, A.D., Menestrina, D., Parameswaran, A., Ullman, J.D.: Fuzzy joins using MapReduce. In: ICDE, pp. 498–509 (2012)
Google Scholar
Arasu, A., Ganti, V., Kaushik, R.: Efficient exact set-similarity joins. In: VLDB, pp. 918–929 (2006)
Google Scholar
Chaudhuri, S., Ganti, V., Kaushik, R.: A primitive operator for similarity joins in data cleaning. In: ICDE, pp. 5–16 (2006)
Google Scholar
Deng, D., Li, G., Feng, J.: A pivotal prefix based filtering algorithm for string similarity search. In: SIGMOD, pp. 673–684. ACM (2014)
Google Scholar
Deng, D., Li, G., Feng, J., Li, W.-S.: Top-k string similarity search with edit-distance constraints. In: ICDE (2013)
Google Scholar
Jacox, E.H., Samet, H.: Metric space similarity joins. TODS 33(2), 7:1–7:38 (2008)
Article Google Scholar
Kelion, L.: Tinder to charge older users more for premium facilities (2015). http://www.bbc.com/news/technology-31700036
Li, G., Deng, D., Wang, J., Feng, J.: Pass-join: a partition-based method for similarity joins. PVLDB 5(3), 253–264 (2012)
Google Scholar
Lu, W., Du, X., Hadjieleftheriou, M., Ooi, B.: Efficiently supporting edit distance based string similarity search using B+-trees. TKDE 26(12), 2983–2996 (2014)
Google Scholar
Molla, R.: The current state of online dating (2015). http://blogs.wsj.com/speakeasy/2015/02/27/the-current-state-of-online-dating/
Rong, C., Lu, W., Wang, X., Du, X., Chen, Y., Tung, A.: Efficient and scalable processing of string similarity join. TKDE 25(10), 2217–2230 (2013)
Google Scholar
Vernica, R., Carey, M.J., Li, C.: Efficient parallel set-similarity joins using MapReduce. In: SIGMOD, pp. 495–506 (2010)
Google Scholar
Wang, C., Wang, J., Lin, X., Wang, W., Wang, H., Li, H., Tian, W., Xu, J., Li, R.: MapDupReducer: detecting near duplicates over massive datasets. In: SIGMOD, pp. 1119–1122 (2010)
Google Scholar
Wang, J., Feng, J., Li, G.: Trie-join: efficient trie-based string similarity joins with edit-distance constraints. PVLDB 3(1–2), 1219–1230 (2010)
Google Scholar
Wang, J., Li, G., Deng, D., Zhang, Y., Feng, J.: Two birds with one stone: an efficient hierarchical framework for top-k and threshold-based string similarity search. In: ICDE, pp. 519–530. IEEE (2015)
Google Scholar
Wang, J., Li, G., Fe, J.: Fast-join: an efficient method for fuzzy token matching based string similarity join. In: ICDE, pp. 458–469 (2011)
Google Scholar
Wang, W., Qin, J., Chuan, X., Lin, X., Shen, H.: VChunkJoin: an efficient algorithm for edit similarity joins. TKDE 25(8), 1916–1929 (2013)
Google Scholar
Xiang, L., Lei, C.: Efficient similarity join over multiple stream time series. TKDE 21(11), 1544–1558 (2009)
Google Scholar
Xiao, C., Wang, W., Lin, X., Yu, J.X., Wang, G.: Efficient similarity joins for near-duplicate detection. TODS 36(3), 15:1–15:41 (2011)
Article Google Scholar
Zhao, X., Xiao, C., Lin, X., Wang, W.: Efficient graph similarity joins with edit distance constraints. In: ICDE, pp. 834–845 (2012)
Google Scholar

Download references

Acknowledgments

This work was supported in part by the National Natural Science Foundation of China (No. 61373023, No. 61170064).

Author information

Authors and Affiliations

School of Software, Tsinghua University, Beijing, 100084, China
Changping Wang, Chaokun Wang, Hao Wang, Jun Chen & Xiaojun Ye

Authors

Changping Wang
View author publications
You can also search for this author in PubMed Google Scholar
Chaokun Wang
View author publications
You can also search for this author in PubMed Google Scholar
Hao Wang
View author publications
You can also search for this author in PubMed Google Scholar
Jun Chen
View author publications
You can also search for this author in PubMed Google Scholar
Xiaojun Ye
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Chaokun Wang .

Editor information

Editors and Affiliations

School of Computing, University of Utah, Salt Lake City, Utah, USA
Feifei Li
School of Electrical Engineering, Seoul National University, Seoul, Korea (Republic of)
Kyuseok Shim
Soochow University , Suzhou, China
Kai Zheng
Soochow University , Suzhou, China
Guanfeng Liu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wang, C., Wang, C., Wang, H., Chen, J., Ye, X. (2016). Preference Join on Heterogeneous Data. In: Li, F., Shim, K., Zheng, K., Liu, G. (eds) Web Technologies and Applications. APWeb 2016. Lecture Notes in Computer Science(), vol 9932. Springer, Cham. https://doi.org/10.1007/978-3-319-45817-5_25

Download citation

DOI: https://doi.org/10.1007/978-3-319-45817-5_25
Published: 18 September 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-45816-8
Online ISBN: 978-3-319-45817-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Preference Join on Heterogeneous Data

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Enhancing Multi-Attribute Similarity Join using Reduced and Adaptive Index Trees

The $$\theta $$ -Join as a Join with $$\theta $$

Skyline Join Query Processing over Multiple Relations

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Preference Join on Heterogeneous Data

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Enhancing Multi-Attribute Similarity Join using Reduced and Adaptive Index Trees

The $$\theta $$ -Join as a Join with $$\theta $$

Skyline Join Query Processing over Multiple Relations

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation