Preference Join on Heterogeneous Data | SpringerLink
Skip to main content

Preference Join on Heterogeneous Data

  • Conference paper
  • First Online:
Web Technologies and Applications (APWeb 2016)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9932))

Included in the following conference series:

Abstract

There are different types of join operation dealing with different issues in database research. However, existing join operations cannot meet the increasing demands of the real world. In this paper, we define a new join operation, the preference join (p-join), which introduces the concepts of the personal preference and the satisfaction operator on various data types. We present a general join algorithm (Nested Loop) to deal with the p-join, and we also propose an advanced algorithm called MFV for p-join. To improve the MFV algorithm, two enhanced mapping methods are employed. A large number of experiments on both real-world and synthetic data sets are conducted. The experimental results demonstrate the effectiveness, efficiency and scalability of our methods, and show the advanced algorithms have advantages over the general algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
JPY 3498
Price includes VAT (Japan)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
JPY 5719
Price includes VAT (Japan)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
JPY 7149
Price includes VAT (Japan)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Afrati, F.N., Sarma, A.D., Menestrina, D., Parameswaran, A., Ullman, J.D.: Fuzzy joins using MapReduce. In: ICDE, pp. 498–509 (2012)

    Google Scholar 

  2. Arasu, A., Ganti, V., Kaushik, R.: Efficient exact set-similarity joins. In: VLDB, pp. 918–929 (2006)

    Google Scholar 

  3. Chaudhuri, S., Ganti, V., Kaushik, R.: A primitive operator for similarity joins in data cleaning. In: ICDE, pp. 5–16 (2006)

    Google Scholar 

  4. Deng, D., Li, G., Feng, J.: A pivotal prefix based filtering algorithm for string similarity search. In: SIGMOD, pp. 673–684. ACM (2014)

    Google Scholar 

  5. Deng, D., Li, G., Feng, J., Li, W.-S.: Top-k string similarity search with edit-distance constraints. In: ICDE (2013)

    Google Scholar 

  6. Jacox, E.H., Samet, H.: Metric space similarity joins. TODS 33(2), 7:1–7:38 (2008)

    Article  Google Scholar 

  7. Kelion, L.: Tinder to charge older users more for premium facilities (2015). http://www.bbc.com/news/technology-31700036

  8. Li, G., Deng, D., Wang, J., Feng, J.: Pass-join: a partition-based method for similarity joins. PVLDB 5(3), 253–264 (2012)

    Google Scholar 

  9. Lu, W., Du, X., Hadjieleftheriou, M., Ooi, B.: Efficiently supporting edit distance based string similarity search using B+-trees. TKDE 26(12), 2983–2996 (2014)

    Google Scholar 

  10. Molla, R.: The current state of online dating (2015). http://blogs.wsj.com/speakeasy/2015/02/27/the-current-state-of-online-dating/

  11. Rong, C., Lu, W., Wang, X., Du, X., Chen, Y., Tung, A.: Efficient and scalable processing of string similarity join. TKDE 25(10), 2217–2230 (2013)

    Google Scholar 

  12. Vernica, R., Carey, M.J., Li, C.: Efficient parallel set-similarity joins using MapReduce. In: SIGMOD, pp. 495–506 (2010)

    Google Scholar 

  13. Wang, C., Wang, J., Lin, X., Wang, W., Wang, H., Li, H., Tian, W., Xu, J., Li, R.: MapDupReducer: detecting near duplicates over massive datasets. In: SIGMOD, pp. 1119–1122 (2010)

    Google Scholar 

  14. Wang, J., Feng, J., Li, G.: Trie-join: efficient trie-based string similarity joins with edit-distance constraints. PVLDB 3(1–2), 1219–1230 (2010)

    Google Scholar 

  15. Wang, J., Li, G., Deng, D., Zhang, Y., Feng, J.: Two birds with one stone: an efficient hierarchical framework for top-k and threshold-based string similarity search. In: ICDE, pp. 519–530. IEEE (2015)

    Google Scholar 

  16. Wang, J., Li, G., Fe, J.: Fast-join: an efficient method for fuzzy token matching based string similarity join. In: ICDE, pp. 458–469 (2011)

    Google Scholar 

  17. Wang, W., Qin, J., Chuan, X., Lin, X., Shen, H.: VChunkJoin: an efficient algorithm for edit similarity joins. TKDE 25(8), 1916–1929 (2013)

    Google Scholar 

  18. Xiang, L., Lei, C.: Efficient similarity join over multiple stream time series. TKDE 21(11), 1544–1558 (2009)

    Google Scholar 

  19. Xiao, C., Wang, W., Lin, X., Yu, J.X., Wang, G.: Efficient similarity joins for near-duplicate detection. TODS 36(3), 15:1–15:41 (2011)

    Article  Google Scholar 

  20. Zhao, X., Xiao, C., Lin, X., Wang, W.: Efficient graph similarity joins with edit distance constraints. In: ICDE, pp. 834–845 (2012)

    Google Scholar 

Download references

Acknowledgments

This work was supported in part by the National Natural Science Foundation of China (No. 61373023, No. 61170064).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chaokun Wang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Wang, C., Wang, C., Wang, H., Chen, J., Ye, X. (2016). Preference Join on Heterogeneous Data. In: Li, F., Shim, K., Zheng, K., Liu, G. (eds) Web Technologies and Applications. APWeb 2016. Lecture Notes in Computer Science(), vol 9932. Springer, Cham. https://doi.org/10.1007/978-3-319-45817-5_25

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-45817-5_25

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-45816-8

  • Online ISBN: 978-3-319-45817-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics