Semantic Link Discovery over Relational Data | SpringerLink
Skip to main content

Semantic Link Discovery over Relational Data

  • Chapter
  • First Online:
Semantic Search over the Web

Abstract

To make semantic search a reality, we need to be able to efficiently publish large data sets containing rich semantic structure. We have tools for translating relational and semi-structured data into RDF, but such translation tools do not have the goal of adding or providing the kind of semantics necessary to achieve the goals of the Semantic Web and semantic search over the Web. In this chapter, we present LinQuer, a tool for creating semantic links within a data source and between data sources. We focus on link discovery over structured (relational) data since many Semantic Web sources are the result of publishing relational data as RDF and since relational engines provide the scalability and flexibility we need for large scale link discovery. The LinQuer framework is based on the declarative specification of linkage requirements by a user. We present algorithms for translating these requirements to queries that can run over relational data sources, potentially using semantic information (such as a class hierarchy or a more general ontology) to enhance the recall of the link discovery. We show that this framework is flexible enough to permit linking real data, including dirty data (which is commonly found on the Web) and data with a variety of semantic connections.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
JPY 3498
Price includes VAT (Japan)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
JPY 5719
Price includes VAT (Japan)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
JPY 7149
Price includes VAT (Japan)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
JPY 7149
Price includes VAT (Japan)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    Part of this work has appeared in Proceedings of the 18th ACM Conference on Information and Knowledge Management [17]. ©2009 Association for Computing Machinery, Inc. Reprinted by permission.

  2. 2.

    To make our example queries simple, we assume that the databases are denormalized and we have a single table for clinical trials (trial), a table storing patient visits (visit), and tables storing DBpedia disease (dbpedia_disease) and drug (dbpedia_drug) data. In reality, the database is normalized and these relations are decomposed into multiple relations.

References

  1. Appelt, D.E.: Introduction to information extraction. AI Commun. 12(3), 161–172 (1999)

    Google Scholar 

  2. Arasu, A., Ganti, V., Kaushik, R.: Efficient exact set-similarity joins. Proceedings of the International Conference on very Large Data Bases (VLDB), pp. 918–929 (2006)

    Google Scholar 

  3. Auer, S., Dietzold, S., Lehmann, J., Hellmann, S., Aumueller, D.: Triplify: light-weight linked data publication from relational databases. International World Wide Web Conference (WWW), pp. 621–630 (2009)

    Google Scholar 

  4. Bayardo, R.J., Ma, Y., Srikant, R.: Scaling up all pairs similarity search. International World Wide Web Conference (WWW), pp. 131–140. Banff, Canada (2007)

    Google Scholar 

  5. Bilke, A., Bleiholder, J., Böhm, C., Draba, K., Naumann, F., Weis, M.: Automatic data fusion with HumMer. Proceedings of the International Conference on very Large Data Bases (VLDB), pp. 1251–1254 (2005)

    Google Scholar 

  6. Bizer, C., Lehmann, J., Kobilarov, G., Auer, S., Becker, C., Cyganiak, R., Hellmann, S.: DBpedia – a crystallization point for the web of data. J. Web Semant. 7(3), 154–165 (2009)

    Article  Google Scholar 

  7. Bizer, C., Seaborne, A.: D2RQ – treating non-RDF databases as virtual RDF graphs. Proceedings of the International Semantic Web Conference (ISWC) (2004)

    Google Scholar 

  8. Cohen, W.W.: Data integration using similarity joins and a word-based information representation language. ACM Trans. Inf. Syst. 18(3), 288–321 (2000)

    Article  Google Scholar 

  9. Cohen, W.W., Ravikumar, P., Fienberg, S.E.: A comparison of string distance metrics for name-matching tasks. Proceedings of IJCAI-03 Workshop on Information Integration on the Web (IIWeb-03), pp. 73–78. Acapulco, Mexico (2003)

    Google Scholar 

  10. Das, S., Chong, E.I., Eadon, G., Srinivasan, J.: Supporting ontology-based semantic matching in RDBMS. Proceedings of the International Conference on very Large Data Bases (VLDB), pp. 1054–1065 (2004)

    Google Scholar 

  11. Elmagarmid, A.K., Ipeirotis, P.G., Verykios, V.S.: Duplicate record detection: a survey. IEEE Trans. Knowl. Data Eng. 19(1), 1–16 (2007)

    Article  Google Scholar 

  12. Erling, O., Mikhailov, I.: Virtuoso: RDF support in a native RDBMS. Semantic Web Information Management, pp. 501–519. Springer, Berlin, Heidelberg, New York (2009)

    Google Scholar 

  13. Galhardas, H., Florescu, D., Shasha, D., Simon, E., Saita, C.A.: Declarative data cleaning: language, model, and algorithms. Proceedings of the International Conference on very Large Data Bases (VLDB), pp. 371–380 (2001)

    Google Scholar 

  14. Galperin, M.Y., Cochrane, G.: The 2011 nucleic acids research database issue and the online molecular biology database collection. Nucleic Acids Res. 39(Database-Issue), 1–6 (2011)

    Article  Google Scholar 

  15. Gravano, L., Ipeirotis, P.G., Jagadish, H.V., Koudas, N., Muthukrishnan, S., Srivastava, D.: Approximate string joins in a database (almost) for free. Proceedings of the International Conference on very Large Data Bases (VLDB), pp. 491–500 (2001)

    Google Scholar 

  16. Hassanzadeh, O.: Benchmarking declarative approximate selection predicates. Master’s Thesis, University of Toronto, Toronto, Ontario, Canada (2007)

    Google Scholar 

  17. Hassanzadeh, O., Kementsietsidis, A., Lim, L., Miller, R.J., Wang, M.: A framework for semantic link discovery over relational data. Proceedings of the Conference on Information and Knowledge Management (CIKM), pp. 1027–1036 (2009). URL http://dx.doi.org/10.1145/1645953.1646084

  18. Hassanzadeh, O., Kementsietsidis, A., Lim, L., Miller, R.J., Wang, M.: LinkedCT: a linked data space for clinical trials. CoRR abs/0908.0567(2009)

    Google Scholar 

  19. Hernández, M.A., Stolfo, S.J.: The merge/purge problem for large databases. ACM SIGMOD international conference on the management of data, pp. 127–138 (1995)

    Google Scholar 

  20. Indyk, P., Motwani, R., Raghavan, P., Vempala, S.: Locality-preserving hashing in multidimensional spaces. ACM Symposym on Theory of Computing (STOC), pp. 618–625 (1997)

    Google Scholar 

  21. Kementsietsidis, A., Lim, L., Wang, M.: Supporting ontology-based keyword search over medical databases. Proceedings of the AMIA 2008 Symposium, pp. 409–13. American Medical Informatics Association (2008)

    Google Scholar 

  22. Miller, G.A.: WordNet: a lexical database for English. Commun. ACM 38(11), 39–41 (1995)

    Article  Google Scholar 

  23. Naumann, F., Herschel, M.: An introduction to duplicate detection. Synthesis Lectures on Data Management. Morgan and Claypool Publishers, Seattle, WA USA (2010)

    Google Scholar 

  24. Sioutos, N., de Coronado, S., Haber, M.W., Hartel, F.W., Shaiu, W., Wright, L.W.: NCI thesaurus: a semantic model integrating cancer-related clinical and molecular information. J. Biomed. Inform. 40(1), 30–43 (2007)

    Article  Google Scholar 

  25. Suchanek, F.M., Kasneci, G., Weikum, G.: YAGO: a large ontology from Wikipedia and WordNet. J. Web Semant. 6(3), 203–217 (2008)

    Article  Google Scholar 

  26. Volz, J., Bizer, C., Gaedke, M., Kobilarov, G.: Discovering and maintaining links on the web of data. Proceedings of the International Semantic Web Conference (ISWC), pp. 650–665 (2009)

    Google Scholar 

  27. Yeganeh, S.H., Hassanzadeh, O., Miller, R.J.: Linking semistructured data on the web. Proceedings of the International Workshop on the Web and Databases (WebDB) (2011)

    Google Scholar 

  28. ClinicalTrials.gov, A Service of the US National Institutes of Health – http://clinicaltrials.gov/(2011). Accessed 28 July 2011

  29. State of the LOD Cloud. Version 0.2. http://www4.wiwiss.fu-berlin.de/lodcloud/state/(2011). Accessed 28 July 2011

  30. The LinQuer Project - http://purl.org/linquer(Accessed 28 July 2011).

Download references

Acknowledgements

This work has been partially supported by the NSERC Business Intelligence Network. Hassanzadeh has been supported by an IBM Graduate Fellowship. We thank Reynold S. Xin for implementation of the LinQuer API and Web interface, and improving the overall design of the system and the LinQL grammar.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Oktie Hassanzadeh .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Hassanzadeh, O., Miller, R.J., Kementsietsidis, A., Lim, L., Wang, M. (2012). Semantic Link Discovery over Relational Data. In: De Virgilio, R., Guerra, F., Velegrakis, Y. (eds) Semantic Search over the Web. Data-Centric Systems and Applications. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-25008-8_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-25008-8_8

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-25007-1

  • Online ISBN: 978-3-642-25008-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics