Semantic Link Discovery over Relational Data

Hassanzadeh, Oktie; Miller, Renée J.; Kementsietsidis, Anastasios; Lim, Lipyeow; Wang, Min

doi:10.1007/978-3-642-25008-8_8

Oktie Hassanzadeh⁴,
Renée J. Miller⁴,
Anastasios Kementsietsidis⁵,
Lipyeow Lim⁶ &
…
Min Wang⁷

Part of the book series: Data-Centric Systems and Applications ((DCSA))

1420 Accesses

Abstract

To make semantic search a reality, we need to be able to efficiently publish large data sets containing rich semantic structure. We have tools for translating relational and semi-structured data into RDF, but such translation tools do not have the goal of adding or providing the kind of semantics necessary to achieve the goals of the Semantic Web and semantic search over the Web. In this chapter, we present LinQuer, a tool for creating semantic links within a data source and between data sources. We focus on link discovery over structured (relational) data since many Semantic Web sources are the result of publishing relational data as RDF and since relational engines provide the scalability and flexibility we need for large scale link discovery. The LinQuer framework is based on the declarative specification of linkage requirements by a user. We present algorithms for translating these requirements to queries that can run over relational data sources, potentially using semantic information (such as a class hierarchy or a more general ontology) to enhance the recall of the link discovery. We show that this framework is flexible enough to permit linking real data, including dirty data (which is commonly found on the Web) and data with a variety of semantic connections.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: JPY 3498; Price includes VAT (Japan)

eBook: JPY 5719; Price includes VAT (Japan)

Softcover Book: JPY 7149; Price includes VAT (Japan)

Hardcover Book: JPY 7149; Price includes VAT (Japan)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

KartoGraphI: Drawing a Map of Linked Data

Materialization of OWL Ontologies from Relational Databases: A Practical Approach

LIMES: A Framework for Link Discovery on the Semantic Web

Article Open access 17 March 2021

Notes

1.
Part of this work has appeared in Proceedings of the 18th ACM Conference on Information and Knowledge Management [17]. ©2009 Association for Computing Machinery, Inc. Reprinted by permission.
2.
To make our example queries simple, we assume that the databases are denormalized and we have a single table for clinical trials (trial), a table storing patient visits (visit), and tables storing DBpedia disease (dbpedia_disease) and drug (dbpedia_drug) data. In reality, the database is normalized and these relations are decomposed into multiple relations.

References

Appelt, D.E.: Introduction to information extraction. AI Commun. 12(3), 161–172 (1999)
Google Scholar
Arasu, A., Ganti, V., Kaushik, R.: Efficient exact set-similarity joins. Proceedings of the International Conference on very Large Data Bases (VLDB), pp. 918–929 (2006)
Google Scholar
Auer, S., Dietzold, S., Lehmann, J., Hellmann, S., Aumueller, D.: Triplify: light-weight linked data publication from relational databases. International World Wide Web Conference (WWW), pp. 621–630 (2009)
Google Scholar
Bayardo, R.J., Ma, Y., Srikant, R.: Scaling up all pairs similarity search. International World Wide Web Conference (WWW), pp. 131–140. Banff, Canada (2007)
Google Scholar
Bilke, A., Bleiholder, J., Böhm, C., Draba, K., Naumann, F., Weis, M.: Automatic data fusion with HumMer. Proceedings of the International Conference on very Large Data Bases (VLDB), pp. 1251–1254 (2005)
Google Scholar
Bizer, C., Lehmann, J., Kobilarov, G., Auer, S., Becker, C., Cyganiak, R., Hellmann, S.: DBpedia – a crystallization point for the web of data. J. Web Semant. 7(3), 154–165 (2009)
Article Google Scholar
Bizer, C., Seaborne, A.: D2RQ – treating non-RDF databases as virtual RDF graphs. Proceedings of the International Semantic Web Conference (ISWC) (2004)
Google Scholar
Cohen, W.W.: Data integration using similarity joins and a word-based information representation language. ACM Trans. Inf. Syst. 18(3), 288–321 (2000)
Article Google Scholar
Cohen, W.W., Ravikumar, P., Fienberg, S.E.: A comparison of string distance metrics for name-matching tasks. Proceedings of IJCAI-03 Workshop on Information Integration on the Web (IIWeb-03), pp. 73–78. Acapulco, Mexico (2003)
Google Scholar
Das, S., Chong, E.I., Eadon, G., Srinivasan, J.: Supporting ontology-based semantic matching in RDBMS. Proceedings of the International Conference on very Large Data Bases (VLDB), pp. 1054–1065 (2004)
Google Scholar
Elmagarmid, A.K., Ipeirotis, P.G., Verykios, V.S.: Duplicate record detection: a survey. IEEE Trans. Knowl. Data Eng. 19(1), 1–16 (2007)
Article Google Scholar
Erling, O., Mikhailov, I.: Virtuoso: RDF support in a native RDBMS. Semantic Web Information Management, pp. 501–519. Springer, Berlin, Heidelberg, New York (2009)
Google Scholar
Galhardas, H., Florescu, D., Shasha, D., Simon, E., Saita, C.A.: Declarative data cleaning: language, model, and algorithms. Proceedings of the International Conference on very Large Data Bases (VLDB), pp. 371–380 (2001)
Google Scholar
Galperin, M.Y., Cochrane, G.: The 2011 nucleic acids research database issue and the online molecular biology database collection. Nucleic Acids Res. 39(Database-Issue), 1–6 (2011)
Article Google Scholar
Gravano, L., Ipeirotis, P.G., Jagadish, H.V., Koudas, N., Muthukrishnan, S., Srivastava, D.: Approximate string joins in a database (almost) for free. Proceedings of the International Conference on very Large Data Bases (VLDB), pp. 491–500 (2001)
Google Scholar
Hassanzadeh, O.: Benchmarking declarative approximate selection predicates. Master’s Thesis, University of Toronto, Toronto, Ontario, Canada (2007)
Google Scholar
Hassanzadeh, O., Kementsietsidis, A., Lim, L., Miller, R.J., Wang, M.: A framework for semantic link discovery over relational data. Proceedings of the Conference on Information and Knowledge Management (CIKM), pp. 1027–1036 (2009). URL http://dx.doi.org/10.1145/1645953.1646084
Hassanzadeh, O., Kementsietsidis, A., Lim, L., Miller, R.J., Wang, M.: LinkedCT: a linked data space for clinical trials. CoRR abs/0908.0567(2009)
Google Scholar
Hernández, M.A., Stolfo, S.J.: The merge/purge problem for large databases. ACM SIGMOD international conference on the management of data, pp. 127–138 (1995)
Google Scholar
Indyk, P., Motwani, R., Raghavan, P., Vempala, S.: Locality-preserving hashing in multidimensional spaces. ACM Symposym on Theory of Computing (STOC), pp. 618–625 (1997)
Google Scholar
Kementsietsidis, A., Lim, L., Wang, M.: Supporting ontology-based keyword search over medical databases. Proceedings of the AMIA 2008 Symposium, pp. 409–13. American Medical Informatics Association (2008)
Google Scholar
Miller, G.A.: WordNet: a lexical database for English. Commun. ACM 38(11), 39–41 (1995)
Article Google Scholar
Naumann, F., Herschel, M.: An introduction to duplicate detection. Synthesis Lectures on Data Management. Morgan and Claypool Publishers, Seattle, WA USA (2010)
Google Scholar
Sioutos, N., de Coronado, S., Haber, M.W., Hartel, F.W., Shaiu, W., Wright, L.W.: NCI thesaurus: a semantic model integrating cancer-related clinical and molecular information. J. Biomed. Inform. 40(1), 30–43 (2007)
Article Google Scholar
Suchanek, F.M., Kasneci, G., Weikum, G.: YAGO: a large ontology from Wikipedia and WordNet. J. Web Semant. 6(3), 203–217 (2008)
Article Google Scholar
Volz, J., Bizer, C., Gaedke, M., Kobilarov, G.: Discovering and maintaining links on the web of data. Proceedings of the International Semantic Web Conference (ISWC), pp. 650–665 (2009)
Google Scholar
Yeganeh, S.H., Hassanzadeh, O., Miller, R.J.: Linking semistructured data on the web. Proceedings of the International Workshop on the Web and Databases (WebDB) (2011)
Google Scholar
ClinicalTrials.gov, A Service of the US National Institutes of Health – http://clinicaltrials.gov/(2011). Accessed 28 July 2011
State of the LOD Cloud. Version 0.2. http://www4.wiwiss.fu-berlin.de/lodcloud/state/(2011). Accessed 28 July 2011
The LinQuer Project - http://purl.org/linquer(Accessed 28 July 2011).

Download references

Acknowledgements

This work has been partially supported by the NSERC Business Intelligence Network. Hassanzadeh has been supported by an IBM Graduate Fellowship. We thank Reynold S. Xin for implementation of the LinQuer API and Web interface, and improving the overall design of the system and the LinQL grammar.

Author information

Authors and Affiliations

University of Toronto, Toronto, Ontario, M5S 3G4, Canada
Oktie Hassanzadeh & Renée J. Miller
IBM T.J. Watson Research Center, 19 Skyline Drive, Hawthorne, NY, 10532, USA
Anastasios Kementsietsidis
University of Hawaii at Manoa, 1680 East West Road, POST 303E, Honolulu, HI, 96822, USA
Lipyeow Lim
HP Labs China, Beijing, China
Min Wang

Authors

Oktie Hassanzadeh
View author publications
You can also search for this author in PubMed Google Scholar
Renée J. Miller
View author publications
You can also search for this author in PubMed Google Scholar
Anastasios Kementsietsidis
View author publications
You can also search for this author in PubMed Google Scholar
Lipyeow Lim
View author publications
You can also search for this author in PubMed Google Scholar
Min Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Oktie Hassanzadeh .

Editor information

Editors and Affiliations

, Dipartimento di Informatica, Università degli Studi Roma Tre, Via della Vasca Navale 79, Roma, 00146, Italy
Roberto De Virgilio
e Reggio Emilia, Dipartimento di Economia Aziendale, Università degli Studi di Modena, Via le Berengario, 51, Modena, 41100, Italy
Francesco Guerra
Università degli Studi di Trento, Via Sommarive 14, Trento, 38123, Italy
Yannis Velegrakis

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Hassanzadeh, O., Miller, R.J., Kementsietsidis, A., Lim, L., Wang, M. (2012). Semantic Link Discovery over Relational Data. In: De Virgilio, R., Guerra, F., Velegrakis, Y. (eds) Semantic Search over the Web. Data-Centric Systems and Applications. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-25008-8_8

Download citation

DOI: https://doi.org/10.1007/978-3-642-25008-8_8
Published: 28 January 2012
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-25007-1
Online ISBN: 978-3-642-25008-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Semantic Link Discovery over Relational Data

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

KartoGraphI: Drawing a Map of Linked Data

Materialization of OWL Ontologies from Relational Databases: A Practical Approach

LIMES: A Framework for Link Discovery on the Semantic Web

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Semantic Link Discovery over Relational Data

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

KartoGraphI: Drawing a Map of Linked Data

Materialization of OWL Ontologies from Relational Databases: A Practical Approach

LIMES: A Framework for Link Discovery on the Semantic Web

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation