Abstract
Data integration remains a perenially difficult task. The need to access, integrate and make sense of large amounts of data has, in fact, accentuated in recent years. There are now many publicly available sources of data that can provide valuable information in various domains. Concrete examples of public data sources include: bibliographic repositories (DBLP, Cora, Citeseer), online movie databases (IMDB), knowledge bases (Wikipedia, DBpedia, Freebase), social media data (Facebook and Twitter, blogs). Additionally, a number of more specialized public data repositories are starting to play an increasingly important role. These repositories include, for example, the U.S. federal government data, congress and census data, as well as financial reports archived by the U.S. Securities and Exchange Commission (SEC).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Alexe, B., ten Cate, B., Kolaitis, P.G., Tan, W.C.: Designing and Refining Schema Mappings via Data Examples. In: SIGMOD, pp. 133–144 (2011)
Arasu, A., Ré, C., Suciu, D.: Large-Scale Deduplication with Constraints Using Dedupalog. In: ICDE, pp. 952–963 (2009)
Balakrishnan, S., Chu, V., Hernández, M.A., Ho, H., Krishnamurthy, R., Liu, S., Pieper, J., Pierce, J.S., Popa, L., Robson, C., Shi, L., Stanoi, I.R., Ting, E.L., Vaithyanathan, S., Yang, H.: Midas: Integrating Public Financial Data. In: SIGMOD, pp. 1187–1190 (2010)
Beyer, K., Ercegovac, V., Gemulla, R., Balmin, A., Eltabakh, M., Kanne, C.C., Ozcan, F., Shekita, E.: Jaql: A Scripting Language for Large Scale Semistructured Data Analysis. In: VLDB (2011)
Bhattacharya, I., Getoor, L.: Collective entity resolution in relational data. TKDD 1(1) (2007)
Bleiholder, J., Naumann, F.: Data Fusion. ACM Comput. Surv. 41(1) (2008)
Burdick, D., Hernández, M.A., Ho, H., Koutrika, G., Krishnamurthy, R., Popa, L., Stanoi, I.R., Vaithyanathan, S., Das, S.: Extracting, Linking and Integrating Data from Public Sources: A Financial Case Study. IEEE Data Eng. Bull. 34(3), 60–67 (2011)
Chiticariu, L., Krishnamurthy, R., Li, Y., Raghavan, S., Reiss, F., Vaithyanathan., S.: SystemT: An Algebraic Approach to Declarative Information Extraction. In: ACL, pp. 128–137 (2010)
Chiticariu, L., Kolaitis, P.G., Popa, L.: Interactive Generation of Integrated Schemas. In: SIGMOD Conference, pp. 833–846 (2008)
Dalvi, N.N., Kumar, R., Pang, B., Ramakrishnan, R., Tomkins, A., Bohannon, P., Keerthi, S., Merugu, S.: A Web of Concepts. In: PODS, pp. 1–12 (2009)
Doan, A., Naughton, J.F., Ramakrishnan, R., Baid, A., Chai, X., Chen, F., Chen, T., Chu, E., DeRose, P., Gao, B.J., Gokhale, C., Huang, J., Shen, W., Vuong, B.Q.: Information Extraction Challenges in Managing Unstructured Data. SIGMOD Record 37(4), 14–20 (2008)
Dong, X., Halevy, A.Y., Madhavan, J.: Reference Reconciliation in Complex Information Spaces. In: SIGMOD Conference, pp. 85–96 (2005)
Elmagarmid, A.K., Ipeirotis, P.G., Verykios, V.S.: Duplicate Record Detection: A Survey. IEEE TKDE 19(1), 1–16 (2007)
Fagin, R., Haas, L.M., Hernández, M., Miller, R.J., Popa, L., Velegrakis, Y.: Clio: Schema Mapping Creation and Data Exchange. In: Borgida, A.T., Chaudhri, V.K., Giorgini, P., Yu, E.S. (eds.) Conceptual Modeling: Foundations and Applications. LNCS, vol. 5600, pp. 198–236. Springer, Heidelberg (2009)
Fagin, R., Kolaitis, P.G., Miller, R.J., Popa, L.: Data Exchange: Semantics and Query Answering. TCS 336(1), 89–124 (2005)
Fan, W., Li, J., Ma, S., Tang, N., Yu, W.: Interaction between Record Matching and Data Repairing. In: SIGMOD Conference, pp. 469–480 (2011)
Fellegi, I.P., Sunter, A.B.: A Theory for Record Linkage. J. Am. Statistical Assoc. 64(328), 1183–1210 (1969)
Fletcher, G.H.L., Gyssens, M., Paredaens, J., Gucht, D.V.: On the Expressive Power of the Relational Algebra on Finite Sets of Relation Pairs. IEEE TKDE 21(6), 939–942 (2009)
Galhardas, H., Florescu, D., Shasha, D., Simon, E., Saita, C.A.: Declarative Data Cleaning: Language, Model, and Algorithms. In: VLDB, pp. 371–380 (2001)
Gottlob, G., Koch, C., Baumgartner, R., Herzog, M., Flesca, S.: The Lixto Data Extraction Project - Back and Forth between Theory and Practice. In: PODS, pp. 1–12 (2004)
Gottlob, G., Senellart, P.: Schema Mapping Discovery from Data Instances. Journal of the Association for Computing Machinery (JACM) 57(2) (2010)
Hernández, M.A., Koutrika, G., Krishnamurthy, R., Popa, L., Wisnesky, R.: HIL: A High-Level Scripting Language for Entity Integration. In: EDBT, pp. 549–560 (2013)
Hernández, M.A., Stolfo, S.J.: The Merge/Purge Problem for Large Databases. In: SIGMOD Conference, pp. 127–138 (1995)
Ohori, A.: A Polymorphic Record Calculus and Its Compilation. ACM Trans. Program. Lang. Syst. 17(6), 844–895 (1995)
Ohori, A., Buneman, P.: Type Inference in a Database Programming Language. In: LISP and Functional Programming, pp. 174–183 (1988)
Rahm, E., Thor, A., Aumueller, D., Do, H.H., Golovin, N., Kirsten, T.: iFuice - Information Fusion utilizing Instance Correspondences and Peer Mappings. In: WebDB, pp. 7–12 (2005)
Sarma, A.D., Parameswaran, A.G., Garcia-Molina, H., Widom, J.: Synthesizing View Definitions from Data. In: ICDT, pp. 89–103 (2010)
Wand, M.: Complete Type Inference for Simple Objects. In: LICS, pp. 37–44 (1987)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Alexe, B. et al. (2013). High-Level Rules for Integration and Analysis of Data: New Challenges. In: Tannen, V., Wong, L., Libkin, L., Fan, W., Tan, WC., Fourman, M. (eds) In Search of Elegance in the Theory and Practice of Computation. Lecture Notes in Computer Science, vol 8000. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-41660-6_3
Download citation
DOI: https://doi.org/10.1007/978-3-642-41660-6_3
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-41659-0
Online ISBN: 978-3-642-41660-6
eBook Packages: Computer ScienceComputer Science (R0)