Abstract
Today, probabilistic databases (PDB) become helpful in several application areas. In the context of cleaning a single PDB or integrating multiple PDBs, duplicate tuples need to be merged. A basic approach for merging probabilistic tuples is simply to build the union of their sets of possible instances. In a merging process, however, often additional domain knowledge or user expertise is available. For that reason, in this paper we extend the basic approach with aggregation functions, knowledge rules, and instance weights for incorporating external knowledge in the merging process.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Andritsos, P., Fuxman, A., Miller, R.J.: Clean Answers over Dirty Databases: A Probabilistic Approach. In: ICDE, p. 30–41 (2006)
Benjelloun, O., Sarma, A.D., Halevy, A., Widom, J.: ULDBs: Databases with Uncertainty and Lineage. In: VLDB, pp. 953–964 (2006)
Bleiholder, J., Naumann, F.: Data fusion. ACM Comput. Surv. 41(1) (2008)
Dalvi, N.N., Suciu, D.: Efficient query evaluation on probabilistic databases. VLDB J. 16(4), 523–544 (2007)
Dayal, U.: Processing Queries Over Generalization Hierarchies in a Multidatabase System. In: VLDB, pp. 342–353 (1983)
DeMichiel, L.G.: Resolving Database Incompatibility: An Approach to Performing Relational Operations over Mismatched Domains. IEEE Trans. Knowl. Data Eng. 1(4), 485–493 (1989)
Elmagarmid, A.K., Ipeirotis, P.G., Verykios, V.S.: Duplicate Record Detection: A Survey. IEEE Trans. Knowl. Data Eng. 19(1), 1–16 (2007)
Khoussainova, N., Balazinska, M., Suciu, D.: Probabilistic event extraction from rfid data. In: ICDE, pp. 1480–1482 (2008)
Koch, C.: MayBMS: A System for Managing Large Uncertain and Probabilistic Databases. In: Managing and Mining Uncertain Data. Springer, Heidelberg (2009)
Lim, E.-P., Srivastava, J., Shekhar, S.: An Evidential Reasoning Approach to Attribute Value Conflict Resolution in Database Integration. IEEE Trans. Knowl. Data Eng. 8(5), 707–723 (1996)
Motro, A., Anokhin, P.: Fusionplex: Resolution of Data Inconsistencies in the Integration of Heterogeneous Information Sources. Information Fusion 7(2), 176–196 (2006)
Panse, F., Ritter, N.: Tuple Merging in Probabilistic Databases. In: MUD, pp. 113–127 (2010)
Panse, F., van Keulen, M., de Keijzer, A., Ritter, N.: Duplicate Detection in Probabilistic Data. In: NTII, pp. 179–182 (2010)
Robertson, E., Wyss, C.M.: Optimal Tuple Merge is NP-Complete. Technical Report TR599, IUCS (2004)
Suciu, D., Connolly, A., Howe, B.: Embracing Uncertainty in Large-Scale Computational Astrophysics. In: MUD, pp. 63–77 (2009)
Tseng, F.S.-C., Chen, A.L.P., Yang, W.-P.: Answering Heterogeneous Database Queries with Degrees of Uncertainty. Distributed and Parallel Databases 1(3), 281–302 (1993)
van Keulen, M., de Keijzer, A.: Qualitative effects of knowledge rules and user feedback in probabilistic data integration. VLDB J. 18(5), 1191–1217 (2009)
Wang, D.Z., Michelakis, E., Franklin, M.J., Garofalakis, M., Hellerstein, J.M.: Probabilistic declarative information extraction. In: ICDE, pp. 173–176 (2010)
Whang, S.E., Benjelloun, O., Garcia-Molina, H.: Generic entity resolution with negative rules. VLDB J. 18(6), 1261–1277 (2009)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Panse, F., Ritter, N. (2011). Incorporating Domain Knowledge and User Expertise in Probabilistic Tuple Merging. In: Benferhat, S., Grant, J. (eds) Scalable Uncertainty Management. SUM 2011. Lecture Notes in Computer Science(), vol 6929. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23963-2_23
Download citation
DOI: https://doi.org/10.1007/978-3-642-23963-2_23
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-23962-5
Online ISBN: 978-3-642-23963-2
eBook Packages: Computer ScienceComputer Science (R0)