Abstract
In this paper we present a data model for uncertain data, where uncertainty is represented using interval probabilities. The theory introduced in the paper can be applied to different specific data models, because the entire approach has been developed independently of the kind of manipulated objects, like XML documents, relational tuples, or other data types. As a consequence, our theory can be used to extend existing data models with the management of uncertainty. In particular, the data model we obtain as an application to XML data is the first proposal that combines XML, interval probabilities and a powerful query algebra with selection, projection, and cross product. The cross product operator is not based on assumptions of independence between XML trees from different collections. Being defined with a possible worlds semantics, our operators are proper extensions of their traditional counterparts, and reduce to them when there is no uncertainty. The main practical result of the paper is a set of equivalences that can be used to compare or rewrite algebraic queries on interval probabilistic data, in particular XML and relational.
Similar content being viewed by others
References
Al-Khalifa, S., Yu, C., Jagadish, H.V.: Querying structured text in an XML database. In: SIGMOD Conference (2003)
Barbara D., Garcia-Molina H. and Porter D. (1992). The management of probabilistic data. IEEE Trans. Knowl. Data Eng. 4(5): 487–501
Bonissone P.P. and Tong R.M. (1985). Editorial: Reasoning with uncertainty in expert systems. Int. J. Man Mach. Stud. 22(3): 241–250
Boulos, J., Dalvi, N., Mandhani, B., Mathur, S., Re, C., Suciu, D.: Mystiq: a system for finding more answers by using probabilities. In: SIGMOD ’05: Proceedings of the 2005 ACM SIGMOD international conference on Management of data, pp. 891–893. ACM Press, New York (2005). http://doi.acm.org/10.1145/1066157.1066277
Codd E.F. (1979). Extending the database relational model to capture more meaning. ACM Trans. Database Syst. 4(4): 397–434 http://doi.acm.org/10.1145/320107.320109
Dalvi, N.N., Suciu, D.: Efficient query evaluation on probabilistic databases. In: VLDB Conference (2004)
Dekhtyar, A., Goldsmith, J., Hawkes, S.R.: Semistructured probalistic databases. In: Statistical and Scientific Database Management (2001)
Demolombe R. (1997). Uncertainty in intelligent databases. In: Motro, A. and Thanos, C. (eds) Uncertainty Management in Information Systems, pp. Kluwer, Dordrecht
Dey D. and Sarkar S. (1996). A probabilistic relational model and algebra. ACM Trans. Database Syst. 21(3): 339–369
Eiter T., Lu J.J., Lukasiewicz T. and Subrahmanian V.S. (2001). Probabilistic object bases. ACM Trans. Database Syst. 26(3): 264–312 http://doi.acm.org/10.1145/502030.502031
Fuhr N. and Rölleke T. (1997). A probabilistic relational algebra for the integration of information retrieval and database systems. ACM Trans. Inf. Syst. 15(1): 32–66
Hung, E., Getoor, L., Subrahmanian, V.: Probabilistic interval XML. In: ICDT. Siena (2003)
Hung, E., Getoor, L., Subrahmanian, V.: PXML: A probabilistic semistructured data model and algebra. In: ICDE. Bangalore (2003)
Hunter, A., Liu, W.: Merging uncertain information with semantic heterogeneity in XML. Knowl. Inf. Syst. (2005) (accepted for publication)
Jagadish, H., Lakshmanan, L., Srivastava, D., Thompson, K.: TAX: A tree algebra for XML. In: DBPL Workshop (2001)
Lakshmanan L.V.S., Leone N., Ross R. and Subrahmanian V.S. (1997). ProbView: a flexible probabilistic database system. ACM Trans. Database Syst. 22(3): 419–469
Lee, S.K.: An extended relational database model for uncertain and imprecise information. In: Yuan, L.Y. (ed.) VLDB Conference (1992)
Magnani, M., Montesi, D.: A unified approach to structured and XML data modeling and manipulation. Data Knowl. Eng. 59(1) (2006)
Magnani, M., Rizopoulos, N., McBrien, P., Montesi, D.: Schema integration based on uncertain semantic mappings. In: International Conference of Conceptual Modeling, LNCS 3716 (2005)
Motro A. (1995). Imprecision and uncertainty in database systems. In: Bosc, P. and Kacprzyk, J. (eds) Fuzziness in Database Management Systems, pp 3–22. Physica-Verlag, New York
Nierman, A., Jagadish, H.V.: ProTDB: Probabilistic data in XML. In: VLDB Conference (2002)
Pal N.R. (1999). On quantification of different facets of uncertainty. Fuzzy Sets Syst. 107: 81–91
Pittarelli M. (1994). An algebra for probabilistic databases. IEEE Trans. Knowl. Data Eng. 6(2): 293–303
Shafer G. (1976). A mathematical theory of evidence. Princeton University Press, New Jersey
Smets P. (1997). Imperfect information: Imprecision - uncertainty. In: Motro, A. and Smets, Ph. (eds) Uncertainty Management in Information Systems. From Needs to Solutions, pp 225–254. Kluwer, Dordrecht
Smithson M.J. (1989). Ignorance and Uncertainty: Emerging Paradigms. Springer, New York
Widom, J.: Trio: A system for integrated management of data, accuracy, and lineage. In: CIDR, pp. 262–276 (2005)
Witold Lipski J. (1979). On semantic issues connected with incomplete information databases. ACM Trans. Database Syst. 4(3): 262–296 http://doi.acm.org/10.1145/320083.320088
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Magnani, M., Montesi, D. Management of interval probabilistic data. Acta Informatica 45, 93–130 (2008). https://doi.org/10.1007/s00236-007-0065-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00236-007-0065-9