Abstract
An architecture is proposed providing robust data acquisition facilities from input documents containing tabular data. This architecture is based on a data-repairing framework exploiting integrity constraints defined on the input data to support the detection and the repair of inconsistencies in the data arising from errors occurring in the acquisition phase. In particular, a specific but expressive form of integrity constraints (steady aggregate constraints) is defined which enables the computation of a repair to be expressed as a mixed integer linear programming problem.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Agarwal, S., Keller, A.M., Wiederhold, G., Saraswat, K.: Flexible Relation: An Approach for Integrating Data from Multiple, Possibly Inconsistent Databases. In: Proc. International Conference on Data Engineering (ICDE), pp. 495–504 (1995)
Arenas, M., Bertossi, L.E., Chomicki, J.: Consistent Query Answers in Inconsistent Databases. In: Proc. Symposium on Principles of Database Systems (PODS), pp. 68–79 (1999)
Arenas, M., Bertossi, L.E., Chomicki, J.: Specifying and Querying Database Repairs using Logic Programs with Exceptions. In: Proc. International Conference on Flexible Query Answering Systems (FQAS), pp. 27–41 (2000)
Arenas, M., Bertossi, L.E., Chomicki, J., He, X., Raghavan, V., Spinrad, J.: Scalar aggregation in inconsistent databases. Theoretical Computer Science 3(296), 405–434 (2003)
Baumgartner, R., Flesca, S., Gottlob, G.: Visual Web Information Extraction with Lixto. In: Proc. International Conference on Very Large Data Bases (VLDB), pp. 119–128 (2001)
Bertossi, L., Bravo, L., Franconi, E., Lopatenko, A.: Complexity and Approximation of Fixing Numerical Attributes in Databases Under Integrity Constraints. In: Proc. International Symposium on Database Programming Languages (DBPL), pp. 262–278 (2005)
Bohannon, P., Flaster, M., Fan, W., Rastogi, R.: A Cost-Based Model and Effective Heuristic for Repairing Constraints by Value Modification. In: Proc. ACM SIGMOD International Conference on Management of Data (SIGMOD), pp. 143–154 (2005)
Bry, F.: Query Answering in Information Systems with Integrity Constraints. In: IFIP WG 11.5 Working Conference on Integrity and Control in Information Systems, pp. 113–130 (1997)
Chomicki, J., Marcinkowski, J., Staworko, S.: Computing consistent query answers using conflict hypergraphs. In: Proc. International Conference on Information and Knowledge Management (CIKM), pp. 417–426 (2004)
Chomicki, J., Marcinkowski, J., Staworko, S.: Hippo: A System for Computing Consistent Answers to a Class of SQL Queries. In: Bertino, E., Christodoulakis, S., Plexousakis, D., Christophides, V., Koubarakis, M., Böhm, K., Ferrari, E. (eds.) EDBT 2004. LNCS, vol. 2992, pp. 841–844. Springer, Heidelberg (2004)
Chomicki, J., Marcinkowski, J.: Minimal-Change Integrity Maintenance Using Tuple Deletions. Information and Computation (IC) 197(1-2), 90–121 (2005)
Cohen, W.W., Hurst, M., Jensen, L.S.: A flexible learning system for wrapping tables and lists in HTML documents. In: Proc. International World Wide Web Conference (WWW), pp. 232–241 (2002)
Crescenzi, V., Mecca, G., Merialdo, P.: RoadRunner: Towards Automatic Data Extraction from Large Web Sites. In: Proc. International Conference on Very Large Data Bases (VLDB), pp. 109–118 (2001)
Embley, D.W., Tao, C., Liddle, S.W.: Automating the extraction of data from HTML tables with unknown structure. Data & Knowledge Engineering 54(1), 3–28 (2005)
Fazzinga, B., Flesca, S., Tagarelli, A.: Learning Robust Web Wrappers. In: Andersen, K.V., Debenham, J., Wagner, R. (eds.) DEXA 2005. LNCS, vol. 3588, pp. 736–745. Springer, Heidelberg (2005)
Flesca, S., Furfaro, F., Parisi, F.: Consistent Query Answer on Numerical Databases under Aggregate Constraint. In: Proc. International Symposium on Database Programming Languages (DBPL), pp. 279–294 (2005)
Flesca, S., Tagarelli, A.: Schema-Based Web Wrapping. In: Atzeni, P., Chu, W., Lu, H., Zhou, S., Ling, T.-W. (eds.) ER 2004. LNCS, vol. 3288, pp. 286–299. Springer, Heidelberg (2004)
Gass, S.I.: Linear Programming Methods and Applications. McGraw Hill, New York (1985)
Greco, G., Greco, S., Zumpano, E.: A Logical Framework for Querying and Repairing Inconsistent Databases. IEEE Transactions on Knowledge and Data Engineering (TKDE) 15(6), 1389–1408 (2003)
Laender, A.H.F., Ribeiro-Neto, B.A., da Silva, A.S.: DEByE - Data Extraction By Example. Data & Knowledge Engineering 40(2), 121–154 (2002)
Liu, L., Pu, C., Han, W.: XWRAP: An XML-Enabled Wrapper Construction System for Web Information Sources. In: Proc. International Conference on Data Engineering (ICDE), pp. 611–621 (2000)
Papadimitriou, C.H.: On the complexity of integer programming. Journal of the Association for Computing Machinery (JACM) 28(4), 765–768 (1981)
Papadimitriou, C.H.: Computational Complexity. Addison-Wesley, Reading (1994)
Wijsen, J.: Condensed representation of database repairs for consistent query answering. In: Calvanese, D., Lenzerini, M., Motwani, R. (eds.) ICDT 2003. LNCS, vol. 2572, pp. 375–390. Springer, Heidelberg (2002)
Wijsen, J.: Making More Out of an Inconsistent Database. In: Benczúr, A.A., Demetrovics, J., Gottlob, G. (eds.) ADBIS 2004. LNCS, vol. 3255, pp. 291–305. Springer, Heidelberg (2004)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Fazzinga, B., Flesca, S., Furfaro, F., Parisi, F. (2006). DART: A Data Acquisition and Repairing Tool. In: Grust, T., et al. Current Trends in Database Technology – EDBT 2006. EDBT 2006. Lecture Notes in Computer Science, vol 4254. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11896548_25
Download citation
DOI: https://doi.org/10.1007/11896548_25
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-46788-5
Online ISBN: 978-3-540-46790-8
eBook Packages: Computer ScienceComputer Science (R0)