Logic, Languages, and Rules for Web Data Extraction and Reasoning over Data | SpringerLink
Skip to main content

Logic, Languages, and Rules for Web Data Extraction and Reasoning over Data

  • Conference paper
  • First Online:
Language and Automata Theory and Applications (LATA 2017)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 10168))

  • 1025 Accesses

Abstract

This paper gives a short overview of specific logical approaches to data extraction, data management, and reasoning about data. In particular, we survey theoretical results and formalisms that have been obtained and used in the context of the Lixto Project at TU Wien, the DIADEM project at the University of Oxford, and the VADA project, which is currently being carried out jointly by the universities of Edinburgh, Manchester, and Oxford. We start with a formal approach to web data extraction rooted in monadic second order logic and monadic Datalog, which gave rise to the Lixto data extraction system. We then present some complexity results for monadic Datalog over trees and for XPath query evaluation. We further argue that for value creation and for ontological reasoning over data, we need existential quantifiers (or Skolem terms) in rule heads, and introduce the Datalog\(^\pm \) family. We give an overview of important members of this family and discuss related complexity issues.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
JPY 3498
Price includes VAT (Japan)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
JPY 5719
Price includes VAT (Japan)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
JPY 7149
Price includes VAT (Japan)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    https://www.w3.org/TR/1998/REC-xml-19980210.

  2. 2.

    http://www.w3c.org/TR/xpath/.

  3. 3.

    https://www.w3.org/XML/Query/.

  4. 4.

    http://www.w3.org/TR/xslt.

  5. 5.

    In this simple model, unrestricted sets of tags as well as string and attribute values are assumed to be encoded as lists of character symbols modeled as subtrees in our document tree.

  6. 6.

    Note that our tree structures contain some redundancy (e.g., a leaf is a node x such that \(\lnot (\exists y) \text{ firstchild }(x, y)\)), by which (monadic) Datalog becomes as expressive as its semipositive generalization. Semipositive Datalog allows to use the complements of extensional relations in rule bodies.

  7. 7.

    For a TGD of the form \(b \rightarrow h\), b is called the body, while h is called the head.

  8. 8.

    Here, the data complexity is calculated by fixing the set of TGDs and the query, while in the combined complexity we assume that everything is part of the input.

  9. 9.

    http://www.w3.org/TR/rdf-schema.

  10. 10.

    http://www.w3.org/TR/owl-features/.

References

  1. Abiteboul, S., Hull, R., Vianu, V.: Foundations of Databases. Addison-Wesley, Boston (1995)

    MATH  Google Scholar 

  2. Arenas, M., Gottlob, G., Pieris, A.: Expressive languages for querying the semantic web. In: PODS, pp. 14–26 (2014)

    Google Scholar 

  3. Baader, F., Calvanese, D., McGuinness, D.L., Nardi, D., Patel-Schneider, P.F.: The Description Logic Handbook: Theory, Implementation, and Applications. Cambridge University Press, Cambridge (2003)

    MATH  Google Scholar 

  4. Baumgartner, R., Flesca, S., Gottlob, G.: Declarative information extraction, web crawling, and recursive wrapping with Lixto. In: Eiter, T., Faber, W., Truszczyński, M. (eds.) LPNMR 2001. LNCS (LNAI), vol. 2173, pp. 21–41. Springer, Heidelberg (2001). doi:10.1007/3-540-45402-0_2

    Chapter  Google Scholar 

  5. Baumgartner, R., Flesca, S., Gottlob, G.: The elog web extraction language. In: Nieuwenhuis, R., Voronkov, A. (eds.) LPAR 2001. LNCS (LNAI), vol. 2250, pp. 548–560. Springer, Heidelberg (2001). doi:10.1007/3-540-45653-8_38

    Chapter  Google Scholar 

  6. Baumgartner, R., Flesca, S., Gottlob, G.: Visual web information extraction with lixto. In: VLDB, pp. 119–128 (2001)

    Google Scholar 

  7. Beeri, C., Vardi, M.Y.: The implication problem for data dependencies. In: Even, S., Kariv, O. (eds.) ICALP 1981. LNCS, vol. 115, pp. 73–85. Springer, Heidelberg (1981). doi:10.1007/3-540-10843-2_7

    Chapter  Google Scholar 

  8. Calì, A., Gottlob, G., Kifer, M.: Taming the infinite chase: query answering under expressive relational constraints. J. Artif. Intell. Res. 48, 115–174 (2013)

    MathSciNet  MATH  Google Scholar 

  9. Calì, A., Gottlob, G., Lukasiewicz, T.: A general datalog-based framework for tractable query answering over ontologies. J. Web Sem. 14, 57–83 (2012)

    Article  Google Scholar 

  10. Cali, A., Gottlob, G., Lukasiewicz, T., Marnette, B., Pieris, A.: Datalog+/-: a family of logical knowledge representation and query languages for new applications. In: LICS, pp. 228–242 (2010)

    Google Scholar 

  11. Calì, A., Gottlob, G., Pieris, A.: Ontological query answering under expressive entity-relationship schemata. Inf. Syst. 37(4), 320–335 (2012)

    Article  Google Scholar 

  12. Calì, A., Gottlob, G., Pieris, A.: Towards more expressive ontology languages: the query answering problem. Artif. Intell. 193, 87–128 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  13. Calì, A., Kifer, M.: Containment of conjunctive object meta-queries. In: VLDB, pp. 942–952 (2006)

    Google Scholar 

  14. Cosmadakis, S.S., Gaifman, H., Kanellakis, P.C., Vardi, M.Y.: Decidable optimization problems for database logic programs (preliminary report). In: STOC, pp. 477–490 (1988)

    Google Scholar 

  15. Courcelle, B.: Graph rewriting: an algebraic and logic approach. In: van Leeuwen, J. (ed.) Handbook of Theoretical Computer Science, vol. 2, chap. 5, pp. 193–242. Elsevier Science Publishers B.V. (1990)

    Google Scholar 

  16. Dantsin, E., Eiter, T., Gottlob, G., Voronkov, A.: Complexity and expressive power of logic programming. ACM Comput. Surv. 33(3), 374–425 (2001)

    Article  Google Scholar 

  17. Deutsch, A., Tannen, V.: Reformulation of XML queries and constraints. In: Calvanese, D., Lenzerini, M., Motwani, R. (eds.) ICDT 2003. LNCS, vol. 2572, pp. 225–241. Springer, Heidelberg (2003). doi:10.1007/3-540-36285-1_15

    Chapter  Google Scholar 

  18. Doner, J.: Tree acceptors and some of their applications. J. Comput. Syst. Sci. 4(5), 406–451 (1970)

    Article  MathSciNet  MATH  Google Scholar 

  19. Fagin, R., Kolaitis, P.G., Miller, R.J., Popa, L.: Data exchange: semantics and query answering. Theor. Comput. Sci. 336(1), 89–124 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  20. Flum, J., Frick, M., Grohe, M.: Query evaluation via tree-decompositions. In: Bussche, J., Vianu, V. (eds.) ICDT 2001. LNCS, vol. 1973, pp. 22–38. Springer, Heidelberg (2001). doi:10.1007/3-540-44503-X_2

    Chapter  Google Scholar 

  21. Frick, M., Grohe, M., Koch, C.: Query evaluation on compressed trees. In: LICS, pp. 22–25 (2003)

    Google Scholar 

  22. Furche, T., Gottlob, G., Grasso, G., Guo, X., Orsi, G., Schallhart, C., Wang, C.: DIADEM: thousands of websites to a single database. PVLDB 7(14), 1845–1856 (2014)

    Google Scholar 

  23. Furche, T., Gottlob, G., Libkin, L., Orsi, G., Paton, N.W.: Data wrangling for big data: challenges and opportunities. In: EDBT, pp. 473–478 (2016)

    Google Scholar 

  24. Furche, T., Linse, B., Bry, F., Plexousakis, D., Gottlob, G.: RDF querying: language constructs and evaluation methods compared. In: Barahona, P., Bry, F., Franconi, E., Henze, N., Sattler, U. (eds.) Reasoning Web 2006. LNCS, vol. 4126, pp. 1–52. Springer, Heidelberg (2006). doi:10.1007/11837787_1

    Chapter  Google Scholar 

  25. Gottlob, G., Kikot, S., Kontchakov, R., Podolskii, V.V., Schwentick, T., Zakharyaschev, M.: The price of query rewriting in ontology-based data access. Artif. Intell. 213, 42–59 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  26. Gottlob, G., Koch, C.: Monadic queries over tree-structured data. In: LICS, pp. 189–202 (2002)

    Google Scholar 

  27. Gottlob, G., Koch, C.: Monadic datalog and the expressive power of languages for web information extraction. J. ACM 51(1), 74–113 (2004)

    Article  MathSciNet  MATH  Google Scholar 

  28. Gottlob, G., Koch, C.: A formal comparison of visual web wrapper generators. In: Wiedermann, J., Tel, G., Pokorný, J., Bieliková, M., Štuller, J. (eds.) SOFSEM 2006. LNCS, vol. 3831, pp. 30–48. Springer, Heidelberg (2006). doi:10.1007/11611257_3

    Chapter  Google Scholar 

  29. Gottlob, G., Koch, C., Baumgartner, R., Herzog, M., Flesca, S.: The Lixto data extraction project: back and forth between theory and practice. In: PODS, pp. 1–12 (2004)

    Google Scholar 

  30. Gottlob, G., Koch, C., Pichler, R.: Efficient algorithms for processing XPath queries. In: VLDB, pp. 95–106 (2002)

    Google Scholar 

  31. Gottlob, G., Koch, C., Pichler, R.: The complexity of XPath query evaluation. In: PODS, pp. 179–190 (2003)

    Google Scholar 

  32. Gottlob, G., Koch, C., Schulz, K.U.: Conjunctive queries over trees. In: PODS, pp. 189–200 (2004)

    Google Scholar 

  33. Gottlob, G., Manna, M., Pieris, A.: Polynomial rewritings for linear existential rules. In: IJCAI, pp. 2992–2998 (2015)

    Google Scholar 

  34. Gottlob, G., Orsi, G., Pieris, A.: Query rewriting and optimization for ontological databases. ACM Trans. Database Syst. 39(3), 25:1–25:46 (2014)

    Article  MathSciNet  Google Scholar 

  35. Gottlob, G., Orsi, G., Pieris, A.: Consistency checking of re-engineered UML class diagrams via Datalog+/-. In: RuleML, pp. 35–53 (2015)

    Google Scholar 

  36. Gottlob, G., Pieris, A.: Beyond SPARQL under OWL 2 QL entailment regime: rules to the rescue. In: IJCAI, pp. 2999–3007 (2015)

    Google Scholar 

  37. Gottlob, G., Rudolph, S., Simkus, M.: Expressiveness of guarded existential rule languages. In: PODS, pp. 27–38 (2014)

    Google Scholar 

  38. Gottlob, G., Schwentick, T.: Rewriting ontological queries into small nonrecursive datalog programs. In: KR (2012)

    Google Scholar 

  39. Grau, B.C., Horrocks, I., Krötzsch, M., Kupke, C., Magka, D., Motik, B., Wang, Z.: Acyclicity conditions and their application to query answering in description logics. In: KR (2012)

    Google Scholar 

  40. Greenlaw, R., Hoover, H.J., Ruzzo, W.L.: Limits to Parallel Computation: P-Completeness Theory. Oxford University Press, Oxford (1995)

    MATH  Google Scholar 

  41. Johnson, D.S., Klug, A.C.: Testing containment of conjunctive queries under functional and inclusion dependencies. J. Comput. Syst. Sci. 28(1), 167–189 (1984)

    Article  MathSciNet  MATH  Google Scholar 

  42. Kifer, M., Lausen, G., Wu, J.: Logical foundations of object-oriented and frame-based languages. J. ACM 42, 741–843 (1995)

    Article  MathSciNet  MATH  Google Scholar 

  43. Koch, C.: Efficient processing of expressive node-selecting queries on XML data in secondary storage: a tree automata-based approach. In: VLDB, pp. 249–260 (2003)

    Google Scholar 

  44. Laender, A.H.F., Ribeiro-Neto, B.A., da Silva, A.S.: Debye - data extraction by example. Data Knowl. Eng. 40(2), 121–154 (2002)

    Article  MATH  Google Scholar 

  45. Liu, L., Pu, C., Han, W.: XWRAP: An XML-enabled wrapper construction system for web information sources. In: ICDE, pp. 611–621 (2000)

    Google Scholar 

  46. Ludäscher, B., Himmeröder, R., Lausen, G., May, W., Schlepphorst, C.: Managing semistructured data with FLORID: a deductive object-oriented perspective. Inf. Syst. 23(8), 589–613 (1998)

    Article  Google Scholar 

  47. Lukasiewicz, T., Martinez, M.V., Pieris, A., Simari, G.I.: From classical to consistent query answering under existential rules. In: AAAI, pp. 1546–1552 (2015)

    Google Scholar 

  48. Marnette, B.: Generalized schema-mappings: from termination to tractability. In: PODS, pp. 13–22 (2009)

    Google Scholar 

  49. Meuss, H., Schulz, K.U., Bry, F.: Towards aggregated answers for semistructured data. In: Bussche, J., Vianu, V. (eds.) ICDT 2001. LNCS, vol. 1973, pp. 346–360. Springer, Heidelberg (2001). doi:10.1007/3-540-44503-X_22

    Chapter  Google Scholar 

  50. Milani, M., Bertossi, L.: Ontology-based multidimensional contexts with applications to quality data specification and extraction. In: Bassiliades, N., Gottlob, G., Sadri, F., Paschke, A., Roman, D. (eds.) RuleML 2015. LNCS, vol. 9202, pp. 277–293. Springer, Heidelberg (2015). doi:10.1007/978-3-319-21542-6_18

    Chapter  Google Scholar 

  51. Miller, R.J., Hernández, M.A., Haas, L.M., Yan, L., Ho, C.T.H., Fagin, R., Popa, L.: The clio project: managing heterogeneity. SIGMOD Rec. 30(1), 78–83 (2001)

    Article  Google Scholar 

  52. Minoux, M.: LTUR: a simplified linear-time unit resolution algorithm for horn formulae and computer implementation. Inf. Process. Lett. 29(1), 1–12 (1988)

    Article  MathSciNet  MATH  Google Scholar 

  53. Neven, F., den Bussche, J.V.: Expressiveness of structured document query languages based on attribute grammars. J. ACM 49(1), 56–100 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  54. Neven, F., Schwentick, T.: Query automata over finite trees. Theor. Comput. Sci. 275(1–2), 633–674 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  55. Papakonstantinou, Y., Gupta, A., Garcia-Molina, H., Ullman, J.: A query translation scheme for rapid implementation of wrappers. In: Ling, T.W., Mendelzon, A.O., Vieille, L. (eds.) DOOD 1995. LNCS, vol. 1013, pp. 161–186. Springer, Heidelberg (1995). doi:10.1007/3-540-60608-4_40

    Chapter  Google Scholar 

  56. Sahuguet, A., Azavant, F.: Building intelligent web applications using lightweight wrappers. Data Knowl. Eng. 36(3), 283–316 (2001)

    Article  MATH  Google Scholar 

  57. Seidl, H., Schwentick, T., Muscholl, A.: Numerical document queries. In: PODS, pp. 155–166 (2003)

    Google Scholar 

  58. Thatcher, J.W., Wright, J.B.: Generalized finite automata theory with an application to a decision problem of second-order logic. Math. Syst. Theory 2(1), 57–81 (1968)

    Article  MathSciNet  MATH  Google Scholar 

  59. Thomas, W.: Languages, automata, and logic. In: Rozenberg, G., Salomaa, A. (eds.) Handbook of Formal Languages, vol. 3, pp. 389–455. Springer, Heidelberg (1997). Chapter 7

    Chapter  Google Scholar 

Download references

Acknowledgements

This work has been supported by the EPSRC Programme Grant EP/M025268/ “VADA: Value Added Data Systems – Principles and Architecture”.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Andreas Pieris .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Gottlob, G., Koch, C., Pieris, A. (2017). Logic, Languages, and Rules for Web Data Extraction and Reasoning over Data. In: Drewes, F., Martín-Vide, C., Truthe, B. (eds) Language and Automata Theory and Applications. LATA 2017. Lecture Notes in Computer Science(), vol 10168. Springer, Cham. https://doi.org/10.1007/978-3-319-53733-7_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-53733-7_2

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-53732-0

  • Online ISBN: 978-3-319-53733-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics