Abstract
The world wide web does not longer consist of static web pages. Instead, more and more web pages are created dynamically from user request and database content. Conventional search engines do not consider these dynamic pages, as user input cannot be simulated, thus providing often insufficient results.
A new approach for online integration of web databases will be presented in this paper. Providing only one sample HTML result page for a source, result pages for new requests will be found by structural recognition. Once structural recognition is established for one source, other web databases of the same universe (e.g. movie databases) can be integrated on the fly by content-based recognition. Thus, the user receives results from various sources.
Global schemata will not be produced at all. Instead, the heterogeneity of the single sources will be preserved. The only requirement is given by the existence of an extensional overlap of the databases.
Part of this work was supported by the Berlin-Brandenburg Graduate School in Distributed Information Systems (DFG grant no. GRK 316)
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Robert Baumgartner, Sergio Flesca, and Georg Gottlob. Declarative information extraction,Web crawling, and recursive wrapping with lixto. Lecture Notes in Computer Science, 2173, 2001.
Valter Crescenzi, Giansalvatore Mecca, and Paolo Merialdo. Roadrunner: Towards automatic data extraction from large web sites. In Proceedings of the 27th International Conference on Very Large Data Bases (VLDB’ 01), pages 109–118, Orlando, September 2001. Morgan Kaufmann.
William W. Cohen. Integration of heterogeneous databases without common domains using queries based on textual similarity. In Proceedings of the 1998 ACM SIGMOD, Seattle, Washington, 1998.
Janet L. Wiener Marc Najork. Breadth-first search crawling yields highquality pages. In Proceedings of Tenth International World Wide Web Conference, Hong Kong, May 2001.
Sriram Raghavan and Hector Garcia-Molina. Crawling the hidden web. In Proceedings of the 27th International Conference on Very Large Data Bases (VLDB’ 01), pages 129–138, Orlando, September 2001. Morgan Kaufmann.
Arnaud Sahuguet and Fabien Azavant. Building light-weight wrappers for legacy web data-sources using w4f. In Proceedings of the 25th International Conference on Very Large Data Bases (VLDB’ 99), 1999.
Gerald Salton, editor. Automatic Text Processing. Addison-Wesley, Reading, Massachusetts, 1989.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Neiling, M., Schaal, M., Schumann, M. (2003). WrapIt: Automated Integration of Web Databases with Extensional Overlaps. In: Chaudhri, A.B., Jeckle, M., Rahm, E., Unland, R. (eds) Web, Web-Services, and Database Systems. NODe 2002. Lecture Notes in Computer Science, vol 2593. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-36560-5_14
Download citation
DOI: https://doi.org/10.1007/3-540-36560-5_14
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-00745-6
Online ISBN: 978-3-540-36560-0
eBook Packages: Springer Book Archive