Abstract
We propose a new web information extraction system, PIES, to convert web information into XML documents. PIES uses a user-specified ontology and HTML tag pattern descriptions. The ontology validates the web information the pattern descriptions extract. We designed a new language to describe HTML tag patterns and extraction rules. We implemented PIES and applied it to the US patent web site for evaluation.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Adelberg, B.: NoDoSE - A tool for Semi-Automatically Extracting Structured and Semistructured Data from Text Documents. In: Proc. Int’l Conf. on Management of Data, ACM SIGMOD, Seattle, pp. 283–294 (1998)
Arasu, A., Garcia-Molina, H.: Extracting Structured Data from Web Pages. In: Proc. Int’l Conf. on Management of Data, ACM SIGMOD, San Diego, June 2003, pp. 337–348 (2003)
Chang, C., Lui, S.: IEPAD: Information Extraction based on Pattern Discovery. In: Proc. Int’l Conf. on World Wide Web (WWW10), Hong Kong, May 2001, pp. 681–688 (2001)
Chung, C.Y., Gertz, M., Sundaresan, N.: Reverse Engineering for Web Data: From Visual to Semantic Structures. In: Proc. Int’l Conf. on Data Engineering (ICDE 2002), San Jose, California, pp. 363–374 (2002)
Crescenzi, V., Mecca, G.: Grammars Have Exceptions. Information Systems 23(8), 539–565 (1998)
Crescenzi, V., Mecca, G., Merialdo, P.: RoadRunner: Towards Automatic Data Extraction from Large Web Sites. In: Proc. Int’l Conf. on Very Large Data Bases, Rome, pp. 109–118 (2001)
Embley, D.W., Campbell, D.M., Jiang, Y.S., Liddle, S.W., Lonsdale, D.W., Ng, Y.-K., Smith, R.D.: Conceptual-model-based data extraction from multiple-record Web pages. Data & Knowledge Engineering 31(3), 227–251 (1999)
Hammer, J., Garcia-Molina, H., Nestorov, S., Yerneni, R., Breunig, M., Vassalos, V.: Template-Based Wrappers in the TSIMMIS System. In: Proc. Int’l Conf. on Management of Data, ACM SIGMOD, AZ, USA, pp. 532–535 (1997)
Laender, A.H.F., Ribeiro-Neto, B.A., da Silva, A.S., Teixeira, J.S.: A Brief Survey of Web Data Extraction Tools. SIGMOD Record 31(2), 84–93 (2002)
Sahuguet, A., Azavant, F.: Looking at the Web through XML glasses. In: Proc. IFCIS Int. Conf. on Cooperative Information Systems (CoopIS 1999), pp. 148–159 (1999)
United States Patent and Trademark Office, http://www.uspto.gov/
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Park, BK., Han, H., Song, IY. (2005). PIES: A Web Information Extraction System Using Ontology and Tag Patterns. In: Fan, W., Wu, Z., Yang, J. (eds) Advances in Web-Age Information Management. WAIM 2005. Lecture Notes in Computer Science, vol 3739. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11563952_65
Download citation
DOI: https://doi.org/10.1007/11563952_65
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-29227-2
Online ISBN: 978-3-540-32087-6
eBook Packages: Computer ScienceComputer Science (R0)