Abstract
Web pages containing RDFa markup facilitate a broad range of new agents that improve their usability for human readers. Unfortunately, there still exist only few web sites featuring such annotations. In this paper, we demonstrate Atheris, a system that annotates structured web pages by means of our web data extraction tool ViPER. Atheris runs inside a web proxy service, making it transparently available. Our approach enables the web browser—the mostly used web agent—to operate intelligently on the displayed page by providing a semantic view over previously ’meaningless’ data in order to support human readers.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Adida, B., Birbeck, M.: RDFa Primer (2008), http://www.w3.org/TR/2008/NOTE-xhtml-rdfa-primer-20081014/
Adida, B., Hausenblas, M.: RDFa Use Cases (2007), http://www.w3.org/TR/2007/WD-xhtml-rdfa-scenarios-20070330/
Manola, F., Miller, E.: RDF Primer (2004), http://www.w3.org/TR/2004/REC-rdf-primer-20040210/
Adida, B., Birbeck, M., McCarron, S., Pemberton, S.: RDFa in XHTML: Syntax and Processing (2008), http://www.w3.org/TR/2008/REC-rdfa-syntax-20081014
Chang, C.H., Kayed, M., Girgis, M.R., Shaalan, K.F.: A Survey of Web Information Extraction Systems. IEEE TKDE 18(10), 1411–1428 (2006)
Uren, V., Cimiano, P., Iria, J., Handschuh, S., Vargas-Vera, M., Motta, E., Ciravegna, F.: Semantic Annotation for Knowledge Management: Requirements and a Survey of the State of the Art. Web Semantics (4), 14–28 (2006)
Baumgartner, R., Flesca, S., Gottlob, G.: Visual Web Information Extraction with Lixto. In: VLDB, San Francisco, CA, USA, pp. 119–128 (2001)
Coetzee, P., Heath, T., Motta, E.: SparqPlug: Generating Linked Data from Legacy HTML, SPARQL and the DOM. In: LDOW, Bejing, China (2008)
Huynh, D., Mazzocchi, S., Karger, D.: Piggy Bank: Experience the Semantic Web inside your Web Browser. Web Semant 5(1), 16–27 (2007)
Vargas-Vera, M., Motta, E., Domingue, J., Lanzoni, M., Stutt, A., Ciravegna, F.: MnM: Ontology Driven Semi-Automatic and Automatic Support for Semantic Markup. In: Gómez-Pérez, A., Benjamins, V.R. (eds.) EKAW 2002. LNCS, vol. 2473, pp. 379–391. Springer, Heidelberg (2002)
Handschuh, S., Staab, S., Ciravegna, F.: S-CREAM — Semi-automatic CREAtion of Metadata. In: Gómez-Pérez, A., Benjamins, V.R. (eds.) EKAW 2002. LNCS, vol. 2473, pp. 358–372. Springer, Heidelberg (2002)
Hogue, A., Karger, D.: Thresher: Automating the unwrapping of semantic content from the world wide web. In: WWW, Chiba, Japan, pp. 86–95 (2005)
Bolin, M., Webber, M., Rha, P., Wilson, T., Miller, R.C.: Automation and Customiztion of Rendered Web Pages. In: UIST, Seattle, WA, USA, pp. 163–172 (2005)
Simon, K., Lausen, G.: ViPER: Augmenting Automatic Information Extraction with Visual Perceptions. In: CIKM, Bremen, Germany, pp. 381–388 (2005)
Simon, K., Hornung, T., Lausen, G.: Learning Rules to Pre-process Web Data for Automatic Integration. In: RuleML, Athens, GA, USA, pp. 107–116 (2006)
Hornung, T., Simon, K., Lausen, G.: Mashing Up the DEEP Web - Research in Progress. In: WEBIST (2), Madeira, Portugal, pp. 58–66 (2008)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Schmedding, F., Schwaibold, M., Simon, K. (2009). Pattern-Based Annotation of HTML-Streams. In: Aroyo, L., et al. The Semantic Web: Research and Applications. ESWC 2009. Lecture Notes in Computer Science, vol 5554. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-02121-3_77
Download citation
DOI: https://doi.org/10.1007/978-3-642-02121-3_77
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-02120-6
Online ISBN: 978-3-642-02121-3
eBook Packages: Computer ScienceComputer Science (R0)