Abstract
We discuss several treebank conceptions in the literature and show that their requirements may be incompatible, describing then the options taken in the construction of a Portuguese treebank, in what concerns human vs. automatic intervention. Use cases are then listed in connection with a Web search tool (Águia), whose philosophy and implementation is presented.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Marcus, Mitchell, Kim, Grace, Marcinkiewicz, Mary Ann, MacIntyre, Robert, Bies, Ann, Ferguson, Mark, Katz, Karen, Schasberger, Britta: The Penn treebank: Annotating predicate argument structure. In: Proceedings of the 1994 Human Language Technology Workshop (ARPA) (1994) 110–115.
Xia, Fei, Palmer, Martha, Xue, Nianwen, Okurowski, Mary Ellen, Kovarik, John, Chiou, Fu-dong, Huang, Shizhe, Kroch, Tony, Marcus, Mitch: Developing Guidelines and Ensuring Consistency for Chinese Text Annotation. In: Gavriladou, M. et al. (eds.), Proceedings of LREC 2000 (2000) 3–10.
Skut, Wojciech, Brants, Thorsten, Krenn, Brigitte, Uszkoreit, Hans: A Linguistically Interpreted Corpus of German Newspaper Text. In: Rubio, A. et al. (eds.), Proceedings of LREC 1998 (1998) 705–711
Afonso, Susana, Bick, Eckhard, Haber, Renato, Santos, Diana: “Floresta sintá(c)tica”: a treebank for Portuguese. In: Rodríguez, M.G., Araujo, C.P.S. (eds.): Proceedings of LREC 2002 (2002), 1698–1703
Wilson, G., Mani, I., Sundheim, B., Ferro, L.: A multilingual approach to annotating and extracting temporal information. In: Proceedings of the Worskhop for Temporal and Spatial Information Processing (Toulouse, July 7th 2001) (2001) 81–87
Marcus, Mitchell P., Santorini, Beatrice, Marcinkiewicz, Mary Ann: Building a large Annotated Corpus of English: The Penn Treebank. Computational Linguistics, 19 (1993) 313–330
Gaizauskas, R., Hepple, M., Huyck, C. Modifying Existing Annotated Corpora for General Comparative Evaluation of Parsing. In: Workshop on Evaluation of Parsing Systems, at the LREC’98 (1998)
Carroll, John, Minnen, Guido, Briscoe, Ted: Corpus annotation for Parser Evaluation. In: Uszkoreit, H. et al (eds.): Proceedings of LINC-99: Linguistically Interpreted Corpora, EACL (Bergen, 12 June 1999) (1999) 35–41
Santos, Diana, Rocha, Paulo: AvalON: uma iniciativa de avaliação conjunta para o português. In: Actas do XVIII Encontro da Associação Portuguesa de LinguÍstica (Porto, 2–4 de Outubro de 2002) (2003)
Santos, Diana, Costa, Luís, Rocha, Paulo: Cooperatively evaluating Portuguese morphology. In: this volume (2003)
Bick, Eckhard: The Parsing System “Palavras”: Automatic Grammatical Analysis of Portuguese in a Constraint Grammar Framework. Aarhus University Press (2000)
Santos, Diana, Gasperin, Caroline: Evaluation of parsed corpora: experiments in user-transparent and user-visible evaluation. In Rodríguez, M.G.; Araujo, C.P.S. (eds.): Proceedings of LREC 2002 (2002) 597–604
Afonso, Susana: Clara e sucintamente: Um estudo em corpus sobre a coordenação de advérbios em-mente. In: Actas do XVIII Encontro da Associação Portuguesa de LinguÍstica (Porto, 2–4 de Outubro de 2002) (2003)
Afonso, Susana, Bick, Eckhard, Haber, Renato, Santos, Diana: Floresta sintá(c)tica: um treebank para o português. In: Gonçalves, Anabela, Correia, Clara Nunes (eds.): Actas do XVII Encontro da Associação Portuguesa de Linguística (Lisboa, 2–4 de Outubro de 2001) (2002) 533–545
Christ, Oliver: A modular and flexible architecture for an integrated corpus query system. In: Proceedings of COMPLEX’94: 3rd Conference on Computational Lexicography and Text Research (1994) 23–32
Evert, Stefan: CQP Query Language Tutorial. IMS Stuttgart, 13 Out 2001
Evert, Stefan; Kermes, Hannah: Annotation, storage, and retrieval of mildly recursive structures. In: Proceedings of the Workshop on Shallow Processing of Large Corpora (SProLaC 2003) (2003)
König, Esther, Lezius, Wolfgang: A description language for syntactically annotated corpora. In: Proceedings of COLING 2000 (2000) 1056–1060
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Santos, D. (2003). Timber! Issues in Treebank Building and Use. In: Mamede, N.J., Trancoso, I., Baptista, J., das Graças Volpe Nunes, M. (eds) Computational Processing of the Portuguese Language. PROPOR 2003. Lecture Notes in Computer Science(), vol 2721. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45011-4_22
Download citation
DOI: https://doi.org/10.1007/3-540-45011-4_22
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-40436-1
Online ISBN: 978-3-540-45011-5
eBook Packages: Springer Book Archive