An evaluation of GO annotation retrieval for BioCreAtIvE and GOA - PubMed Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2005;6 Suppl 1(Suppl 1):S17.
doi: 10.1186/1471-2105-6-S1-S17. Epub 2005 May 24.

An evaluation of GO annotation retrieval for BioCreAtIvE and GOA

Affiliations

An evaluation of GO annotation retrieval for BioCreAtIvE and GOA

Evelyn B Camon et al. BMC Bioinformatics. 2005.

Abstract

Background: The Gene Ontology Annotation (GOA) database http://www.ebi.ac.uk/GOA aims to provide high-quality supplementary GO annotation to proteins in the UniProt Knowledgebase. Like many other biological databases, GOA gathers much of its content from the careful manual curation of literature. However, as both the volume of literature and of proteins requiring characterization increases, the manual processing capability can become overloaded. Consequently, semi-automated aids are often employed to expedite the curation process. Traditionally, electronic techniques in GOA depend largely on exploiting the knowledge in existing resources such as InterPro. However, in recent years, text mining has been hailed as a potentially useful tool to aid the curation process. To encourage the development of such tools, the GOA team at EBI agreed to take part in the functional annotation task of the BioCreAtIvE (Critical Assessment of Information Extraction systems in Biology) challenge. BioCreAtIvE task 2 was an experiment to test if automatically derived classification using information retrieval and extraction could assist expert biologists in the annotation of the GO vocabulary to the proteins in the UniProt Knowledgebase. GOA provided the training corpus of over 9000 manual GO annotations extracted from the literature. For the test set, we provided a corpus of 200 new Journal of Biological Chemistry articles used to annotate 286 human proteins with GO terms. A team of experts manually evaluated the results of 9 participating groups, each of which provided highlighted sentences to support their GO and protein annotation predictions. Here, we give a biological perspective on the evaluation, explain how we annotate GO using literature and offer some suggestions to improve the precision of future text-retrieval and extraction techniques. Finally, we provide the results of the first inter-annotator agreement study for manual GO curation, as well as an assessment of our current electronic GO annotation strategies.

Results: The GOA database currently extracts GO annotation from the literature with 91 to 100% precision, and at least 72% recall. This creates a particularly high threshold for text mining systems which in BioCreAtIvE task 2 (GO annotation extraction and retrieval) initial results precisely predicted GO terms only 10 to 20% of the time.

Conclusion: Improvements in the performance and accuracy of text mining for GO terms should be expected in the next BioCreAtIvE challenge. In the meantime the manual and electronic GO annotation strategies already employed by GOA will provide high quality annotations.

PubMed Disclaimer

Figures

Figure 1
Figure 1
BioCreAtIvE Evaluation Tool (subtask 2.2). showing GO annotation of 'kinase activity' GO:0016301 (right tool bar) by user 9-1 with supporting text evidence (central panel). The left tool bar shows the UniProt accession number, in this case Q8IWU2 has been annotated. Q8IWU2 represents a KPI-2 protein so the user has been evaluated based on the evidence text as 'high' for GO term prediction and 'high' for representing the correct gene product. The user also uses this sentence to predict the GO term 'receptor signaling protein serine/threonine kinase activity'(GO:0004702). Although that GO annotation is correct for this protein the evidence text supplied does not support that level of detail. The same evidence text was evaluated as 'general' for the GO term prediction of GO:0004702 (same lineage as correct GO term 'kinase activity') and 'high' for representing the correct gene product.

Similar articles

Cited by

References

    1. Apweiler R, Bairoch A, Wu CH, Barker WC, Boeckmann B, Ferro S, Gasteiger E, Huang H, Lopez R, Magrane M, Martin MJ, Natale DA, O'Donovan C, Redaschi N, Yeh LS. UniProt: the Universal Protein knowledgebase. Nucleic Acids Res. 2004;32:D115–119. doi: 10.1093/nar/gkh131. - DOI - PMC - PubMed
    1. Camon E, Magrane M, Barrell D, Lee V, Dimmer E, Maslen J, Binns D, Harte N, Lopez R, Apweiler R. The Gene Ontology Annotation (GOA) Database: sharing knowledge in Uniprot with Gene Ontology. Nucleic Acids Res. 2004;32:D262–266. doi: 10.1093/nar/gkh021. - DOI - PMC - PubMed
    1. Gene Ontology Consortium The Gene Ontology (GO) database and informatics resource. Nucleic Acids Res. 2004;32:D258–261. doi: 10.1093/nar/gkh036. - DOI - PMC - PubMed
    1. GO Consortium home page http://www.geneontology.org
    1. GOA home page http:///www.ebi.ac.uk/GOA

Publication types

MeSH terms