An evaluation of GO annotation retrieval for BioCreAtIvE and GOA
- PMID: 15960829
- PMCID: PMC1869009
- DOI: 10.1186/1471-2105-6-S1-S17
An evaluation of GO annotation retrieval for BioCreAtIvE and GOA
Abstract
Background: The Gene Ontology Annotation (GOA) database http://www.ebi.ac.uk/GOA aims to provide high-quality supplementary GO annotation to proteins in the UniProt Knowledgebase. Like many other biological databases, GOA gathers much of its content from the careful manual curation of literature. However, as both the volume of literature and of proteins requiring characterization increases, the manual processing capability can become overloaded. Consequently, semi-automated aids are often employed to expedite the curation process. Traditionally, electronic techniques in GOA depend largely on exploiting the knowledge in existing resources such as InterPro. However, in recent years, text mining has been hailed as a potentially useful tool to aid the curation process. To encourage the development of such tools, the GOA team at EBI agreed to take part in the functional annotation task of the BioCreAtIvE (Critical Assessment of Information Extraction systems in Biology) challenge. BioCreAtIvE task 2 was an experiment to test if automatically derived classification using information retrieval and extraction could assist expert biologists in the annotation of the GO vocabulary to the proteins in the UniProt Knowledgebase. GOA provided the training corpus of over 9000 manual GO annotations extracted from the literature. For the test set, we provided a corpus of 200 new Journal of Biological Chemistry articles used to annotate 286 human proteins with GO terms. A team of experts manually evaluated the results of 9 participating groups, each of which provided highlighted sentences to support their GO and protein annotation predictions. Here, we give a biological perspective on the evaluation, explain how we annotate GO using literature and offer some suggestions to improve the precision of future text-retrieval and extraction techniques. Finally, we provide the results of the first inter-annotator agreement study for manual GO curation, as well as an assessment of our current electronic GO annotation strategies.
Results: The GOA database currently extracts GO annotation from the literature with 91 to 100% precision, and at least 72% recall. This creates a particularly high threshold for text mining systems which in BioCreAtIvE task 2 (GO annotation extraction and retrieval) initial results precisely predicted GO terms only 10 to 20% of the time.
Conclusion: Improvements in the performance and accuracy of text mining for GO terms should be expected in the next BioCreAtIvE challenge. In the meantime the manual and electronic GO annotation strategies already employed by GOA will provide high quality annotations.
Figures
Similar articles
-
Evaluation of BioCreAtIvE assessment of task 2.BMC Bioinformatics. 2005;6 Suppl 1(Suppl 1):S16. doi: 10.1186/1471-2105-6-S1-S16. Epub 2005 May 24. BMC Bioinformatics. 2005. PMID: 15960828 Free PMC article.
-
The Gene Ontology Annotation (GOA) Database: sharing knowledge in Uniprot with Gene Ontology.Nucleic Acids Res. 2004 Jan 1;32(Database issue):D262-6. doi: 10.1093/nar/gkh021. Nucleic Acids Res. 2004. PMID: 14681408 Free PMC article.
-
Automatic extraction of gene ontology annotation and its correlation with clusters in protein networks.BMC Bioinformatics. 2007 Jul 10;8:243. doi: 10.1186/1471-2105-8-243. BMC Bioinformatics. 2007. PMID: 17620146 Free PMC article.
-
Facts from text: can text mining help to scale-up high-quality manual curation of gene products with ontologies?Brief Bioinform. 2008 Nov;9(6):466-78. doi: 10.1093/bib/bbn043. Epub 2008 Dec 6. Brief Bioinform. 2008. PMID: 19060303 Review.
-
Linking genes to literature: text mining, information extraction, and retrieval applications for biology.Genome Biol. 2008;9 Suppl 2(Suppl 2):S8. doi: 10.1186/gb-2008-9-s2-s8. Epub 2008 Sep 1. Genome Biol. 2008. PMID: 18834499 Free PMC article. Review.
Cited by
-
Integration of background knowledge for automatic detection of inconsistencies in gene ontology annotation.Bioinformatics. 2024 Jun 28;40(Suppl 1):i390-i400. doi: 10.1093/bioinformatics/btae246. Bioinformatics. 2024. PMID: 38940182 Free PMC article.
-
Evaluation of BioCreAtIvE assessment of task 2.BMC Bioinformatics. 2005;6 Suppl 1(Suppl 1):S16. doi: 10.1186/1471-2105-6-S1-S16. Epub 2005 May 24. BMC Bioinformatics. 2005. PMID: 15960828 Free PMC article.
-
A multi-source domain annotation pipeline for quantitative metagenomic and metatranscriptomic functional profiling.Microbiome. 2018 Aug 28;6(1):149. doi: 10.1186/s40168-018-0532-2. Microbiome. 2018. PMID: 30153857 Free PMC article.
-
Dizeez: an online game for human gene-disease annotation.PLoS One. 2013 Aug 7;8(8):e71171. doi: 10.1371/journal.pone.0071171. eCollection 2013. PLoS One. 2013. PMID: 23951102 Free PMC article.
-
Ten quick tips for biocuration.PLoS Comput Biol. 2019 May 2;15(5):e1006906. doi: 10.1371/journal.pcbi.1006906. eCollection 2019 May. PLoS Comput Biol. 2019. PMID: 31048830 Free PMC article. No abstract available.
References
-
- GO Consortium home page http://www.geneontology.org
-
- GOA home page http:///www.ebi.ac.uk/GOA
Publication types
MeSH terms
Substances
Grants and funding
LinkOut - more resources
Full Text Sources
Medical