Abstract
We applied the OpenDMAP [1] and BioNLP-UIMA [2] NLP systems to the task of mining protein-protein interactions (PPIs) from GeneRIFs. Our goal was to assess and improve system performance on GeneRIF text. We identified several classes of errors in the system’s output on a training dataset (most notably difficulty recognizing protein complexes) and modified the system to improve performance based on these observations. To improve recognition of protein complex interactions, we implemented a new protein-complex-resolution UIMA component. We added a custom entity identification engine that uses GeneRIF metadata to annotate proteins that may have been missed by the other engines. These changes simultaneously improved both recall and precision, resulting in an overall improvement in F-measure (from 0.23 to 0.48). Results confirm that the targeted enhancements described here lead to a substantial improvement in performance.
Availability: Annotated data sets and source code for the new UIMA components can be found at http://bcb.cs.tufts.edu/GeneRIFs/
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Hunter, L., Lu, Z., Firby, J., Baumgartner, W., Johnson, H., Ogren, P., Cohen, K.: OpenDMAP: An open source, ontology-driven concept analysis engine, with applications to capturing knowledge regarding protein transport, protein interactions and cell-type-specific gene expression. BMC Bioinformatics 9(1), 78 (2008)
BioNLP UIMA Component Repository, http://bionlp-uima.sourceforge.net/
Baumgartner, W.A., Cohen, K.B., Fox, L.M., Acquaah-Mensah, G., Hunter, L.: Manual curation is not sufficient for annotation of genomic databases. Bioinformatics 23(14) (2007)
Krallinger, M., Leitner, F., Rodriguez-Penagos, C., Valencia, A.: Overview of the protein-protein interaction annotation extraction task of BioCreative II. Genome Biol. 9(Suppl. 2), S4: 41–55 (2008)
Winnenburg, R., Wachter, T., Plake, C., Doms, A., Schroeder, M.: Facts from text: can text mining help to scale-up high-quality manual curation of gene products with ontologies? Brief Bioinform. 9(6), 466–478 (2008)
Lu, Z., Cohen, K.B., Hunter, L.E.: GeneRIF quality assurance as summary revision. In: Pac. Symp. Biocomput., pp. 269–280 (2007)
Mitchell, J.A., Aronson, A.R., Mork, J.G., Folk, L.C., Humphrey, S.M., Ward, J.M.: Gene indexing: characterization and analysis of NLM’s GeneRIFs. In: AMIA Annu. Symp. Proc., pp. 460–464 (2003)
Lu, Z., Cohen, K.B., Hunter, L.E.: Finding GeneRIFs via Gene Ontology annotations. In: Pac. Symp. Biocomput., pp. 52–63 (2006)
Ding, J., Berleant, D., Nettleton, D., Wurtele, E.: Mining MEDLINE: Abstracts, Sentences, or Phrases? In: Pac. Symp. on Biocomput., vol. 7, pp. 326–337 (2002)
Lu, Z.: Text Mining on GeneRIFs. PhD Thesis, Univeristy of Colorado (2007)
Blaschke, C., Andrade, M.A., Ouzounis, C., Valencia, A.: Automatic extraction of biological information from scientific text: protein-protein interactions. In: Proc. Int. Conf. Intell. Syst. Mol. Biol., pp. 60–67 (1999)
Morgan, A., Lu, Z., Wang, X., Cohen, A., Fluck, J., et al.: Overview of BioCreative II gene normalization. Genome Biol. 9(Suppl. 2), S3 (2008)
Apache: Apache UIMA, http://incubator.apache.org/uima/
Alias-i. 2008.: LingPipe 3.8.2 (2008), http://alias-i.com/lingpipe/
Settles, B.: ABNER: an open source tool for automatically tagging genes, proteins and other entity names in text. Bioinformatics 21(14), 3191–3192 (2005)
Hirschman, L., Colosimo, M., Morgan, A., Yeh, A.: Overview of BioCreative task 1B: normalized gene lists. BMC Bioinfo. 6(Suppl. 1), S11 (2005)
Alex, B., Grover, C., Haddow, B., Kabadjor, M., Klein, E., Matthews, M., Roebuck, S., Tobin, R., Wang, X.: Assisted Curation: Does Text Mining Really Help? In: Pac. Symp. Biocomput., pp. 556–567 (2008)
Leaman, R., Gonzalez, G.: BANNER: An executable survey of advances in biomedical named entity recognition. In: Pac. Symp. Biocomput., vol. 13, pp. 652–663 (2008)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Fox, A.D., Baumgartner, W.A., Johnson, H.L., Hunter, L.E., Slonim, D.K. (2010). Mining Protein-Protein Interactions from GeneRIFs with OpenDMAP. In: Blaschke, C., Shatkay, H. (eds) Linking Literature, Information, and Knowledge for Biology. Lecture Notes in Computer Science(), vol 6004. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-13131-8_6
Download citation
DOI: https://doi.org/10.1007/978-3-642-13131-8_6
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-13130-1
Online ISBN: 978-3-642-13131-8
eBook Packages: Computer ScienceComputer Science (R0)