Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applications - PubMed Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2010 Sep-Oct;17(5):507-13.
doi: 10.1136/jamia.2009.001560.

Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applications

Affiliations

Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applications

Guergana K Savova et al. J Am Med Inform Assoc. 2010 Sep-Oct.

Abstract

We aim to build and evaluate an open-source natural language processing system for information extraction from electronic medical record clinical free-text. We describe and evaluate our system, the clinical Text Analysis and Knowledge Extraction System (cTAKES), released open-source at http://www.ohnlp.org. The cTAKES builds on existing open-source technologies-the Unstructured Information Management Architecture framework and OpenNLP natural language processing toolkit. Its components, specifically trained for the clinical domain, create rich linguistic and semantic annotations. Performance of individual components: sentence boundary detector accuracy=0.949; tokenizer accuracy=0.949; part-of-speech tagger accuracy=0.936; shallow parser F-score=0.924; named entity recognizer and system-level evaluation F-score=0.715 for exact and 0.824 for overlapping spans, and accuracy for concept mapping, negation, and status attributes for exact and overlapping spans of 0.957, 0.943, 0.859, and 0.580, 0.939, and 0.839, respectively. Overall performance is discussed against five applications. The cTAKES annotations are the foundation for methods and modules for higher-level semantic processing of clinical free-text.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Example sentence processed through cTAKES components ‘family history of obesity but no family history of coronary artery diseases.’ Fx, family history.

Similar articles

Cited by

References

    1. Hornberger J. Electronic health records: a guide for clinicians and administrators. Book and media review. JAMA 2009;(301):110
    1. Meystre SM, Savova GK, Kipper-Schuler KC, et al. Extracting information from textual documents in the electronic health record: a review of recent research. IMIA Year book of Medical Informatics 2008;47(Suppl 1):128–44 - PubMed
    1. Friedman C. Towards a comprehensive medical language processing system: methods and issues. Proc AMIA Annu Fall Symp 1997:595–9 - PMC - PubMed
    1. Friedman C. A broad-coverage natural language processing system. Proc AMIA Symp 2000:270–4 - PMC - PubMed
    1. Hripcsak G, Kuperman G, Friedman C. Extracting findings from narrative reports: software transferability and sources of physician disagreement. Methods Inf Med 1998;37:1–7 - PubMed

Publication types

MeSH terms