Abstract
Electronic Health Record (EHR) use in India is generally poor, and structured clinical information is mostly lacking. This work is the first attempt aimed at evaluating unstructured text mining for extracting relevant clinical information from Indian clinical records. We annotated a corpus of 250 discharge summaries from an Intensive Care Unit (ICU) in India, with markups for diseases, procedures, and lab parameters, their attributes, as well as key demographic information and administrative variables such as patient outcomes. In this process, we have constructed guidelines for an annotation scheme useful to clinicians in the Indian context. We evaluated the performance of an NLP engine, Cocoa, on a cohort of these Indian clinical records. We have produced an annotated corpus of roughly 90 thousand words, which to our knowledge is the first tagged clinical corpus from India. Cocoa was evaluated on a test corpus of 50 documents. The overlap F-scores across the major categories, namely disease/symptoms, procedures, laboratory parameters and outcomes, are 0.856, 0.834, 0.961 and 0.872 respectively. These results are competitive with results from recent shared tasks based on US records. The annotated corpus and associated results from the Cocoa engine indicate that unstructured text mining is a viable method for cohort analysis in the Indian clinical context, where structured EHR records are largely absent.
Similar content being viewed by others
References
H.E.S.S. Committee, And the G.E.T. Force, Electronic Health Records, A Global Perspective, 2010.
Electronic Health Record Standards For India Helpdesk | National Health Portal Of India, (n.d.). http://www.nhp.gov.in/ehr-standards-helpdesk_ms (accessed May 12, 2016).
Debra, D., Sullivan, guide to clinical documentation, 2nd edn. F. A, Davis Company, Philadelphia, 2004.
Anthes, A.M., Harinstein, L.M., Smithburger, P.L., Seybert, A.L., and Kane-Gill, S.L., Improving adverse drug event detection in critically ill patients through screening intensive care unit transfer summaries. Pharmacoepidemiol. Drug Saf. 22:510–516, 2013. doi:10.1002/pds.3422.
Constant, E., Garin, H., Bouchet, C., and Kohler, F., Differences of case-mix according to the type of hospital: methodological aspects and results. Stud. Health Technol. Inform. 52(Pt 2):874–878 , 1998.http://www.ncbi.nlm.nih.gov/pubmed/10384586 (accessed May 12, 2016)
Kind, A.J.H., Thorpe, C.T., Sattin, J.A., Walz, S.E., and Smith, M.A., Provider characteristics, clinical-work processes and their relationship to discharge summary quality for sub-acute care patients. J. Gen. Intern. Med. 27:78–84, 2012. doi:10.1007/s11606-011-1860-0.
M. Skouroliakou, G. Soloupis, A. Gounaris, A. Charitou, P. Papasarantopoulos, S.L. Markantonis, C. Golna, K. Souliotis, Data analysis of the benefits of an electronic registry of information in a neonatal intensive care unit in Greece., Perspect. Health Inf. Manag. 5 (2008) 10. http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=2508737&tool=pmcentrez&rendertype=abstract (accessed May 12, 2016).
Blair, D.R., Lyttle, C.S., Mortensen, J.M., Bearden, C.F., Jensen, A.B., Khiabanian, H., Melamed, R., Rabadan, R., Bernstam, E.V., Brunak, S., Jensen, L.J., Nicolae, D., Shah, N.H., Grossman, R.L., Cox, N.J., White, K.P., and Rzhetsky, A., A nondegenerate code of deleterious variants in Mendelian loci contributes to complex disease risk. Cell. 155:70–80, 2013. doi:10.1016/j.cell.2013.08.030.
Li, L., Ruau, D.J., Patel, C.J., Weber, S.C., Chen, R., Tatonetti, N.P., Dudley, J.T., and Butte, A.J., Disease risk factors identified through shared genetic architecture and electronic medical records. Sci. Transl. Med. 6:234–ra57, 2014. doi:10.1126/scitranslmed.3007191.
Earl, M.F., Information retrieval in biomedicine: natural language processing for knowledge integration. J. Med. Libr. Assoc. 98:190–191, 2010. doi:10.3163/1536-5050.98.2.020.
Mehrotra, A., Dellon, E.S., Schoen, R.E., Saul, M., Bishehsari, F., Farmer, C., and Harkema, H., Applying a natural language processing tool to electronic health records to assess performance on colonoscopy quality measures. Gastrointest. Endosc. 75:1233–9.e14, 2012. doi:10.1016/j.gie.2012.01.045.
Uzuner, O., Solti, I., Xia, F., and Cadag, E., Community annotation experiment for ground truth generation for the i2b2 medication challenge. J. Am. Med. Inform. Assoc. 17:519–523. doi:10.1136/jamia.2010.004200.
Gobbel, G.T., Reeves, R., Jayaramaraja, S., Giuse, D., Speroff, T., Brown, S.H., Elkin, P.L., and Matheny, M.E., Development and evaluation of RapTAT: a machine learning system for concept mapping of phrases from medical narratives. J. Biomed. Inform. 48:54–65, 2014. doi:10.1016/j.jbi.2013.11.008.
S. Sohn, Z. Ye, H. Liu, C.G. Chute, I.J. Kullo, Identifying Abdominal Aortic Aneurysm Cases and Controls using Natural Language Processing of Radiology Reports., AMIA Jt. Summits Transl. Sci. Proc. AMIA Summit Transl. Sci. (2013) 249–253. http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=3845740&tool=pmcentrez&rendertype=abstract (accessed May 12, 2016).
Imler, T.D., Morea, J., Kahi, C., and Imperiale, T.F., Natural language processing accurately categorizes findings from colonoscopy and pathology reports. Clin. Gastroenterol. Hepatol. 11:689–694, 2013. doi:10.1016/j.cgh.2012.11.035.
Shaban-Nejad, A., Mamiya, H., Riazanov, A., Forster, A.J., Baker, C.J.O., Tamblyn, R., and Buckeridge, D.L., From cues to nudge: a knowledge-based framework for surveillance of healthcare-associated infections. J. Med. Syst. 40:1–12, 2016. doi:10.1007/s10916-015-0364-6.
Chen, L.S., Lin, Z.C., and Chang, J.R., FIR: An Effective Scheme for Extracting Useful Metadata from Social Media. J. Med. Syst. 39, 2015. doi:10.1007/s10916-015-0333-0.
Y.a, W., Y.a, T., L.-L.b, T., Y.-M.b, Q., and J.-S.a, L., An Electronic Medical Record System with Treatment Recommendations Based on Patient Similarity. J. Med. Syst. 39, 2015. doi:10.1007/s10916-015-0237-z.
Sun, W., Rumshisky, A., and Uzuner, O., Evaluating temporal relations in clinical text: 2012 i2b2 challenge. J. Am. Med. Inform. Assoc. 20:806–813. doi:10.1136/amiajnl-2013-001628.
i2b2: Informatics for Integrating Biology & the Bedside, (n.d.). https://www.i2b2.org/NLP/HeartDisease/ (accessed May 12, 2016).
S. Pradhan, N. Elhadad, B.R. South, D. Martinez, Lee, Christensen, A. Vogel, H. Suominen, W.W. Chapman, A.G. Savova, Task 1: ShARe/CLEF eHealth Evaluation Lab, 2013. http://ceur-ws.org/Vol-1179/CLEF2013wn-CLEFeHealth-PradhanEt2013.pdf.
D.L. Mowery, S. Velupillai, B.R. South, L. Christensen, D. Martinez, L. Kelly, L. Goeuriot, N. Elhadad, Sameer, Pradhan, G. Savova, and W.W. Chapman, Task 2: ShARe/CLEF eHealth Evaluation Lab, 2014. http://ceur-ws.org/Vol-1180/CLEF2014wn-eHealth-MoweryEt2014.pdf.
S. Pradhan, N. Elhadad, W. Chapman, G. Savova, S. Manandhar, Task 7: analysis of clinical text, in: 8th Int. Work. Semant. Eval., 2014.
N. Elhadad, S. Pradhan, S.L. Gorman, W. Manandhar, Suresh Chapman, G. Savova, Task 14: Analysis of Clinical Text, 2015. http://alt.qcri.org/semeval2015/cdrom/pdf/SemEval051.pdf.
van Walraven, C., and Austin, P., Administrative database research has unique characteristics that can risk biased results. J. Clin. Epidemiol. 65:126–131, 2012. doi:10.1016/j.jclinepi.2011.08.002.
P. Stenetorp, S. Pyysalo, G. Topić, T. Ohta, S. Ananiadou, J. Tsujii, BRAT: a web-based tool for NLP-assisted text annotation, (2012) 102–107. http://dl.acm.org/citation.cfm?id=2380921.2380942 (accessed May 12, 2016).
ABNEY, S., Partial parsing via finite-state cascades. Nat. Lang. Eng. 2:337–344, 1996. doi:10.1017/S1351324997001599.
Chapman, W.W., Bridewell, W., Hanbury, P., Cooper, G.F., and Buchanan, B.G., A simple algorithm for identifying negated findings and diseases in discharge summaries. J. Biomed. Inform. 34:301–310, 2001. doi:10.1006/jbin.2001.1029.
S. Ramanan, S.P. Nathan, Performance and limitations of the linguistically motivated cocoa/Peaberry system in a broad biomedical domain, in: BioNLP Shar. Task, 2013. http://www.aclweb.org/anthology/W13-2011.
S. V Ramanan, S.P. Nathan, Performance of a multi-class biomedical tagger on the BioCreative IV CTD task, in: Fourth BioCreative Chall. Eval. Work., 2013. http://www.biocreative.org/media/store/files/2013/bc4_v1_13.pdf.
S. V Ramanan, S.P. Nathan, RelAgent: Entity Detection and Normalization for Diseases in Clinical Records: a Linguistically Driven Approach, in: 8th Int. Work. Semant. Eval., 2014. http://www.aclweb.org/anthology/S14-2083.
S. V Ramanan, S.P. Nathan, Cocoa: Extending a rule-based system to tag disease attributes in clinical records, in: ShARe/CLEF eHealth Eval. Lab, 2014. http://ceur-ws.org/Vol-1180/CLEF2014wn-eHealth-RamananEt2014.pdf.
S. Pradhan, N. Elhadad, W. Chapman, S. Manandhar, G. Savova, SemEval-2014 Task 7: Analysis of Clinical Text, in: Proc. 8th Int. Work. Semant. Eval. (SemEval 2014), 2014: pp. 54–62. http://www.aclweb.org/anthology/S14-2007.
Author information
Authors and Affiliations
Corresponding authors
Ethics declarations
Funding
This work was covered completely by internal funding from St. John’s Research Institute and RelAgent Tech Pvt. Ltd.
Competing Interests
P. Senthil Nathan and S. V. Ramanan are founders of RelAgent Tech Pvt. Ltd., a biomedical text mining company. Other authors declare that they have no competing interests.
Ethics Statement
Ethical approval for the study was granted by the Institutional Ethics Committee (IEC) of St. John’s National Academy of Health Sciences. Patient consent for data collection is obtained as part of routine procedure during admission to the ICU.
Additional information
This article is part of the Topical Collection on Patient Facing Systems
Electronic supplementary material
ESM 1
Supplemmentary material A document containing the corpus annotation guidelines (DOCX 340 kb)
Rights and permissions
About this article
Cite this article
Ramanan, S.V., Radhakrishna, K., Waghmare, A. et al. Dense Annotation of Free-Text Critical Care Discharge Summaries from an Indian Hospital and Associated Performance of a Clinical NLP Annotator. J Med Syst 40, 187 (2016). https://doi.org/10.1007/s10916-016-0541-2
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s10916-016-0541-2