Automatic Creation of a Pharmaceutical Corpus Based on Open-Data | SpringerLink
Skip to main content

Automatic Creation of a Pharmaceutical Corpus Based on Open-Data

  • Conference paper
  • First Online:
Computational Linguistics and Intelligent Text Processing (CICLing 2019)

Abstract

A large amount of information related to the pharmaceutical industry is published through the web, but there are few tools that allow for automatic analysis. The pharmaceutical area at international level, especially for English language, has a large number of virtual lexical tools, that support not only the research but the software development, while for Spanish language, are few lexical tools and less for the particularities of a country as Colombia. This paper presents the pharmaceutical corpus generation based on open data of Colombian medicines published monthly by the National Institute of Medicines and Food Surveillance (INVIMA). A model has been developed that combines the concepts of corpus and ontology, and the model is structured through a multi-related graph. This model is implemented in a graph-oriented database, because it has been shown to manage this type of structures and since they are based on a mathematical theory, graph-oriented database allows to find patterns and relationships that would otherwise not be possible. For the creation of the corpus, a Crawler was developed to download and control the documents, and through text processing and the proper algorithm are stored in the graph-oriented database (Neo4j).

Universidad Distrital Francisco José de Caldas.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
JPY 3498
Price includes VAT (Japan)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
JPY 11439
Price includes VAT (Japan)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
JPY 14299
Price includes VAT (Japan)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    Neo Technology, Inc. 2017. “Neo4j”. https://neo4j.com/.

References

  1. Alexander, L., Allen, S., Bindoff, N.L.: MongoDB Applied Design Patterns. O’Reilly (2013). https://doi.org/10.1017/CBO9781107415324.004

  2. Anderson, J.C., Lehnardt, J., Slater, N.: CouchDB: The Definitive Guide (2010)

    Google Scholar 

  3. Balakrishnan, R., Ranganathan, K.: A Textbook of Graph Theory. Universitext (1979). Springer (2000). https://books.google.com.co/books?id=ERgLpQgQx4cC

  4. Bautista-Zambrana, M.R.: Creating corpus-based ontologies: a proposal for preparatory work. Proc. Soc. Behav. Sci. (2015). https://doi.org/10.1016/j.sbspro.2015.11.314

    Article  Google Scholar 

  5. Celko, J.: Complete Guide To NoSQL, 1st edn. Elsevier Inc., Amsterdam (2014)

    Google Scholar 

  6. Chang, F., et al.: Bigtable: a distributed storage system for structured data. Trans. Comput. Syst. 26, 1–26 (2008). https://doi.org/10.1145/1365815.1365816

    Article  Google Scholar 

  7. Chartrand, G., Zhang, P.: A First Course in Graph Theory. Dover Books on Mathematics. Dover Publications (2012). https://books.google.com.co/books?id=ocIr0RHyI8oC

  8. Coden, A., Gruhl, D., Lewis, N., Tanenblatt, M., Terdiman, J.: SPOT the drug! An unsupervised pattern matching method to extract drug names from very large clinical corpora. In: Proceedings - 2012 IEEE 2nd Conference on Healthcare Informatics, Imaging and Systems Biology, HISB 2012, pp. 33–39 (2012). https://doi.org/10.1109/HISB.2012.16

  9. Cruanes Vilas, J.: Una aproximación léxico-semántica para el mapeado automático de medicamentos y su aplicación al enriquecimiento de ontologías farmacoterapéuticas. Doctoral, Universidad de Alicante (2014). http://hdl.handle.net/10045/42146

  10. Duque, A., Martínez-Romo, J., Araujo, L.: Extracción no supervisada de relaciones entre medicamentos y efectos. Procesamiento Lenguaje Nat. 55(83–90), 1135–5948 (2015). ISSN 1135-5948

    Google Scholar 

  11. Ginn, R., et al.: Mining Twitter for adverse drug reaction mentions: a corpus and classification benchmark. In: proceedings of the Fourth Workshop on Building and Evaluating Resources for Health and Biomedical Text Processing (BioTxtM), no. 1 (2014)

    Google Scholar 

  12. Grando, A., Farrish, S., Boyd, C., Boxwala, A.: Ontological approach for safe and effective polypharmacy prescription. In: AMIA ... Annual Symposium proceedings/AMIA Symposium. AMIA Symposium 2012, pp. 291–300 (2012). http://www.pubmedcentral.nih.gov/articlerender

  13. Gruber, T.R.: A translation approach to portable ontology specifications. Knowl. Acquisit. 5(2), 199–220 (1993). https://doi.org/10.1006/knac.1993.1008, http://www.sciencedirect.com/science/article/pii/S1042814383710083

  14. Gruber, T.R.: Toward principles for the design of ontologies used for knowledge sharing. Int. J. Hum.-Comput. Stud. 43(5–6), 907–928 (1995). https://doi.org/10.1006/ijhc.1995.1081, http://www.sciencedirect.com/science/article/pii/S1071581985710816

  15. Guarino, N.: Understanding, building and using ontologies. Int. J. Hum.-Comput. Stud. 46(2), 293–310 (1997). https://doi.org/10.1006/ijhc.1996.0091, http://www.sciencedirect.com/science/article/pii/S1071581996900919

  16. Guichard, D.: An Introduction to Combinatorics and Graph Theory. Creative Commons (2016). http://www.freetechbooks.com/an-introduction-to-combinatorics-and-graph-theory-t1079.html

  17. Gurulingappa, H., Mateen-Rajpu, A., Toldo, L.: Extraction of potential adverse drug events from medical case reports. J. Biomed. Semant. 3(1), 1–10 (2012). https://doi.org/10.1186/2041-1480-3-15, http://link.springer.com/article/10.1186/2041-1480-3-15

  18. Gurulingappa, H., Rajput, A.M., Roberts, A., Fluck, J., Hofmann-Apitius, M., Toldo, L.: Development of a benchmark corpus to support the automatic extraction of drug-related adverse effects from medical case reports. J. Biomed. Inform. 45(5), 885–892 (2012). https://doi.org/10.1016/j.jbi.2012.04.008, https://dx.doi.org/10.1016/j.jbi.2012.04.008

  19. Harrison, G.: Next Generation Databases: NoSQL, NewSQL, and Big Data. Springer, New York (2015). https://doi.org/10.1007/978-1-4842-1329-2, http://link.springer.com/10.1007/978-1-4842-1329-2

  20. Herrero-Zazo, M.: Semantic resources in pharmacovigilance: a corpus and an ontology for drug-drug interactions. Ph.D. thesis, Universidad Carlos II (2015). http://sphynx.uc3m.es/lmoreno/tesisMariaHerrero.pdf

  21. Herrero-Zazo, M., Segura-Bedmar, I., Martínez, P., Declerck, T.: The DDI corpus: an annotated corpus with pharmacological substances and drug-drug interactions. J. Biomed. Inform. 46(5), 914–920 (2013). https://doi.org/10.1016/j.jbi.2013.07.011

    Article  Google Scholar 

  22. Invima, Ministerio de Salud: Manual de Normas Técnicas de Calidad. Normas de Calidad Y Guia de Analisis (2015). https://www.invima.gov.co/images/normas_tecnicas.pdf

  23. Jones, C., Waller, D.: Corpus Linguistics for Grammar. Routledge, London and New York (2015). https://doi.org/10.1017/CBO9781107415324.004

    Book  Google Scholar 

  24. Karimi, S., Metke-Jimenez, A., Kemp, M., Wang, C.: CADEC: a corpus of adverse drug event annotations. J. Biomed. Inform. 55, 73–81 (2015). https://doi.org/10.1016/j.jbi.2015.03.010, https://dx.doi.org/10.1016/j.jbi.2015.03.010

  25. Khalili, A., Sedaghati, B.: Semantic medical prescriptions - towards intelligent and interoperable medical prescriptions. In: Proceedings - 2013 IEEE 7th International Conference on Semantic Computing, ICSC 2013, pp. 347–354 (2013). https://doi.org/10.1109/ICSC.2013.66

  26. Khemmarat, S., Gao, L.: Supporting drug prescription via predictive and personalized query system. In: 2015 9th International Conference On Pervasive Computing Technologies For Healthcare (PervasiveHealth), pp. 9–16 (2015). https://doi.org/10.4108/icst.pervasivehealth.2015.259130

  27. Kostopoulos, K., Chouvarda, I., Koutkias, V., Kokonozi, A., Van Gils, M., Maglaveras, N.: An ontology-based framework aiming to support personalized exercise prescription: application in cardiac rehabilitation. In: Proceedings of the Annual International Conference of the IEEE Engineering in Medicine and Biology Society, EMBS pp. 1567–1570 (2011). https://doi.org/10.1109/IEMBS.2011.6090456

  28. Lakshman, A., Malik, P.: Cassandra - a decentralized structured storage system. In: SIGOPS (2010). https://doi.org/10.1145/1773912.1773922

  29. Matías, I., Antiñanco, J., Bazzocco, M.J.: Bases de Datos NoSQL: escalabilidad y alta disponibilidad a través de patrones de diseño. Ph.D. thesis, Universidad Nacional de La Plata (2013)

    Google Scholar 

  30. National Library of Medicine: UMLS (2016). https://www.nlm.nih.gov/research/umls/

  31. van Mulligen, E.M., et al.: The EU-ADR corpus: annotated drugs, diseases, targets, and their relationships. J. Biomed. Inform. 45(5), 879–884 (2012). https://doi.org/10.1016/j.jbi.2012.04.004, https://dx.doi.org/10.1016/j.jbi.2012.04.004

  32. Niranjanamurthy, M., Archana, U.L., Niveditha, K.T., Abdul Jafar, S., Shravan, N.S.: The research study on DynamoDB-NoSQL database service. Int. J. Comput. Sci. Mob. Comput. 3, 268–279 (2014)

    Google Scholar 

  33. Pant, G., Srinivasan, P.: Learning to crawl: comparing classification schemes. ACM Trans. Inf. Syst. 23(4), 430–462 (2005). https://doi.org/10.1145/1095872.1095875, http://doi.acm.org/10.1145/1095872.1095875

  34. Pokorny, J.: New database architectures: steps towards big data processing. In: IADIS European Conference Data Mining 2013 (2013)

    Google Scholar 

  35. Roberts, A., et al.: The CLEF corpus: semantic annotation of clinical text. In: AMIA ... Annual Symposium proceedings/AMIA Symposium. AMIA Symposium, pp. 625–629 (2007)

    Google Scholar 

  36. Roberts, A., et al.: Building a semantically annotated corpus of clinical texts. J. Biomed. Inform. 42(5), 950–966 (2009). https://doi.org/10.1016/j.jbi.2008.12.013, https://dx.doi.org/10.1016/j.jbi.2008.12.013

  37. Romá-Ferri, M.: OntoFIS: tecnología ontológica en el dominio farmacoterapéutico. Doctoral, Universidad de Alicante (2009). http://rua.ua.es/dspace/handle/10045/14216

  38. Rubrichi, S., Quaglini, S., Spengler, A., Russo, P., Gallinari, P.: A system for the extraction and representation of summary of product characteristics content. Artif. Intell. Med. 57(2), 145–154 (2013). https://doi.org/10.1016/j.artmed.2012.08.004, https://dx.doi.org/10.1016/j.artmed.2012.08.004

  39. Sánchez-Cisneros, D., Lana, S., Moreno, A., Martínez, P., Campillos, L., Segura-Bedmar, I.: Prototipo buscador de información médica en corpus multilingües y extractor de información sobre fármacos. Procesamiento Lenguaje Nat. 49, 209–212 (2012)

    Google Scholar 

  40. Senger, C., Seidling, H.M., Quinzler, R., Leser, U., Haefeli, W.E.: Design and evaluation of an ontology-based drug application database. Methods Inf. Med. 50(3), 273–284 (2011). https://doi.org/10.3414/ME10-01-0013

    Article  Google Scholar 

  41. Sohn, S., Clark, C., Halgrim, S.R., Murphy, S.P., Chute, C.G., Liu, H.: MedXN: an open source medication extraction and normalization tool for clinical text. J. Am. Med. Inform. Assoc. JAMIA 1–8 (2014). https://doi.org/10.1136/amiajnl-2013-002190, http://www.ncbi.nlm.nih.gov/pubmed/24637954

  42. UMLS: Unified Medical Language System (2016). https://www.nlm.nih.gov/research/umls/

  43. Valbuena, S.J., Londoño, J.M.: Sistemas Para Almacenar Grandes Volúmenes De Datos. Rev. Gti 13(37), 17–28 (2015)

    Google Scholar 

  44. Vázquez, E.: Prospectos medicamentosos: macroestructura comparada aplicada a la traducción (inglés\(<>\)español). Skopos (2014)

    Google Scholar 

  45. Villayandre Llamazares, M.U.D.L.: Internet como corpus: el caso de bibidí. Contextos XXI-XXII(41–44), 205–231 (2003)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sebastian Otálora .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Bravo, C., Otálora, S., Ordoñez-Salinas, S. (2023). Automatic Creation of a Pharmaceutical Corpus Based on Open-Data. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2019. Lecture Notes in Computer Science, vol 13451. Springer, Cham. https://doi.org/10.1007/978-3-031-24337-0_31

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-24337-0_31

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-24336-3

  • Online ISBN: 978-3-031-24337-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics