Abstract
A large amount of information related to the pharmaceutical industry is published through the web, but there are few tools that allow for automatic analysis. The pharmaceutical area at international level, especially for English language, has a large number of virtual lexical tools, that support not only the research but the software development, while for Spanish language, are few lexical tools and less for the particularities of a country as Colombia. This paper presents the pharmaceutical corpus generation based on open data of Colombian medicines published monthly by the National Institute of Medicines and Food Surveillance (INVIMA). A model has been developed that combines the concepts of corpus and ontology, and the model is structured through a multi-related graph. This model is implemented in a graph-oriented database, because it has been shown to manage this type of structures and since they are based on a mathematical theory, graph-oriented database allows to find patterns and relationships that would otherwise not be possible. For the creation of the corpus, a Crawler was developed to download and control the documents, and through text processing and the proper algorithm are stored in the graph-oriented database (Neo4j).
Universidad Distrital Francisco José de Caldas.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
Neo Technology, Inc. 2017. “Neo4j”. https://neo4j.com/.
References
Alexander, L., Allen, S., Bindoff, N.L.: MongoDB Applied Design Patterns. O’Reilly (2013). https://doi.org/10.1017/CBO9781107415324.004
Anderson, J.C., Lehnardt, J., Slater, N.: CouchDB: The Definitive Guide (2010)
Balakrishnan, R., Ranganathan, K.: A Textbook of Graph Theory. Universitext (1979). Springer (2000). https://books.google.com.co/books?id=ERgLpQgQx4cC
Bautista-Zambrana, M.R.: Creating corpus-based ontologies: a proposal for preparatory work. Proc. Soc. Behav. Sci. (2015). https://doi.org/10.1016/j.sbspro.2015.11.314
Celko, J.: Complete Guide To NoSQL, 1st edn. Elsevier Inc., Amsterdam (2014)
Chang, F., et al.: Bigtable: a distributed storage system for structured data. Trans. Comput. Syst. 26, 1–26 (2008). https://doi.org/10.1145/1365815.1365816
Chartrand, G., Zhang, P.: A First Course in Graph Theory. Dover Books on Mathematics. Dover Publications (2012). https://books.google.com.co/books?id=ocIr0RHyI8oC
Coden, A., Gruhl, D., Lewis, N., Tanenblatt, M., Terdiman, J.: SPOT the drug! An unsupervised pattern matching method to extract drug names from very large clinical corpora. In: Proceedings - 2012 IEEE 2nd Conference on Healthcare Informatics, Imaging and Systems Biology, HISB 2012, pp. 33–39 (2012). https://doi.org/10.1109/HISB.2012.16
Cruanes Vilas, J.: Una aproximación léxico-semántica para el mapeado automático de medicamentos y su aplicación al enriquecimiento de ontologías farmacoterapéuticas. Doctoral, Universidad de Alicante (2014). http://hdl.handle.net/10045/42146
Duque, A., Martínez-Romo, J., Araujo, L.: Extracción no supervisada de relaciones entre medicamentos y efectos. Procesamiento Lenguaje Nat. 55(83–90), 1135–5948 (2015). ISSN 1135-5948
Ginn, R., et al.: Mining Twitter for adverse drug reaction mentions: a corpus and classification benchmark. In: proceedings of the Fourth Workshop on Building and Evaluating Resources for Health and Biomedical Text Processing (BioTxtM), no. 1 (2014)
Grando, A., Farrish, S., Boyd, C., Boxwala, A.: Ontological approach for safe and effective polypharmacy prescription. In: AMIA ... Annual Symposium proceedings/AMIA Symposium. AMIA Symposium 2012, pp. 291–300 (2012). http://www.pubmedcentral.nih.gov/articlerender
Gruber, T.R.: A translation approach to portable ontology specifications. Knowl. Acquisit. 5(2), 199–220 (1993). https://doi.org/10.1006/knac.1993.1008, http://www.sciencedirect.com/science/article/pii/S1042814383710083
Gruber, T.R.: Toward principles for the design of ontologies used for knowledge sharing. Int. J. Hum.-Comput. Stud. 43(5–6), 907–928 (1995). https://doi.org/10.1006/ijhc.1995.1081, http://www.sciencedirect.com/science/article/pii/S1071581985710816
Guarino, N.: Understanding, building and using ontologies. Int. J. Hum.-Comput. Stud. 46(2), 293–310 (1997). https://doi.org/10.1006/ijhc.1996.0091, http://www.sciencedirect.com/science/article/pii/S1071581996900919
Guichard, D.: An Introduction to Combinatorics and Graph Theory. Creative Commons (2016). http://www.freetechbooks.com/an-introduction-to-combinatorics-and-graph-theory-t1079.html
Gurulingappa, H., Mateen-Rajpu, A., Toldo, L.: Extraction of potential adverse drug events from medical case reports. J. Biomed. Semant. 3(1), 1–10 (2012). https://doi.org/10.1186/2041-1480-3-15, http://link.springer.com/article/10.1186/2041-1480-3-15
Gurulingappa, H., Rajput, A.M., Roberts, A., Fluck, J., Hofmann-Apitius, M., Toldo, L.: Development of a benchmark corpus to support the automatic extraction of drug-related adverse effects from medical case reports. J. Biomed. Inform. 45(5), 885–892 (2012). https://doi.org/10.1016/j.jbi.2012.04.008, https://dx.doi.org/10.1016/j.jbi.2012.04.008
Harrison, G.: Next Generation Databases: NoSQL, NewSQL, and Big Data. Springer, New York (2015). https://doi.org/10.1007/978-1-4842-1329-2, http://link.springer.com/10.1007/978-1-4842-1329-2
Herrero-Zazo, M.: Semantic resources in pharmacovigilance: a corpus and an ontology for drug-drug interactions. Ph.D. thesis, Universidad Carlos II (2015). http://sphynx.uc3m.es/lmoreno/tesisMariaHerrero.pdf
Herrero-Zazo, M., Segura-Bedmar, I., Martínez, P., Declerck, T.: The DDI corpus: an annotated corpus with pharmacological substances and drug-drug interactions. J. Biomed. Inform. 46(5), 914–920 (2013). https://doi.org/10.1016/j.jbi.2013.07.011
Invima, Ministerio de Salud: Manual de Normas Técnicas de Calidad. Normas de Calidad Y Guia de Analisis (2015). https://www.invima.gov.co/images/normas_tecnicas.pdf
Jones, C., Waller, D.: Corpus Linguistics for Grammar. Routledge, London and New York (2015). https://doi.org/10.1017/CBO9781107415324.004
Karimi, S., Metke-Jimenez, A., Kemp, M., Wang, C.: CADEC: a corpus of adverse drug event annotations. J. Biomed. Inform. 55, 73–81 (2015). https://doi.org/10.1016/j.jbi.2015.03.010, https://dx.doi.org/10.1016/j.jbi.2015.03.010
Khalili, A., Sedaghati, B.: Semantic medical prescriptions - towards intelligent and interoperable medical prescriptions. In: Proceedings - 2013 IEEE 7th International Conference on Semantic Computing, ICSC 2013, pp. 347–354 (2013). https://doi.org/10.1109/ICSC.2013.66
Khemmarat, S., Gao, L.: Supporting drug prescription via predictive and personalized query system. In: 2015 9th International Conference On Pervasive Computing Technologies For Healthcare (PervasiveHealth), pp. 9–16 (2015). https://doi.org/10.4108/icst.pervasivehealth.2015.259130
Kostopoulos, K., Chouvarda, I., Koutkias, V., Kokonozi, A., Van Gils, M., Maglaveras, N.: An ontology-based framework aiming to support personalized exercise prescription: application in cardiac rehabilitation. In: Proceedings of the Annual International Conference of the IEEE Engineering in Medicine and Biology Society, EMBS pp. 1567–1570 (2011). https://doi.org/10.1109/IEMBS.2011.6090456
Lakshman, A., Malik, P.: Cassandra - a decentralized structured storage system. In: SIGOPS (2010). https://doi.org/10.1145/1773912.1773922
Matías, I., Antiñanco, J., Bazzocco, M.J.: Bases de Datos NoSQL: escalabilidad y alta disponibilidad a través de patrones de diseño. Ph.D. thesis, Universidad Nacional de La Plata (2013)
National Library of Medicine: UMLS (2016). https://www.nlm.nih.gov/research/umls/
van Mulligen, E.M., et al.: The EU-ADR corpus: annotated drugs, diseases, targets, and their relationships. J. Biomed. Inform. 45(5), 879–884 (2012). https://doi.org/10.1016/j.jbi.2012.04.004, https://dx.doi.org/10.1016/j.jbi.2012.04.004
Niranjanamurthy, M., Archana, U.L., Niveditha, K.T., Abdul Jafar, S., Shravan, N.S.: The research study on DynamoDB-NoSQL database service. Int. J. Comput. Sci. Mob. Comput. 3, 268–279 (2014)
Pant, G., Srinivasan, P.: Learning to crawl: comparing classification schemes. ACM Trans. Inf. Syst. 23(4), 430–462 (2005). https://doi.org/10.1145/1095872.1095875, http://doi.acm.org/10.1145/1095872.1095875
Pokorny, J.: New database architectures: steps towards big data processing. In: IADIS European Conference Data Mining 2013 (2013)
Roberts, A., et al.: The CLEF corpus: semantic annotation of clinical text. In: AMIA ... Annual Symposium proceedings/AMIA Symposium. AMIA Symposium, pp. 625–629 (2007)
Roberts, A., et al.: Building a semantically annotated corpus of clinical texts. J. Biomed. Inform. 42(5), 950–966 (2009). https://doi.org/10.1016/j.jbi.2008.12.013, https://dx.doi.org/10.1016/j.jbi.2008.12.013
Romá-Ferri, M.: OntoFIS: tecnología ontológica en el dominio farmacoterapéutico. Doctoral, Universidad de Alicante (2009). http://rua.ua.es/dspace/handle/10045/14216
Rubrichi, S., Quaglini, S., Spengler, A., Russo, P., Gallinari, P.: A system for the extraction and representation of summary of product characteristics content. Artif. Intell. Med. 57(2), 145–154 (2013). https://doi.org/10.1016/j.artmed.2012.08.004, https://dx.doi.org/10.1016/j.artmed.2012.08.004
Sánchez-Cisneros, D., Lana, S., Moreno, A., Martínez, P., Campillos, L., Segura-Bedmar, I.: Prototipo buscador de información médica en corpus multilingües y extractor de información sobre fármacos. Procesamiento Lenguaje Nat. 49, 209–212 (2012)
Senger, C., Seidling, H.M., Quinzler, R., Leser, U., Haefeli, W.E.: Design and evaluation of an ontology-based drug application database. Methods Inf. Med. 50(3), 273–284 (2011). https://doi.org/10.3414/ME10-01-0013
Sohn, S., Clark, C., Halgrim, S.R., Murphy, S.P., Chute, C.G., Liu, H.: MedXN: an open source medication extraction and normalization tool for clinical text. J. Am. Med. Inform. Assoc. JAMIA 1–8 (2014). https://doi.org/10.1136/amiajnl-2013-002190, http://www.ncbi.nlm.nih.gov/pubmed/24637954
UMLS: Unified Medical Language System (2016). https://www.nlm.nih.gov/research/umls/
Valbuena, S.J., Londoño, J.M.: Sistemas Para Almacenar Grandes Volúmenes De Datos. Rev. Gti 13(37), 17–28 (2015)
Vázquez, E.: Prospectos medicamentosos: macroestructura comparada aplicada a la traducción (inglés\(<>\)español). Skopos (2014)
Villayandre Llamazares, M.U.D.L.: Internet como corpus: el caso de bibidí. Contextos XXI-XXII(41–44), 205–231 (2003)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 Springer Nature Switzerland AG
About this paper
Cite this paper
Bravo, C., Otálora, S., Ordoñez-Salinas, S. (2023). Automatic Creation of a Pharmaceutical Corpus Based on Open-Data. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2019. Lecture Notes in Computer Science, vol 13451. Springer, Cham. https://doi.org/10.1007/978-3-031-24337-0_31
Download citation
DOI: https://doi.org/10.1007/978-3-031-24337-0_31
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-24336-3
Online ISBN: 978-3-031-24337-0
eBook Packages: Computer ScienceComputer Science (R0)