Evaluating the Extraction of Toxicological Properties with Extractive Question Answering | SpringerLink
Skip to main content

Evaluating the Extraction of Toxicological Properties with Extractive Question Answering

  • Conference paper
  • First Online:
Engineering Applications of Neural Networks (EANN 2023)

Abstract

Preparing toxicological analysis of chemical substances is a time-consuming process that requires a safety advisor to search text documents from multiple sources for information on several properties and experiments. There has been a growing interest in using Machine Learning (ML) approaches, specifically Natural Language Processing (NLP) Techniques to improve Human-Machine integration in processes in different areas. In this paper we explore this integration in toxicological analysis. To minimise the effort of preparing toxicological analysis of chemical substances, we explore several available neural network models tuned for Extractive Question Answering (BERT, RoBERTa, BioBERT, ChemBERT) for retrieving toxicological properties from sections of the document sources. This formulation of Information Extraction as a targeted Question Answering task can be considered as a more flexible and scalable alternative to manually creating a set of (limited) extraction patterns or even training a model for chemical relation extraction. The proposed approach was tested for a set of eight properties, each containing multiple fields, in a sample of 33 reports for which golden answers were provided by a security advisor. Compared to the golden responses, the best model tested achieved a BLEU score of 0.55. When responses from different models are combined, BLEU increases to 0.59. Our results suggest that while this approach cannot yet be fully automated, it can be useful in supporting security advisor’s decisions and reducing time and manual effort.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
JPY 3498
Price includes VAT (Japan)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
JPY 11439
Price includes VAT (Japan)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
JPY 14299
Price includes VAT (Japan)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    https://health.ec.europa.eu/scientific-committees/scientific-committee-consumer-safety-sccs_en.

  2. 2.

    https://www.industrialchemicals.gov.au/.

  3. 3.

    https://huggingface.co/.

References

  1. Arici, T., Kumar, K., Çeker, H., Saladi, A.S., Tutar, I.: Solving price per unit problem around the world: Formulating fact extraction as question answering. arXiv preprint arXiv:2204.05555 (2022)

  2. Baradaran, R., Ghiasi, R., Amirkhani, H.: A survey on machine reading comprehension systems. Nat. Lang. Eng. 28(6), 683–732 (2022). https://doi.org/10.1017/S1351324921000395

    Article  Google Scholar 

  3. Chithrananda, S., Grand, G., Ramsundar, B.: Chemberta: Large-scale self-supervised pretraining for molecular property prediction. CoRR abs/2010.09885 (2020), https://arxiv.org/abs/2010.09885

  4. Cvitaš, A.: Information extraction in business intelligence systems. In: The 33rd International Convention MIPRO, pp. 1278–1282. IEEE (2010)

    Google Scholar 

  5. Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. CoRR abs/1810.04805 (2018). https://arxiv.org/abs/1810.04805

  6. Ferreira, B.C.L., Gonçalo Oliveira, H., Amaro, H., Laranjeiro, A., Silva, C.: Question Answering For Toxicological Information Extraction. In: Cordeiro, J.a., Pereira, M.J.a., Rodrigues, N.F., Pais, S.a. (eds.) 11th Symposium on Languages, Applications and Technologies (SLATE 2022). Open Access Series in Informatics (OASIcs), vol. 104, pp. 3:1–3:10. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, Dagstuhl, Germany (2022). https://doi.org/10.4230/OASIcs.SLATE.2022.3 ,https://drops.dagstuhl.de/opus/volltexte/2022/16749

  7. Lee, J., et al.: Biobert: a pre-trained biomedical language representation model for biomedical text mining. CoRR abs/1901.08746 (2019). https://arxiv.org/abs/1901.08746

  8. Li, F., et al.: Event extraction as multi-turn question answering. In: Findings of the Association for Computational Linguistics: EMNLP 2020, pp. 829–838 (2020)

    Google Scholar 

  9. Lin, C.Y.: ROUGE: A package for automatic evaluation of summaries. In: Text Summarization Branches Out, pp. 74–81. Association for Computational Linguistics, Barcelona, Spain (Jul 2004). https://aclanthology.org/W04-1013

  10. Liu, Y., et al.: Roberta: a robustly optimized BERT pretraining approach. CoRR abs/1907.11692 (2019). https://arxiv.org/abs/1907.11692

  11. Nguyen, M.T., Le, D.T., Le, L.: Transformers-based information extraction with limited data for domain-specific business documents. Eng. Appl. Artif. Intell. 97, 104100 (2021)

    Article  Google Scholar 

  12. Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: Bleu: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, pp. 311–318. Association for Computational Linguistics, Philadelphia, Pennsylvania, USA (Jul 2002). https://doi.org/10.3115/1073083.1073135 ,https://aclanthology.org/P02-1040

  13. Vaswani, A., et al.: Attention is all you need. In: Advances in neural information processing systems, pp. 5998–6008 (2017)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bruno Carlos Luís Ferreira .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Ferreira, B.C.L., Oliveira, H.G., Amaro, H., Laranjeiro, Â., Silva, C. (2023). Evaluating the Extraction of Toxicological Properties with Extractive Question Answering. In: Iliadis, L., Maglogiannis, I., Alonso, S., Jayne, C., Pimenidis, E. (eds) Engineering Applications of Neural Networks. EANN 2023. Communications in Computer and Information Science, vol 1826. Springer, Cham. https://doi.org/10.1007/978-3-031-34204-2_48

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-34204-2_48

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-34203-5

  • Online ISBN: 978-3-031-34204-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics