Preprocessing Requirements Documents for Automatic UML Modelling | SpringerLink
Skip to main content

Preprocessing Requirements Documents for Automatic UML Modelling

  • Conference paper
  • First Online:
Natural Language Processing and Information Systems (NLDB 2022)

Abstract

Current approaches to natural language processing of requirements documents restrict their input to documents that are relevant to specific types of models only, such as domain- or process-focused models. Such input texts do not reflect real-world requirements documents. To address this issue, we propose a pipeline for preprocessing such requirements documents at the conceptual level, for subsequent automatic generation of class, activity, and use case models in the Unified Modelling Language (UML) downstream. Our pipeline consists of three steps. Firstly, we implement entity-based extractive summarization of the raw text to enable highlighting certain parts of the requirements that are of interest to the modelling goal. Secondly, we develop a rule-based bucketing method for selecting sentences into a range of ‘buckets’ for transformation into their corresponding UML models. Finally, to prove the effectiveness of supervised machine learning models on requirements texts, a sequence labelling model is applied to the text specific for class modelling to distinguish classes and attributes in the running text. In order to enable this step of our pipeline, we address the lack of available annotated data by labelling the widely used PURE requirements dataset on a word level by tagging classes and attributes within the texts. We validate our findings using this extended dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
JPY 3498
Price includes VAT (Japan)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
JPY 11439
Price includes VAT (Japan)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
JPY 14299
Price includes VAT (Japan)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Bamman, D.: BookNLP, a natural language processing pipeline for books (2021). https://github.com/booknlp/booknlp

  2. Ben Abdessalem Karaa, W., Ben Azzouz, Z., Singh, A., Dey, N., Ashour, A.S., Ben Ghazala, H.: Automatic builder of class diagram (ABCD): an application of UML generation from functional requirements. Softw. Pract. Exp. 46(11), 1443–1458 (2016)

    Google Scholar 

  3. Bojanowski, P., Grave, E., Joulin, A., Mikolov, T.: Enriching word vectors with subword information. arXiv preprint arXiv:1607.04606 (2016)

  4. Brackett, J.W.: Software requirements. Technical report, Carnegie-Mellon University Software Engineering Institute (1990)

    Google Scholar 

  5. Deeptimahanti, D.K., Sanyal, R.: Semi-automatic generation of UML models from natural language requirements. In: Proceedings of the 4th India Software Engineering Conference, pp. 165–174 (2011)

    Google Scholar 

  6. Elallaoui, M., Nafil, K., Touahni, R.: Automatic transformation of user stories into UML use case diagrams using NLP techniques. Procedia Comput. Sci. 130, 42–49 (2018)

    Article  Google Scholar 

  7. Ferrari, A., Spagnolo, G.O., Gnesi, S.: PURE: a dataset of public requirements documents. In: 2017 IEEE 25th International Requirements Engineering Conference (RE), pp. 502–505 (2017). https://doi.org/10.1109/RE.2017.29

  8. Ferreira, R.C.B., Thom, L.H., Fantinato, M.: A semi-automatic approach to identify business process elements in natural language texts. In: ICEIS, no. 3, pp. 250–261 (2017)

    Google Scholar 

  9. Friedrich, F., Mendling, J., Puhlmann, F.: Process model generation from natural language text. In: Mouratidis, H., Rolland, C. (eds.) CAiSE 2011. LNCS, vol. 6741, pp. 482–496. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-21640-4_36

    Chapter  Google Scholar 

  10. Garlan, D.: Software engineering in an uncertain world. In: Proceedings of the FSE/SDP Workshop on Future of Software Engineering Research, pp. 125–128 (2010)

    Google Scholar 

  11. Hamza, Z.A., Hammad, M.: Generating UML use case models from software requirements using natural language processing. In: 2019 8th International Conference on Modeling Simulation and Applied Optimization (ICMSAO), pp. 1–6. IEEE (2019)

    Google Scholar 

  12. Iqbal, U., Bajwa, I.S.: Generating UML activity diagram from SBVR rules. In: 2016 Sixth International Conference on Innovative Computing Technology (INTECH), pp. 216–219. IEEE (2016)

    Google Scholar 

  13. Joulin, A., Grave, E., Bojanowski, P., Douze, M., Jégou, H., Mikolov, T.: FastText.zip: compressing text classification models. arXiv preprint arXiv:1612.03651 (2016)

  14. Kipper, K., Korhonen, A., Ryant, N., Palmer, M.: Extending VerbNet with novel verb classes. In: LREC, pp. 1027–1032 (2006)

    Google Scholar 

  15. López, J.A.H., Cuadrado, J.S.: MAR: a structure-based search engine for models. In: Proceedings of the 23rd ACM/IEEE International Conference on Model Driven Engineering Languages and Systems, pp. 57–67 (2020)

    Google Scholar 

  16. Maatuk, A.M., Abdelnabi, E.A.: Generating UML use case and activity diagrams using NLP techniques and heuristics rules. In: International Conference on Data Science, E-Learning and Information Systems 2021, pp. 271–277 (2021)

    Google Scholar 

  17. Narawita, C.R., Vidanage, K.: UMl generator-use case and class diagram generation from text requirements. Int. J. Adv. ICT Emerg. Regions (ICTer) 10, 1 (2018). https://doi.org/10.4038/icter.v10i1.7182

  18. Nassar, I.N., Khamayseh, F.T.: Constructing activity diagrams from Arabic user requirements using natural language processing tool. In: 2015 6th International Conference on Information and Communication Systems (ICICS), pp. 50–54. IEEE (2015)

    Google Scholar 

  19. Nuseibeh, B.: Weaving together requirements and architectures. Computer 34(3), 115–119 (2001)

    Article  Google Scholar 

  20. Overmyer, S.P., Benoit, L., Owen, R.: Conceptual modeling through linguistic analysis using LIDA. In: Proceedings of the 23rd International Conference on Software Engineering, ICSE 2001, pp. 401–410. IEEE (2001)

    Google Scholar 

  21. Paetsch, F., Eberlein, A., Maurer, F.: Requirements engineering and agile software development. In: WET ICE 2003. Proceedings. Twelfth IEEE International Workshops on Enabling Technologies: Infrastructure for Collaborative Enterprises, pp. 308–313. IEEE (2003)

    Google Scholar 

  22. Perez-Gonzalez, H.G., Kalita, J.K.: GOOAL: a graphic object oriented analysis laboratory. In: Companion of the 17th Annual ACM SIGPLAN Conference on Object-Oriented Programming, Systems, Languages, and Applications, pp. 38–39 (2002)

    Google Scholar 

  23. Petrolito, T., Bond, F.: A survey of wordnet annotated corpora. In: Proceedings of the Seventh Global WordNet Conference, pp. 236–245 (2014)

    Google Scholar 

  24. Ramackers, G., Griffioen, P., Schouten, M., Chaudron, M.: From prose to prototype: synthesising executable UML models from natural language. In: Proceedings of the 23rd ACM/IEEE International Conference on Model Driven Engineering Languages and Systems, pp. 380–389 (2021)

    Google Scholar 

  25. Tang, T.: From natural language to UML class models: an automated solution using NLP to assist requirements analysis. Master’s thesis, Leiden University (2020)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Martijn B. J. Schouten .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Schouten, M.B.J., Ramackers, G.J., Verberne, S. (2022). Preprocessing Requirements Documents for Automatic UML Modelling. In: Rosso, P., Basile, V., Martínez, R., Métais, E., Meziane, F. (eds) Natural Language Processing and Information Systems. NLDB 2022. Lecture Notes in Computer Science, vol 13286. Springer, Cham. https://doi.org/10.1007/978-3-031-08473-7_17

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-08473-7_17

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-08472-0

  • Online ISBN: 978-3-031-08473-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics