Abstract
Current approaches to natural language processing of requirements documents restrict their input to documents that are relevant to specific types of models only, such as domain- or process-focused models. Such input texts do not reflect real-world requirements documents. To address this issue, we propose a pipeline for preprocessing such requirements documents at the conceptual level, for subsequent automatic generation of class, activity, and use case models in the Unified Modelling Language (UML) downstream. Our pipeline consists of three steps. Firstly, we implement entity-based extractive summarization of the raw text to enable highlighting certain parts of the requirements that are of interest to the modelling goal. Secondly, we develop a rule-based bucketing method for selecting sentences into a range of ‘buckets’ for transformation into their corresponding UML models. Finally, to prove the effectiveness of supervised machine learning models on requirements texts, a sequence labelling model is applied to the text specific for class modelling to distinguish classes and attributes in the running text. In order to enable this step of our pipeline, we address the lack of available annotated data by labelling the widely used PURE requirements dataset on a word level by tagging classes and attributes within the texts. We validate our findings using this extended dataset.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Bamman, D.: BookNLP, a natural language processing pipeline for books (2021). https://github.com/booknlp/booknlp
Ben Abdessalem Karaa, W., Ben Azzouz, Z., Singh, A., Dey, N., Ashour, A.S., Ben Ghazala, H.: Automatic builder of class diagram (ABCD): an application of UML generation from functional requirements. Softw. Pract. Exp. 46(11), 1443–1458 (2016)
Bojanowski, P., Grave, E., Joulin, A., Mikolov, T.: Enriching word vectors with subword information. arXiv preprint arXiv:1607.04606 (2016)
Brackett, J.W.: Software requirements. Technical report, Carnegie-Mellon University Software Engineering Institute (1990)
Deeptimahanti, D.K., Sanyal, R.: Semi-automatic generation of UML models from natural language requirements. In: Proceedings of the 4th India Software Engineering Conference, pp. 165–174 (2011)
Elallaoui, M., Nafil, K., Touahni, R.: Automatic transformation of user stories into UML use case diagrams using NLP techniques. Procedia Comput. Sci. 130, 42–49 (2018)
Ferrari, A., Spagnolo, G.O., Gnesi, S.: PURE: a dataset of public requirements documents. In: 2017 IEEE 25th International Requirements Engineering Conference (RE), pp. 502–505 (2017). https://doi.org/10.1109/RE.2017.29
Ferreira, R.C.B., Thom, L.H., Fantinato, M.: A semi-automatic approach to identify business process elements in natural language texts. In: ICEIS, no. 3, pp. 250–261 (2017)
Friedrich, F., Mendling, J., Puhlmann, F.: Process model generation from natural language text. In: Mouratidis, H., Rolland, C. (eds.) CAiSE 2011. LNCS, vol. 6741, pp. 482–496. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-21640-4_36
Garlan, D.: Software engineering in an uncertain world. In: Proceedings of the FSE/SDP Workshop on Future of Software Engineering Research, pp. 125–128 (2010)
Hamza, Z.A., Hammad, M.: Generating UML use case models from software requirements using natural language processing. In: 2019 8th International Conference on Modeling Simulation and Applied Optimization (ICMSAO), pp. 1–6. IEEE (2019)
Iqbal, U., Bajwa, I.S.: Generating UML activity diagram from SBVR rules. In: 2016 Sixth International Conference on Innovative Computing Technology (INTECH), pp. 216–219. IEEE (2016)
Joulin, A., Grave, E., Bojanowski, P., Douze, M., Jégou, H., Mikolov, T.: FastText.zip: compressing text classification models. arXiv preprint arXiv:1612.03651 (2016)
Kipper, K., Korhonen, A., Ryant, N., Palmer, M.: Extending VerbNet with novel verb classes. In: LREC, pp. 1027–1032 (2006)
López, J.A.H., Cuadrado, J.S.: MAR: a structure-based search engine for models. In: Proceedings of the 23rd ACM/IEEE International Conference on Model Driven Engineering Languages and Systems, pp. 57–67 (2020)
Maatuk, A.M., Abdelnabi, E.A.: Generating UML use case and activity diagrams using NLP techniques and heuristics rules. In: International Conference on Data Science, E-Learning and Information Systems 2021, pp. 271–277 (2021)
Narawita, C.R., Vidanage, K.: UMl generator-use case and class diagram generation from text requirements. Int. J. Adv. ICT Emerg. Regions (ICTer) 10, 1 (2018). https://doi.org/10.4038/icter.v10i1.7182
Nassar, I.N., Khamayseh, F.T.: Constructing activity diagrams from Arabic user requirements using natural language processing tool. In: 2015 6th International Conference on Information and Communication Systems (ICICS), pp. 50–54. IEEE (2015)
Nuseibeh, B.: Weaving together requirements and architectures. Computer 34(3), 115–119 (2001)
Overmyer, S.P., Benoit, L., Owen, R.: Conceptual modeling through linguistic analysis using LIDA. In: Proceedings of the 23rd International Conference on Software Engineering, ICSE 2001, pp. 401–410. IEEE (2001)
Paetsch, F., Eberlein, A., Maurer, F.: Requirements engineering and agile software development. In: WET ICE 2003. Proceedings. Twelfth IEEE International Workshops on Enabling Technologies: Infrastructure for Collaborative Enterprises, pp. 308–313. IEEE (2003)
Perez-Gonzalez, H.G., Kalita, J.K.: GOOAL: a graphic object oriented analysis laboratory. In: Companion of the 17th Annual ACM SIGPLAN Conference on Object-Oriented Programming, Systems, Languages, and Applications, pp. 38–39 (2002)
Petrolito, T., Bond, F.: A survey of wordnet annotated corpora. In: Proceedings of the Seventh Global WordNet Conference, pp. 236–245 (2014)
Ramackers, G., Griffioen, P., Schouten, M., Chaudron, M.: From prose to prototype: synthesising executable UML models from natural language. In: Proceedings of the 23rd ACM/IEEE International Conference on Model Driven Engineering Languages and Systems, pp. 380–389 (2021)
Tang, T.: From natural language to UML class models: an automated solution using NLP to assist requirements analysis. Master’s thesis, Leiden University (2020)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Schouten, M.B.J., Ramackers, G.J., Verberne, S. (2022). Preprocessing Requirements Documents for Automatic UML Modelling. In: Rosso, P., Basile, V., Martínez, R., Métais, E., Meziane, F. (eds) Natural Language Processing and Information Systems. NLDB 2022. Lecture Notes in Computer Science, vol 13286. Springer, Cham. https://doi.org/10.1007/978-3-031-08473-7_17
Download citation
DOI: https://doi.org/10.1007/978-3-031-08473-7_17
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-08472-0
Online ISBN: 978-3-031-08473-7
eBook Packages: Computer ScienceComputer Science (R0)