Abstract
Scientific software often involves many input and output variables. Identifying these variables is important for such software engineering tasks as metamorphic testing. To reduce the manual work, we report in this paper our investigation of machine learning algorithms in classifying variables from software’s user manuals. We identify thirteen natural-language features, and use them to develop a multi-layer solution where the first layer distinguishes variables from non-variables and the second layer classifies the variables into input and output types. Our experimental results on three scientific software systems show that random forest and feedforward neural network can be used to best implement the first layer and second layer respectively.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Abualhaija, S., Arora, C., Sabetzadeh, M., Briand, L.C., Vaz, E.: A machine learning-based approach for demarcating requirements in textual specifications. In: International Requirements Engineering Conference, pp. 51–62 (2019)
Aghajani, E., et al.: Software documentation: the practitioners’ perspective. In: International Conference on Software Engineering, pp. 590–601 (2020)
Arnold, J.G., Kiniry, J.R., Srinivasan, R., Williams, J.R., Haney, E.B., Neitsch, S.L.: Soil & Water Assessment Tool (SWAT) Input/Output Documentation (Version 2012). https://swat.tamu.edu/media/69296/swat-io-documentation-2012.pdf. Accessed 06 Mar 2022
Bhowmik, T., Niu, N., Wang, W., Cheng, J.-R.C., Li, L., Cao, X.: Optimal group size for software change tasks: a social information foraging perspective. IEEE Trans. Cybern. 46(8), 1784–1795 (2016)
Burungale, A.A., Zende, D.A.: Survey of large-scale hierarchical classification. Int. J. Eng. Res. Gen. Sci. 2(6), 917–921 (2014)
Challa, H., Niu, N., Johnson, R.: Faulty requirements made valuable: on the role of data quality in deep learning. In: International Workshop on Artificial Intelligence and Requirements Engineering, pp. 61–69 (2020)
Chattopadhyay, A., Niu, N., Peng, Z., Zhang, J.: Semantic frames for classifying temporal requirements: an exploratory study. In: Workshop on Natural Language Processing for Requirements Engineering (2021)
Chen, T.Y., Poon, P.-L., Xie, X.: METamorphic relation identification based on the category-choice framework (METRIC). J. Syst. Softw. 116, 177–190 (2016)
Clarno, K., de Almeida, V., d’Azevedo, E., de Oliveira, C., Hamilton, S.: GNES-R: global nuclear energy simulator for research task 1: high-fidelity neutron transport. In: American Nuclear Society Topical Meeting on Reactor Physics: Advances in Nuclear Analysis and Simulation (2006)
Dalpiaz, F., Dell’Anna, D., Aydemir, F.B., Çevikol, S.: Requirements classification with interpretable machine learning and dependency parsing. In: International Requirements Engineering Conference, pp. 142–152 (2019)
Fleiss, J.L., Cohen, J.: The equivalence of weighted kappa and the intraclass correlation coefficient as measures of reliability. Educ. Psychol. Measur. 33(3), 613–619 (1973)
Gudaparthi, H., Johnson, R., Challa, H., Niu, N.: Deep learning for smart sewer systems: assessing nonfunctional requirements. In: International Conference on Software Engineering: Software Engineering in Society, pp. 35–38 (2020)
Ibarguren, I., Pérez, J.M., Muguerza, J., Gurrutxaga, I., Arbelaitz, O.: Coverage-based resampling: building robust consolidated decision trees. Knowl. Based Syst. 79, 51–67 (2015)
Kanewala, U., Chen, T.Y.: Metamorphic testing: a simple yet effective approach for testing scientific software. Comput. Sci. Eng. 21(1), 66–72 (2019)
Khatwani, C., Jin, X., Niu, N., Koshoffer, A., Newman, L., Savolainen, J.: Advancing viewpoint merging in requirements engineering: a theoretical replication and explanatory study. Requir. Eng. 22(3), 317–338 (2017). https://doi.org/10.1007/s00766-017-0271-0
Li, Y., Guzman, E., Tsiamoura, K., Schneider, F., Bruegge, B.: Automated requirements extraction for scientific software. In: International Conference on Computational Science, pp. 582–591 (2015)
Lin, X., Peng, Z., Niu, N., Wang, W., Liu, H.: Finding metamorphic relations for scientific software. In: International Conference on Software Engineering (Companion Volume), pp. 254–255 (2021)
Lin, X., Simon, M., Peng, Z., Niu, N.: Discovering metamorphic relations for scientific software from user forums. Comput. Sci. Eng. 23(2), 65–72 (2021)
Lin, X., Simon, M., Niu, N.: Releasing scientific software in GitHub: a case study on SWMM2PEST. In: International Workshop on Software Engineering for Science, pp. 47–50 (2019)
Lin, X., Simon, M., Niu, N.: Scientific software testing goes serverless: creating and invoking metamorphic functions. IEEE Softw. 38(1), 61–67 (2021)
Maarek, Y.S., Berry, D.M., Kaiser, G.E.: An information retrieval approach for automatically constructing software libraries. IEEE Trans. Softw. Eng. 17(8), 800–813 (1991)
Maltbie, N., Niu, N., Van Doren, M., Johnson, R.: XAI tools in the public sector: a case study on predicting combined sewer overflows. In: ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, pp. 1032–1044 (2021)
Nguyen-Hoan, L., Flint, S., Sankaranarayana, R.: A survey of scientific software development. In: International Symposium on Empirical Software Engineering and Measurement, pp. 1–10 (2010)
Niu, N., Koshoffer, A., Newman, L., Khatwani, C., Samarasinghe, C., Savolainen, J.: Advancing repeated research in requirements engineering: a theoretical replication of viewpoint merging. In: International Requirements Engineering Conference, pp. 186–195 (2016)
Niu, N., Yu, Y., González-Baixauli, B., Ernst, N., Leite, J., Mylopoulos, J.: Aspects across software life cycle: a goal-driven approach. Trans. Aspect-Orient. Softw. Develop. V1, 83–110 (2009)
NLTK. Natural Language Toolkit. https://www.nltk.org. Accessed 06 Mar 2022
Pawlik, A., Segal, J., Petre, M.: Documentation practices in scientific software development. In: International Workshop on Cooperative and Human Aspects of Software Engineering, pp. 113–119 (2012)
Peng, Z., Kanewala, U., Niu, N.: Contextual understanding and improvement of metamorphic testing in scientific software development. In: Int. Symp. Emp. Softw. Eng. Measur. pp. 28:1–28:6 (2021)
Peng, Z., Lin, X., Niu, N.: Data of Classifying I/O Variables via Machine Learning. https://doi.org/10.7945/85j1-qf68. Accessed 06 Mar 2022
Peng, Z., Lin, X., Niu, N.: Unit tests of scientific software: a study on SWMM. In: International Conference on Computational Science, pp. 413–427 (2020)
Peng, Z., Lin, X., Niu, N., Abdul-Aziz, O.I.: I/O associations in scientific software: a study of SWMM. In: International Conference on Computational Science, pp. 375–389 (2021)
Peng, Z., Lin, X., Simon, M., Niu, N.: Unit and regression tests of scientific software: a study on SWMM. J. Comput. Sci. 53, 101347:1–101347:13 (2021)
Peng, Z., Niu, N.: Co-AI: a Colab-based tool for abstraction identification. In: International Requirements Engineering Conference, pp. 420–421 (2021)
Rossman, L.A.: Storm Water Management Model User’s Manual Version 5.1. https://www.epa.gov/water-research/storm-water-management-model-swmm-version-51-users-manual. Accessed 06 Mar 2022
Sanders, R., Kelly, D.: Dealing with risk in scientific software development. IEEE Softw. 25(4), 21–28 (2008)
Scikit-learn. Machine Learning in Python. https://scikit-learn.org/stable/ Accessed 06 Mar 2022
Spikerog SAS. ExtractPDF. https://www.extractpdf.com. Accessed 06 Mar 2022
Suthaharan, S.: Machine Learning Models and Algorithms for Big Data Classification. ISIS, vol. 36. Springer, Boston (2016). https://doi.org/10.1007/978-1-4899-7641-3
TextBlob. Simplified Text Processing. https://textblob.readthedocs.io. Accessed 06 Mar 2022
United States Department of Agriculture. Soil & Water Assessment Tool (SWAT). https://data.nal.usda.gov/dataset/swat-soil-and-water-assessment-tool. Accessed 06 Mar 2022
United States Department of the Interior & United States Geological Survey. Modular Hydrologic Model (MODFLOW) Description of Input and Output (Version 6.0.0). https://water.usgs.gov/ogw/modflow/mf6io.pdf. Accessed 06 Mar 2022
United States Environmental Protection Agency. Agency-wide Quality System Documents. https://www.epa.gov/quality/agency-wide-quality-system-documents. Accessed 06 Mar 2022
United States Environmental Protection Agency. Storm Water Management Model (SWMM). https://www.epa.gov/water-research/storm-water-management-model-swmm. Accessed 06 Mar 2022
United States Geological Survey. Modular Hydrologic Model (MODFLOW). https://www.usgs.gov/software/software-modflow. Accessed 06 Mar 2022
United States Geological Survey. Review and Approval of Scientific Software for Release (IM OSQI 2019–01). https://www.usgs.gov/about/organization/science-support/survey-manual/im-osqi-2019-01-review-and-approval-scientific. Accessed 06 Mar 2022
Vilkomir, S.A., Swain, W.T., Poore, J.H., Clarno, K.T.: Modeling input space for testing scientific computational software: a case study. In: International Conference on Computational Science, pp. 291–300 (2008)
Wang, W., Niu, N., Liu, H., Niu, Z.: Enhancing automated requirements traceability by resolving polysemy. In: International Requirements Engineering Conference, pp. 40–51 (2018)
Wikipedia. Storm Water Management Model. https://en.wikipedia.org/wiki/Storm_Water_Management_Model. Accessed 06 Mar 2022
Witten, I.H., Frank, E., Hall, M.A.: Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann (2016)
Zhou, Z., Xiang, S., Chen, T.Y.: Metamorphic testing for software quality assessment: a study of search engines. IEEE Trans. Softw. Eng. 42(3), 264–284 (2016)
Acknowledgments
We thank the EPA SWMM team, especially Michelle Simon, for the research collaborations. We also thank the anonymous reviewers for their constructive comments.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Peng, Z., Lin, X., Santhoshkumar, S.N., Niu, N., Kanewala, U. (2022). Learning I/O Variables from Scientific Software’s User Manuals. In: Groen, D., de Mulatier, C., Paszynski, M., Krzhizhanovskaya, V.V., Dongarra, J.J., Sloot, P.M.A. (eds) Computational Science – ICCS 2022. ICCS 2022. Lecture Notes in Computer Science, vol 13353. Springer, Cham. https://doi.org/10.1007/978-3-031-08760-8_42
Download citation
DOI: https://doi.org/10.1007/978-3-031-08760-8_42
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-08759-2
Online ISBN: 978-3-031-08760-8
eBook Packages: Computer ScienceComputer Science (R0)