Learning I/O Variables from Scientific Software’s User Manuals

Peng, Zedong; Lin, Xuanyi; Santhoshkumar, Sreelekhaa Nagamalli; Niu, Nan; Kanewala, Upulee

doi:10.1007/978-3-031-08760-8_42

Zedong Peng¹³,
Xuanyi Lin¹⁴,
Sreelekhaa Nagamalli Santhoshkumar¹³,
Nan Niu¹³ &
…
Upulee Kanewala¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13353))

Included in the following conference series:

International Conference on Computational Science

1403 Accesses

Abstract

Scientific software often involves many input and output variables. Identifying these variables is important for such software engineering tasks as metamorphic testing. To reduce the manual work, we report in this paper our investigation of machine learning algorithms in classifying variables from software’s user manuals. We identify thirteen natural-language features, and use them to develop a multi-layer solution where the first layer distinguishes variables from non-variables and the second layer classifies the variables into input and output types. Our experimental results on three scientific software systems show that random forest and feedforward neural network can be used to best implement the first layer and second layer respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: JPY 3498; Price includes VAT (Japan)

eBook: JPY 13727; Price includes VAT (Japan)

Softcover Book: JPY 17159; Price includes VAT (Japan)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

An Improved and Optimized Random Forest Based Approach to Predict the Software Faults

Article Open access 09 May 2024

Software Fault Prediction Using Particle Swarm Optimization and Random Forest

Software Fault Prediction Using Random Forests

References

Abualhaija, S., Arora, C., Sabetzadeh, M., Briand, L.C., Vaz, E.: A machine learning-based approach for demarcating requirements in textual specifications. In: International Requirements Engineering Conference, pp. 51–62 (2019)
Google Scholar
Aghajani, E., et al.: Software documentation: the practitioners’ perspective. In: International Conference on Software Engineering, pp. 590–601 (2020)
Google Scholar
Arnold, J.G., Kiniry, J.R., Srinivasan, R., Williams, J.R., Haney, E.B., Neitsch, S.L.: Soil & Water Assessment Tool (SWAT) Input/Output Documentation (Version 2012). https://swat.tamu.edu/media/69296/swat-io-documentation-2012.pdf. Accessed 06 Mar 2022
Bhowmik, T., Niu, N., Wang, W., Cheng, J.-R.C., Li, L., Cao, X.: Optimal group size for software change tasks: a social information foraging perspective. IEEE Trans. Cybern. 46(8), 1784–1795 (2016)
Article Google Scholar
Burungale, A.A., Zende, D.A.: Survey of large-scale hierarchical classification. Int. J. Eng. Res. Gen. Sci. 2(6), 917–921 (2014)
Google Scholar
Challa, H., Niu, N., Johnson, R.: Faulty requirements made valuable: on the role of data quality in deep learning. In: International Workshop on Artificial Intelligence and Requirements Engineering, pp. 61–69 (2020)
Google Scholar
Chattopadhyay, A., Niu, N., Peng, Z., Zhang, J.: Semantic frames for classifying temporal requirements: an exploratory study. In: Workshop on Natural Language Processing for Requirements Engineering (2021)
Google Scholar
Chen, T.Y., Poon, P.-L., Xie, X.: METamorphic relation identification based on the category-choice framework (METRIC). J. Syst. Softw. 116, 177–190 (2016)
Google Scholar
Clarno, K., de Almeida, V., d’Azevedo, E., de Oliveira, C., Hamilton, S.: GNES-R: global nuclear energy simulator for research task 1: high-fidelity neutron transport. In: American Nuclear Society Topical Meeting on Reactor Physics: Advances in Nuclear Analysis and Simulation (2006)
Google Scholar
Dalpiaz, F., Dell’Anna, D., Aydemir, F.B., Çevikol, S.: Requirements classification with interpretable machine learning and dependency parsing. In: International Requirements Engineering Conference, pp. 142–152 (2019)
Google Scholar
Fleiss, J.L., Cohen, J.: The equivalence of weighted kappa and the intraclass correlation coefficient as measures of reliability. Educ. Psychol. Measur. 33(3), 613–619 (1973)
Article Google Scholar
Gudaparthi, H., Johnson, R., Challa, H., Niu, N.: Deep learning for smart sewer systems: assessing nonfunctional requirements. In: International Conference on Software Engineering: Software Engineering in Society, pp. 35–38 (2020)
Google Scholar
Ibarguren, I., Pérez, J.M., Muguerza, J., Gurrutxaga, I., Arbelaitz, O.: Coverage-based resampling: building robust consolidated decision trees. Knowl. Based Syst. 79, 51–67 (2015)
Article Google Scholar
Kanewala, U., Chen, T.Y.: Metamorphic testing: a simple yet effective approach for testing scientific software. Comput. Sci. Eng. 21(1), 66–72 (2019)
Article Google Scholar
Khatwani, C., Jin, X., Niu, N., Koshoffer, A., Newman, L., Savolainen, J.: Advancing viewpoint merging in requirements engineering: a theoretical replication and explanatory study. Requir. Eng. 22(3), 317–338 (2017). https://doi.org/10.1007/s00766-017-0271-0
Article Google Scholar
Li, Y., Guzman, E., Tsiamoura, K., Schneider, F., Bruegge, B.: Automated requirements extraction for scientific software. In: International Conference on Computational Science, pp. 582–591 (2015)
Google Scholar
Lin, X., Peng, Z., Niu, N., Wang, W., Liu, H.: Finding metamorphic relations for scientific software. In: International Conference on Software Engineering (Companion Volume), pp. 254–255 (2021)
Google Scholar
Lin, X., Simon, M., Peng, Z., Niu, N.: Discovering metamorphic relations for scientific software from user forums. Comput. Sci. Eng. 23(2), 65–72 (2021)
Article Google Scholar
Lin, X., Simon, M., Niu, N.: Releasing scientific software in GitHub: a case study on SWMM2PEST. In: International Workshop on Software Engineering for Science, pp. 47–50 (2019)
Google Scholar
Lin, X., Simon, M., Niu, N.: Scientific software testing goes serverless: creating and invoking metamorphic functions. IEEE Softw. 38(1), 61–67 (2021)
Article Google Scholar
Maarek, Y.S., Berry, D.M., Kaiser, G.E.: An information retrieval approach for automatically constructing software libraries. IEEE Trans. Softw. Eng. 17(8), 800–813 (1991)
Article Google Scholar
Maltbie, N., Niu, N., Van Doren, M., Johnson, R.: XAI tools in the public sector: a case study on predicting combined sewer overflows. In: ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, pp. 1032–1044 (2021)
Google Scholar
Nguyen-Hoan, L., Flint, S., Sankaranarayana, R.: A survey of scientific software development. In: International Symposium on Empirical Software Engineering and Measurement, pp. 1–10 (2010)
Google Scholar
Niu, N., Koshoffer, A., Newman, L., Khatwani, C., Samarasinghe, C., Savolainen, J.: Advancing repeated research in requirements engineering: a theoretical replication of viewpoint merging. In: International Requirements Engineering Conference, pp. 186–195 (2016)
Google Scholar
Niu, N., Yu, Y., González-Baixauli, B., Ernst, N., Leite, J., Mylopoulos, J.: Aspects across software life cycle: a goal-driven approach. Trans. Aspect-Orient. Softw. Develop. V1, 83–110 (2009)
Article Google Scholar
NLTK. Natural Language Toolkit. https://www.nltk.org. Accessed 06 Mar 2022
Pawlik, A., Segal, J., Petre, M.: Documentation practices in scientific software development. In: International Workshop on Cooperative and Human Aspects of Software Engineering, pp. 113–119 (2012)
Google Scholar
Peng, Z., Kanewala, U., Niu, N.: Contextual understanding and improvement of metamorphic testing in scientific software development. In: Int. Symp. Emp. Softw. Eng. Measur. pp. 28:1–28:6 (2021)
Google Scholar
Peng, Z., Lin, X., Niu, N.: Data of Classifying I/O Variables via Machine Learning. https://doi.org/10.7945/85j1-qf68. Accessed 06 Mar 2022
Peng, Z., Lin, X., Niu, N.: Unit tests of scientific software: a study on SWMM. In: International Conference on Computational Science, pp. 413–427 (2020)
Google Scholar
Peng, Z., Lin, X., Niu, N., Abdul-Aziz, O.I.: I/O associations in scientific software: a study of SWMM. In: International Conference on Computational Science, pp. 375–389 (2021)
Google Scholar
Peng, Z., Lin, X., Simon, M., Niu, N.: Unit and regression tests of scientific software: a study on SWMM. J. Comput. Sci. 53, 101347:1–101347:13 (2021)
Google Scholar
Peng, Z., Niu, N.: Co-AI: a Colab-based tool for abstraction identification. In: International Requirements Engineering Conference, pp. 420–421 (2021)
Google Scholar
Rossman, L.A.: Storm Water Management Model User’s Manual Version 5.1. https://www.epa.gov/water-research/storm-water-management-model-swmm-version-51-users-manual. Accessed 06 Mar 2022
Sanders, R., Kelly, D.: Dealing with risk in scientific software development. IEEE Softw. 25(4), 21–28 (2008)
Article Google Scholar
Scikit-learn. Machine Learning in Python. https://scikit-learn.org/stable/ Accessed 06 Mar 2022
Spikerog SAS. ExtractPDF. https://www.extractpdf.com. Accessed 06 Mar 2022
Suthaharan, S.: Machine Learning Models and Algorithms for Big Data Classification. ISIS, vol. 36. Springer, Boston (2016). https://doi.org/10.1007/978-1-4899-7641-3
Book MATH Google Scholar
TextBlob. Simplified Text Processing. https://textblob.readthedocs.io. Accessed 06 Mar 2022
United States Department of Agriculture. Soil & Water Assessment Tool (SWAT). https://data.nal.usda.gov/dataset/swat-soil-and-water-assessment-tool. Accessed 06 Mar 2022
United States Department of the Interior & United States Geological Survey. Modular Hydrologic Model (MODFLOW) Description of Input and Output (Version 6.0.0). https://water.usgs.gov/ogw/modflow/mf6io.pdf. Accessed 06 Mar 2022
United States Environmental Protection Agency. Agency-wide Quality System Documents. https://www.epa.gov/quality/agency-wide-quality-system-documents. Accessed 06 Mar 2022
United States Environmental Protection Agency. Storm Water Management Model (SWMM). https://www.epa.gov/water-research/storm-water-management-model-swmm. Accessed 06 Mar 2022
United States Geological Survey. Modular Hydrologic Model (MODFLOW). https://www.usgs.gov/software/software-modflow. Accessed 06 Mar 2022
United States Geological Survey. Review and Approval of Scientific Software for Release (IM OSQI 2019–01). https://www.usgs.gov/about/organization/science-support/survey-manual/im-osqi-2019-01-review-and-approval-scientific. Accessed 06 Mar 2022
Vilkomir, S.A., Swain, W.T., Poore, J.H., Clarno, K.T.: Modeling input space for testing scientific computational software: a case study. In: International Conference on Computational Science, pp. 291–300 (2008)
Google Scholar
Wang, W., Niu, N., Liu, H., Niu, Z.: Enhancing automated requirements traceability by resolving polysemy. In: International Requirements Engineering Conference, pp. 40–51 (2018)
Google Scholar
Wikipedia. Storm Water Management Model. https://en.wikipedia.org/wiki/Storm_Water_Management_Model. Accessed 06 Mar 2022
Witten, I.H., Frank, E., Hall, M.A.: Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann (2016)
Google Scholar
Zhou, Z., Xiang, S., Chen, T.Y.: Metamorphic testing for software quality assessment: a study of search engines. IEEE Trans. Softw. Eng. 42(3), 264–284 (2016)
Article Google Scholar

Download references

Acknowledgments

We thank the EPA SWMM team, especially Michelle Simon, for the research collaborations. We also thank the anonymous reviewers for their constructive comments.

Author information

Authors and Affiliations

University of Cincinnati, Cincinnati, OH, 45221, USA
Zedong Peng, Sreelekhaa Nagamalli Santhoshkumar & Nan Niu
Oracle America, Inc., Redwood Shores, CA, 94065, USA
Xuanyi Lin
University of North Florida, Jacksonville, FL, 32224, USA
Upulee Kanewala

Authors

Zedong Peng
View author publications
You can also search for this author in PubMed Google Scholar
Xuanyi Lin
View author publications
You can also search for this author in PubMed Google Scholar
Sreelekhaa Nagamalli Santhoshkumar
View author publications
You can also search for this author in PubMed Google Scholar
Nan Niu
View author publications
You can also search for this author in PubMed Google Scholar
Upulee Kanewala
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Nan Niu .

Editor information

Editors and Affiliations

Brunel University London, London, UK
Derek Groen
University of Amsterdam, Amsterdam, The Netherlands
Clélia de Mulatier
AGH University of Science and Technology, Krakow, Poland
Maciej Paszynski
University of Amsterdam, Amsterdam, The Netherlands
Valeria V. Krzhizhanovskaya
University of Tennessee at Knoxville, Knoxville, TN, USA
Jack J. Dongarra
University of Amsterdam, Amsterdam, The Netherlands
Peter M. A. Sloot

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Peng, Z., Lin, X., Santhoshkumar, S.N., Niu, N., Kanewala, U. (2022). Learning I/O Variables from Scientific Software’s User Manuals. In: Groen, D., de Mulatier, C., Paszynski, M., Krzhizhanovskaya, V.V., Dongarra, J.J., Sloot, P.M.A. (eds) Computational Science – ICCS 2022. ICCS 2022. Lecture Notes in Computer Science, vol 13353. Springer, Cham. https://doi.org/10.1007/978-3-031-08760-8_42

Download citation

DOI: https://doi.org/10.1007/978-3-031-08760-8_42
Published: 15 June 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-08759-2
Online ISBN: 978-3-031-08760-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Learning I/O Variables from Scientific Software’s User Manuals

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

An Improved and Optimized Random Forest Based Approach to Predict the Software Faults

Software Fault Prediction Using Particle Swarm Optimization and Random Forest

Software Fault Prediction Using Random Forests

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Learning I/O Variables from Scientific Software’s User Manuals

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

An Improved and Optimized Random Forest Based Approach to Predict the Software Faults

Software Fault Prediction Using Particle Swarm Optimization and Random Forest

Software Fault Prediction Using Random Forests

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation