Combat with Class Overlapping in Software Defect Prediction Using Neighbourhood Metric

Gupta, Shivani; Richa; Kumar, Ranjeet; Jain, Kusum Lata

doi:10.1007/s42979-023-02082-8

Combat with Class Overlapping in Software Defect Prediction Using Neighbourhood Metric

Original Research
Published: 12 September 2023

Volume 4, article number 695, (2023)
Cite this article

SN Computer Science Aims and scope Submit manuscript

Shivani Gupta¹,
Richa²,
Ranjeet Kumar^3,4 &
…
Kusum Lata Jain^3,4

110 Accesses
Explore all metrics

Abstract

The characteristics of data is a open problem which has been tended perceived in data analysis in machine learning research from last decades. The researcher defined some measures to identify the characteristics of the dataset by applying data complexity measures to find the fitness for purpose. The presence of class overlapping in data-sets, significantly affect performance of the classifiers. Data complexity measures provide quantitative insight in quality of the data set and overlapping existent in it. Machine learning techniques are also utilized by several researchers on healthcare datasets in software defect prediction. In this paper, our aim is to evaluates the effectiveness of new overlap measure: Near Enemy Ratio, and its effect on complexity measures and performance of the classifier. The new ration is based on nearest instances to the target instance. The experimental result offers insights in usefulness of the method and help us decide whether this solution should be applied on a particular data-set or not.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Institutional subscriptions

A set of measures designed to identify overlapped instances in software defect prediction

Article 10 January 2017

Software Defect Prediction on Unlabelled Datasets: A Comparative Study

Software defect prediction using global and local models

Article 02 July 2024

Data availability

The collected data will be available on request.

References

Kuncheva LI, Rodrguez JJ. A weighted voting framework for classifiers ensembles. Knowl Inform Syst. (in press). 2013. https://doi.org/10.1007/s10115-012-0586-6.
Fawcett T, Provost F. Adaptive Fraud Detection. Data Min Knowl Discovery. 1997;1(3):291–316.
Article Google Scholar
Sáez JA, Galar M, Luengo J, Herrera F. Analyzing the presence of noise in multi-class problems: alleviating its influence with the one-vs-one decomposition. Knowl Inform Syst (in press). 2013. https://doi.org/10.1007/s10115-012-0570-1.
Baumgartner R, Somorjai RL. Data complexity assessment in undersampled classification. Pattern Recognit Lett. 2006;27:1383–9.
Article Google Scholar
Bernadó-Mansilla E, Ho TK. Domain of competence of XCS classifier system in complexity measurement space. IEEE Trans Evol Comput. 2005;9(1):82–104.
Article Google Scholar
Basu M, Ho TK. Data complexity in pattern recognition. Berlin: Springer; 2006.
Book MATH Google Scholar
Sánchez JS, Mollineda RA, Sotoca JM. An analysis of how training data complexity affects the nearest neighbor classifiers. Pattern Anal Appl. 2007;10:189–201.
Article MathSciNet Google Scholar
José Salvador Sánchez, Pla Filiberto, Ferri Francesc J. Prototype selection for the nearest neighbour rule through proximity graphs. Pattern Recognit Lett. 1997;18(6):507–13.
Article Google Scholar
Bernadó-Mansilla E, Ho TK. Domain of competence of XCS classifier system in complexity measurement space. IEEE Trans Evol Comput. 2005;9(1):82–104.
Article Google Scholar
Mollineda RA, Sánchez JS, Sotoca JM. Data characterization for effective prototype selection. In: Proceedings of the 2nd Iberian Conference on Pattern Recognition and Image Analysis, Springer, 2005; pp 27–34.
Orriols-Puig A, Macià N, Ho TK. Documentation for the Data Complexity Library in C++, Technical Report, La Salle - Universitat Ramon Llull 2010.
Zhang ML, Zhou ZH. Ml-knn: a lazy learning approach to multi-label learning. Pattern Recognit. 2007;40:2038–48.
Article MATH Google Scholar
Hoekstra Aarnoud, Duin Robert PW. Investigating redundancy in feed-forward neural classifiers. Pattern Recognit Lett. 1997;18(11):1293–300.
Article Google Scholar
Janez Demšar. Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res. 2006;7:1–30.
MathSciNet MATH Google Scholar
Gupta Shivani, Gupta Atul. A set of measures designed to identify overlapped instances in software defect prediction. Computing. 2017;99(9):889–914.
Article MathSciNet Google Scholar
He Haibo, Garcia Edwardo A. Learning from imbalanced data. Knowl Data Eng IEEE Trans. 2009;21(9):1263–84.
Article Google Scholar
Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P. Witten, The WEKA data mining software: an update; SIGKDD Explor 2009; 10–18.
Platt J. Fast training of support vector Machines using sequential minimal optimization. In: Schölkopf B, Burges CJC, Smola AJ (eds) Advances in kernel methods–support vector learning, Chap 12. MIT Press, 1998; pp 169–185.
Yu S, Zhou ZH, Steinbac M, Hand DJ, Steinberg D. Top 10 algorithms in data mining. Knowl Inform Syst. 2007;14(1):1–37.
Google Scholar
John GH, Langley P. Estimating continuous distributions in Bayesian classifiers. In: Eleventh conference on uncertainty in artificial intelligence, San Mateo, 1995; pp 338–345.
Witten IH, Frank E. Data mining: practical machine learning tools and techniques. 2nd ed. San Francisco: Morgan Kaufmann; 2005.
MATH Google Scholar
Cover TM, Hart PE. Nearest Neighbor Pattern Classification. IEEE Trans Information Theory. 1967;13:21–7.
Article MATH Google Scholar
Cortes C, Vapnik V. Support vector networks. Mach Learn. 1995;20:273–97.
Article MATH Google Scholar
Quinlan J C4.5: Programs for Machine Learning. San Mateo, CA: Morgan Kaufman, 1992.
Joaquín D, Isaac T, Salvador G, Francisco H. Integrating instance selection, instance weighting, and feature weighting for nearest neighbor classifiers by coevolutionary algorithms. IEEE Trans Syst Man Cybern Part B. 2012;42(5):1383–97.
Article Google Scholar
Gongde G et al. KNN model-based approach in classification. OTM confederated international conferences on the move to meaningful internet systems. Springer, Berlin, Heidelberg, 2003.
Zang B et al. An improved KNN algorithm based on minority class distribution for imbalanced dataset. Comput Symp (ICS), 2016 International. IEEE, 2016.
Kotsiantis Sotiris B, Zaharakis I, Pintelas P. Supervised machine learning: a review of classification techniques. Emerging Artificial Intell Appl Comput Eng. 2007;160:3–24.
Google Scholar
Zhu X, Wu X, Yang Y Error detection and impact-sensitive instance ranking in noisy datasets. AAAI; 2004.
Hunt EB, Janet M, Philip JS. Experiments in induction 1966.
Holte Robert C. Very simple classification rules perform well on most commonly used datasets. Mach Learn. 1993;11(1):63–90.
Article MATH Google Scholar
Wu X, Zhu X. Mining with noise knowledge: error aware data mining. IEEE Trans Syst Man Cybernet-Part A. 2008;38(4):917–32.
Article Google Scholar
Liu C. Partial discriminative training for classification of overlapping classes in document analysis. Int J Document Anal Recognit. 2008;11(2):53–65.
Article Google Scholar
Shivani G, Atul G Domain of Competency of Classifiers on Overlapping Complexity of Datasets Using Multi-label Classification with Meta-Learning. Adv Comput Intell Commun Technol. 2019. Springer Singapore, 2021.
Andrews S Learning from ambiguous examples, 2007;68(07).
Tang Y, Gao J. Improved classification for problem involving overlapping patterns. IEICE Trans Inform Syst. 2007;90(11):1787–95.
Article Google Scholar
Gupta Shivani. Classifiers recommendation system for overlapped software defect prediction using multi-label framework. J Adv Res Dyn Control Syst. 2020;12(1472–1478):2020.
MathSciNet Google Scholar
Visa S, Ralescu A Learning imbalanced and overlapping classes using fuzzy sets, in Proceedings of the ICML, 2003;3.
Xiong H, Wu J, Liu L. Classification with class overlapping: a systematic study. In: The 2010 international conference on E-business intelligence, 2010; pp. 491-497.
Ojima Y, Horiuchi S, Ishikawa F Model-based Data-Complexity Estimator for Deep Learning Systems. In: IEEE International conference on artificial intelligence testing (AITest). Oxford, United Kingdom. 2021;2021:1–8.
Gupta Shivani, Gupta Atul. Dealing with noise problem in machine learning data-sets: a systematic review. Proc Comput Sci. 2019;161:466–74.
Article Google Scholar
Hoerl Roger, Jensen Willis, de Mast Jeroen. Understanding and addressing complexity in problem solving. Q Eng. 2021;33(4):612–26. https://doi.org/10.1080/08982112.2021.1952230.
Article Google Scholar
Sarker IH. Machine learning: algorithms, Real-world applications and research directions. SN COMPUT SCI. 2021;2:160. https://doi.org/10.1007/s42979-021-00592-x, 2021.
Alsaeedi A, Khan MZ. Software defect prediction using supervised machine learning and ensemble techniques: a comparative study. J Softw Eng Appl. 2019;12(05):85–100.
Article Google Scholar
Khan B, Naseem R, Muhammad F, Abbas G, Kim S. An empirical evaluation of machine learning techniques for chronic kidney disease prophecy. IEEE Access. 2020;8:55012–22.
Article Google Scholar
Gupta Shivani, Gupta Atul. Handling class overlapping to detect noisy instances in classification. Knowl Eng Rev. 2018;33: e8.
Article Google Scholar
Santos Miriam Seoane, et al. A unifying view of class overlap and imbalance: key concepts, multi-view panorama, and open avenues for research. Inform Fus. 2023;89(2023):228–53.
Article Google Scholar
Dai Qi, Liu Jian-wei, Shi Yong-hui. Class-overlap undersampling based on Schur decomposition for Class-imbalance problems. Expert Syst Appl. 2023;221(2023): 119735.
Article Google Scholar
Prince Mini, Prathap PM Joe. An imbalanced dataset and class overlapping classification model for big data. Comput Syst Sci Eng. 2023;44(2):1009–24.
Article Google Scholar
Lina G et al. A comprehensive investigation of the impact of class overlap on software defect prediction. IEEE transactions on software engineering 2022.

Download references

Funding

This declaration is not applicable.

Author information

Authors and Affiliations

School of Computer Science Engineering, Vellore Institute of Technology, Chennai, India
Shivani Gupta
Department of Computer science and Engineering, Birla Institute of Technology, Mesra, Ranchi, India
Richa
School of Electronics Engineering, Vellore Institute of Technology, Chennai, India
Ranjeet Kumar & Kusum Lata Jain
School of Computer & Communication Engineering, Manipal University Jaipur, Jaipur, India
Ranjeet Kumar & Kusum Lata Jain

Authors

Shivani Gupta
View author publications
You can also search for this author inPubMed Google Scholar
Richa
View author publications
You can also search for this author inPubMed Google Scholar
Ranjeet Kumar
View author publications
You can also search for this author inPubMed Google Scholar
Kusum Lata Jain
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Shivani Gupta.

Ethics declarations

Ethical Approval

This declaration is not applicable.

Conflict of Interest

The authors do not have any competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article is part of the topical collection “Advances in Machine Vision and Augmented Intelligence” guest edited by Manish Kumar Bajpai, Ranjeet Kumar, Koushlendra Kumar Singh and George Giakos.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Gupta, S., Richa, Kumar, R. et al. Combat with Class Overlapping in Software Defect Prediction Using Neighbourhood Metric. SN COMPUT. SCI. 4, 695 (2023). https://doi.org/10.1007/s42979-023-02082-8

Download citation

Received: 16 May 2023
Accepted: 25 June 2023
Published: 12 September 2023
DOI: https://doi.org/10.1007/s42979-023-02082-8

Keywords

Part of a collection:

Advances in Machine Vision and Augmented Intelligence

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Institutional subscriptions

Combat with Class Overlapping in Software Defect Prediction Using Neighbourhood Metric

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A set of measures designed to identify overlapped instances in software defect prediction

Software Defect Prediction on Unlabelled Datasets: A Comparative Study

Software defect prediction using global and local models

Data availability

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Ethical Approval

Conflict of Interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now