Credit Scoring Models Using Ensemble Learning and Classification Approaches: A Comprehensive Survey

Tripathi, Diwakar; Shukla, Alok Kumar; Reddy, B. Ramachandra; Bopche, Ghanshyam S.; Chandramohan, D.

doi:10.1007/s11277-021-09158-9

Credit Scoring Models Using Ensemble Learning and Classification Approaches: A Comprehensive Survey

Published: 01 October 2021

Volume 123, pages 785–812, (2022)
Cite this article

Wireless Personal Communications Aims and scope Submit manuscript

Diwakar Tripathi¹,
Alok Kumar Shukla²,
B. Ramachandra Reddy³,
Ghanshyam S. Bopche⁴ &
…
D. Chandramohan⁵

1581 Accesses
19 Citations
Explore all metrics

Abstract

Credit scoring models are developed to strengthen the decision-making process specifically for financial institutions to deal with risk associated with a credit candidate while applying for new credit product. Ensemble learning is a strong approach to get close to ideal classifier and it strengthens the classifiers with aggregation of various models to obtain better outcome than individual model. Various studies have shown that heterogeneous ensemble models have received superior classification performances as compare to existing machine learning models. Enhancement in the predictive performance will result great savings of revenues for financial institution. And, in order to provide the higher stability and accuracy, ensemble learning produces commendable results due to their inherent properties for improving the effectiveness of credit scoring model. So, this study presents a comprehensive comparative analysis of nine ensemble learning approaches such as Multiboost, Cross Validation Parameter, Random Subspace, Metacoast, etc. with five classification approaches such as Partial Decision Tree (PART), Radial Basis Function Neural Network (RBFN), Logistic Regression (LR), Naive Bayes Decision Tree (NBT) and Sequential Minimal Optimization (SMO) along with various ensemble classifiers frameworks arranged in single and multi layer with various aggregation approaches such as Majority Voting, Average Probability, Maximum Probability, Unanimous Voting and Weighted Voting. Further, this study presents the impact of various combinations of classification and ensemble approaches on six bench-marked credit scoring datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Institutional subscriptions

Multiple optimized ensemble learning for high-dimensional imbalanced credit scoring datasets

Article 23 May 2024

An Ensemble Learning Approach for Credit Scoring Problem: A Case Study of Taiwan Default Credit Card Dataset

Credit risk evaluation: a comprehensive study

Article 04 October 2022

References

Mester, L. J., et al. (1997). What’s the point of credit scoring? Business review, 3, 3–16.
Google Scholar
Thomas, L.C., Edelman, D.B. & Crook, J.N. (2002). Credit scoring and its applications. Journal of the Operational Research Society, 57, 997–1006.
Louzada, F., Ara, A., & Fernandes, G. B. (2016). Classification methods applied to credit scoring: Systematic review and overall comparison. Surveys in Operations Research and Management Science, 21(2), 117–134.
Paleologo, G., Elisseeff, A., & Antonini, G. (2010). Subagging for credit scoring models. European Journal of Operational Research, 201(2), 490–499.
Article Google Scholar
Kuppili, V., Tripathi, D. & Reddy Edla, D. (2020). Credit score classification using spiking extreme learning machine. Computational Intelligence 36(2), 402–426.
Wang, G., Ma, J., Huang, L., & Xu, K. (2012). Two credit scoring models based on dual strategy ensemble trees. Knowledge-Based Systems, 26, 61–68.
Article Google Scholar
Sun, J., & Li, H. (2012). Financial distress prediction using support vector machines: Ensemble vs. individual. Applied Soft Computing, 12(8), 2254–2265.
Article Google Scholar
Marqués, A., García, V., & Sánchez, J. S. (2012). Two-level classifier ensembles for credit risk assessment. Expert Systems with Applications, 39(12), 10916–10922.
Article Google Scholar
Tripathi, D., Edla, D. R., & Cheruku, R. (2018). Hybrid credit scoring model using neighborhood rough set and multi-layer ensemble classification. Journal of Intelligent & Fuzzy Systems, 34(3), 1543–1549.
Article Google Scholar
Abellán, J., & Castellano, J. G. (2017). A comparative study on base classifiers in ensemble methods for credit scoring. Expert Systems with Applications, 73, 1–10.
Article Google Scholar
Parvin, H., MirnabiBaboli, M., & Alinejad-Rokny, H. (2015). Proposing a classifier ensemble framework based on classifier selection and decision tree. Engineering Applications of Artificial Intelligence, 37, 34–42.
Article Google Scholar
Saha, M. (2019). Credit cards issued. http://www.thehindu.com/business/Industry/Credit-cards-issued-touch-24.5-million/article14378386.ece (2017 (accessed October 1)).
Vapnik, V. (2013). The nature of statistical learning theory. NY: Springer.
MATH Google Scholar
Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine learning, 20(3), 273–297.
Article MATH Google Scholar
Van Gestel, T., et al. (2006). Bayesian kernel based classification for financial distress detection. European journal of operational research, 172(3), 979–1003.
Article MATH Google Scholar
Yang, Y. (2007). Adaptive credit scoring with kernel learning methods. European Journal of Operational Research, 183(3), 1521–1536.
Article MATH Google Scholar
Zhou, L., Lai, K. K., & Yen, J. (2009). Credit scoring models with auc maximization based on weighted svm. International journal of information technology & decision making, 8(04), 677–696.
Article MATH Google Scholar
XIAO, W.-b. & Fei, Q. (2006). A study of personal credit scoring models on support vector machine with optimal choice of kernel function parameters [j]. Systems Engineering-Theory & Practice 10, 010.
Li, S.-T., Shiue, W., & Huang, M.-H. (2006). The evaluation of consumer loans using support vector machines. Expert Systems with Applications, 30(4), 772–782.
Article Google Scholar
West, D. (2000). Neural network credit scoring models. Computers & Operations Research, 27(11), 1131–1152.
Article MATH Google Scholar
Haykin, S. S. (2001). Neural networks: A comprehensive foundation. NY: Tsinghua University Press.
MATH Google Scholar
Atiya, A. F. (2001). Bankruptcy prediction for credit risk using neural networks: A survey and new results. IEEE Transactions on neural networks, 12(4), 929–935.
Article Google Scholar
Tripathi, D., Edla, D. R., Kuppili, V., & Bablani, A. (2020). Evolutionary extreme learning machine with novel activation function for credit scoring. Engineering Applications of Artificial Intelligence, 96, 103980.
Article Google Scholar
Tripathi, D., Edla, D. R., Kuppili, V., & Dharavath, R. (2020). Binary bat algorithm and rbfn based hybrid credit scoring model. Multimedia Tools and Applications, 79(43), 31889–31912.
Article Google Scholar
Tripathi, D. et al. Bat algorithm based feature selection: Application in credit scoring. Journal of Intelligent & Fuzzy Systems (Preprint), 1–10 .
Ala’raj, M., & Abbod, M. F. (2016). A new hybrid ensemble credit scoring model based on classifiers consensus system approach. Expert Systems with Applications, 64, 36–55.
Article Google Scholar
Yeh, I.-C., & Lien, C.-H. (2009). The comparisons of data mining techniques for the predictive accuracy of probability of default of credit card clients. Expert Systems with Applications, 36(2), 2473–2480.
Article Google Scholar
Wang, G., Hao, J., Ma, J., & Jiang, H. (2011). A comparative assessment of ensemble learning for credit scoring. Expert systems with applications, 38(1), 223–230.
Article Google Scholar
Nanni, L., & Lumini, A. (2009). An experimental comparison of ensemble of classifiers for bankruptcy prediction and credit scoring. Expert systems with applications, 36(2), 3028–3033.
Article Google Scholar
Zhang, D., Zhou, X., Leung, S. C., & Zheng, J. (2010). Vertical bagging decision trees model for credit scoring. Expert Systems with Applications, 37(12), 7838–7843.
Article Google Scholar
Lin, W. .-Y., Hu, Y. .-H., & Tsai, C. .-F. (2012). Machine learning in financial crisis prediction: a survey. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 42(4), 421–436.
Article Google Scholar
Lahsasna, A., Ainon, R. N., & Teh, Y. W. (2010). Credit scoring models using soft computing methods: A survey. The International Arab Journal of Information Technology, 7(2), 115–123.
Google Scholar
Abdou, H. A., & Pointon, J. (2011). Credit scoring, statistical techniques and evaluation criteria: a review of the literature. Intelligent Systems in Accounting, Finance and Management, 18(2–3), 59–88.
Article Google Scholar
Bequé, A.., & Lessmann, S. (2017). Extreme learning machines for credit scoring: An empirical evaluation. Expert Systems with Applications, 86 42–53.
Ala’raj, M., & Abbod, M. F. (2016). Classifiers consensus system approach for credit scoring. Knowledge-Based Systems, 104, 89–105.
Article Google Scholar
Tsai, C.-F., & Wu, J.-W. (2008). Using neural network ensembles for bankruptcy prediction and credit scoring. Expert systems with applications, 34(4), 2639–2649.
Article Google Scholar
Xia, Y., Liu, C., Da, B., & Xie, F. (2018). A novel heterogeneous ensemble credit scoring model based on bstacking approach. Expert Systems with Applications, 93, 182–199.
Article Google Scholar
Guo, S., He, H., & Huang, X. (2019). A multi-stage self-adaptive classifier ensemble model with application in credit scoring. IEEE Access, 7, 78549–78559.
Article Google Scholar
Wongchinsri, P. & Kuratach, W. (2017). Sr-based binary classification in credit scoring, 385–388 (IEEE).
Hens, A. B., & Tiwari, M. K. (2012). Computational time reduction for credit scoring: An integrated approach based on support vector machine and stratified sampling method. Expert Systems with Applications, 39(8), 6774–6781.
Article Google Scholar
Huang, C.-L., & Wang, C.-J. (2006). A ga-based feature selection and parameters optimizationfor support vector machines. Expert Systems with applications, 31(2), 231–240.
Article Google Scholar
Hu, Q., Yu, D., Liu, J., & Wu, C. (2008). Neighborhood rough set based heterogeneous feature subset selection. Information sciences, 178(18), 3577–3594.
Article MathSciNet MATH Google Scholar
Liu, Y., et al. (2011). An improved particle swarm optimization for feature selection. Journal of Bionic Engineering, 8(2), 191–200.
Article Google Scholar
Oreski, S., & Oreski, G. (2014). Genetic algorithm-based heuristic for feature selection in credit risk assessment. Expert systems with applications, 41(4), 2052–2064.
Article Google Scholar
Huang, C.-L., Chen, M.-C., & Wang, C.-J. (2007). Credit scoring with a data mining approach based on support vector machines. Expert systems with applications, 33(4), 847–856.
Article Google Scholar
Ping, Y., & Yongheng, L. (2011). Neighborhood rough set and svm based hybrid credit scoring classifier. Expert Systems with Applications, 38(9), 11300–11304.
Article Google Scholar
Liang, D., Tsai, C.-F., & Wu, H.-T. (2015). The effect of feature selection on financial distress prediction. Knowledge-Based Systems, 73, 289–297.
Article Google Scholar
Wang, J., Guo, K., & Wang, S. (2010). Rough set and tabu search based feature selection for credit scoring. Procedia Computer Science, 1(1), 2425–2432.
Article Google Scholar
Edla, D. R., Tripathi, D., Cheruku, R., & Kuppili, V. (2018). An efficient multi-layer ensemble framework with bpsogsa-based feature selection for credit scoring data analysis. Arabian Journal for Science and Engineering, 43(12), 6909–6928.
Article Google Scholar
Tripathi, D., Edla, D. R., Kuppili, V., Bablani, A., & Dharavath, R. (2018). Credit scoring model based on weighted voting and cluster based feature selection. Procedia Computer Science, 132, 22–31.
Article Google Scholar
Zhang, W., He, H., & Zhang, S. (2019). A novel multi-stage hybrid model with enhanced multi-population niche genetic algorithm: An application in credit scoring. Expert Systems with Applications, 121, 221–232.
Article Google Scholar
Xu, D., Zhang, X., & Feng, H. (2019). Generalized fuzzy soft sets theory-based novel hybrid ensemble credit scoring model. International Journal of Finance & Economics, 24(2), 903–921.
Article Google Scholar
Tripathi, D., Cheruku, R., & Bablani, A. (2018). in Relative performance evaluation of ensemble classification with feature reduction in credit scoring datasets (pp. 293–304). Ny: Springer.
Google Scholar
Somol, P., Baesens, B., Pudil, P., & Vanthienen, J. (2005). Filter-versus wrapper-based feature selection for credit scoring. International Journal of Intelligent Systems, 20(10), 985–999.
Article Google Scholar
Wang, D., Zhang, Z., Bai, R., & Mao, Y. (2018). A hybrid system with filter approach and multiple population genetic algorithm for feature selection in credit scoring. Journal of Computational and Applied Mathematics, 329, 307–321.
Article MathSciNet MATH Google Scholar
Tripathi, D., Edla, D. R., Bablani, A., Shukla, A. K., & Reddy, B. R. (2021). Experimental analysis of machine learning methods for credit score classification. Progress in Artificial Intelligence, 1–27.
Frank, E. & Witten, I.H. (1998). Generating accurate rule sets without global optimization. University of Waikato: Department of Computer Science.
Witten, I. H., Frank, E., Hall, M. A., & Pal, C. J. (2016). Data Mining: Practical machine learning tools and techniques. Morgan Kaufmann.
Kala, R., Vazirani, H., Khanwalkar, N., & Bhattacharya, M. (2010). Evolutionary radial basis function network for classificatory problems. IJCSA, 7(4), 34–49.
Google Scholar
Broomhead, D. S., & Lowe, D. (1988). Radial basis functions, multi-variable functional interpolation and adaptive networks. Royal Signals and Radar Establishment Malvern (United Kingdom): Tech. Rep.
MATH Google Scholar
Le Cessie, S., & Van Houwelingen, J. C. (1992). Ridge estimators in logistic regression. Applied statistics, 191–201,
Green, S., & Salkind, N. (2010). Using spss for windows and macintosh: Analyzing and understanding data. Uppersaddle River: Prentice Hall Google Scholar.
Google Scholar
Trevor, H., Robert, T. & JH, F. (2017). The elements of statistical learning: data mining, inference, and prediction. Springer open.
Rokach, L. & Maimon, O.Z. Data mining with decision trees: theory and applications, Vol. 69. World scientific.
Kohavi, R. (1996). Scaling up the accuracy of naive-bayes classifiers: a decision-tree hybrid., Vol. 96, 202–207 (Citeseer).
Rifkin, R.M. (2002). Everything old is new again: a fresh look at historical approaches in machine learning. Ph.D. thesis, MaSSachuSettS InStitute of Technology.
Platt, J. C. (1999). Fast training of support vector machines using sequential minimal optimization. Advances in kernel methods, 3, 185–208.
Brown, G. (2011). in Ensemble learning 312–320. Springer.
Woźniak, M., Graña, M., & Corchado, E. (2014). A survey of multiple classifier systems as hybrid systems. Information Fusion, 16, 3–17.
Article Google Scholar
Rokach, L. (2010). Ensemble-based classifiers. Artificial Intelligence Review, 33(1–2), 1–39.
Article Google Scholar
Ravikumar, P. & Ravi, V. (2006). Bankruptcy prediction in banks by an ensemble classifier, 2032–2036 (IEEE).
Breiman, L. (1996). Bagging predictors. Machine learning, 24(2), 123–140.
Aslam, J. A., Popa, R. A., & Rivest, R. L. (2007). On estimating the size and confidence of a statistical audit. EVT, 7, 8.
Google Scholar
Kohavi, R. (1995). Wrappers for performance enhancement and oblivious decision graphs. Tech. Rep.: Carnegie-Mellon Univ Pittsburgh Pa Dept of Computer Science.
Freund, Y., Schapire, R. E., et al. (1996). Experiments with a new boosting algorithm (Vol. 96, pp. 148–156). NY: Citeseer.
Google Scholar
Melville, P., & Mooney, R. J. (2003). Constructing diverse classifier ensembles using artificial training examples (Vol. 3, pp. 505–510). NY: Citeseer.
Google Scholar
Ho, T.K. (1995). Random decision forests, Vol. 1, 278–282 (IEEE).
Rodriguez, J. J., Kuncheva, L. I., & Alonso, C. J. (2006). Rotation forest: A new classifier ensemble method. IEEE transactions on pattern analysis and machine intelligence, 28(10), 1619–1630.
Article Google Scholar
Ting, K. M. & Witten, I.H. (1997). Stacking bagged and dagged models.
Domingos, P. (1999). Metacost: A general method for making classifiers cost-sensitive, 155–164 (ACM).
Webb, G. I. (2000). Multiboosting: A technique for combining boosting and wagging. Machine learning, 40(2), 159–196.
Article Google Scholar
Bauer, E., & Kohavi, R. (1999). An empirical comparison of voting classification algorithms: Bagging, boosting, and variants. Machine learning, 36(1–2), 105–139.
Article Google Scholar
Bashir, S., Qamar, U., & Khan, F. H. (2016). Intellihealth: A medical decision support application using a novel weighted multi-layer classifier ensemble framework. Journal of biomedical informatics, 59, 185–200.
Article Google Scholar
Liang, D., Tsai, C.-F., Dai, A.-J., & Eberle, W. (2018). A novel classifier ensemble approach for financial distress prediction. Knowledge and Information Systems, 54(2), 437–462.
Kittler, J., Hatef, M., Duin, R. P., & Matas, J. (1998). On combining classifiers. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(3), 226–239.
Article Google Scholar
Triantaphyllou, E. (2000). in Multi-criteria decision making methods 5–21. Springer.
Lichman, M. (2013). UCI machine learning repository. http://archive.ics.uci.edu/ml.
Moro, S., Cortez, P., & Rita, P. (2014). A data-driven approach to predict the success of bank telemarketing. Decision Support Systems, 62, 22–31.
Article Google Scholar
Statlog. (2019). German dataset. https://archive.ics.uci.edu/ml/machine-learning-databases/statlog/german/ ((accessed October 1)).
Statlog. (2019). Australian credit approval data set. http://archive.ics.uci.edu/ml/machine-learning-databases/statlog/australian/australian.dat ((accessed October 1)).
Dua, D. & Graff, C. (2017). UCI machine learning repository. http://archive.ics.uci.edu/ml.

Download references

Author information

Authors and Affiliations

Thapar Institute of Engineering and Technology, Patiala, Punjab, 147004, India
Diwakar Tripathi
VIT University AP-Andhra Pradesh, Amaravati, Andhra Pradesh, 522237, India
Alok Kumar Shukla
SRM University AP - Andhra Pradesh, Amaravati, Andhra Pradesh, 522502, India
B. Ramachandra Reddy
National Institute of Technology Tiruchirappalli, Tiruchirappalli, Tamilnadu, 620015, India
Ghanshyam S. Bopche
Madanapalle Institute of Technology and Science, Madanapalle, Andhra Pradesh, 517325, India
D. Chandramohan

Authors

Diwakar Tripathi
View author publications
You can also search for this author in PubMed Google Scholar
Alok Kumar Shukla
View author publications
You can also search for this author in PubMed Google Scholar
B. Ramachandra Reddy
View author publications
You can also search for this author in PubMed Google Scholar
Ghanshyam S. Bopche
View author publications
You can also search for this author in PubMed Google Scholar
D. Chandramohan
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

A. K. Shukla, B. R. Reddy, G. S. Bopche, D. Chandramohan: These authors contributed equally to this work.

Corresponding author

Correspondence to Diwakar Tripathi.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Tripathi, D., Shukla, A.K., Reddy, B.R. et al. Credit Scoring Models Using Ensemble Learning and Classification Approaches: A Comprehensive Survey. Wireless Pers Commun 123, 785–812 (2022). https://doi.org/10.1007/s11277-021-09158-9

Download citation

Accepted: 16 September 2021
Published: 01 October 2021
Issue Date: March 2022
DOI: https://doi.org/10.1007/s11277-021-09158-9

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Institutional subscriptions

Credit Scoring Models Using Ensemble Learning and Classification Approaches: A Comprehensive Survey

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Multiple optimized ensemble learning for high-dimensional imbalanced credit scoring datasets

An Ensemble Learning Approach for Credit Scoring Problem: A Case Study of Taiwan Default Credit Card Dataset

Credit risk evaluation: a comprehensive study

References

Author information

Authors and Affiliations

Contributions

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Credit Scoring Models Using Ensemble Learning and Classification Approaches: A Comprehensive Survey

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Multiple optimized ensemble learning for high-dimensional imbalanced credit scoring datasets

An Ensemble Learning Approach for Credit Scoring Problem: A Case Study of Taiwan Default Credit Card Dataset

Credit risk evaluation: a comprehensive study

References

Author information

Authors and Affiliations

Contributions

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation