基于灰狼优化算法的信用评估样本均衡化与特征选择同步处理

计算机科学 ›› 2022, Vol. 49 ›› Issue (4): 134-139.doi: 10.11896/jsjkx.210300075

• 数据库&大数据&数据科学 • 上一篇    下一篇

基于灰狼优化算法的信用评估样本均衡化与特征选择同步处理

储安琪, 丁志军   

  1. 嵌入式系统与服务计算教育部重点实验室(同济大学) 上海 201804; 上海市电子交易与信息服务协同创新中心(同济大学) 上海 201804
  • 收稿日期:2021-03-08 修回日期:2021-07-14 发布日期:2022-04-01
  • 通讯作者: 丁志军(dingzj@tongji.edu.cn)
  • 作者简介:(1933013@tongji.edu.cn)
  • 基金资助:
    上海市科技创新行动计划(19511101300)

Application of Gray Wolf Optimization Algorithm on Synchronous Processing of Sample Equalization and Feature Selection in Credit Evaluation

CHU An-qi, DING Zhi-jun   

  1. Key Laboratory of Embedded System and Service Computing of Ministry of Education (Tongji University), Shanghai 201804, China; Shanghai Electronic Transactions and Information Service Collaborative Innovation Center (Tongji University), Shanghai 201804, China
  • Received:2021-03-08 Revised:2021-07-14 Published:2022-04-01
  • Supported by:
    This work was supported by the Shanghai Science and Technology Innovation Action Plan(19511101300).

摘要: 随着互联网金融行业的迅速发展,面对海量数据,传统信用风险评估面临着挑战。信用评估中样本类别不均衡,且特征冗余度高,成为影响目前评估分类精度的关键因素。为了解决以上问题,提出了一种基于灰狼优化算法同步处理样本欠采样与特征选择的方法。该方法将分类器的性能作为灰狼优化算法的启发式信息,然后进行智能搜索,以得到最优样本与特征集的组合,并在原始灰狼算法中引入禁忌表策略,避免算法陷入局部最优。实验表明,该方法相较于其他方法有较大改进,在不同数据集上的表现均证明了该方法能够有效解决样本不均衡问题,降低特征空间维度,同时提高分类准确率。其在信用风险评估上相比原始数据准确率提高了3%左右,证实了该方法在信用评估领域的适用性与优越性。

关键词: 灰狼优化算法, 特征选择, 信用评估, 样本不均衡

Abstract: With the rapid development of Internet finance industry, traditional credit risk evaluation is facing challenges in the face of massive data.Due to the unbalanced sample categories and high feature redundancy in credit evaluation, it has become the key factor affecting the classification accuracy of current evaluation.In order to solve the above problems, a method based on gray wolf optimization algorithm is proposed to process the samples under sampling and feature selection synchronously.In this me-thod, the performance of the classifier is taken as the heuristic information of the gray wolf optimization algorithm, and then the intelligent search is carried out to obtain the combination of the optimal sample and the feature set, and the tabu table strategy is introduced into the original gray wolf algorithm to avoid the algorithm falling into the local optimum.Experimental results show that the proposed method has a great improvement compared with other methods, and its performance on different data sets proves that it can effectively solve the problem of sample imbalance, reduce the dimension of feature space, and improve the accuracy of classification.Compared with the original data, the accuracy of credit risk evaluation is improved by about 3%, which proves the applicability and superiority of this method in the field of credit evaluation.

Key words: Credit evaluation, Feature selection, Gray wolf optimization algorithm, Sample imbalance

中图分类号: 

  • TP3-05
[1] SUN H,WANG B.Research on Credit Risk Assessment of Online Network Credit Based on GBDT[C]//2020 International Conference on Big Data in Management.2020.
[2] PENG M,ZHANG Q,XING X,et al.Trainable Undersampling for Class-Imbalance Learning[C]//AAAI Conference on Artificial Intelligence.2019.
[3] WILSON D L.Asymptotic Properties of Nearest NeighborRules Using Edited Data[J].IEEE Transactions on Systems Man & Cybernetics,1972,SMC-2(3):408-421.
[4] MANI I,ZHANG J.KNN Approach to Unbalanced Data Distributions:A Case Study Involving Information Extraction[C]//ICML Workshop on Learning from Imbalanced Datasets.2003.
[5] LIU X Y,WU J,ZHOU Z H.Exploratory Undersampling for Class-Imbalance Learning[J].IEEE Transactions on Cyberne-tics,2009,39(2):539-550.
[6] LEMAITRE G,NOGUEIRA F,ARIDAS C K.Imbalanced-learn:A Python Toolbox to Tackle the Curse of Imbalanced Datasets in Machine Learning[J].Journal of Machine Learning Research,2016,18(17):1-5.
[7] LIU Y,YANG K.Credit Fraud Detection for Extremely Imba-lanced Data Based on Ensembled Deep Learning[J].Journal of Computer Research and Development,2021,58(3):539-547.
[8] FRITZ S,HOSEMANN D.Restructuring the credit process:behaviour scoring for German corporates[J].Intelligent Systems in Accounting Finance & Management,2000,9(1):9-21.
[9] DING C,PENG H.Minimum redundancy feature selection from microarray gene expression data[J].Journal of Bioinformatics and Computational Biology,2005,3(2):185-206.
[10] HALL M.Correlation-based Feature Selection for Discrete and Numeric Class Machine Learning[C]//Proceedings of the 17th International Conference on Machine Learning.San Francisco,Morgan Kaufmann,2000:359-366.
[11] TRAN B,XUE B,ZHANG M.A New Representation in PSO for Discretization-Based Feature Selection[J].IEEE Transactions on Cybernetics,2018,48(6):1733-1746.
[12] ZHANG X,LI Z S.Research on Feature Selection Algorithm Based on Natural Evolution Strategy[J].Journal of Software,2020,31(12):3733-3752.
[13] MIRJALILI S,MIRJALILI S M,LEWIS A.Grey Wolf Optimizer[J].Advances in Engineering Software,2014,69:46-61.
[14] ZHANG P Y,HUANG X Z,LI M Z,et al.Hybridization between Neural Computing and Nature-Inspired Algorithms for a Sentence Similarity Model Based on the Attention Mechanism[J].ACM Transactions on Asian and Low-Resource Language Information Processing,2021,20(1):1-21.
[15] MISHRA S,DWIVEDULA R,KSHIRSAGAR V,et al.Robust Detection of Network Intrusion using Tree-based Convolutional Neural Networks[C]//8th ACM IKDD CODS and 26th COMAD.2021.
[16] INDU S,SRIVASTAVA S,SHARMA V.Optimal CameraPlacement and Orientation of A Multi-camera System for Self Driving Cars[C]//Proceedings of the 2020 4th International Conference on Vision,Image and Signal Processing.2020:1-5.
[17] LIU J,CHEN Z,ZHANG Y,et al.Path Planning of Mobile Robots based on Improved Genetic Algorithm[C]//2020 2nd International Conference on Robotics,Intelligent Control and Artificial Intelligence.2020.
[18] WANG W J,SUN Y Y,SUN H L,et al.Research on Multi-source Heterogeneous Data Classification Based on Multi-objective Optimization Technology[J].Computer and Digital Engineering,2020,48(1):130-136.
[19] ZHOU M.Credit Evaluation for Hybrid Grey Wolf Optimization and Least Squares Support Vector Machine Approach[J].Journal of Chengdu University of Technology(Science & Technology Edition),2019,46(4):507-512.
[20] CHANG C C,LIN C J.LIBSVM:A library for support vector machines[J].ACM Transactions on Intelligent Systems and Technology,2011,2(3):1-27.
[1] 李斌, 万源.
基于相似度矩阵学习和矩阵校正的无监督多视角特征选择
Unsupervised Multi-view Feature Selection Based on Similarity Matrix Learning and Matrix Alignment
计算机科学, 2022, 49(8): 86-96. https://doi.org/10.11896/jsjkx.210700124
[2] 胡艳羽, 赵龙, 董祥军.
一种用于癌症分类的两阶段深度特征选择提取算法
Two-stage Deep Feature Selection Extraction Algorithm for Cancer Classification
计算机科学, 2022, 49(7): 73-78. https://doi.org/10.11896/jsjkx.210500092
[3] 康雁, 王海宁, 陶柳, 杨海潇, 杨学昆, 王飞, 李浩.
混合改进的花授粉算法与灰狼算法用于特征选择
Hybrid Improved Flower Pollination Algorithm and Gray Wolf Algorithm for Feature Selection
计算机科学, 2022, 49(6A): 125-132. https://doi.org/10.11896/jsjkx.210600135
[4] 孙林, 黄苗苗, 徐久成.
基于邻域粗糙集和Relief的弱标记特征选择方法
Weak Label Feature Selection Method Based on Neighborhood Rough Sets and Relief
计算机科学, 2022, 49(4): 152-160. https://doi.org/10.11896/jsjkx.210300094
[5] 李宗然, 陈秀宏, 陆赟, 邵政毅.
鲁棒联合稀疏不相关回归
Robust Joint Sparse Uncorrelated Regression
计算机科学, 2022, 49(2): 191-197. https://doi.org/10.11896/jsjkx.210300034
[6] 张叶, 李志华, 王长杰.
基于核密度估计的轻量级物联网异常流量检测方法
Kernel Density Estimation-based Lightweight IoT Anomaly Traffic Detection Method
计算机科学, 2021, 48(9): 337-344. https://doi.org/10.11896/jsjkx.200600108
[7] 杨蕾, 降爱莲, 强彦.
基于自编码器和流形正则的结构保持无监督特征选择
Structure Preserving Unsupervised Feature Selection Based on Autoencoder and Manifold Regularization
计算机科学, 2021, 48(8): 53-59. https://doi.org/10.11896/jsjkx.200700211
[8] 侯春萍, 赵春月, 王致芃.
基于自反馈最优子类挖掘的视频异常检测算法
Video Abnormal Event Detection Algorithm Based on Self-feedback Optimal Subclass Mining
计算机科学, 2021, 48(7): 199-205. https://doi.org/10.11896/jsjkx.200800146
[9] 胡艳梅, 杨波, 多滨.
基于网络结构的正则化逻辑回归
Logistic Regression with Regularization Based on Network Structure
计算机科学, 2021, 48(7): 281-291. https://doi.org/10.11896/jsjkx.201100106
[10] 周钢, 郭福亮.
基于特征选择的高维数据集成学习方法研究
Research on Ensemble Learning Method Based on Feature Selection for High-dimensional Data
计算机科学, 2021, 48(6A): 250-254. https://doi.org/10.11896/jsjkx.200700102
[11] 丁思凡, 王锋, 魏巍.
一种基于标签相关度的Relief特征选择算法
Relief Feature Selection Algorithm Based on Label Correlation
计算机科学, 2021, 48(4): 91-96. https://doi.org/10.11896/jsjkx.200800025
[12] 滕俊元, 高猛, 郑小萌, 江云松.
噪声可容忍的软件缺陷预测特征选择方法
Noise Tolerable Feature Selection Method for Software Defect Prediction
计算机科学, 2021, 48(12): 131-139. https://doi.org/10.11896/jsjkx.201000168
[13] 张亚钏, 李浩, 宋晨明, 卜荣景, 王海宁, 康雁.
混合人工化学反应优化和狼群算法的特征选择
Hybrid Artificial Chemical Reaction Optimization with Wolf Colony Algorithm for Feature Selection
计算机科学, 2021, 48(11A): 93-101. https://doi.org/10.11896/jsjkx.210100067
[14] 董明刚, 黄宇扬, 敬超.
基于遗传实例和特征选择的K近邻训练集优化方法
K-Nearest Neighbor Classification Training Set Optimization Method Based on Genetic Instance and Feature Selection
计算机科学, 2020, 47(8): 178-184. https://doi.org/10.11896/jsjkx.190700089
[15] 张严, 秦亮曦.
基于Levy飞行策略的改进樽海鞘群算法
Improved Salp Swarm Algorithm Based on Levy Flight Strategy
计算机科学, 2020, 47(7): 154-160. https://doi.org/10.11896/jsjkx.190600068
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!