计算机科学 ›› 2021, Vol. 48 ›› Issue (11A): 71-76.doi: 10.11896/jsjkx.210200110
王茂光, 杨行
WANG Mao-guang, YANG Hang
摘要: 近年来互联网金融网贷领域涌现出了众多的风控问题,对此采用多种特征选择方法预处理风控领域的数据指标,构建了全面的针对企业信用的风控指标体系,采用stacking 集成策略研究了基于AP-Entropy的信用风险模型。信用风险模型有两层学习器,引入选择集成思想,从种类和数量上筛选基学习器。首先,在Logistic回归、反向传播神经网络、AdaBoost等经典机器学习算法中,采用AP 聚类算法选出适合企业信用风险的异质学习器作为基学习器;其次,在每次学习器迭代中,利用熵对学习器择优,自动选出F1值最高的基学习器,其中改进基于熵的学习器选择算法,提升了基学习器选择过程的效率,降低了模型的计算成本,模型选取XGBoost作为次级基学习器。实验结果表明,文中提出的模型和其他模型相比具有更好的学习效果和更强的泛化能力。
中图分类号:
[1]YAN R J,YIN S Q.Micro-blog credit evaluation model based on selective neutral network ensembl[J].Computer Engineering and Design,2018,377(5):286-291. [2]YANG J,YUAN Y L,YU H L.Selective Ensemble LearningAlgorithm of Extreme Learning Machine Based on Ant Colony Optimization[J].Computer Science,2016(43):266-271. [3]LIU J P,HE J Z,MA T Y.Selective Ensemble of KELM-Based Complex Network Intrusion Detection[J].Acta Electronica,2019,47(5):1070-1078. [4]HU X J,KANG N.SVM selective ensemble learning methodbased on feature selection[J].Electronic Technology & Software Engineering,2019(18):143-144. [5]FANG K N,FAN X Y,MA S G.Forecasting of Enterprise's Credit Risk Based on Network-logistic Model[J].Statiscal Research,2016,33(4):50-55. [6]ZHANG Q,HU L Y,WANG Y.Study on credit risk earlywarning based on Logit and SVM[J].System Engineering-Thery & Practice,2015(7):1784-1790. [7]LI X,DAI Y C.Research on Early Warning Model of Banking Credit Risk Based on Logit and SVM[J].Wuhan Finance Monthly,2018(2):33-37. [8]LIU Y.The Application of Decision tree algorithm in credit risk assessment of P2P new loan[D]Changsha:Hunan University,2016. [9]YU X H,LOU W G.P2P Online Loan Credit Risk Evaluation,Early Warning and Empirical Research Based on Random Forest[J].Financial Theory & Practice,2016,439(2):53-58. [10]PIERRE G,ERNST D,WEHENKEL L.Extremely randomized trees[J].Machine Learning,2006,63(1):3-42. [11]ALEXEY N,ALOIS K.Gradient boosting machines,a tutorial[J].Frontiers in Neurorobotics,2013,7:21. [12]FEI H Y,HUANG H.Research on Internet Credit Risk Prediction Based on Model Fusion[J].Statistics and Applications,2019,8(5):12. [13]ZHOU Q Y.Application Research of Improved AdaBoost Algorithm in Credit Imbalance Classification[D].Hangzhou:Zhejiang Gongshang University,2020. [14]YU L,YANG Z,TANG L.A novel multistage deep belief network based extreme learning machine.ensemble learning paradigm for credit risk assessment[J].Flexible Services & Manufacturing Journal,2016,28(4):576-592. [15]CHEN Y,SHI S,PAN Y,et al.Hybrid ensemble approach for credit risk assessment based on SVM[J].Computer Engineering and Applications,2016(4):115-120. [16]NASCIMENTO D,COELHO A,CANUTO A.Integrating complementary techniques for promoting diversity in classifier ensembles:A systematic study[J].Neurocomputing,2014,138:347-357. [17]ALA'RAJ M,ABBOD M F.Classifiers consensus system approach for credit scoring[J].Knowledge-Based Systems,2016,104:89-105. [18]XIA Y F.A novel heterogeneous ensemble credit scoring model based on bstacking.approach[J].Expert Systems with Applications,2018,93. [19]ZHOU Z H,WU J X,TANG W.Ensembling neural networks:Many could be better than al1[J].Artificial Intelligence,2002,137(1/2):239-263. [20]ZHANG C X,ZHANG J S.A Survey of Selective EnsembleLearning Algorithms[J].Chinese Journal of Computers,2011,34(8):1399-1410. [21]CHEN K.Study of selective ensemble alogrithm based on classification problems[J].Application Research of Computers,2009(7):2457-2459. [22]ZHENG L R.Heuristic selective ensemble learning algorithm based on clustering and dynamic update[D].Xiamen:Xiamen University,2017. [23]CHEN Q.Research on selective ensemble learning algorithm.Computer Technology and Development[J].Comput Technol,2010,20(2):87-89. [24]FREY B J,DUECK D.Clustering by passing messages between data points[J].Science,2007,315(5814):972-976. [25]KUNCHEVA L I,WHITAKER C J.Ten measures of diversity in classifier ensembles:Limits for two classifiers[C]//Intelligent Sensor Processing.IET,2001. [26]YU J Y.Research on Enterprise Credit Risk Evaluation Based on Heterogeneous Learning Device Integration Strategy[D].Beijing:Central University of Finance and Economics,2019. [27]LIU J C,JIANG X H,WU J P.Realization of a Knowledge Inference Rule Induction System[J].Systems Engineering,2003,21(3):108-110. [28]LI Z S,LIU Z G.Feature selection algorithm based on XGBoost[J].Journal on Communications,2019(10). [29]prosper-loan[EB/OL].https://www.kaggle.com/yousuf28/prosper-loan. [30]lendingclub[EB/OL].https://www.lendingclub.com/info/download-data.action. |
[1] | 孙福权, 梁莹. 基于XGBoost算法的水稻基因组6mA位点识别研究 Identification of 6mA Sites in Rice Genome Based on XGBoost Algorithm 计算机科学, 2022, 49(6A): 309-313. https://doi.org/10.11896/jsjkx.210700262 |
[2] | 李京泰, 王晓丹. 基于代价敏感激活函数XGBoost的不平衡数据分类方法 XGBoost for Imbalanced Data Based on Cost-sensitive Activation Function 计算机科学, 2022, 49(5): 135-143. https://doi.org/10.11896/jsjkx.210400064 |
[3] | 陈静杰, 王琨. 不平衡油耗数据的区间预测方法 Interval Prediction Method for Imbalanced Fuel Consumption Data 计算机科学, 2021, 48(7): 178-183. https://doi.org/10.11896/jsjkx.200500145 |
[4] | 龚追飞, 魏传佳. 基于拓扑相似和XGBoost的复杂网络链路预测方法 Complex Network Link Prediction Method Based on Topology Similarity and XGBoost 计算机科学, 2021, 48(12): 226-230. https://doi.org/10.11896/jsjkx.200800026 |
[5] | 王晓迪, 刘鑫, 于晓. 用于多元时间序列预测的自适应频域模型 Adaptive Frequency Domain Model for Multivariate Time Series Forecasting 计算机科学, 2021, 48(11A): 204-210. https://doi.org/10.11896/jsjkx.210500129 |
[6] | 宋玲玲, 王时绘, 杨超, 盛潇. 改进的XGBoost在不平衡数据处理中的应用研究 Application Research of Improved XGBoost in Imbalanced Data Processing 计算机科学, 2020, 47(6): 98-103. https://doi.org/10.11896/jsjkx.191200138 |
[7] | 王晓晖, 张亮, 李俊清, 孙玉翠, 田捷, 韩睿毅. 基于遗传算法与随机森林的XGBoost改进方法研究 Study on XGBoost Improved Method Based on Genetic Algorithm and Random Forest 计算机科学, 2020, 47(11A): 454-458. https://doi.org/10.11896/jsjkx.200600002 |
[8] | 赵瑞杰, 施勇, 张涵, 龙军, 薛质. 基于TF-IDF的Webshell文件检测 Webshell File Detection Method Based on TF-IDF 计算机科学, 2020, 47(11A): 363-367. https://doi.org/10.11896/jsjkx.200100064 |
[9] | 崔艳鹏,史科杏,胡建伟. 基于XGBoost算法的Webshell检测方法研究 Research of Webshell Detection Method Based on XGBoost Algorithm 计算机科学, 2018, 45(6A): 375-379. |
[10] | 雷雪梅,谢依彤. 用于高血压菜谱识别的基于遗传算法的改进XGBoost模型 Improved XGBoostModel Based on Genetic Algorithm for Hypertension Recipe Recognition 计算机科学, 2018, 45(6A): 476-481. |
|