计算机科学 ›› 2019, Vol. 46 ›› Issue (10): 7-13.doi: 10.11896/jsjkx.181102216
杨德杰1, 章宁1, 袁戟2, 白璐1
YANG De-jie1, ZHANG Ning1, YUAN Ji2, BAI Lu1
摘要: 个人信用历来是银行衡量个人履约风险最重要的因素。近年来,随着我国借贷需求与日俱增,仅依据信用卡信息的传统个人信用评估方式,已不能完全满足银行业的发展需求。因此,为了构建更加丰富的用户信用画像,文中基于银行大数据提取信用风险评估特征。为了解决金融大数据带来的维度灾难和噪声问题,充分考虑了数据特征之间的相关性,对堆栈降噪自编码神经网络模型进行了改进,引入了截断的Karhunen-Loève展开作为噪声传入项,并在某商业银行的大数据平台上进行了一系列数据实验。实验结果显示:相比仅使用信用卡信息,利用银行大数据能使衡量正负样本分离度的指标——K-S值提升约11%;改进的堆栈降噪自编码神经网络方法具有更好的风险评估效果,准确率相比原模型提高了3%左右,验证了在银行大数据环境下进行信用风险评估的有效性。
中图分类号:
[1]LESSMANN S,BAESENS B,SEOW H V,et al.Benchmarking State-of-theart Classification Algorithms for Credit Scoring:An Update of Research[J].European Journal of Operational Research,2015,247(1):124-136. [2]VISHWAKARMA A C,SOLANKI R.Analysing Credit Risk using Statistical and Machine Learning Techniques[J].International Journal of Engineering Science and Computing,2018,8(6):18397-18404. [3]JAYANTHI J,JOSEPH KS,VAISHNAVI J.Bankruptcy Prediction using SVM and Hybrid SVM Survey [J].International Journal of Computer Application,2011,33(7):39-45. [4]FANG K N,ZHANG G J,ZHANG H Y.Individual Credit Risk Prediction Method:Application of a Lasso-logistic Model [J].The Journal of Quantitative & Technical Economics,2014,31(2):125-136.(in Chinese) 方匡南,章贵军,张慧颖.基于Lasso-logistic模型的个人信用风险预警方法[J].数量经济技术经济研究,2014,31(2):125-136. [5]LIN W Y,HU Y H,TSAI C F.Machine Learning in Financial Crisis Prediction:A Survey[J].IEEE Transactions on Systems Man & Cybernetics Part C,2012,42(4):421-436. [6]CHEN M Y,CHEN C C,LIU J Y.Credit Rating Analysis with Support Vector Machines and Artificial Bee Colony Algorithm[C]//Recent Trends in Applied Artificial Intelligence.Amsterdam:Springer,2013:528-534. [7]HEATON J B,POLSON N G,WITTE J H.Deep Learning in Finance[J].Applied Stochastic Models in Business and Industry,2017,33(1):561-580. [8]YU L,YANG Z B,TANG L.A Novel Multistage Deep Belief Network Based Extreme Learning Machine Ensemble Learning Paradigm for Credit Risk Assessment[J].Flexible Services & Manufacturing Journal,2016,28(4):576-592. [9]SIRIGNANO J,SADHWANI A,GIESECKE K.Deep Learning for Mortgage Risk[J].Social Science Electronic Publishing,2017,22(6):134-216. [10]SHIGEYUKI H,MINAMI K,TAKAHIRO K,et al.Ensemble Learning or Deep Learning? Application to Default Risk Analysis[J].Risk and Financial Management,2018,11(1):12-25. [11]MA S L,WUNIRI Q G,LI X P.Deep Learning With Big Data:State of The Art and Development [J].CAAI Transactions on Intelligent Systems,2016,11(6):728-742.(in Chinese) 马世龙,乌尼日其其格,李小平.大数据与深度学习综述[J].智能系统学报,2016,11(6):728-742. [12]LIU X H,DING W.Big Data Credit Reporting Practices of ZestFinance in The United States[J].Credit Reference,2015,22(8):27-32.(in Chinese) 刘新海,丁伟.美国ZestFinance公司大数据征信实践 [J].征信,2015,22(8):27-32. [13]LECUN Y,BENGIO Y,HINTON G.Deep Learning [J].Nature,2015,521(7553):436-444. [14]CUI L X,BAI L,HANCOCK E R,et al.Identifying the most informative features using a structurally interacting elastic net[J].Neurocomputing,2018,313(11):65-77. [15]ADDO P M,GUEGAN D,HASSANI B.Credit Risk Analysis Machine and Deep Learning Models[J].Risks,2018,6(2):38-57. [16]HINTON G E,SALAKHUTDINOV R R.Reducing the dimensionality of data with neural networks[J].Science,2006,313(5786):504-507. [17]VINCENT P,LAROCHELLE H,LAJOIE I,et al.Stacked Denosing Autoencoders:Learning Useful Representations in a Deep Network with aLocal Denoising Criterion [J].Journal Machine Learning Research,2010,27(11):3371-3408. [18]SAGHA H,CUMMINS N,SCHULLER B.Stacked Denoising Autoencoders for Sentiment Analysis:A review[J].Data Mining and Knowledge Discovery,2017,7(5):132-146. [19]ALHASSAN Z,MCGOUGH A,ALSHAMMARI R,et al. Stacked Denoising Autoencoders for Mortality Risk Prediction Using Imbalanced Clinical Data[C]//IEEE International Conference on Machine Learning and Applications.Orlando:IEEE Press,2018:396-401. [20]VANMARCKE E H.Random Fields:Analysis and Synthesis [M].Cambridge:MIT Press,1983:92-101. [21]YUAN J.Time-dependent Probabilistic Assessment of Rainfall-induced Slope Failure[D].Munich:Technical University of Munich,2016. [22]BETZ W,PAPAIOANNOU I,STRAUB D.Numerical Methods for the Discretization of Random Fields by Means of the Karhunen-Loève Expansion[J].Computer Methods in Applied Mechanics and Engineering,2014,271(0):109-129. |
[1] | 饶志双, 贾真, 张凡, 李天瑞. 基于Key-Value关联记忆网络的知识图谱问答方法 Key-Value Relational Memory Networks for Question Answering over Knowledge Graph 计算机科学, 2022, 49(9): 202-207. https://doi.org/10.11896/jsjkx.220300277 |
[2] | 汤凌韬, 王迪, 张鲁飞, 刘盛云. 基于安全多方计算和差分隐私的联邦学习方案 Federated Learning Scheme Based on Secure Multi-party Computation and Differential Privacy 计算机科学, 2022, 49(9): 297-305. https://doi.org/10.11896/jsjkx.210800108 |
[3] | 徐涌鑫, 赵俊峰, 王亚沙, 谢冰, 杨恺. 时序知识图谱表示学习 Temporal Knowledge Graph Representation Learning 计算机科学, 2022, 49(9): 162-171. https://doi.org/10.11896/jsjkx.220500204 |
[4] | 王剑, 彭雨琦, 赵宇斐, 杨健. 基于深度学习的社交网络舆情信息抽取方法综述 Survey of Social Network Public Opinion Information Extraction Based on Deep Learning 计算机科学, 2022, 49(8): 279-293. https://doi.org/10.11896/jsjkx.220300099 |
[5] | 郝志荣, 陈龙, 黄嘉成. 面向文本分类的类别区分式通用对抗攻击方法 Class Discriminative Universal Adversarial Attack for Text Classification 计算机科学, 2022, 49(8): 323-329. https://doi.org/10.11896/jsjkx.220200077 |
[6] | 姜梦函, 李邵梅, 郑洪浩, 张建朋. 基于改进位置编码的谣言检测模型 Rumor Detection Model Based on Improved Position Embedding 计算机科学, 2022, 49(8): 330-335. https://doi.org/10.11896/jsjkx.210600046 |
[7] | 何强, 尹震宇, 黄敏, 王兴伟, 王源田, 崔硕, 赵勇. 基于大数据的进化网络影响力分析研究综述 Survey of Influence Analysis of Evolutionary Network Based on Big Data 计算机科学, 2022, 49(8): 1-11. https://doi.org/10.11896/jsjkx.210700240 |
[8] | 李斌, 万源. 基于相似度矩阵学习和矩阵校正的无监督多视角特征选择 Unsupervised Multi-view Feature Selection Based on Similarity Matrix Learning and Matrix Alignment 计算机科学, 2022, 49(8): 86-96. https://doi.org/10.11896/jsjkx.210700124 |
[9] | 陈晶, 吴玲玲. 多源异构环境下的车联网大数据混合属性特征检测方法 Mixed Attribute Feature Detection Method of Internet of Vehicles Big Datain Multi-source Heterogeneous Environment 计算机科学, 2022, 49(8): 108-112. https://doi.org/10.11896/jsjkx.220300273 |
[10] | 孙奇, 吉根林, 张杰. 基于非局部注意力生成对抗网络的视频异常事件检测方法 Non-local Attention Based Generative Adversarial Network for Video Abnormal Event Detection 计算机科学, 2022, 49(8): 172-177. https://doi.org/10.11896/jsjkx.210600061 |
[11] | 侯钰涛, 阿布都克力木·阿布力孜, 哈里旦木·阿布都克里木. 中文预训练模型研究进展 Advances in Chinese Pre-training Models 计算机科学, 2022, 49(7): 148-163. https://doi.org/10.11896/jsjkx.211200018 |
[12] | 周慧, 施皓晨, 屠要峰, 黄圣君. 基于主动采样的深度鲁棒神经网络学习 Robust Deep Neural Network Learning Based on Active Sampling 计算机科学, 2022, 49(7): 164-169. https://doi.org/10.11896/jsjkx.210600044 |
[13] | 苏丹宁, 曹桂涛, 王燕楠, 王宏, 任赫. 小样本雷达辐射源识别的深度学习方法综述 Survey of Deep Learning for Radar Emitter Identification Based on Small Sample 计算机科学, 2022, 49(7): 226-235. https://doi.org/10.11896/jsjkx.210600138 |
[14] | 胡艳羽, 赵龙, 董祥军. 一种用于癌症分类的两阶段深度特征选择提取算法 Two-stage Deep Feature Selection Extraction Algorithm for Cancer Classification 计算机科学, 2022, 49(7): 73-78. https://doi.org/10.11896/jsjkx.210500092 |
[15] | 程成, 降爱莲. 基于多路径特征提取的实时语义分割方法 Real-time Semantic Segmentation Method Based on Multi-path Feature Extraction 计算机科学, 2022, 49(7): 120-126. https://doi.org/10.11896/jsjkx.210500157 |
|