一种基于Q-学习算法的增量分类模型

doi:10.11896/jsjkx.190600150

摘要/Abstract

摘要： 大数据时代的数据信息呈现持续性、爆炸性的增长, 为机器学习算法带来了大量监督样本。然而, 这对信息通常不是一次性获得的, 且获得的数据标记是不准确的, 这对传统的分类模型提出了挑战, 而增量学习是一种重要的解决方法。但在增量学习中, 样本的标记顺序将严重影响分类器的性能, 特别是在分类器分类能力较弱的情况下, 传统的增量学习方法容易过早地将噪声数据添加到训练集上, 从而影响分类器的精度。为解决这个问题, 文中提出一种基于Q-学习算法的增量分类模型。该模型利用强化学习中经典的Q-学习算法来合理选择样本增量序列, 削弱噪声数据的负面影响, 并实现在学习过程中自主标记样本。同时, 为了解决当新增未标记样本集规模较大时, Q-学习中的状态空间与动作空间增大带来的计算复杂度和存储空间呈指数增长的问题, 文中进一步给出了批量增量分类模型, 有效降低了模型的计算复杂度并节约了存储空间。基于Q-学习算法的增量分类模型融合了增量学习及强化学习的思想, 具有分类精度高、实时性强等优点。最后, 在3个UCI数据集上进行实验来验证所提模型的有效性, 结果表明该模型通过选择新增训练集合的确有助于提升分类器的精度, 且由不同增量序列训练得到的分类器精度也有较大差异。基于Q-学习算法的增量分类模型可以利用已有的少量监督信息进行初始训练, 通过自主标记样本构造增量训练集, 并通过自监督的方式提高分类器的精度。因此, 基于Q-学习算法的增量分类模型可被用于解决监督信息缺乏的问题, 具有一定的应用价值。

关键词: Q-学习, 分类, 强化学习, 在线学习, 增量学习

Abstract: The traditional classification models are insufficient to take full advantage of the sequential data with their continuous and explosive growth due to the imprecision of the data.Therefore, the incremental learning is provided to handle this problem.However, the difference sequence of the training samples may have strong impact on performance of a classifier.Especially when the classifier is undertrained, traditional incremental learning method takes the risk of utilizing the noise samples with wrong labels to train the classifier.To overcome this problem, this paper proposes an incremental classification model based on Q-learning algorithm.The model employs the classical Q-learning algorithm in reinforcement learning to select the sequence samples incrementally, which is capable of softening the negative impact of the noise data and labels samples automatically as well.To overcome the problem of computational complexity along with the increasing of state space and action space of Q-learning, an improved batch incremental classification model based on Q-learning algorithm is proposed.Compared with the traditionally trained classifiers, the proposed model combines the ideas of online incremental learning and reinforcement learning, which is able to achieve high accuracy and can be updated online.Finally, the validity of the model is verified on three UCI datasets.The experimental results show that choosing training sets incrementally is helpful to improve the performance of the classifier and the precision of the classifier trained by different incremental training sequences varies greatly as well.The proposed incremental classification model based on Q-learning algorithm can make use of the limited available dataset for supervised initial training, and then construct new-added self-supervised training set based on the Q value of each unlabeled sample to improve the accuracy of the classifier.Therefore, the incremental classification model based on Q-learning algorithm can be used to solve the problem of lack of supervisory information, and has a potential application.

Key words: Classification, Incremental learning, Online learning, Q-learning, Reinforcement learning

中图分类号:

TP391

刘凌云, 钱辉, 邢红杰, 董春茹, 张峰. 一种基于Q-学习算法的增量分类模型[J]. 计算机科学, 2020, 47(8): 171-177. https://doi.org/10.11896/jsjkx.190600150

LIU Ling-yun, QIAN Hui, XING Hong-jie, DONG Chun-ru, ZHANG Feng. Incremental Classification Model Based on Q-learning Algorithm[J]. Computer Science, 2020, 47(8): 171-177. https://doi.org/10.11896/jsjkx.190600150

参考文献

[1]KRIZHEVSKY A, SUTSKEVER I, HINTON G E.ImageNet Classification with Deep Convolutional Neural Networks[C]∥International Conference on Neural Information Processing Systems.2012:1097-1105.
[2]GOODFELIIOW I J, POUGET-ABADIE J, MIRZA M, et al.Generative Adversarial Networks[J].Advances in Neural Information Processing Systems, 2014, 3:2672-2680.
[3]XIAO R, WANG J C, SUN Z X, et al.An Incremental SVM Learning Algorithm α-ISVM[J].Journal of Software, 2001, 12(12):1818-1824.
[4]KIVINEN J, SMOLA A J, WILLIAMSON R C.Online Learning with Kernels[J].IEEE Transactions on Signal Processing, 2004, 52(8):2165-2176.
[5]GONG X J, LIU S H, SHI Z Z.An Incremental Bayes Classification Model[J].Chinese Journal of Computers, 2002, 25(6):645-650.
[6]RICHARD S, ANDREW B.Reinforcement Learning:An Introduction[M].Cambridge, MA:MIT Press, 1998.
[7]FOERSTER J, NARDELLI N, FARQUHAR G, et al.Stabili-sing Experience Replay for Deep Multi-agent Reinforcement Learning[J].arXiv:1702.08887v1.
[8]LI J.Incremental Learning and Its Applications to Image Recognition[D].Shanghai:Shanghai Jiao Tong University, 2008.
[9]COPPOCK H W, FREUND J E.All-or-none Versus Incremental Learning of Errorless Shock Ecapes by the Rat[J].Science, 1962, 135 (3500):318-319.
[10]SYED N, LIU H, SUNG K.Incremental Learning with Support Vetcor Machines[C]∥Proceedings of the Workshop on Support Vetcor Machines at the International Joint Conference on Artificial Intelligence.Stockholm:Morgan Kaufmann Publishers, 1999:876-892.
[11]ZENG W H, MA J.An Incremental Learning Algorithm forSupport Vector Machine and its Application[J].Computer Integrated Manufacturing System, 2003, 9(S1):144-148.
[12]ZHAO Y H, WANG K N, ZHONG P, et al.Incremental support vector machine based on border samples[J].Computer Engineering and Design, 2010(1):161-163.
[13]PI W J, GONG X J.Data driven parallel incremental support vector machine learning algorithm based on Hadoop framework[J].Journal of Computer Applications, 2016(11):3044-3049.
[14]VO M T.Incremental Learning Using the Time Delay NeuralNetwork[C]∥Proceedings of ICASSP’94.IEEE International Conference on Acoustics, Speech and Signal Processing.IEEE, 1994, 2(2):629-632.
[15]WANG Z.A Modified Neutral Network Increment Study Algorithm[J].Computer Science, 2007, 34(6):177-178.
[16]ZHAO C C.Research of Ensemble Incremental Learning Based on RBF[D].Tianjin:Hebei University of Technology, 2014.

[17]NAKAMURA Y, HASEGAWA O.Nonparametric Density Estimation Based on Self-Organizing Incremental Neural Network for Large Noisy Data[J].IEEE Transactions on Neural Networks & Learning Systems, 2017, 28(1):8-17.
[18]ZHANG Q X, ZHENG J J, NIU Z D, et al.Increment Learning Algorithm Based on Bayesian Classifier Integration[J].Transactions of Beijing Institute of Technology, 2008, 28(5):397-400.
[19]WEI Y, XU M, ZHENG Y.Incremental Learning Method ofBayesian Classification Combined with Feedback Information[J].Journal of Computer Applications, 2011, 1(9):643-648.
[20]SU Z T, LI Y.On Improved Incremental Bayesian Classification Model[J].Computer Applications and Software, 2016, 33(8):254-259.
[21]KOCHUROV M, GARIPOV T, PODOPRIKIN D, et al.Ba-yesian Incremental Learning for Deep Neural Networks[J].ar-Xiv:1802.07329, 2018.
[22]KAELBLING L P.Reinforcement Learning:A Survey[J].Journal of Artificial Intelligence Research, 1996, 4:237-285.
[23]WATKINS C J C H.Learning from Delayed Rewards[J].Robotics & Autonomous Systems, 1989, 15(4):233-235.
[24]WATKINS C J C H, DAYAN P.Technical Note:Q-Learning[J].Machine Learning, 1992, 8(3/4):279-292.
[25]MITCHELL T M.Machine Learning[M].Beijing:China Machine Press, 2014:270-271.
[26]SHANNON C E, WEAVER W.A Mathematical Theory ofCommunication[J].Bell Labs Technical Journal, 1948, 27(4):379-423.
[27]DEKE O, RAN G B, SHAMIR O, et al.Optimal Distributed Online Prediction Using Mini-batches[J].Journal of Machine Learning Research, 2012, 13(1):165-202.

相关文章 15

[1]	刘兴光, 周力, 刘琰, 张晓瀛, 谭翔, 魏急波. 基于边缘智能的频谱地图构建与分发方法 Construction and Distribution Method of REM Based on Edge Intelligence 计算机科学, 2022, 49(9): 236-241. https://doi.org/10.11896/jsjkx.220400148
[2]	陈志强, 韩萌, 李慕航, 武红鑫, 张喜龙. 数据流概念漂移处理方法研究综述 Survey of Concept Drift Handling Methods in Data Streams 计算机科学, 2022, 49(9): 14-32. https://doi.org/10.11896/jsjkx.210700112
[3]	周旭, 钱胜胜, 李章明, 方全, 徐常胜. 基于对偶变分多模态注意力网络的不完备社会事件分类方法 Dual Variational Multi-modal Attention Network for Incomplete Social Event Classification 计算机科学, 2022, 49(9): 132-138. https://doi.org/10.11896/jsjkx.220600022
[4]	熊丽琴, 曹雷, 赖俊, 陈希亮. 基于值分解的多智能体深度强化学习综述 Overview of Multi-agent Deep Reinforcement Learning Based on Value Factorization 计算机科学, 2022, 49(9): 172-182. https://doi.org/10.11896/jsjkx.210800112
[5]	史殿习, 赵琛然, 张耀文, 杨绍武, 张拥军. 基于多智能体强化学习的端到端合作的自适应奖励方法 Adaptive Reward Method for End-to-End Cooperation Based on Multi-agent Reinforcement Learning 计算机科学, 2022, 49(8): 247-256. https://doi.org/10.11896/jsjkx.210700100
[6]	郝志荣, 陈龙, 黄嘉成. 面向文本分类的类别区分式通用对抗攻击方法 Class Discriminative Universal Adversarial Attack for Text Classification 计算机科学, 2022, 49(8): 323-329. https://doi.org/10.11896/jsjkx.220200077
[7]	武红鑫, 韩萌, 陈志强, 张喜龙, 李慕航. 监督和半监督学习下的多标签分类综述 Survey of Multi-label Classification Based on Supervised and Semi-supervised Learning 计算机科学, 2022, 49(8): 12-25. https://doi.org/10.11896/jsjkx.210700111
[8]	刘冬梅, 徐洋, 吴泽彬, 刘倩, 宋斌, 韦志辉. 基于边框距离度量的增量目标检测方法 Incremental Object Detection Method Based on Border Distance Measurement 计算机科学, 2022, 49(8): 136-142. https://doi.org/10.11896/jsjkx.220100132
[9]	袁唯淋, 罗俊仁, 陆丽娜, 陈佳星, 张万鹏, 陈璟. 智能博弈对抗方法:博弈论与强化学习综合视角对比分析 Methods in Adversarial Intelligent Game:A Holistic Comparative Analysis from Perspective of Game Theory and Reinforcement Learning 计算机科学, 2022, 49(8): 191-204. https://doi.org/10.11896/jsjkx.220200174
[10]	檀莹莹, 王俊丽, 张超波. 基于图卷积神经网络的文本分类方法研究综述 Review of Text Classification Methods Based on Graph Convolutional Network 计算机科学, 2022, 49(8): 205-216. https://doi.org/10.11896/jsjkx.210800064
[11]	闫佳丹, 贾彩燕. 基于双图神经网络信息融合的文本分类方法 Text Classification Method Based on Information Fusion of Dual-graph Neural Network 计算机科学, 2022, 49(8): 230-236. https://doi.org/10.11896/jsjkx.210600042
[12]	于滨, 李学华, 潘春雨, 李娜. 基于深度强化学习的边云协同资源分配算法 Edge-Cloud Collaborative Resource Allocation Algorithm Based on Deep Reinforcement Learning 计算机科学, 2022, 49(7): 248-253. https://doi.org/10.11896/jsjkx.210400219
[13]	李梦菲, 毛莺池, 屠子健, 王瑄, 徐淑芳. 基于深度确定性策略梯度的服务器可靠性任务卸载策略 Server-reliability Task Offloading Strategy Based on Deep Deterministic Policy Gradient 计算机科学, 2022, 49(7): 271-279. https://doi.org/10.11896/jsjkx.210600040
[14]	高振卓, 王志海, 刘海洋. 嵌入典型时间序列特征的随机Shapelet森林算法 Random Shapelet Forest Algorithm Embedded with Canonical Time Series Features 计算机科学, 2022, 49(7): 40-49. https://doi.org/10.11896/jsjkx.210700226
[15]	杨炳新, 郭艳蓉, 郝世杰, 洪日昌. 基于数据增广和模型集成策略的图神经网络在抑郁症识别上的应用 Application of Graph Neural Network Based on Data Augmentation and Model Ensemble in Depression Recognition 计算机科学, 2022, 49(7): 57-63. https://doi.org/10.11896/jsjkx.210800070

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed