计算机科学 ›› 2022, Vol. 49 ›› Issue (4): 152-160.doi: 10.11896/jsjkx.210300094
孙林1,2, 黄苗苗1,3, 徐久成1,2
SUN Lin1,2, HUANG Miao-miao1,3, XU Jiu-cheng1,2
摘要: 在多标记学习与分类中,现有邻域粗糙集特征选择算法若将样本的分类间隔作为邻域半径,则会出现分类间隔过大导致分类无意义、样本距离过大容易造成异类样本和同类样本失效,以及无法处理弱标记数据等情况。为解决这些问题,提出一种基于多标记邻域粗糙集和多标记Relief的弱标记特征选择方法。首先,引入异类样本数和同类样本数来改进分类间隔,在此基础上定义邻域半径,构造新的邻域近似精度与多标记邻域粗糙集模型,并有效度量边界域引起的集合不确定性。其次,利用迭代更新权重公式填补大部分缺失标记信息,将邻域近似精度与互信息相结合,以构造新的标记相关性,填补剩余的缺失标记信息。然后,使用异类样本数和同类样本数,以构造新的标记权重和特征权重计算公式,进而提出多标记Relief模型,并将其应用于多标记特征选择。最后,结合多标记邻域粗糙集模型和多标记Relief算法,设计一种新的弱标记特征选择算法,以处理带有缺失标记的高维数据,并有效地提升多标记分类性能。在11个公共多标记数据集上进行仿真实验,结果验证了所提出的弱标记特征选择算法的有效性。
中图分类号:
[1] KASHEF S,NEZAMABADI-POUR H.A label-specific multi-label feature selection algorithm based on the Pareto dominance concept[J].Pattern Recognition,2019,88:654-667. [2] SUN L,YIN T Y,DING W P,et al.Feature selection with mis-sing labels using multilabel fuzzy neighborhood rough sets and maximum relevance minimum redundancy[J/OL].IEEE Tran-sactions on Fuzzy Systems,2021.https://ieeexplore.ieee.org/abstract/document/9333666. [3] GONZÁLEZ-LÓPEZ J,VENTURA S,CANO A.Distributedselection of continuous features in multilabel classification using mutual information[J].IEEE Transactions on Neural Networks and Learning Systems,2020,31(7):2280-2293. [4] DENGW,GUO Y X,LI Y,et al.Power losses prediction based on feature selection and Stacking integrated learning[J].Power System Protection and Control,2020,28(15):108-115. [5] CHEN C Y,LIN Y J,TANG L,et al.Streaming multi-label feature selection based on neighborhood interaction gain information[J].Journal of Nanjing University (Natural Science),2020,56(1):30-40. [6] LI Y C,YANG Y L,QIU H Q.Label embedding for weak label classification[J].Journal of Nanjing University(Natural Science),2020,56(4):549-560. [7] SUN L,YIN T Y,DING W P,et al.Multilabel feature selection using ML-ReliefF and neighborhood mutual information for multilabel neighborhood decision systems[J].Information Sciences,2020,537:401-424. [8] LIU Y,CHENG L,SUN L.Feature selection method based on K-S test and neighborhood rough set[J].Journal of Henan Normal University (Natural Science Edition),2019,47(2):21-28. [9] XUE Z A,PANG W L,YAO S Q,et al.The prospect theory based intuitionistic fuzzy three-way decisions model[J].Journal of Henan Normal University(Natural Science Edition),2020,48(5):31-36. [10] SUN L,WANG L Y,DING W P,et al.Neighborhood multi-granulation rough sets-based attribute reduction using Lebesgue and entropy measures in incomplete neighborhood decision systems[J/OL].Knowledge-Based Systems.https://www.sciencedirect.com/science/article/pii/S0950705119306240. [11] LIU K,FENG S.An improved artificial bee colony algorithm for enhancing local search ability[J].Journal of Henan Normal University (Natural Science Edition),2021,49(2):15-24. [12] SUN L,ZHANG X Y,QIAN Y H,et al.Joint neighborhood entropy-based gene selection method with Fisher score for tumor classification[J].Applied Intelligence,2019,49(4):1245-1259. [13] LIN Y J,LI Y W,WANG C X,et al.Attribute reduction formulti-label learning with fuzzy rough set[J].Knowledge-Based Systems,2018,152:51-61. [14] SUN L,WANG L Y,QIAN Y H,et al.Feature selection using Lebesgue and entropy measures for incomplete neighborhood decision systems[J/OL].Knowledge-Based Systems.https://www.sciencedirect.com/science/article/pii/S0950705119303818. [15] ZHU P F,XU Q,HU Q H,et al.Multi-label feature selection with missing labels[J].Pattern Recognition,2018,74:488-502. [16] SUN L,WANGL Y,DING W P,et al.Feature selection using fuzzy neighborhood entropy-based uncertainty measures for fuzzy neighborhood multigranulation rough sets[J].IEEE Transactions on Fuzzy Systems,2021,29(1):19-33. [17] HAN S M,ZHENG S Q,HE Y S.Open circuit fault diagnosis for inverters based on a greedy algorithm of a rough set[J].Power System Protection and Control,2020,48(17):122-130. [18] LIU J H,LIN Y J,LI Y W,et al.Online multi-label streaming feature selection based on neighborhood rough set[J].Pattern Recognition,2018,84:273-287. [19] HU Q H,YU D R,LIU J F,et al.Neighborhood rough setbased heterogeneous feature subset selection[J].Information Sciences,2008,178(18):3577-3594. [20] DUAN J,HU Q H,ZHANG L J,et al.Feature selection formulti-label classification based on neighborhood rough sets[J].Journal of Computer Research and Development,2015,52(1):56-65. [21] YU Y,PEDRYCZ W,MIAO D Q.Neighborhood rough setsbased multi-label classification for automatic image annotation[J].International Journal of Approximate Reasoning,2013,54(9):1373-1387. [22] YU Y,PEDRYCZ W,MIAO D Q.Multi-label classification by exploiting label correlations[J].Expert Systems with Applications,2014,41(6):2989-3004. [23] SUN L,YIN T Y,DING W P,et al.Hybrid multilabel feature selection using BPSO and neighborhood rough sets for multilabel neighborhood decision systems[J].IEEE Access,2019,7:175793-175815. [24] WANG C X,LIN Y J,LIU J H.Feature selection for multi-label learning with missing labels[J].Applied Intelligence,2019,49(8):3027-3042. [25] JIANG L,YU G X,GUO M Z,et al.Feature selection withmissing labels based on label compression and local feature correlation[J].Neurocomputing,2020,395:95-106. [26] WANG J J,YANG Y L.Multi-label classification algorithm for weak-label data[J].Computer Engineering and Applications,2020,56(5):65-73. [27] YILMAZ T,YAZICI A,KITSUREGAWA M.RELIEF-MM:Effective modality weighting for multimedia information retrie-val[J].Multimedia Systems,2014,20(4):389-413. [28] SPOLAOR N,CHERMAN E A,MONARD M C,et al.ReliefF for multi-label feature selection[C]//Proceedings of the IEEE Brazilian Conference on Intelligent Systems.2013:6-11. [29] REYES O,MORELL C,VENTURA S.Scalable extensions ofthe ReliefF algorithm for weighting and selecting features on the multi-label learning context[J].Neurocomputing,2015,161:168-182. [30] KONG D G,DING C,HUANG H,et al.Multi-label ReliefF and F-statistic feature selections for image annotation[C]//Procee-dings of the IEEE Conference on Computer Vision and Pattern Recognition.2012:2352-2359. [31] LIN M L,LIU J H,WANG C X,et al.Multi-label feature selection algorithm based on label weighting[J].Computer Science,2017,44(10):289-295. [32] CHEN S B,ZHANG Y M,DING C H Q,et al.Extended adaptive lasso for multi-class and multi-label feature selection[J].Knowledge-Based Systems,2019,173:28-36. [33] HE Z F,YANG M,GAO Y,et al.Joint multi-label classification and label correlations with missing labels and feature selection[J].Knowledge-Based Systems,2019,163:145-158. [34] CHUNG C H,DAI B R.A framework of the semi-supervisedmulti-label classification with non-uniformly distributed incomplete labels[C]//Proceedings of International Confe-rence on Big Data Analytics and Knowledge Discovery.2016:267-280. [35] SUN Y Y,ZHANG Y,ZHOU Z H.Multi-label learning with weak label[C]//Proceedings of Twenty-Fourth AAAI Confe-rence on Artificial Intelligence.2010:593-598. [36] BRAYTEE A,LIU W,CATCHPOOLE D R,et al.Multi-label feature selection using correlation information[C]//Proceedings of the ACM Conference on Information and Knowledge Management.2017:1649-1656. [37] CAI Z L,ZHU W.Multi-label feature selection via feature manifold learning and sparsity regularization[J].International Journal of Machine Learning and Cybernetics,2018,9(8):1321-1334. [38] CHANG X J,NIE F P,YANG Y,et al.A convex formulation for semi-supervised multi-label feature selection[C]//Procee-dings of the AAAI Conference on Artificial Intelligence.2014:1171-1177. |
[1] | 李斌, 万源. 基于相似度矩阵学习和矩阵校正的无监督多视角特征选择 Unsupervised Multi-view Feature Selection Based on Similarity Matrix Learning and Matrix Alignment 计算机科学, 2022, 49(8): 86-96. https://doi.org/10.11896/jsjkx.210700124 |
[2] | 胡艳羽, 赵龙, 董祥军. 一种用于癌症分类的两阶段深度特征选择提取算法 Two-stage Deep Feature Selection Extraction Algorithm for Cancer Classification 计算机科学, 2022, 49(7): 73-78. https://doi.org/10.11896/jsjkx.210500092 |
[3] | 康雁, 王海宁, 陶柳, 杨海潇, 杨学昆, 王飞, 李浩. 混合改进的花授粉算法与灰狼算法用于特征选择 Hybrid Improved Flower Pollination Algorithm and Gray Wolf Algorithm for Feature Selection 计算机科学, 2022, 49(6A): 125-132. https://doi.org/10.11896/jsjkx.210600135 |
[4] | 陈于思, 艾志华, 张清华. 基于三角不等式判定和局部策略的高效邻域覆盖模型 Efficient Neighborhood Covering Model Based on Triangle Inequality Checkand Local Strategy 计算机科学, 2022, 49(5): 152-158. https://doi.org/10.11896/jsjkx.210300302 |
[5] | 储安琪, 丁志军. 基于灰狼优化算法的信用评估样本均衡化与特征选择同步处理 Application of Gray Wolf Optimization Algorithm on Synchronous Processing of Sample Equalization and Feature Selection in Credit Evaluation 计算机科学, 2022, 49(4): 134-139. https://doi.org/10.11896/jsjkx.210300075 |
[6] | 李宗然, 陈秀宏, 陆赟, 邵政毅. 鲁棒联合稀疏不相关回归 Robust Joint Sparse Uncorrelated Regression 计算机科学, 2022, 49(2): 191-197. https://doi.org/10.11896/jsjkx.210300034 |
[7] | 毋琳, 白澜, 孙梦伟, 郭拯危. 基于特征优化的SAR图像水华识别方法 Algal Bloom Discrimination Method Using SAR Image Based on Feature Optimization Algorithm 计算机科学, 2021, 48(9): 194-199. https://doi.org/10.11896/jsjkx.200800142 |
[8] | 张叶, 李志华, 王长杰. 基于核密度估计的轻量级物联网异常流量检测方法 Kernel Density Estimation-based Lightweight IoT Anomaly Traffic Detection Method 计算机科学, 2021, 48(9): 337-344. https://doi.org/10.11896/jsjkx.200600108 |
[9] | 杨蕾, 降爱莲, 强彦. 基于自编码器和流形正则的结构保持无监督特征选择 Structure Preserving Unsupervised Feature Selection Based on Autoencoder and Manifold Regularization 计算机科学, 2021, 48(8): 53-59. https://doi.org/10.11896/jsjkx.200700211 |
[10] | 侯春萍, 赵春月, 王致芃. 基于自反馈最优子类挖掘的视频异常检测算法 Video Abnormal Event Detection Algorithm Based on Self-feedback Optimal Subclass Mining 计算机科学, 2021, 48(7): 199-205. https://doi.org/10.11896/jsjkx.200800146 |
[11] | 胡艳梅, 杨波, 多滨. 基于网络结构的正则化逻辑回归 Logistic Regression with Regularization Based on Network Structure 计算机科学, 2021, 48(7): 281-291. https://doi.org/10.11896/jsjkx.201100106 |
[12] | 周钢, 郭福亮. 基于特征选择的高维数据集成学习方法研究 Research on Ensemble Learning Method Based on Feature Selection for High-dimensional Data 计算机科学, 2021, 48(6A): 250-254. https://doi.org/10.11896/jsjkx.200700102 |
[13] | 丁思凡, 王锋, 魏巍. 一种基于标签相关度的Relief特征选择算法 Relief Feature Selection Algorithm Based on Label Correlation 计算机科学, 2021, 48(4): 91-96. https://doi.org/10.11896/jsjkx.200800025 |
[14] | 滕俊元, 高猛, 郑小萌, 江云松. 噪声可容忍的软件缺陷预测特征选择方法 Noise Tolerable Feature Selection Method for Software Defect Prediction 计算机科学, 2021, 48(12): 131-139. https://doi.org/10.11896/jsjkx.201000168 |
[15] | 张亚钏, 李浩, 宋晨明, 卜荣景, 王海宁, 康雁. 混合人工化学反应优化和狼群算法的特征选择 Hybrid Artificial Chemical Reaction Optimization with Wolf Colony Algorithm for Feature Selection 计算机科学, 2021, 48(11A): 93-101. https://doi.org/10.11896/jsjkx.210100067 |
|