Chinese Text Classification Model Based on Improved TF-IDF and ABLCNN

Computer Science ›› 2021, Vol. 48 ›› Issue (11A): 170-175.doi: 10.11896/jsjkx.210100232

• Intelligent Computing • Previous Articles     Next Articles

Chinese Text Classification Model Based on Improved TF-IDF and ABLCNN

JING Li, HE Ting-ting   

  1. School of Computer and Information Engineering,Henan University of Economics and Law,Zhengzhou 450000,China
  • Online:2021-11-10 Published:2021-11-12
  • About author:JING Li,born in 1971,Ph.D,professor,is a member of China Computer Federation.Her main research interests include artificial intelligence and information security.
    HE Ting-ting,born in 1996,postgra-duate.Her main research interests include natural language processing and data mining.
  • Supported by:
    National Natural Science Foundation of China(61806073).

Abstract: Text classification which is often used in information retrieval,emotion analysis and other fields,is a very important content in the field of natural language processing and has become a research hotspot of many scholars.Traditional text classification model exists the problems of incomplete text feature extraction and weak semantic expression,thus,a text classification model based on improved TF-IDF algorithm and attention base on Bi-LSTM and CNN (ABLCNN) is proposed.Firstly,the TF-IDF algorithm is improved by using the distribution relationship of feature items within and between classes and location information to highlight the importance of feature items,the text is represented by word vector trained by word2vec tool and improved TF-IDF.Then,ABLCNN extracts the text features.ABLCNN combines the advantages of attention mechanism,long-term memory network and convolutional neural network.ABLCNN not only extracts major the context semantic features of the text,but also takes into account the local semantic features,At last,the feature vector is classified by softmax function.Chinese text classification model based on improved TF-IDF and ABLCNN is tested on THUCNews dataset and online_ shopping_ 10_cats dataset.The results of experimental show that the accuracy on the THUCNews dataset is 97.38% and the accuracy on the online_ shopping_ 10_cats dataset is 91.33%,the accuracy of experiment is higher than that of other text classification models.

Key words: Attention, Convolutional neural network, Long-term and short-term memory network, Term frequency-inverse document frequency, Text classification

CLC Number: 

  • TP391
[1]WEI J.Research on chinese text classification algorithm basedon convolutional neural network[C]//3rd International Confe-rence on Computer Engineering,Information Science & Application Technology(ICCIA 2019).Paris:Atlantis Press,2019:250-254.
[2]KOWSARI K,JAFARI MEIMANDI K,HEIDARYSAFA M,et al.Text classification algorithms:a survey[J].Information,2019,10(4):150.
[3]CHEN Z,ZHOU L J,DA LI X,et al.The Lao text classification method based on KNN[J].Procedia Computer Science,2020,166:523-528.
[4]HUO G Y,ZHANG Y,SUN Y,et al.Research on Archive Data Intelligent Classification Based on Semantic[J/OL].(2020-11-18) [2021-01-21].http:// kns.cnki.net/kcms/detail/11.2127.TP.20201118.1647.018.html.
[5]HU W,GU Z,XIE Y,et al.Chinese text classification based on neural networks and word2vec[C]//2019 IEEE Fourth International Conference on Data Science in Cyberspace (DSC).Piscata-way:IEEE,2019:284-291.
[6]LU Y,ZHANG P Z,ZHANG C.Research on News Keyword Extraction Technology Based on TF-IDF and TextRank[C]//2019 IEEE/ACIS 18th International Conference on Computer and Information Science (ICIS).Piscataway:IEEE,2019:425-455.
[7]YE X M,MAO X M,XIA J C.Improved approach to TF-IDF algorithm in text classification[J].Computer Engineering and Applications,2019,55(2):104-109,161.
[8]MA Y,ZHAO H,LI W L,et al.Optimization of TF-IDF algorithm combined with improved CHI statistical method[J].Application Research of Computers,2019,36(9):2596-2598,2603.
[9]ZHANG L,LI Z H.An improved feature weighting method in text classification[J].Journal of Fujian Normal University(Na-tural Science Edition),2020,36(2):49-54.
[10]PENG H,LI J,HE Y,et al.Large-scale hierarchical text classification with recursively regularized deep graph-cnn[C]//Proceedings of the 2018 World Wide Web.Switzerland:InternationalWorld Wide Web Conferences Steering Committee Republic and Canton of Geneva,2018:1063-1072.
[11]LIU P,QIU X,HUANG X.Recurrent neural network for text classification with multi-task learning[J].arXiv:1605.05101,2016.
[12]KIM Y.Convolutional Neural Networks for Sentence Classification[C]//Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing.Qatar,2014:1746-1751.
[13]ZHOU P,QI Z,ZHENG S,et al.Text classification improved by integrating bidirectional LSTM with two-dimensional max pooling[J].arXiv:1611.06639,2016.
[14]XING X,SUN G Z.Dual-channel word vectors based acrnn for text classification.[J/OL].(2020-12-14)[2021-01-21].https://doi.org/10.19734/j.issn.1001-3695.
[15]DU L,CAO D,LIN S Y,et al.Extraction and Automatic Classification of TCM Medical Records Based on Attention Mechanism of BERT and Bi-LSTM[J].Computer Science,2020,47(S2):416-420.
[16]BAI F B,CHANG L,WANG S F,et al.An Improved method study on the extracting keywords in chinese Judgment documents[J].Computer Engineering and Applications,2020,56(23):153-160.
[17]HOCHSREITER S,SCHMIDHUBER J.Long short-term me-mory[J].Neural Computation,1997,9(8):1735-1780.
[18]DONG Y R,LIU P Y,LIU W F,et al.A text classification model based on BiLSTM and label embedding[J].Journal of Shandong University(Natural Science),2020,55(11):78-86.
[19]SUN H,CHEN Y Q.Chinese text classification based on BERT and attention.[J/OL].(2021-01-06) [2021-01-21].https://kns.cnki.net/kcms/detail/detail.aspx?FileName=XXWX2021010500E&DbName=CAPJ2021.
[20]WANG H T,SONG W,WANG H.Text classification method based on hybrid model of LSTM and CNN[J].Journal of Chinese Computer Systems,2020,41(6):1163-1168.
[21]WANG G S,HUANG X J.convolution neural network textclassification model based on Word2vec and improved TF-IDF[J].Journal of Chinese Computer Systems,2019,40(5):1120-1126.
[22]LI Y H,LIANG S C,REN J,et al.Text classification method based on recurrent neural network variants and convolutional neural network[J].Journal of Northwest University(Natural Science Edition),2019,49(4):573-579.
[1] RAO Zhi-shuang, JIA Zhen, ZHANG Fan, LI Tian-rui. Key-Value Relational Memory Networks for Question Answering over Knowledge Graph [J]. Computer Science, 2022, 49(9): 202-207.
[2] WU Zi-yi, LI Shao-mei, JIANG Meng-han, ZHANG Jian-peng. Ontology Alignment Method Based on Self-attention [J]. Computer Science, 2022, 49(9): 215-220.
[3] ZHOU Fang-quan, CHENG Wei-qing. Sequence Recommendation Based on Global Enhanced Graph Neural Network [J]. Computer Science, 2022, 49(9): 55-63.
[4] DAI Yu, XU Lin-feng. Cross-image Text Reading Method Based on Text Line Matching [J]. Computer Science, 2022, 49(9): 139-145.
[5] ZHOU Le-yuan, ZHANG Jian-hua, YUAN Tian-tian, CHEN Sheng-yong. Sequence-to-Sequence Chinese Continuous Sign Language Recognition and Translation with Multi- layer Attention Mechanism Fusion [J]. Computer Science, 2022, 49(9): 155-161.
[6] XIONG Li-qin, CAO Lei, LAI Jun, CHEN Xi-liang. Overview of Multi-agent Deep Reinforcement Learning Based on Value Factorization [J]. Computer Science, 2022, 49(9): 172-182.
[7] SHI Dian-xi, ZHAO Chen-ran, ZHANG Yao-wen, YANG Shao-wu, ZHANG Yong-jun. Adaptive Reward Method for End-to-End Cooperation Based on Multi-agent Reinforcement Learning [J]. Computer Science, 2022, 49(8): 247-256.
[8] HAO Zhi-rong, CHEN Long, HUANG Jia-cheng. Class Discriminative Universal Adversarial Attack for Text Classification [J]. Computer Science, 2022, 49(8): 323-329.
[9] JIANG Meng-han, LI Shao-mei, ZHENG Hong-hao, ZHANG Jian-peng. Rumor Detection Model Based on Improved Position Embedding [J]. Computer Science, 2022, 49(8): 330-335.
[10] WU Hong-xin, HAN Meng, CHEN Zhi-qiang, ZHANG Xi-long, LI Mu-hang. Survey of Multi-label Classification Based on Supervised and Semi-supervised Learning [J]. Computer Science, 2022, 49(8): 12-25.
[11] LI Rong-fan, ZHONG Ting, WU Jin, ZHOU Fan, KUANG Ping. Spatio-Temporal Attention-based Kriging for Land Deformation Data Interpolation [J]. Computer Science, 2022, 49(8): 33-39.
[12] WANG Ming, PENG Jian, HUANG Fei-hu. Multi-time Scale Spatial-Temporal Graph Neural Network for Traffic Flow Prediction [J]. Computer Science, 2022, 49(8): 40-48.
[13] FANG Yi-qiu, ZHANG Zhen-kun, GE Jun-wei. Cross-domain Recommendation Algorithm Based on Self-attention Mechanism and Transfer Learning [J]. Computer Science, 2022, 49(8): 70-77.
[14] CHEN Yong-quan, JIANG Ying. Analysis Method of APP User Behavior Based on Convolutional Neural Network [J]. Computer Science, 2022, 49(8): 78-85.
[15] ZHU Cheng-zhang, HUANG Jia-er, XIAO Ya-long, WANG Han, ZOU Bei-ji. Deep Hash Retrieval Algorithm for Medical Images Based on Attention Mechanism [J]. Computer Science, 2022, 49(8): 113-119.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!