Technology of Extracting Topical Keyphrases from Chinese Corpora

Computer Science ›› 2017, Vol. 44 ›› Issue (Z11): 432-436.doi: 10.11896/j.issn.1002-137X.2017.11A.092

Technology of Extracting Topical Keyphrases from Chinese Corpora

YANG Yue and ZHANG De-sheng   

  • Online:2018-12-01 Published:2018-12-01

Abstract: In the big data era,the information is exploding.The most popular information among people connection is text message.On the Internet,there are countless text information upload or download every day.The important way to quickly grasp content of countless text message is extracting keywords.However,the traditional work of extracting keywords from text corpora ignores two problems:the length of keywords and the topic of text corpora.In this paper,a new algorithm which is in consideration of two aspects mentioned above was proposed.This paper combined the LDA topic model and frequent phrases discovery algorithm to generate frequent candidate phrases with different length,at the same time,this paper proposed an algorithm of completeness filter and rank function to filt and rank candidate.Finally,according to the rank list,the real keyphrases were chosen.

Key words: Extracting keywords,LDA topic model,Frequent phrases,Completeness filter,Rank function

