Web Text Categorization Based on Statistical Merging Algorithm in Big Data Environment | IGI Global Scientific Publishing
Reference Hub18
Web Text Categorization Based on Statistical Merging Algorithm in Big Data Environment

Web Text Categorization Based on Statistical Merging Algorithm in Big Data Environment

Rujuan Wang (College of Humanities & Sciences of Northeast Normal University, Changchun, China) and Gang Wang (Northeast Normal University, Changchun, China)
Copyright: © 2019 |Volume: 10 |Issue: 3 |Pages: 16
ISSN: 1941-6237|EISSN: 1941-6245|EISBN13: 9781522565086|DOI: 10.4018/IJACI.2019070102
Cite Article Cite Article


Wang, Rujuan, and Gang Wang. "Web Text Categorization Based on Statistical Merging Algorithm in Big Data Environment." IJACI vol.10, no.3 2019: pp.17-32. https://doi.org/10.4018/IJACI.2019070102


Wang, R. & Wang, G. (2019). Web Text Categorization Based on Statistical Merging Algorithm in Big Data Environment. International Journal of Ambient Computing and Intelligence (IJACI), 10(3), 17-32. https://doi.org/10.4018/IJACI.2019070102


Wang, Rujuan, and Gang Wang. "Web Text Categorization Based on Statistical Merging Algorithm in Big Data Environment," International Journal of Ambient Computing and Intelligence (IJACI) 10, no.3: 17-32. https://doi.org/10.4018/IJACI.2019070102

Export Reference

Favorite Full-Issue Download


In the field of modern information technology, how to find information quickly, accurately and comprehensively that users really needed has become the focus of research in this field. In this article, a feature selection method based on a complex network is proposed for the structure and content characteristics of large-scale web text information. The preprocessed web text is converted into a complex network. The nodes in the network correspond to the entries in the text. The edges of the network correspond to the links between the entries in the text, and the degree of nodes and the aggregation system are used. Second, the text classification method is studied from the point of view of data sampling, and a text classification method based on density statistics is proposed. This method uses not only the density information of the text feature set in the classification process, but also the use of statistical merging criteria to get the text. The difference information of each feature has a better classification effect for large text collections.

Request Access

You do not own this content. Please login to recommend this title to your institution's librarian or purchase it from the IGI Global Scientific Publishing bookstore.