Abstract
Web page classification is an important research direction of web mining. In the paper, a SVM method of web page classification is presented. It include four steps: (1) using analysis module to extract the core text and structural tags from a web page; (2) adopting the improved VSM model to generate the initial feature vectors based on the core text of web page; (3) adjusting weights of the selected features based on structural tags in web page to generate the base SVM classifier; (4) combining the base classifiers produced by iteration based on Boosting mechanism to obtain the target SVM classifier. The experiment of web page classification shows that the approach presented is efficient.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Yang, Y.: An evaluation of statistical approach to text categorization. Technical Report CMU-CS-97-127, Computer Science Department, Carnegie Mellon University (1997)
Joachims, T.: A Probabilistic Analysis of the Rocchio Algorithm with TF-IDF for Text Categorization. In: Proc. of ICML 1997, pp. 143–151. Morgan Kaufmann Publishers, San Francisco (1997)
Yu-chang, L., Ming-yu, L., Fan, L., Li-zhu, Z.: Analysis and construction of word weighing function in VSM. Computer Research and Development 39(10), 1205–1210 (2002)
Xianjun, X., Jiantao, S., Yuchang, L.: The research and implementation of a new result-faced methods for webpage information extraction. Computer Engineering and Application 38, 87–91 (2002)
Kecman, V.: Learning and Soft Computing, Support Verctor Machines. In: Neural Networks and Fuzzy Logic Models, The MIT Press, Cambridge (2001)
Wang, L.P. (ed.): Support Vector Machines: Theory and Application. Springer, Heidelberg (2005)
Jiantao, S., Dou, S., Yuchang, L.: Web document classification techniques. Journal of Tsinghua University 44(1), 65–68 (2004)
Diao, L., Lu, M., Hu, K., Lu, Y., Shi, C.: New boosting algorithms for text categorization. In: Proceedings of the World Congress on Intelligent Control and Automation (WCICA), vol. 3, pp. 2326–2329 (2002)
Lili, D., Yuchang, L., Chunyi, S.: A method to boost support vector machines. In: Chen, M.-S., Yu, P.S., Liu, B. (eds.) PAKDD 2002. LNCS (LNAI), vol. 2336, pp. 463–468. Springer, Heidelberg (2002)
Mingyu, L., Qiang, Z., Fan, L., et al.: Recommendation of Web Pages Based on Concept Association. In: Proc. of 4th IEEE International Workshop on Advanced Issues of E-Commerce and Web-Based Information Systems, pp. 221–227. IEEE Computer Society Press, Los Alamitos (2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Lu, M., Guo, C., Sun, J., Lu, Y. (2005). A SVM Method for Web Page Categorization Based on Weight Adjustment and Boosting Mechanism. In: Wang, L., Jin, Y. (eds) Fuzzy Systems and Knowledge Discovery. FSKD 2005. Lecture Notes in Computer Science(), vol 3614. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11540007_100
Download citation
DOI: https://doi.org/10.1007/11540007_100
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-28331-7
Online ISBN: 978-3-540-31828-6
eBook Packages: Computer ScienceComputer Science (R0)