Abstract
Text classification has received increasing interest over the past decades for its wide range of applications driven by the ubiquity of textual information. The high dimensionality of those applications led to pervasive use of dimensionality reduction methods, often black-box feature extraction non-linear techniques.
We show how Non-Negative Matrix Factorization (NMF), an algorithm able to learn a parts-based representation of data by imposing non-negativity constraints, can be used to represent and extract knowledge from a text classification problem. The resulting reduced set of features is tested with kernel-based machines on Reuters-21578 benchmark showing the method’s performance competitiveness.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Lee, D.D., Seung, H.S.: Learning the Parts of Objects by Non-negative Matrix Factorization. Nature 401, 788–791 (1999)
Jolliffe, I.T.: Principal Component Analysis. Springer, New York (1986)
Paatero, P., Tapper, U.: Positive matrix factorization: A non-negative factor model with optimal utilization of error estimates of data values. Environmetrics 5, 111–126 (1994)
Lee, D.D., Seung, H.S.: Algorithms for non-negative matrix factorization. In: Advances in Neural Information Processing 13 (Proc. NIPS 2000). MIT Press, Cambridge (2000)
Hoyer, P.O.: Non-negative matrix factorization with sparseness constraints. Journal of Machine Learning Research 5, 1457–1469 (2004)
Zhang, Z.Y., Zhang, X.S.: Two improvements of NMF used for tumor clustering. In: 1st Int. Symposium on Optimization and Systems Biology, pp. 242–249 (2007)
Carmona-Saez, P., Pascual-Marqui, R.D., Tirado, F., Carazo, J., Pascual-Montano, A.: Biclustering of gene expression data by non-smooth non-negative matrix factorization. BMC Bioinf. (2006)
Fogel, P., Young, S., Hawkins, D., Ledirac, N.: Inferential, robust non-negative matrix factorization analysis of microarray data. BMC Bioinf. 23(1) (2007)
Brunet, J.P., Tamayo, P., Golub, T.R., Mesirov, J.P.: Metagenes and molecular pattern discovery using matrix factorization. National Academy of Science 101 (2004)
Guimet, F., Boque, R., Ferre, J.: Application of non-negative matrix factorization combined with fishers linear discriminant analysis for classification of olive oil excitation emission fluorescence spectra. Chemometrics and Intelligent Laboratory Systems 81, 94–106 (2006)
Ribeiro, B., Silva, C., Vieira, A., Neves, J.: Extracting Discriminative Features Using Non-Negative Matrix Factorization in Financial Distress Data. In: Kolehmainen, V., Toivanen, P., Beliczynski, B. (eds.) ICANNGA 2009. LNCS, vol. 5495. Springer, Heidelberg (2009)
Shahnaz, F., Berry, M., Pauca, V., Plemmons, R.: Document clustering using nonnegative matrix factorization. Information Processing and Management: an International Journal 42(2), 373–386 (2006)
Xu, W., Liu, X., Gong, Y.: Document clustering based on non-negative matrix factorization. In: ACM SIGIR 2003, pp. 267–273 (2003)
Berry, M., Browne, M., Langville, A., Pauca, V., Plemmons, R.: Algorithms and applications for approximate nonnegative matrix factorization. Computational Statistics & Data Analysis 52(1), 155–173 (2007)
Lin, C.J.: Projected gradient methods for nonnegative matrix factorization. Neural Computation 19(10), 2756–2779 (2007)
Hofmann, T.: Probabilistic latent semantic indexing. In: Twenty-Second Annual International SIGIR Conference on Research and Development in Information Retrieval (1999)
Lin, C.J.: On the convergence of multiplicative update algorithms for nonnegative matrix factorization. IEEE Tran. on Neural Networks 6(18), 1589–1596 (2007)
Chu, M., Plemmons, R.J.: Nonnegative matrix factorization and applications. IMAGE 34, 1–25 (2005)
Almeida, A., Júdice, J., Fernandes, L., Patrício, J.: On the computation of a nonnegative matrix factorization and its application in telecommunications. In: 7th Conference on Telecommunications (2009)
Sebastiani, F.: Classification of Text, Automatic. In: Brown, K. (ed.) The Encyclopedia of Language and Linguistics, 2nd edn., vol. 14. Elsevier, Amsterdam (2006)
Vapnik, V.: The Nature of Statistical Learning Theory, 2nd edn. Springer, Heidelberg (1999)
Apté, C., Damerau, F., Weiss, S.: Automated Learning of Decision Rules for Text Categorization. ACM Trans. for Information Sys. 12, 233–251 (1994)
van Rijsbergen, C.: Information Retrieval. Butterworths (1979)
Ruiz, M., Srinivasan, P.: Automatic Text Categorization and Its Application to Text Retrieval. IEEE Tran. Know. Data Eng. 11(6), 865–879 (1999)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Silva, C., Ribeiro, B. (2009). Knowledge Extraction with Non-Negative Matrix Factorization for Text Classification. In: Corchado, E., Yin, H. (eds) Intelligent Data Engineering and Automated Learning - IDEAL 2009. IDEAL 2009. Lecture Notes in Computer Science, vol 5788. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04394-9_37
Download citation
DOI: https://doi.org/10.1007/978-3-642-04394-9_37
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-04393-2
Online ISBN: 978-3-642-04394-9
eBook Packages: Computer ScienceComputer Science (R0)