Knowledge Extraction with Non-Negative Matrix Factorization for Text Classification

Silva, Catarina; Ribeiro, Bernardete

doi:10.1007/978-3-642-04394-9_37

Catarina Silva^18,19 &
Bernardete Ribeiro¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 5788))

Included in the following conference series:

International Conference on Intelligent Data Engineering and Automated Learning

1921 Accesses
2 Citations

Abstract

Text classification has received increasing interest over the past decades for its wide range of applications driven by the ubiquity of textual information. The high dimensionality of those applications led to pervasive use of dimensionality reduction methods, often black-box feature extraction non-linear techniques.

We show how Non-Negative Matrix Factorization (NMF), an algorithm able to learn a parts-based representation of data by imposing non-negativity constraints, can be used to represent and extract knowledge from a text classification problem. The resulting reduced set of features is tested with kernel-based machines on Reuters-21578 benchmark showing the method’s performance competitiveness.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Feature Selection in Texts

Nonnegative Matrix Factorization for Document Clustering: A Survey

Discriminant Mutual Information for Text Feature Selection

References

Lee, D.D., Seung, H.S.: Learning the Parts of Objects by Non-negative Matrix Factorization. Nature 401, 788–791 (1999)
Article Google Scholar
Jolliffe, I.T.: Principal Component Analysis. Springer, New York (1986)
Book MATH Google Scholar
Paatero, P., Tapper, U.: Positive matrix factorization: A non-negative factor model with optimal utilization of error estimates of data values. Environmetrics 5, 111–126 (1994)
Article Google Scholar
Lee, D.D., Seung, H.S.: Algorithms for non-negative matrix factorization. In: Advances in Neural Information Processing 13 (Proc. NIPS 2000). MIT Press, Cambridge (2000)
Google Scholar
Hoyer, P.O.: Non-negative matrix factorization with sparseness constraints. Journal of Machine Learning Research 5, 1457–1469 (2004)
MathSciNet MATH Google Scholar
Zhang, Z.Y., Zhang, X.S.: Two improvements of NMF used for tumor clustering. In: 1st Int. Symposium on Optimization and Systems Biology, pp. 242–249 (2007)
Google Scholar
Carmona-Saez, P., Pascual-Marqui, R.D., Tirado, F., Carazo, J., Pascual-Montano, A.: Biclustering of gene expression data by non-smooth non-negative matrix factorization. BMC Bioinf. (2006)
Google Scholar
Fogel, P., Young, S., Hawkins, D., Ledirac, N.: Inferential, robust non-negative matrix factorization analysis of microarray data. BMC Bioinf. 23(1) (2007)
Google Scholar
Brunet, J.P., Tamayo, P., Golub, T.R., Mesirov, J.P.: Metagenes and molecular pattern discovery using matrix factorization. National Academy of Science 101 (2004)
Google Scholar
Guimet, F., Boque, R., Ferre, J.: Application of non-negative matrix factorization combined with fishers linear discriminant analysis for classification of olive oil excitation emission fluorescence spectra. Chemometrics and Intelligent Laboratory Systems 81, 94–106 (2006)
Article Google Scholar
Ribeiro, B., Silva, C., Vieira, A., Neves, J.: Extracting Discriminative Features Using Non-Negative Matrix Factorization in Financial Distress Data. In: Kolehmainen, V., Toivanen, P., Beliczynski, B. (eds.) ICANNGA 2009. LNCS, vol. 5495. Springer, Heidelberg (2009)
Google Scholar
Shahnaz, F., Berry, M., Pauca, V., Plemmons, R.: Document clustering using nonnegative matrix factorization. Information Processing and Management: an International Journal 42(2), 373–386 (2006)
Article MATH Google Scholar
Xu, W., Liu, X., Gong, Y.: Document clustering based on non-negative matrix factorization. In: ACM SIGIR 2003, pp. 267–273 (2003)
Google Scholar
Berry, M., Browne, M., Langville, A., Pauca, V., Plemmons, R.: Algorithms and applications for approximate nonnegative matrix factorization. Computational Statistics & Data Analysis 52(1), 155–173 (2007)
Article MathSciNet MATH Google Scholar
Lin, C.J.: Projected gradient methods for nonnegative matrix factorization. Neural Computation 19(10), 2756–2779 (2007)
Article MathSciNet MATH Google Scholar
Hofmann, T.: Probabilistic latent semantic indexing. In: Twenty-Second Annual International SIGIR Conference on Research and Development in Information Retrieval (1999)
Google Scholar
Lin, C.J.: On the convergence of multiplicative update algorithms for nonnegative matrix factorization. IEEE Tran. on Neural Networks 6(18), 1589–1596 (2007)
Google Scholar
Chu, M., Plemmons, R.J.: Nonnegative matrix factorization and applications. IMAGE 34, 1–25 (2005)
Google Scholar
Almeida, A., Júdice, J., Fernandes, L., Patrício, J.: On the computation of a nonnegative matrix factorization and its application in telecommunications. In: 7th Conference on Telecommunications (2009)
Google Scholar
Sebastiani, F.: Classification of Text, Automatic. In: Brown, K. (ed.) The Encyclopedia of Language and Linguistics, 2nd edn., vol. 14. Elsevier, Amsterdam (2006)
Google Scholar
Vapnik, V.: The Nature of Statistical Learning Theory, 2nd edn. Springer, Heidelberg (1999)
MATH Google Scholar
Apté, C., Damerau, F., Weiss, S.: Automated Learning of Decision Rules for Text Categorization. ACM Trans. for Information Sys. 12, 233–251 (1994)
Article Google Scholar
van Rijsbergen, C.: Information Retrieval. Butterworths (1979)
Google Scholar
Ruiz, M., Srinivasan, P.: Automatic Text Categorization and Its Application to Text Retrieval. IEEE Tran. Know. Data Eng. 11(6), 865–879 (1999)
Article Google Scholar

Download references

Author information

Authors and Affiliations

School of Technology and Management of the Polytechnic Institute of Leiria, Morro do Lena - Alto do Vieiro, Portugal, P-2411-901, Leiria, Portugal
Catarina Silva
Department of Informatics Engineering, Center for Informatics and Systems (CISUC), University of Coimbra, Polo II, P-3030-290, Coimbra, Portugal
Catarina Silva & Bernardete Ribeiro

Authors

Catarina Silva
View author publications
You can also search for this author in PubMed Google Scholar
Bernardete Ribeiro
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Escuela Politécnica Superior, Universidad de Burgos, Calle Francisco de Vitoria, S/N, Edifico C, 09006, Burgos, Spain
Emilio Corchado
School of Electrical and Electronic Engineering, University of Manchester, Sackville Street Building, Sackville Street, M60 1QD, Manchester, UK
Hujun Yin

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Silva, C., Ribeiro, B. (2009). Knowledge Extraction with Non-Negative Matrix Factorization for Text Classification. In: Corchado, E., Yin, H. (eds) Intelligent Data Engineering and Automated Learning - IDEAL 2009. IDEAL 2009. Lecture Notes in Computer Science, vol 5788. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04394-9_37

Download citation

DOI: https://doi.org/10.1007/978-3-642-04394-9_37
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-04393-2
Online ISBN: 978-3-642-04394-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Knowledge Extraction with Non-Negative Matrix Factorization for Text Classification

Abstract

Access this chapter

Preview

Similar content being viewed by others

Feature Selection in Texts

Nonnegative Matrix Factorization for Document Clustering: A Survey

Discriminant Mutual Information for Text Feature Selection

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Knowledge Extraction with Non-Negative Matrix Factorization for Text Classification

Abstract

Access this chapter

Preview

Similar content being viewed by others

Feature Selection in Texts

Nonnegative Matrix Factorization for Document Clustering: A Survey

Discriminant Mutual Information for Text Feature Selection

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation