Knowledge Extraction with Non-Negative Matrix Factorization for Text Classification | SpringerLink
Skip to main content

Knowledge Extraction with Non-Negative Matrix Factorization for Text Classification

  • Conference paper
Intelligent Data Engineering and Automated Learning - IDEAL 2009 (IDEAL 2009)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 5788))

Abstract

Text classification has received increasing interest over the past decades for its wide range of applications driven by the ubiquity of textual information. The high dimensionality of those applications led to pervasive use of dimensionality reduction methods, often black-box feature extraction non-linear techniques.

We show how Non-Negative Matrix Factorization (NMF), an algorithm able to learn a parts-based representation of data by imposing non-negativity constraints, can be used to represent and extract knowledge from a text classification problem. The resulting reduced set of features is tested with kernel-based machines on Reuters-21578 benchmark showing the method’s performance competitiveness.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Lee, D.D., Seung, H.S.: Learning the Parts of Objects by Non-negative Matrix Factorization. Nature 401, 788–791 (1999)

    Article  Google Scholar 

  2. Jolliffe, I.T.: Principal Component Analysis. Springer, New York (1986)

    Book  MATH  Google Scholar 

  3. Paatero, P., Tapper, U.: Positive matrix factorization: A non-negative factor model with optimal utilization of error estimates of data values. Environmetrics 5, 111–126 (1994)

    Article  Google Scholar 

  4. Lee, D.D., Seung, H.S.: Algorithms for non-negative matrix factorization. In: Advances in Neural Information Processing 13 (Proc. NIPS 2000). MIT Press, Cambridge (2000)

    Google Scholar 

  5. Hoyer, P.O.: Non-negative matrix factorization with sparseness constraints. Journal of Machine Learning Research 5, 1457–1469 (2004)

    MathSciNet  MATH  Google Scholar 

  6. Zhang, Z.Y., Zhang, X.S.: Two improvements of NMF used for tumor clustering. In: 1st Int. Symposium on Optimization and Systems Biology, pp. 242–249 (2007)

    Google Scholar 

  7. Carmona-Saez, P., Pascual-Marqui, R.D., Tirado, F., Carazo, J., Pascual-Montano, A.: Biclustering of gene expression data by non-smooth non-negative matrix factorization. BMC Bioinf. (2006)

    Google Scholar 

  8. Fogel, P., Young, S., Hawkins, D., Ledirac, N.: Inferential, robust non-negative matrix factorization analysis of microarray data. BMC Bioinf. 23(1) (2007)

    Google Scholar 

  9. Brunet, J.P., Tamayo, P., Golub, T.R., Mesirov, J.P.: Metagenes and molecular pattern discovery using matrix factorization. National Academy of Science 101 (2004)

    Google Scholar 

  10. Guimet, F., Boque, R., Ferre, J.: Application of non-negative matrix factorization combined with fishers linear discriminant analysis for classification of olive oil excitation emission fluorescence spectra. Chemometrics and Intelligent Laboratory Systems 81, 94–106 (2006)

    Article  Google Scholar 

  11. Ribeiro, B., Silva, C., Vieira, A., Neves, J.: Extracting Discriminative Features Using Non-Negative Matrix Factorization in Financial Distress Data. In: Kolehmainen, V., Toivanen, P., Beliczynski, B. (eds.) ICANNGA 2009. LNCS, vol. 5495. Springer, Heidelberg (2009)

    Google Scholar 

  12. Shahnaz, F., Berry, M., Pauca, V., Plemmons, R.: Document clustering using nonnegative matrix factorization. Information Processing and Management: an International Journal 42(2), 373–386 (2006)

    Article  MATH  Google Scholar 

  13. Xu, W., Liu, X., Gong, Y.: Document clustering based on non-negative matrix factorization. In: ACM SIGIR 2003, pp. 267–273 (2003)

    Google Scholar 

  14. Berry, M., Browne, M., Langville, A., Pauca, V., Plemmons, R.: Algorithms and applications for approximate nonnegative matrix factorization. Computational Statistics & Data Analysis 52(1), 155–173 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  15. Lin, C.J.: Projected gradient methods for nonnegative matrix factorization. Neural Computation 19(10), 2756–2779 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  16. Hofmann, T.: Probabilistic latent semantic indexing. In: Twenty-Second Annual International SIGIR Conference on Research and Development in Information Retrieval (1999)

    Google Scholar 

  17. Lin, C.J.: On the convergence of multiplicative update algorithms for nonnegative matrix factorization. IEEE Tran. on Neural Networks 6(18), 1589–1596 (2007)

    Google Scholar 

  18. Chu, M., Plemmons, R.J.: Nonnegative matrix factorization and applications. IMAGE 34, 1–25 (2005)

    Google Scholar 

  19. Almeida, A., Júdice, J., Fernandes, L., Patrício, J.: On the computation of a nonnegative matrix factorization and its application in telecommunications. In: 7th Conference on Telecommunications (2009)

    Google Scholar 

  20. Sebastiani, F.: Classification of Text, Automatic. In: Brown, K. (ed.) The Encyclopedia of Language and Linguistics, 2nd edn., vol. 14. Elsevier, Amsterdam (2006)

    Google Scholar 

  21. Vapnik, V.: The Nature of Statistical Learning Theory, 2nd edn. Springer, Heidelberg (1999)

    MATH  Google Scholar 

  22. Apté, C., Damerau, F., Weiss, S.: Automated Learning of Decision Rules for Text Categorization. ACM Trans. for Information Sys. 12, 233–251 (1994)

    Article  Google Scholar 

  23. van Rijsbergen, C.: Information Retrieval. Butterworths (1979)

    Google Scholar 

  24. Ruiz, M., Srinivasan, P.: Automatic Text Categorization and Its Application to Text Retrieval. IEEE Tran. Know. Data Eng. 11(6), 865–879 (1999)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Silva, C., Ribeiro, B. (2009). Knowledge Extraction with Non-Negative Matrix Factorization for Text Classification. In: Corchado, E., Yin, H. (eds) Intelligent Data Engineering and Automated Learning - IDEAL 2009. IDEAL 2009. Lecture Notes in Computer Science, vol 5788. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04394-9_37

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-04394-9_37

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-04393-2

  • Online ISBN: 978-3-642-04394-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics