Abstract
In this paper, a new summarization method, which uses non-negative matrix factorization (NMF) and K-means clustering, is introduced to extract meaningful sentences from multi-documents. The proposed method can improve the quality of document summaries because the inherent semantics of the documents are well reflected by using the semantic features calculated by NMF and the sentences most relevant to the given topic are extracted efficiently by using the semantic variables derived by NMF. Besides, it uses K-means clustering to remove noises so that it can avoid the biased inherent semantics of the documents to be reflected in summaries. We perform detail experiments with the well-known DUC test dataset. The experimental results demonstrate that the proposed method has better performance than other methods using the LSA, the Kmeans, and the NMF.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Chin-Yew, L.: ROUGE: A Package for Automatic Evaluation of Summaries. In: Proceedings of Workshop on Text Summarization Branches Out, Post-Conference Workshop of ACL (2004)
Chuang, W.T., Yang, J.: Extracting Sentence Segments for Text Summarization: A Machine Learning Approach. In: Proceeding of ACM SIGIR, pp. 152–159 (2000)
Goldstein, J., Mittal, V., Carbonell, J., Callan, J.: Creating and Evaluating Multi-Document Sentence Extract Summaries. In: The Proceeding of CIKM, pp. 165–172 (2000)
Gong, Y., Liu, X.: Generic Text Summarization Using Relevance Measure and Latent Semantic Analysis. In: Proceeding of ACM SIGIR, pp. 19–25 (2001)
Hachey, B., Murray, G., Reitter, D.: The Embra System at DUC 2005: Query-Oriented Multi-Document Summarization with a Very Large Latent Semantic Space. In: Proceedings of the DUC (2005)
Han, J., Kamber, M.: Data Mining Concepts and Techniques. Morgan Kaufmann, San Francisco (2001)
Harabagiu, S., Finley, L.: Topic Themes for Multi-Document Summarization. In: Proceeding of ACM SIGIR, pp. 202–209 (2005)
Hoa, H.D.: Overview of DUC 2005. In: Proceedings of the DUC (2005)
Lee, D.D., Seung, H.S.: Learning the Parts of Objects by Non-Negative Matrix Factorization. Nature 401, 788–791 (1999)
Lee, D.D., Seung, H.S.: Algorithms for Non-Negative Matrix Factorization. Advances in Neural Information Processing Systems 13, 556–562 (2000)
Mani, I.: Automatic Summarization. John Benjamins, Amsterdam (2001)
Park, S., Lee, J.-H., Ahn, C.-M., Hong, J.S., Chun, S.-J.: Query Based Summarization Using Non-negative Matrix Factorization. In: Gabrys, B., Howlett, R.J., Jain, L.C. (eds.) KES 2006. LNCS (LNAI), vol. 4253, pp. 84–89. Springer, Heidelberg (2006)
Radev, D.R., Hovy, E., Mckeown, K.: Introduction to the Special Issue on Summarization. In: Blikle, A. (ed.) MFCS 1974. LNCS, vol. 28, pp. 399–408. Springer, Heidelberg (1975)
Ricardo, B.Y., Berthier, R.N.: Moden Information Retrieval. ACM Press, New York (1999)
Sakurai, T., Utsumi, A.: Query-based Multidocument Summarization for Information Retrieval. In: The Proceeding of NTCIR (2004)
Sassion, H.: Topic-Based Summarization at DUC 2005. In: Proceedings of DUC (2005)
Varadarajan, R., Hristidis, V.: Structure-Based Query-Specific Document Summarization. In: The Proceeding of CIKM, pp. 231–232 (2005)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer Berlin Heidelberg
About this paper
Cite this paper
Park, S., Lee, JH., Kim, DH., Ahn, CM. (2007). Multi-document Summarization Based on Cluster Using Non-negative Matrix Factorization. In: van Leeuwen, J., Italiano, G.F., van der Hoek, W., Meinel, C., Sack, H., Plášil, F. (eds) SOFSEM 2007: Theory and Practice of Computer Science. SOFSEM 2007. Lecture Notes in Computer Science, vol 4362. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-69507-3_66
Download citation
DOI: https://doi.org/10.1007/978-3-540-69507-3_66
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-69506-6
Online ISBN: 978-3-540-69507-3
eBook Packages: Computer ScienceComputer Science (R0)