Abstract
In this paper we propose an effective method to summarize document clusters. This method is based on the Testor Theory, and it is applied to a group of newspaper articles in order to summarize the events that they describe. This method is also applicable to either a very large document collection or a very large document, in order to identify the main themes (topics) of the collection (documents) and to summarize them. The results obtained in the experiments demonstrate the usefulness of the proposed method.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Allan, J., Carbonell, J., Doddington, G., Yamron, J., Yang, Y.: Topic Detection and Tracking Pilot Study: Final Report. In: Proceedings of DARPA Broadcast News Transcription and Understanding Workshop, pp. 194–218 (1998)
Yang, Y., Ault, T., Pierce, T., Lattimer, C.W.: Improving text categorization methods for event tracking. In: Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2000, Athens, pp. 65–72 (2000)
Carbonell, J., Yang, Y., Lafferty, J., Brown, R.D., Pierce, T., Liu, X.: CMU Report on TDT-2: Segmentation, detection and tracking. In: Proceedings of DARPA Broadcast News Workshop, pp. 117–120 (1999)
Yamron, J.: Dragon’s Tracking and Detection Systems for TDT2000 Evaluation. In: Proceedings of Topic Detection & Tracking Workshop, pp. 75–80 (2000)
Allan, J., Lavrenko, V., Frey, D., Khandelwal, V.: UMASS at TDT 2000. In: Proceedings TDT 2000 Workshop (2000)
Pons-Porrata, A., Berlanga-Llavori, R., Ruiz-Shulcloper, J.: Detecting events and topics by using temporal references. In: Garijo, F.J., Riquelme, J.-C., Toro, M. (eds.) IBERAMIA 2002. LNCS (LNAI), vol. 2527, pp. 11–20. Springer, Heidelberg (2002)
Pons-Porrata, A., Berlanga-Llavori, R., Ruiz-Shulcloper, J.: Building a hierarchy of events and topics for newspaper digital libraries. In: Sebastiani, F. (ed.) ECIR 2003. LNCS, vol. 2633, pp. 588–596. Springer, Heidelberg (2003)
Zamir, O., Etzioni, O., Madani, O., Karp, R.M.: Fast and Intituitive Clustering of Web Documents. In: Proceedings of KDD 1997, pp. 287–290 (1997)
Cutting, D.R., Karger, D.R., Pedersen, J.O.: Constant Interaction-Time Scatter/Gather Browsing of Very Large Document Collections. In: Proceedings of the 16th International ACM SIGIR Conference on Research and Development in Information Retrieval (1993)
Mani, I., Bloedorn, E.: Multi-Document Summarization by Graph Search and Matching. In: AAAI/IAAI 1997, pp. 622–628 (1997)
Barzilay, R., Elhadad, N., McKeown, K.: Inferring Strategies for Sentence Ordering in Multidocument News Summarization. Journal of Artificial Intelligence Research 17, 35–55 (2002)
Mani, I.: Automatic Summarisation. John Benjamins Publishing Company, Amsterdam (2001)
Marcu, D.: Discourse-based summarisation in DUC-2001. In: Proceedings of Document Understanding Conference, DUC 2001 (2001)
Lazo-Cortés, M., Ruiz-Shulcloper, J., Alba-Cabrera, E.: An overview of the concept testor. Pattern Recognition 34(4), 13–21 (2001)
Llidó, D., Berlanga, R., Aramburu, M.J.: Extracting temporal references to automatically assign document event-time periods. In: Proceedings of Database and Expert System Applications 2001, pp. 62–71. Springer, Munich (2001)
Santiesteban, Y., Pons, A.: LEX: a new algorithm for the calculus of typical testors. Rev. Ciencias Matemáticas 21(1) (2003) (in Spanish)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Pons-Porrata, A., Ruiz-Shulcloper, J., Berlanga-Llavori, R. (2003). A Method for the Automatic Summarization of Topic-Based Clusters of Documents. In: Sanfeliu, A., Ruiz-Shulcloper, J. (eds) Progress in Pattern Recognition, Speech and Image Analysis. CIARP 2003. Lecture Notes in Computer Science, vol 2905. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-24586-5_73
Download citation
DOI: https://doi.org/10.1007/978-3-540-24586-5_73
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-20590-6
Online ISBN: 978-3-540-24586-5
eBook Packages: Springer Book Archive