Co-clustering Documents and Words by Minimizing the Normalized Cut Objective Function | Journal of Mathematical Modelling and Algorithms in Operations Research Skip to main content
Log in

Co-clustering Documents and Words by Minimizing the Normalized Cut Objective Function

  • Published:
Journal of Mathematical Modelling and Algorithms

Abstract

This paper follows a word-document co-clustering model independently introduced in 2001 by several authors such as I.S. Dhillon, H. Zha and C. Ding. This model consists in creating a bipartite graph based on word frequencies in documents, and whose vertices are both documents and words. The created bipartite graph is then partitioned in a way that minimizes the normalized cut objective function to produce the document clustering. The fusion-fission graph partitioning metaheuristic is applied on several document collections using this word-document co-clustering model. Results demonstrate a real problem in this model partitions found almost always have a normalized cut value lowest than the original document collection clustering. Moreover, measures of the goodness of solutions seem to be relatively independent of the normalized cut values of partitions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Anagnostopoulos, A., Dasgupta, A., Kumar, R.: Approximation algorithms for co-clustering. In: Proceedings of the 27th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, pp. 201–210 (2008)

  2. Baker, L.D., McCallum, A.K.: Distributional clustering of words for text classification. In: Proceedings of the 21st ACM SIGIR International Conference on Research and Development in Information Retrieval, pp. 96–103 (1998)

  3. Bichot, C.-E.: A metaheuristic based on fusion and fission for partitioning problems. In: Proceedings of NIDISC’08 in Conjunction with the 20th IEEE International Parallel and Distributed Processing Symposium (2006)

  4. Bichot, C.-E.: A new Method, the fusion fission, for the relaxed k-way graph partitioning problem, and comparisons with some multilevel algorithms. J. Math. Model Algorithm 6(3), 319–344 (2007)

    Article  MATH  MathSciNet  Google Scholar 

  5. Bichot, C.-E.: A new meta-method for graph partitioning. In: Proceedings of the 2008 IEEE Congress on Evolutionary Computation, pp. 3498–3505 (2008)

  6. Boley, D., Gini, M., Gross, R., Han, S., Hastings, K., Karypis, G., Kumar, V., Mobasher, B., Moore, J.: Partitioning-based clustering for web document categorization. Decis. Support Syst. 27(3), 329–341 (1999)

    Article  Google Scholar 

  7. Costa, G., Manco, G., Ortale, R.: A hierarchical model-based approach to co-clustering high-dimensional data. In: Proceedings of the ACM Symposium on Applied Computing, pp. 886–890 (2008)

  8. Dhillon, I.S.: Co-clustering documents and words using bipartite spectral graph partitioning. In: Proceedings of the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 269–274 (2001)

  9. Dhillon, I.S., Guan, Y., Kulis, B.: Weighted graph cuts without eigenvectors: a multilevel approach. IEEE Trans. Pattern Anal. Mach. Intell. 29, 1944–1957 (2007)

    Article  Google Scholar 

  10. Dhillon, I.S., Mallela, S., Modha, D.S.: Information-theoric co-clustering. In: Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 89–98 (2003)

  11. Ding, C., He, X., Zha, H., Gu, M., Simon, H.D.: A min-max cut algorithm for graph partitioning and data clustering. In: Proceedings of the 1st IEEE International Conference on Data Mining, pp. 107–114 (2001)

  12. Fiduccia, C.M., Mattheyses, R.M.: A linear-time heuristic for improving network partitions. In: Proceedings of 19th ACM/IEEE Design Automation Conference, pp. 175–181 (1982)

  13. Hendrickson, B., Leland, R.W.: A multilevel algorithm for partitioning graphs. In: Proceedings of Supercomputing (1995)

  14. Hubert, L., Arabie, P.: Comparing partitions. J. Classif. 2(1), 193–218 (1985)

    Article  Google Scholar 

  15. Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: a review. ACM Comput. Surv. 31(3), 264–323 (1999)

    Article  Google Scholar 

  16. Karypis, G., Kumar, V.: Multilevel k-way partitioning scheme for irregular graphs. J. Parallel Distrib. Comput. 48(1), 96–129 (1998)

    Article  MathSciNet  Google Scholar 

  17. Kernighan, B.W., Lin, S.: An efficient heuristic procedure for partitioning graphs. Bell Syst. Tech. J. 49(2), 291–307 (1970)

    Google Scholar 

  18. Long, B., Zhang, Z.M., Yu, P.S.: Co-clustering by block value decomposition. In: Proceedings of the 11th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 635–640 (2005)

  19. Madeira, S.C., Oliveira, A.L.: Biclustering algorithms for biological data analysis: a survey. IEEE/ACM Trans. Comput. Biol. Bioinf. 1(1), 24–45 (2004)

    Article  Google Scholar 

  20. Mandhani, B., Joshi, S., Kummamuru, K.: A matrix density based algorithm to hierarchically co-cluster documents and words. In: Proceedings of the 12th International Conference on World Wide Web, pp. 658–665 (2003)

  21. Porter, M.: An algorithm for suffix stripping. Program 14(3), 130–137 (1980)

    Google Scholar 

  22. Pothen, A., Simon, H.D., Liou, K.-P.: Partitioning sparse matrices with eigenvectors of graphs. SIAM J. Matrix Anal. Appl. 11(3), 430–452 (1990)

    Article  MATH  MathSciNet  Google Scholar 

  23. Rand, W.: Objective criteria for the evaluation of clustering methods. J. Am. Stat. Assoc. 66, 846–850 (1971)

    Article  Google Scholar 

  24. Rege, M., Dong, M., Fotouhi, F.: Co-clustering documents and words using bipartite isoperimetric graph partitioning. In: Proceedings of the 6th IEEE International Conference on Data Mining, pp. 532–541 (2006)

  25. Rege, M., Dong, M., Toouhi, F.: Co-clustering image features and semantic concepts. In: Proceedings of IEEE International Conference on Image Processing, pp. 137–140 (2006)

  26. Rokach, L., Maimon, O.: Data mining and knowledge discovery handbook, Chapt. Clustering methods. Springer (2005)

  27. Slonim, N., Tishby, N.: Document clustering using word clusters via the information bottleneck method. In: Proceedings of the 23rd ACM SIGIR International Conference on Research and Development in Informaion Retrieval, pp. 208–215 (2000)

  28. Steinbach, M., Karypis, G., Kumar, V.: A comparison of document clustering techniques. In: Proceedings of ACM SIGKDD Workshop on Text Mining (2000)

  29. Tagarelli, A., Karypis, G.: A segment-based approach To clustering multi-topic documents. In: Text Mining Workshop, SIAM Datamining Conference (2008)

  30. Tjhi, W.-C., Chen, L.: A partitioning based algorithm to fuzzy co-cluster documents and words. Pattern Recogn. Lett. 27(3), 151–159 (2006)

    Article  Google Scholar 

  31. Walshaw, C.: Multilevel refinement for combinatorial optimisation problems. Ann. Oper. Res. 131, 325–372 (2004)

    Article  MATH  MathSciNet  Google Scholar 

  32. Wu, X., Ngo, C.-W., Li, Q.: Co-Clustering of time-evolving news story with transcript and keyframe. In: Proceedings of IEEE International Conference on Multimedia and Expo, pp. 117–120 (2005)

  33. Zha, H., He, X., Ding, C.H.Q., Gu, M., Simon, H.D.: Bipartite graph partitioning and data clustering. In: ACM Conference on Information and Knowledge Management, pp. 25–32 (2001)

  34. Zhao, Y., Karypis, G.: Empirical and theoretical comparisons of selected criterion functions for document clustering. Mach. Learn. 55(3), 311–331 (2004)

    Article  MATH  Google Scholar 

  35. Zhao, Y., Karypis, G., Fayyad, U.M.: Hierarchical clustering algorithms for document datasets. Data Min. Knowl. Discov. 10(2), 141–168 (2005)

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Charles-Edmond Bichot.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bichot, CE. Co-clustering Documents and Words by Minimizing the Normalized Cut Objective Function. J Math Model Algor 9, 131–147 (2010). https://doi.org/10.1007/s10852-010-9126-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10852-010-9126-0

Keywords

Navigation