Abstract
This paper follows a word-document co-clustering model independently introduced in 2001 by several authors such as I.S. Dhillon, H. Zha and C. Ding. This model consists in creating a bipartite graph based on word frequencies in documents, and whose vertices are both documents and words. The created bipartite graph is then partitioned in a way that minimizes the normalized cut objective function to produce the document clustering. The fusion-fission graph partitioning metaheuristic is applied on several document collections using this word-document co-clustering model. Results demonstrate a real problem in this model partitions found almost always have a normalized cut value lowest than the original document collection clustering. Moreover, measures of the goodness of solutions seem to be relatively independent of the normalized cut values of partitions.
Similar content being viewed by others
References
Anagnostopoulos, A., Dasgupta, A., Kumar, R.: Approximation algorithms for co-clustering. In: Proceedings of the 27th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, pp. 201–210 (2008)
Baker, L.D., McCallum, A.K.: Distributional clustering of words for text classification. In: Proceedings of the 21st ACM SIGIR International Conference on Research and Development in Information Retrieval, pp. 96–103 (1998)
Bichot, C.-E.: A metaheuristic based on fusion and fission for partitioning problems. In: Proceedings of NIDISC’08 in Conjunction with the 20th IEEE International Parallel and Distributed Processing Symposium (2006)
Bichot, C.-E.: A new Method, the fusion fission, for the relaxed k-way graph partitioning problem, and comparisons with some multilevel algorithms. J. Math. Model Algorithm 6(3), 319–344 (2007)
Bichot, C.-E.: A new meta-method for graph partitioning. In: Proceedings of the 2008 IEEE Congress on Evolutionary Computation, pp. 3498–3505 (2008)
Boley, D., Gini, M., Gross, R., Han, S., Hastings, K., Karypis, G., Kumar, V., Mobasher, B., Moore, J.: Partitioning-based clustering for web document categorization. Decis. Support Syst. 27(3), 329–341 (1999)
Costa, G., Manco, G., Ortale, R.: A hierarchical model-based approach to co-clustering high-dimensional data. In: Proceedings of the ACM Symposium on Applied Computing, pp. 886–890 (2008)
Dhillon, I.S.: Co-clustering documents and words using bipartite spectral graph partitioning. In: Proceedings of the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 269–274 (2001)
Dhillon, I.S., Guan, Y., Kulis, B.: Weighted graph cuts without eigenvectors: a multilevel approach. IEEE Trans. Pattern Anal. Mach. Intell. 29, 1944–1957 (2007)
Dhillon, I.S., Mallela, S., Modha, D.S.: Information-theoric co-clustering. In: Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 89–98 (2003)
Ding, C., He, X., Zha, H., Gu, M., Simon, H.D.: A min-max cut algorithm for graph partitioning and data clustering. In: Proceedings of the 1st IEEE International Conference on Data Mining, pp. 107–114 (2001)
Fiduccia, C.M., Mattheyses, R.M.: A linear-time heuristic for improving network partitions. In: Proceedings of 19th ACM/IEEE Design Automation Conference, pp. 175–181 (1982)
Hendrickson, B., Leland, R.W.: A multilevel algorithm for partitioning graphs. In: Proceedings of Supercomputing (1995)
Hubert, L., Arabie, P.: Comparing partitions. J. Classif. 2(1), 193–218 (1985)
Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: a review. ACM Comput. Surv. 31(3), 264–323 (1999)
Karypis, G., Kumar, V.: Multilevel k-way partitioning scheme for irregular graphs. J. Parallel Distrib. Comput. 48(1), 96–129 (1998)
Kernighan, B.W., Lin, S.: An efficient heuristic procedure for partitioning graphs. Bell Syst. Tech. J. 49(2), 291–307 (1970)
Long, B., Zhang, Z.M., Yu, P.S.: Co-clustering by block value decomposition. In: Proceedings of the 11th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 635–640 (2005)
Madeira, S.C., Oliveira, A.L.: Biclustering algorithms for biological data analysis: a survey. IEEE/ACM Trans. Comput. Biol. Bioinf. 1(1), 24–45 (2004)
Mandhani, B., Joshi, S., Kummamuru, K.: A matrix density based algorithm to hierarchically co-cluster documents and words. In: Proceedings of the 12th International Conference on World Wide Web, pp. 658–665 (2003)
Porter, M.: An algorithm for suffix stripping. Program 14(3), 130–137 (1980)
Pothen, A., Simon, H.D., Liou, K.-P.: Partitioning sparse matrices with eigenvectors of graphs. SIAM J. Matrix Anal. Appl. 11(3), 430–452 (1990)
Rand, W.: Objective criteria for the evaluation of clustering methods. J. Am. Stat. Assoc. 66, 846–850 (1971)
Rege, M., Dong, M., Fotouhi, F.: Co-clustering documents and words using bipartite isoperimetric graph partitioning. In: Proceedings of the 6th IEEE International Conference on Data Mining, pp. 532–541 (2006)
Rege, M., Dong, M., Toouhi, F.: Co-clustering image features and semantic concepts. In: Proceedings of IEEE International Conference on Image Processing, pp. 137–140 (2006)
Rokach, L., Maimon, O.: Data mining and knowledge discovery handbook, Chapt. Clustering methods. Springer (2005)
Slonim, N., Tishby, N.: Document clustering using word clusters via the information bottleneck method. In: Proceedings of the 23rd ACM SIGIR International Conference on Research and Development in Informaion Retrieval, pp. 208–215 (2000)
Steinbach, M., Karypis, G., Kumar, V.: A comparison of document clustering techniques. In: Proceedings of ACM SIGKDD Workshop on Text Mining (2000)
Tagarelli, A., Karypis, G.: A segment-based approach To clustering multi-topic documents. In: Text Mining Workshop, SIAM Datamining Conference (2008)
Tjhi, W.-C., Chen, L.: A partitioning based algorithm to fuzzy co-cluster documents and words. Pattern Recogn. Lett. 27(3), 151–159 (2006)
Walshaw, C.: Multilevel refinement for combinatorial optimisation problems. Ann. Oper. Res. 131, 325–372 (2004)
Wu, X., Ngo, C.-W., Li, Q.: Co-Clustering of time-evolving news story with transcript and keyframe. In: Proceedings of IEEE International Conference on Multimedia and Expo, pp. 117–120 (2005)
Zha, H., He, X., Ding, C.H.Q., Gu, M., Simon, H.D.: Bipartite graph partitioning and data clustering. In: ACM Conference on Information and Knowledge Management, pp. 25–32 (2001)
Zhao, Y., Karypis, G.: Empirical and theoretical comparisons of selected criterion functions for document clustering. Mach. Learn. 55(3), 311–331 (2004)
Zhao, Y., Karypis, G., Fayyad, U.M.: Hierarchical clustering algorithms for document datasets. Data Min. Knowl. Discov. 10(2), 141–168 (2005)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Bichot, CE. Co-clustering Documents and Words by Minimizing the Normalized Cut Objective Function. J Math Model Algor 9, 131–147 (2010). https://doi.org/10.1007/s10852-010-9126-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10852-010-9126-0