Abstract
Previous studies have shown that hybrid clustering methods that incorporate textual content and bibliometric information can outperform clustering methods that use only one of these components. In this paper we apply a hybrid clustering method based on Fisher’s inverse chisquare to integrate full-text with citations and to provide a mapping of the field of information science. We quantitatively and qualitatively asses the added value of such an integrated analysis and we investigate whether the clustering outcome is a better representation of the field by comparing with a text-only clustering and with another hybrid method based on linear combination of distance matrices. Our data set consists of almost 1000 articles and notes published in the period 2002–2004 in 5 representative journals. The optimal number of clusters for the field is 5, determined by using a combination of distance-based and stability-based methods. Term networks present the cognitive structure of the field and are complemented by the most representative publications. Three large traditional sub-disciplines, particularly, information retrieval, bibliometrics/scientometrics, and more social aspects, and two smaller clusters about patent analysis and webometrics, can be distinguished.
Similar content being viewed by others
References
Baeza-Yates, R., Ribeiro-Neto, B. (1999), Modern Information Retrieval. Cambridge: Addison-Wesley.
Braam, R. R., Moed, H. F., Van Raan, A. F. J. (1991), Mapping of science by combined cocitation and word analysis. 2. Dynamic aspects. JASIS, 42: 252–266.
Batagelj, V., Mrvar, A. (2002), Pajek — Analysis and visualization of large networks. Graph Drawing, 2265: 477–478.
Ben-Hur, A., Elisseeff, A., Guyon, I. (2002), A stability based method for discovering structure in clustered data. In: Pacific Symposium on Biocomputing (vol. 7, pp. 6–17), Retrieved September 9, 2007 from: http://helix-web.stanford.edu/psb02/benhur.pdf.
Berry, M., Dumais, S. T., O’Brien, G. W. (1995), Using linear algebra for intelligent information retrieval. SIAM Review, 37(4): 573–595.
Calado, P., Ribeiro-Neto, B., Ziviani, N., Moura, E., Silva, I. (2003), Local versus global link information in the Web. ACM Transactions on Information Systems, 21: 42–63.
Calado, P., Cristo, M., Goncalves, M. A., De Moura, E. S., Ribeiro-Neto, B., Ziviani, N. (2006), Link-based similarity measures for the classification of Web documents. JASIST, 57: 208–221.
Cohn, D., Hofmann, T. (2001), The missing link — a probabilistic model of document content and hypertext connectivity. Neural Information Processing Systems, 13.
Deerwester, S., Dumais, S. T., Furnas, G. W., Landauer, T. K., Harshman, R. (1990), Indexing by latent semantic analysis. JASIS, 41(6): 391–407.
Dunning, T. (1993), Accurate methods for the statistics of surprise and coincidence. Computational Linguistics, 19(1): 61–74.
Glenisson, P., Glänzel, W., Janssens, F., De Moor, B. (2005), Combining full text and bibliometric information in mapping scientific disciplines. Information Processing & Management, 41: 1548–1572.
Hatcher, E., Gospodnetiæ, O. (2004), Lucene in Action. New York: Manning Publications Co.
Hedges, L. V., Olkin, I. (1985), Statistical Methods for Meta-analysis. San Diego: Academic Press.
Jain, A., Dubes, R. (1988), Algorithms for Clustering Data. New Jersey: Prentice Hall.
Janssens, F., Leta, J., Glänzel, W., De Moor, B. (2006)A, Towards mapping library and information science. Information Processing & Management, 42(6): 1614–1642.
Janssens, F., Tran Quoc, V., Glänzel, W., De Moor, B. (2006)B, Integration of textual content and link information for accurate clustering of science fields. In: V. P. Guerrero-Bote (Ed.), Proc. of the I Intl. Conf. on Multidisciplinary Information Sciences and Technologies (InSciT2006) (pp. 615–619), M’erida, Spain.
Janssens, F. (2007)A, Clustering of Scientific Fields by Integrating Text Mining and Bibliometrics. Ph.D. thesis, Faculty of Engineering, Katholieke Universiteit Leuven, Belgium, http://hdl.handle.net/1979/847.
Janssens, F., Glänzel, W., De Moor, B. (2007)B, A hybrid mapping of information science. In: D. Torres-Salinas, H. Moed (Eds) Proc. of the 11th International Conference of the International Society for Scientometrics and Informetrics (ISSI2007) (pp. 408–420), Madrid, Spain.
Joachims, T., Cristianini, N., Shawe-Taylor, J. (2001), Composite kernels for hypertext categorisation. In: Proceedings of the 18th International Conference on Machine Learning (ICML) (pp. 250–257)
Kaufman, L., Rousseeuw, P. J. (1990), Finding Groups in Data: An Introduction to Cluster Analysis. New York: John Wiley and Sons Inc.
Kessler, M. M. (1963), Bibliographic coupling between scientific papers. American Documentation, 14: 10–25.
Manning, C. D., Schütze, H. (2000), Foundations of Statistical Natural Language Processing. Cambridge: MIT Press.
Modha, D. S., Spangler, W. S. (2000), Clustering hypertext with applications to web searching. ACM Conference on Hypertext (pp. 143–152).
Morris, S. A., Yen, G., Wu, Z., Asnake, B. (2003), Time line visualization of research fronts. Journal of the American Society for Information Science and Technology, 54(5): 413–422.
Morris, S. A., Yen, G. G. (2004), Crossmaps: Visualization of overlapping relationships in collections of journal papers. Proceedings of the National Academy of Sciences of the United States of America, 101: 5291–5296.
Mullins, N., Snizek, W., Oehler, K. (1988), The structural analysis of a scientific paper. In: A. F. J. Van Raan (Ed.), Handbook of Quantitative Studies of Science and Technology (pp. 81–105), New York: Elsevier Science.
Porter, M. F. (1980), An algorithm for suffix stripping. Program, 14 (3): 130–137.
Rousseeuw, P. J. (1987), Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. Journal of Computational and Applied Mathematics, 20: 53–65.
Salton, G., Mcgill, M. J. (1986). Introduction to Modern Information Retrieval. New York: McGraw-Hill, Inc.
Snizek, W., Oehler, K., Mullins, N. (1991). Textual and nontextual characteristics of scientific papers: Neglected science indicators. Scientometrics, 20 (1): 25–35.
Wang, Y., Kitsuregawa, M. (2002). Evaluating contents-link coupled web page clustering for web search results. In: Proc. of the 11th intl. Conf. on Information and Knowledge Management (CIKM) (pp. 499–506).
Ward, J. H. (1963). Hierarchical grouping to optimize an objective function. Journal of the American Statistical Association, 58: 236–244.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Janssens, F., Glänzel, W. & De Moor, B. A hybrid mapping of information science. Scientometrics 75, 607–631 (2008). https://doi.org/10.1007/s11192-007-2002-7
Received:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11192-007-2002-7