Abstract
Based on a dataset on Astronomy and Astrophysics, hybrid cluster analyses have been conducted. In order to obtain an optimum solution and to analyse possible issues resulting from the bibliometric methodologies used, we have systematically studied three models and, within these models, two scenarios each. The hybrid clustering was based on a combination of bibliographic coupling and textual similarities using the Louvain method at two resolution levels. The procedure resulted in three clearly hierarchical structures with six and thirteen, seven and thirteen and finally five and eleven clusters, respectively. These structures are analysed with the help of a concordance table. The statistics reflect a high quality of classification. The results of these three models are presented, discussed and compared with each other. For labelling and interpreting clusters, core documents representing the obtained clusters are used. Furthermore, these core documents help depict the internal structure of the complete network and the clusters. This work has been done as part of the international project ‘Measuring the Diversity of Research’ and in the framework a special workshop on the comparative analysis of algorithms for the identification of topics in science organised in Berlin in August 2014.
Similar content being viewed by others
References
Ahlgren, P., & Colliander, C. (2009). Document–document similarity approaches and science mapping: experimental comparison of five approaches. Journal of Informetrics, 3(1), 49–63.
Batagelj, V., & Mrvar, A. (2003). Pajek-analysis and visualization of large networks. In M. Jünger & P. Mutzel (Eds.), Graph drawing software (pp. 77–103). Berlin: Springer.
Blondel, V. D., Guillaume, J. L., Lambiotte, R., & Lefebvre, E. (2008). Fast unfolding of communities in large networks. Journal of Statistical Mechanics: Theory and Experiment, P10008. doi:10.1088/1742-5468/2008/10/P10008.
Boyack, K. W., & Klavans, R. (2010). Co-citation analysis, bibliographic coupling, and direct citation: Which citation approach represents the research front most accurately? Journal of the American Society for Information Science and Technology, 61(12), 2389–2404.
Callon, M., Courtial, J. P., & Laville, F. (1991). Co-word analysis as a tool for describing the network of interactions between basic and technological research: The case of polymer chemistry. Scientometrics, 22(1), 155–205.
Garfield, E. (1969). Permuterm Subject Index—The primordial dictionary of science. Current Contents, 12(22), 4.
Glänzel, W. (2012). The role of core documents in bibliometric network analysis and their relation with h-type indices. Scientometrics, 93(1), 113–123.
Glänzel, W., & Czerwon, H. J. (1996). A new methodological approach to bibliographic coupling and its application to the national, regional and institutional level. Scientometrics, 37(2), 195–221.
Glänzel, W., & Thijs, B. (2011). Using ‘core documents’ for the representation of clusters and topics. Scientometrics, 88(1), 297–309.
Glänzel, W., & Thijs, B. (2012a). Hybrid solutions—The best of all possible worlds? Bibliometrie & Praxis und Forschung, 1(3). doi:10.5283/bpf.156.
Glänzel, W., & Thijs, B. (2012b). Using ‘core documents’ for detecting and labelling new emerging topics. Scientometrics, 91(2), 399–416.
Glänzel, W., & Thijs, B. (2015). Using hybrid methods and ‘core documents’ for the representation of clusters and topics. The astronomy dataset. In A. A. Salah, Y. Tonta, A. A. Salah, C. Sugimoto, & U. Al (Eds.), Proceedings of ISSI 2015—The 15th international conference on scientometrics and informetrics (pp. 1085–1090). Istanbul: Turkey.
Glenisson, P., Glänzel, W., Janssens, F., & de Moor, B. (2005). Combining full text and bibliometric information in mapping scientific disciplines. Information Processing and Management, 41(6), 1548–1572.
Gould, R. V., & Fernandez, R. M. (1989). Structures of mediation: a formal approach to brokerage in transaction networks. Sociological Methodology, 19, 89–126.
Hicks, D. (1987). Limitations of co-citation analysis as a tool for science policy. Social Studies of Science, 17(2), 295–316.
Janssens, F. (2007). Clustering of scientific fields by integrating text mining and bibliometrics. Ph.D. Thesis, Faculty of Engineering, Katholieke Universiteit Leuven, Belgium. http://www.hdl.handle.net/1979/847.
Janssens, F., Glänzel, W., & de Moor, B. (2008). A hybrid mapping of information science. Scientometrics, 75(3), 607–631.
Klein, D., & Manning, Ch. D. (2003). Accurate unlexicalized parsing. In Proceedings of the 41st annual meeting of the association for computational linguistics (pp. 423–430).
Kostoff, R. N., Eberhart, H. J., & Toothman, D. R. (1997). Database tomography for information retrieval. Journal of Information Science, 23(4), 301–311.
Thijs, B., Glänzel, W., & Meyer, M. (2015). Using noun phrases extraction for the improvement of hybrid clustering with text- and citation-based components. The example of Information system research. In Proceedings of the workshop mining scientific papers: Computational linguistics and bibliometrics (Vol. 1384). International Society of Scientometrics and Informetrics Conference (ISSI). Istanbul (Turkey), 29 June 2015. http://ceur-ws.org/Vol-1384/.
Thijs, B., Schiebel, E., & Glänzel, W. (2013). Do second-order similarities provide added-value in a hybrid approach? Scientometrics, 96(3), 667–677.
Todorov, R. (1992). Displaying content of scientific journals: A co-heading analysis. Scientometrics, 23(2), 319–334.
Turner, W. A., Chartron, G., Laville, F., & Michelet, M. (1988). Packaging information for peer review: new co-word analysis techniques. In A. van Raan (Ed.), Handbook of quantitative studies of science and technology. North Holland: Elsevier.
Zitt, M., & Basseacoulard, E. (1996). Reassessment of co-citation methods for science indicators: Effect of methods improving recall rates. Scientometrics, 37(2), 223–244.
Acknowledgements
This work has been done as part of the international project ‘Measuring the Diversity of Research’ and in the framework a special workshop on the comparative analysis of algorithms for the identification of topics in science organised in Berlin in August 2014. The project and workshop series was jointly organised by the Humboldt Universität and Technische Universität Berlin. We would like to acknowledge their support of our study. We also thank all project members for their comments and discussion. Above all, we would like to thank the internal reviewers Kevin Boyack and Shenghui Wang as well as the anonymous external referees for their valuable comments and suggestions that resulted in a substantial improvement of the manuscript.
Author information
Authors and Affiliations
Corresponding author
Appendix
Appendix
The first 20 core documents of Cluster #3 (‘General Theory of Cosmology’) in scenario 1 of model 1 (Bibliographic Coupling)
WoS accession code | Degree | Title |
---|---|---|
000264174700027 | 432 | An introduction to the dark energy problem |
000225317900001 | 324 | Sudden future singularities in FLRW cosmologies |
000223638500012 | 309 | Supernova constraints on a holographic dark energy model |
000228261200001 | 299 | Quantum fields and ‘big rip’ expansion singularities |
000245928000021 | 253 | Exploring the properties of dark energy using type-Ia supernovae and other datasets |
000230889600014 | 224 | Parametrization of quintessence and its potential |
000220801900003 | 219 | Constraints on a Cardassian model from Type Ia supernova data, revisited |
000250363000014 | 210 | Measuring the baryon acoustic oscillation scale using the sloan digital sky survey and 2dF galaxy redshift survey |
000245827600007 | 205 | A modified Chaplygin gas model with interaction |
000244080700016 | 205 | Statefinder parameters for interacting phantom energy with dark matter |
000245405900001 | 199 | Constraints on the generalized Chaplygin gas model from recent supernova data and baryonic acoustic oscillations |
000240874500033 | 194 | High redshift detection of the integrated Sachs–Wolfe effect |
000183786100002 | 187 | The coincidence of Friedmann integrals |
000186983100013 | 186 | Generalized chaplygin gas with alpha = 0 and the Lambda CDM cosmological model |
000241963800007 | 184 | Gravitational collapse due to dark matter and dark energy in the braneworld scenario |
000234274900033 | 182 | Comparison of the legacy and gold type Ia supernovae dataset constraints on dark energy models |
000229888900007 | 179 | Escaping the big rip? |
000228112400010 | 179 | Cosmology with interaction between phantom dark energy and dark matter and the coincidence problem |
000185229300023 | 178 | k-essence and the coincidence problem |
000244535200025 | 175 | Lemaitre–Tolman–Bondi universes as alternatives to dark energy: Does positive averaged acceleration imply positive cosmic acceleration? |
Rights and permissions
About this article
Cite this article
Glänzel, W., Thijs, B. Using hybrid methods and ‘core documents’ for the representation of clusters and topics: the astronomy dataset. Scientometrics 111, 1071–1087 (2017). https://doi.org/10.1007/s11192-017-2301-6
Received:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11192-017-2301-6