Abstract
Agglomerative hierarchical clustering can be implemented with several strategies that differ in the way elements of a collection are grouped together to build a hierarchy of clusters. Here we introduce versatile linkage, a new infinite system of agglomerative hierarchical clustering strategies based on generalized means, which go from single linkage to complete linkage, passing through arithmetic average linkage and other clustering methods yet unexplored such as geometric linkage and harmonic linkage. We compare the different clustering strategies in terms of cophenetic correlation, mean absolute error, and also tree balance and space distortion, two new measures proposed to describe hierarchical trees. Unlike the β-flexible clustering system, we show that the versatile linkage family is space-conserving.
Similar content being viewed by others
References
Aeberhard, S., Coomans, D., De Vel, O. (1992). Comparison of classifiers in high dimensional settings. Dept. Math. Statist., James Cook Univ., North Queensland, Australia, Tech. Rep. no. 92-02.
Belbin, L., Faith, D.P., Milligan, G.W. (1992). A comparison of two approaches to beta-flexible clustering. Multivariate Behavioral Research, 27(3), 417–433.
Bradley, P.E. (2010). Mumford dendrograms. The Computer Journal, 53(4), 393–404.
Contreras, P., & Murtagh, F. (2012). Fast, linear time hierarchical clustering using the Baire metric. Journal of Classification, 29(2), 118–143.
Day, W.H.E., & Edelsbrunner, H. (1984). Efficient algorithms for agglomerative hierarchical clustering methods. Journal of Classification, 1(1), 7–24.
Dubien, J.L., & Warde, W.D. (1979). A mathematical comparison of the members of an infinite family of agglomerative clustering algorithms. Canadian Journal of Statistics, 7, 29–38.
Fernández, A., & Gómez, S. (2008). Solving non-uniqueness in agglomerative hierarchical clustering using multidendrograms. Journal of Classification, 25(1), 43–65.
Fisher, R.A. (1936). The use of multiple measurements in taxonomic problems. Annals of Eugenics, 7(2), 179–188.
Gómez, S., & Fernández, A. (2018). MultiDendrograms: a hierarchical clustering tool (Version 5.0). http://deim.urv.cat/~sergio.gomez/multidendrograms.php.
Gordon, A.D. (1999). Classification, 2nd edn. Boca Raton: Chapman & Hall/CRC.
Hart, G. (1983). The occurrence of multiple UPGMA phenograms. In J. Felsenstein (Ed.) Numerical taxonomy (pp. 254–258). Berlin: Springer.
Jossinet, J. (1996). Variability of impedivity in normal and pathological breast tissue. Medical and Biological Engineering and Computing, 34(5), 346–350.
Lance, G.N., & Williams, W.T. (1966). A generalized sorting strategy for computer classifications. Nature, 212, 218.
Lance, G.N., & Williams, W.T. (1967). A general theory of classificatory sorting strategies: 1. Hierarchical systems. The Computer Journal, 9(4), 373–380.
Lichman, M. (2013). UCI machine learning repository. http://archive.ics.uci.edu/ml.
Little, M.A., McSharry, P.E., Hunter, E.J., Spielman, J., Ramig, L.O. (2009). Suitability of dysphonia measurements for telemonitoring of Parkinson’s disease. IEEE Transactions on Biomedical Engineering, 56(4), 1015–1022.
Morgan, B.J.T., & Ray, A.P.G. (1995). Non-uniqueness and inversions in cluster analysis. Journal of the Royal Statistical Society: Series C (Applied Statistics), 44(1), 117–134.
Murtagh, F. (1985). Multidimensional clustering algorithms. In Compstat lectures. Vienna: Physica-Verlag.
Murtagh, F., & Contreras, P. (2017a). Algorithms for hierarchical clustering: an overview, ii. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 7(6), e1219.
Murtagh, F., & Contreras, P. (2017b). Clustering through high dimensional data scaling: applications and implementations. Archives of Data Science, Series A, 2(1), 1–16.
Shannon, C.E. (1948). A mathematical theory of communication. The Bell System Technical Journal, 27, 379–423.
Sneath, P.H.A., & Sokal, R.R. (1973). Numerical taxonomy: the principles and practice of numerical classification. San Francisco: W. H. Freeman and Company.
Sokal, R.R., & Michener, C.D. (1958). A statistical method for evaluating systematic relationships. The University of Kansas Science Bulletin, 38, 1409–1438.
Sokal, R.R., & Rohlf, F.J. (1962). The comparison of dendrograms by objective methods. Taxon, 11(2), 33–40.
Ward, J.H. Jr. (1963). Hierarchical grouping to optimize an objective function. Journal of the American Statistical Association, 58(301), 236–244.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Fernández, A., Gómez, S. Versatile Linkage: a Family of Space-Conserving Strategies for Agglomerative Hierarchical Clustering. J Classif 37, 584–597 (2020). https://doi.org/10.1007/s00357-019-09339-z
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00357-019-09339-z