{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,4,13]],"date-time":"2025-04-13T12:06:31Z","timestamp":1744545991214},"reference-count":97,"publisher":"Wiley","issue":"3","license":[{"start":{"date-parts":[[2021,12,23]],"date-time":"2021-12-23T00:00:00Z","timestamp":1640217600000},"content-version":"vor","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by-nc\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100002347","name":"Bundesministerium f\u00fcr Bildung und Forschung","doi-asserted-by":"publisher","award":["01IS18036A"],"id":[{"id":"10.13039\/501100002347","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001659","name":"Deutsche Forschungsgemeinschaft","doi-asserted-by":"publisher","award":["BO3139\/7\u20101"],"id":[{"id":"10.13039\/501100001659","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["wires.onlinelibrary.wiley.com"],"crossmark-restriction":true},"short-container-title":["WIREs Data Min & Knowl"],"published-print":{"date-parts":[[2022,5]]},"abstract":"Abstract<\/jats:title>Cluster analysis refers to a wide range of data analytic techniques for class discovery and is popular in many application fields. To assess the quality of a clustering result, different cluster validation procedures have been proposed in the literature. While there is extensive work on classical validation techniques, such as internal and external validation, less attention has been given to validating and replicating a clustering result using a validation dataset. Such a dataset may be part of the original dataset, which is separated before analysis begins, or it could be an independently collected dataset. We present a systematic, structured review of the existing literature about this topic. For this purpose, we outline a formal framework that covers most existing approaches for validating clustering results on validation data. In particular, we review classical validation techniques such as internal and external validation, stability analysis, and visual validation, and show how they can be interpreted in terms of our framework. We define and formalize different types of validation of clustering results on a validation dataset, and give examples of how clustering studies from the applied literature that used a validation dataset can be seen as instances of our framework.<\/jats:p>This article is categorized under:\nTechnologies > Structure Discovery and Clustering<\/jats:p><\/jats:list-item>\nAlgorithmic Development > Statistics<\/jats:p><\/jats:list-item>\nTechnologies > Machine Learning<\/jats:p><\/jats:list-item>\n<\/jats:list><\/jats:p>","DOI":"10.1002\/widm.1444","type":"journal-article","created":{"date-parts":[[2021,12,24]],"date-time":"2021-12-24T02:29:43Z","timestamp":1640312983000},"update-policy":"http:\/\/dx.doi.org\/10.1002\/crossmark_policy","source":"Crossref","is-referenced-by-count":61,"title":["Validation of cluster analysis results on validation data: A systematic framework"],"prefix":"10.1002","volume":"12","author":[{"ORCID":"http:\/\/orcid.org\/0000-0003-1215-8561","authenticated-orcid":false,"given":"Theresa","family":"Ullmann","sequence":"first","affiliation":[{"name":"Institute for Medical Information Processing, Biometry and Epidemiology Ludwig\u2010Maximilians\u2010Universit\u00e4t M\u00fcnchen Munich Germany"}]},{"ORCID":"http:\/\/orcid.org\/0000-0003-1550-5637","authenticated-orcid":false,"given":"Christian","family":"Hennig","sequence":"additional","affiliation":[{"name":"Dipartimento di Scienze Statistiche \u201cPaolo Fortunati\u201d Universita di Bologna Bologna Italy"}]},{"ORCID":"http:\/\/orcid.org\/0000-0002-2729-0947","authenticated-orcid":false,"given":"Anne\u2010Laure","family":"Boulesteix","sequence":"additional","affiliation":[{"name":"Institute for Medical Information Processing, Biometry and Epidemiology Ludwig\u2010Maximilians\u2010Universit\u00e4t M\u00fcnchen Munich Germany"}]}],"member":"311","published-online":{"date-parts":[[2021,12,23]]},"reference":[{"key":"e_1_2_11_2_1","doi-asserted-by":"publisher","DOI":"10.1509\/jmkg.65.1.71.18132"},{"key":"e_1_2_11_3_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11222-020-09958-2"},{"key":"e_1_2_11_4_1","doi-asserted-by":"publisher","DOI":"10.1007\/s00357-006-0017-z"},{"key":"e_1_2_11_5_1","doi-asserted-by":"publisher","DOI":"10.1177\/117693510600200006"},{"key":"e_1_2_11_6_1","doi-asserted-by":"publisher","DOI":"10.1038\/483531a"},{"key":"e_1_2_11_7_1","doi-asserted-by":"publisher","DOI":"10.1007\/11776420_4"},{"key":"e_1_2_11_8_1","first-page":"6","article-title":"A stability based method for discovering structure in clustered data","volume":"7","author":"Ben\u2010Hur A.","year":"2002","journal-title":"Pacific Symposium on Biocomputing"},{"key":"e_1_2_11_9_1","doi-asserted-by":"publisher","DOI":"10.1016\/S0005-7967(99)00175-8"},{"key":"e_1_2_11_10_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.csda.2004.10.012"},{"key":"e_1_2_11_11_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-55723-6_6"},{"key":"e_1_2_11_12_1","doi-asserted-by":"publisher","DOI":"10.4137\/CIN.S408"},{"key":"e_1_2_11_13_1","doi-asserted-by":"publisher","DOI":"10.1207\/s15327906mbr2402_1"},{"key":"e_1_2_11_14_1","doi-asserted-by":"publisher","DOI":"10.1207\/S15327906MBR3502_5"},{"key":"e_1_2_11_15_1","doi-asserted-by":"publisher","DOI":"10.1177\/0093854812456777"},{"key":"e_1_2_11_16_1","doi-asserted-by":"publisher","DOI":"10.1158\/1078-0432.CCR-14-0432"},{"issue":"1","key":"e_1_2_11_17_1","first-page":"1","article-title":"A dendrite method for cluster analysis","volume":"3","author":"Cali\u0144ski T.","year":"1974","journal-title":"Communications in Statistics"},{"key":"e_1_2_11_18_1","doi-asserted-by":"publisher","DOI":"10.1126\/science.aaf0918"},{"key":"e_1_2_11_19_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-0-387-71762-3"},{"key":"e_1_2_11_20_1","doi-asserted-by":"publisher","DOI":"10.1038\/nature10983"},{"key":"e_1_2_11_21_1","doi-asserted-by":"publisher","DOI":"10.1007\/s00357-019-09328-2"},{"key":"e_1_2_11_22_1","doi-asserted-by":"publisher","DOI":"10.1080\/08870449808407432"},{"key":"e_1_2_11_23_1","doi-asserted-by":"publisher","DOI":"10.1016\/S0306-4573(00)00051-0"},{"key":"e_1_2_11_24_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11002-009-9083-4"},{"key":"e_1_2_11_25_1","doi-asserted-by":"publisher","DOI":"10.2174\/138920207780076956"},{"key":"e_1_2_11_26_1","doi-asserted-by":"publisher","DOI":"10.1142\/9789814343138_0001"},{"key":"e_1_2_11_27_1","doi-asserted-by":"publisher","DOI":"10.1186\/gb-2002-3-7-research0036"},{"key":"e_1_2_11_28_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.csda.2011.09.003"},{"key":"e_1_2_11_29_1","unstructured":"F\u00e4rber I. G\u00fcnnemann S. Kriegel H.\u2010P. Kr\u00f6ger P. M\u00fcller E. Schubert E. Seidl T. &Zimek A.(2010).On using class\u2010labels in evaluation of clusterings. InMultiClust: 1st international workshop on discovering summarizing and using multiple clusterings held in conjunction with KDD Washington DC."},{"key":"e_1_2_11_30_1","doi-asserted-by":"publisher","DOI":"10.1080\/01621459.1983.10478008"},{"key":"e_1_2_11_31_1","doi-asserted-by":"publisher","DOI":"10.1186\/1471-2105-10-234"},{"key":"e_1_2_11_32_1","doi-asserted-by":"publisher","DOI":"10.1080\/10618600.2019.1647846"},{"key":"e_1_2_11_33_1","doi-asserted-by":"publisher","DOI":"10.1158\/2159-8290.CD-18-1177"},{"key":"e_1_2_11_34_1","doi-asserted-by":"publisher","DOI":"10.1037\/h0028467"},{"key":"e_1_2_11_35_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-4-431-65950-1_2"},{"key":"e_1_2_11_36_1","doi-asserted-by":"publisher","DOI":"10.1093\/biomet\/55.3.582"},{"key":"e_1_2_11_37_1","doi-asserted-by":"publisher","DOI":"10.1002\/smj.865"},{"key":"e_1_2_11_38_1","doi-asserted-by":"publisher","DOI":"10.1198\/jcgs.2010.09139"},{"key":"e_1_2_11_39_1","doi-asserted-by":"publisher","DOI":"10.1145\/565117.565124"},{"key":"e_1_2_11_40_1","first-page":"616","volume-title":"Handbook of cluster analysis","author":"Halkidi M.","year":"2015"},{"key":"e_1_2_11_41_1","doi-asserted-by":"publisher","DOI":"10.1093\/bioinformatics\/bti517"},{"key":"e_1_2_11_42_1","doi-asserted-by":"publisher","DOI":"10.1027\/1614-2241\/a000173"},{"key":"e_1_2_11_43_1","doi-asserted-by":"publisher","DOI":"10.1111\/rssa.12493"},{"key":"e_1_2_11_44_1","doi-asserted-by":"publisher","DOI":"10.1198\/106186004X12740"},{"key":"e_1_2_11_45_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.csda.2006.11.025"},{"key":"e_1_2_11_46_1","doi-asserted-by":"publisher","DOI":"10.1201\/b19706-40"},{"key":"e_1_2_11_47_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.patrec.2015.04.009"},{"key":"e_1_2_11_48_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11222-015-9566-5"},{"key":"e_1_2_11_49_1","doi-asserted-by":"publisher","DOI":"10.1098\/rsos.201925"},{"key":"e_1_2_11_50_1","doi-asserted-by":"publisher","DOI":"10.1509\/jmkg.72.2.133"},{"key":"e_1_2_11_51_1","first-page":"336","volume-title":"Handbook of cluster analysis","author":"Huang H.","year":"2015"},{"key":"e_1_2_11_52_1","doi-asserted-by":"publisher","DOI":"10.1007\/BF01908075"},{"key":"e_1_2_11_53_1","doi-asserted-by":"publisher","DOI":"10.1126\/science.359.6377.725"},{"key":"e_1_2_11_54_1","first-page":"223","article-title":"Nouvelles recherches sur la distribution florale","volume":"44","author":"Jaccard P.","year":"1908","journal-title":"Bulletin de la Societe Vaudoise des Sciences Naturelles"},{"key":"e_1_2_11_55_1","doi-asserted-by":"publisher","DOI":"10.1016\/0031-3203(87)90081-1"},{"key":"e_1_2_11_56_1","doi-asserted-by":"publisher","DOI":"10.1007\/BF00848262"},{"key":"e_1_2_11_57_1","doi-asserted-by":"publisher","DOI":"10.1038\/s41598-020-58766-1"},{"key":"e_1_2_11_58_1","doi-asserted-by":"publisher","DOI":"10.1080\/08870440008402003"},{"key":"e_1_2_11_59_1","doi-asserted-by":"publisher","DOI":"10.1186\/1471-2164-7-231"},{"key":"e_1_2_11_60_1","doi-asserted-by":"publisher","DOI":"10.1093\/biostatistics\/kxj029"},{"key":"e_1_2_11_61_1","volume-title":"Finding groups in data: An introduction to cluster analysis","author":"Kaufman L.","year":"2009"},{"key":"e_1_2_11_62_1","doi-asserted-by":"publisher","DOI":"10.1162\/089976604773717621"},{"key":"e_1_2_11_63_1","doi-asserted-by":"publisher","DOI":"10.1172\/JCI45014"},{"key":"e_1_2_11_64_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-540-33037-0_22"},{"key":"e_1_2_11_65_1","doi-asserted-by":"publisher","DOI":"10.1162\/089976601753196030"},{"key":"e_1_2_11_66_1","doi-asserted-by":"publisher","DOI":"10.1093\/bioinformatics\/btl184"},{"key":"e_1_2_11_67_1","doi-asserted-by":"publisher","DOI":"10.1109\/TIT.1982.1056489"},{"key":"e_1_2_11_68_1","doi-asserted-by":"publisher","DOI":"10.1207\/s15327906mbr1502_7"},{"key":"e_1_2_11_69_1","first-page":"640","volume-title":"Handbook of cluster analysis","author":"Meila M.","year":"2015"},{"key":"e_1_2_11_70_1","doi-asserted-by":"publisher","DOI":"10.1177\/014662168701100401"},{"key":"e_1_2_11_71_1","doi-asserted-by":"publisher","DOI":"10.1201\/9781420034912"},{"key":"e_1_2_11_72_1","doi-asserted-by":"publisher","DOI":"10.1023\/A:1023949509487"},{"key":"e_1_2_11_73_1","doi-asserted-by":"publisher","DOI":"10.1207\/s15327906mbr1803_4"},{"key":"e_1_2_11_74_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.foodqual.2013.12.004"},{"key":"e_1_2_11_75_1","doi-asserted-by":"publisher","DOI":"10.1371\/journal.pbio.3000691"},{"key":"e_1_2_11_76_1","doi-asserted-by":"publisher","DOI":"10.1126\/science.aac4716"},{"key":"e_1_2_11_77_1","doi-asserted-by":"publisher","DOI":"10.1177\/0739986305280692"},{"key":"e_1_2_11_78_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.breast.2015.07.008"},{"key":"e_1_2_11_79_1","doi-asserted-by":"publisher","DOI":"10.1080\/01621459.1971.10482356"},{"key":"e_1_2_11_80_1","doi-asserted-by":"publisher","DOI":"10.1177\/001316447303300404"},{"key":"e_1_2_11_81_1","doi-asserted-by":"publisher","DOI":"10.1093\/bioinformatics\/btp543"},{"key":"e_1_2_11_82_1","doi-asserted-by":"publisher","DOI":"10.1093\/jnci\/95.1.14"},{"key":"e_1_2_11_83_1","doi-asserted-by":"publisher","DOI":"10.1037\/0021-9010.90.6.1280"},{"key":"e_1_2_11_84_1","doi-asserted-by":"publisher","DOI":"10.1016\/0031-3203(80)90042-4"},{"key":"e_1_2_11_85_1","doi-asserted-by":"publisher","DOI":"10.1073\/pnas.0932692100"},{"key":"e_1_2_11_86_1","doi-asserted-by":"publisher","DOI":"10.1073\/pnas.1732912100"},{"key":"e_1_2_11_87_1","doi-asserted-by":"publisher","DOI":"10.1161\/CIRCRESAHA.118.313911"},{"key":"e_1_2_11_88_1","doi-asserted-by":"publisher","DOI":"10.1214\/ss\/1056397488"},{"key":"e_1_2_11_89_1","doi-asserted-by":"publisher","DOI":"10.1198\/106186005X59243"},{"key":"e_1_2_11_90_1","doi-asserted-by":"publisher","DOI":"10.1111\/j.1467-9868.2009.00706.x"},{"key":"e_1_2_11_91_1","unstructured":"Van Mechelen I. Boulesteix A.\u2010L. Dangl R. Dean N. Guyon I. Hennig C. Leisch F.&Steinley D.(2018). Benchmarking in cluster analysis: A white paper. arXiv preprint arXiv:1809.10496."},{"key":"e_1_2_11_92_1","volume-title":"Clustering stability: An overview","author":"Von Luxburg U.","year":"2010"},{"key":"e_1_2_11_93_1","first-page":"65","volume-title":"Proceedings of ICML Workshop on Unsupervised and Transfer Learning, volume 27 of Proceedings of Machine Learning Research","author":"Von Luxburg U.","year":"2012"},{"key":"e_1_2_11_94_1","doi-asserted-by":"publisher","DOI":"10.1093\/biomet\/asq061"},{"key":"e_1_2_11_95_1","doi-asserted-by":"publisher","DOI":"10.1198\/tas.2009.0033"},{"key":"e_1_2_11_96_1","doi-asserted-by":"publisher","DOI":"10.1038\/ncomms6355"},{"key":"e_1_2_11_97_1","doi-asserted-by":"publisher","DOI":"10.1186\/s12918-014-0136-9"},{"key":"e_1_2_11_98_1","doi-asserted-by":"publisher","DOI":"10.1002\/widm.1330"}],"container-title":["WIREs Data Mining and Knowledge Discovery"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/onlinelibrary.wiley.com\/doi\/pdf\/10.1002\/widm.1444","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/onlinelibrary.wiley.com\/doi\/full-xml\/10.1002\/widm.1444","content-type":"application\/xml","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/wires.onlinelibrary.wiley.com\/doi\/pdf\/10.1002\/widm.1444","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,8,25]],"date-time":"2023-08-25T22:08:05Z","timestamp":1693001285000},"score":1,"resource":{"primary":{"URL":"https:\/\/wires.onlinelibrary.wiley.com\/doi\/10.1002\/widm.1444"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,12,23]]},"references-count":97,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2022,5]]}},"alternative-id":["10.1002\/widm.1444"],"URL":"https:\/\/doi.org\/10.1002\/widm.1444","archive":["Portico"],"relation":{},"ISSN":["1942-4787","1942-4795"],"issn-type":[{"value":"1942-4787","type":"print"},{"value":"1942-4795","type":"electronic"}],"subject":[],"published":{"date-parts":[[2021,12,23]]},"assertion":[{"value":"2021-03-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2021-11-27","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2021-12-23","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}