Author:
David Allen Olsen
Affiliation:
University of Minnesota-Twin Cities, United States
Keyword(s):
Intelligent Control Systems, Hierarchical Clustering, Hierarchical Sequence, Complete Linkage, Meaningful Level, Meaningful Cluster Set, Distance Graphs, Noise Attenuation.
Related
Ontology
Subjects/Areas/Topics:
Informatics in Control, Automation and Robotics
;
Intelligent Control Systems and Optimization
;
Intelligent Fault Detection and Identification
;
Machine Learning in Control Applications
Abstract:
When the assumptions underlying the standard complete linkage method are unwound, the size of a hierarchical sequence reverts back from n levels to n(n-1)/2 +1 levels, and the time complexity to construct a hierarchical sequence of cluster sets becomes O(n^4). Moreover, the post hoc heuristics for cutting dendrograms are not suitable for finding meaningful cluster sets of an n(n-1)/2 +1-level hierarchical sequence. To overcome these problems for small-n, large-m data sets, the project described in this paper went back more than 60 years to solve a problem that could not be solved then. This paper presents a means for finding meaningful levels of an n(n-1)/2 +1-level hierarchical sequence prior to performing a cluster analysis. By finding meaningful levels of such a hierarchical sequence prior to performing a cluster analysis, it is possible to know which cluster sets to construct and construct only these cluster sets. This paper also shows how increasing the dimensionality of the
data points helps reveal inherent structure in noisy data. The means is theoretically validated. Empirical results from four experiments show that finding meaningful levels of a hierarchical sequence is easy and that meaningful cluster sets can have real world meaning.
(More)