Abstract
In an information rich world, the task of data analysis is becoming ever more complex. Even with the processing capability of modern technology, more often than not, important details become saturated and thus, lost amongst the volume of data. With analysis problems ranging from discovering credit card fraud to tracking terrorist activities the phrase “a needle in a haystack” has never been more apt. In order to deal with large data sets current approaches require that the data be sampled or summarised before true analysis can take place. In this paper we propose a novel pyramidic method, namely, copasetic clustering, which focuses on the problem of applying traditional clustering techniques to large-scale data sets while using limited resources. A further benefit of the technique is the transparency into intermediate clustering steps; when applied to spatial data sets this allows the capture of contextual information. The abilities of this technique are demonstrated using both synthetic and biological data.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Berkhin, P.: Survey of clustering data mining techniques. In: Accrue Software, San Jose, CA (2002)
McQueen, J.: Some methods for classification and analysis of multivariate observations. In: Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability, pp. 281–297 (1967)
Dunn, C.J.: A fuzzy relative of ISODATA process and its use in detecting compact well-separated clusters. Cybernetics 3(3), 32–57 (1974)
Wann, D.C., Thomopoulos, A.S.: A comparative study of self-organising clustering algorithms Dignet and ART2. Neural Networks 10(4), 737–743 (1997)
DuMouchel, W., Volinsky, C., Johnson, T., Cortes, C., Pregibon, D.: Squashing flat files flatter. In: Proceedings of the 5th ACM SIGKDD, pp. 6–15 (1999)
Motwani, R., Raghavan, P.: Randomised algorithms. Cambridge University Press, Cambridge (1995)
Moore, K.S.: Making Chips. IEEE Spectrum, 54–60 (2001)
Orengo, A.C., Jones, D.T., Thorton, M.J.: Bioinformatics: Genes, proteins & computers, pp. 217–244. BIOS scientific publishers limited (2003)
The chipping forecast II. Nature Genetics Supplement, 461–552 (2002)
Yang, H.Y., Buckley, J.M., Dudoit, S., Speed, P.T.: Comparison of methods for image analysis on cDNA microarray data. J. Comput. Graphical Stat. 11, 108–136 (2002)
Netravali, N.A., Haskell, G.B.: Digital pictures: Representation, compression and standards, 2nd edn. Plenum Press, New York (1995)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Fraser, K., O’Neill, P., Wang, Z., Liu, X. (2004). “Copasetic Clustering”: Making Sense of Large-Scale Images. In: Shi, Y., Xu, W., Chen, Z. (eds) Data Mining and Knowledge Management. CASDMKM 2004. Lecture Notes in Computer Science(), vol 3327. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30537-8_11
Download citation
DOI: https://doi.org/10.1007/978-3-540-30537-8_11
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-23987-1
Online ISBN: 978-3-540-30537-8
eBook Packages: Computer ScienceComputer Science (R0)