Abstract
The unprecedented large size and high dimensionality of existing geographic datasets make the complex patterns that potentially lurk in the data hard to find. Clustering is one of the most important techniques for geographic knowledge discovery. However, existing clustering methods have two severe drawbacks for this purpose. First, spatial clustering methods focus on the specific characteristics of distributions in 2- or 3-D space, while general-purpose high-dimensional clustering methods have limited power in recognizing spatial patterns that involve neighbors. Second, clustering methods in general are not geared toward allowing the human-computer interaction needed to effectively tease-out complex patterns. In the current paper, an approach is proposed to open up the “black box” of the clustering process for easy understanding, steering, focusing and interpretation, and thus to support an effective exploration of large and high dimensional geographic data. The proposed approach involves building a hierarchical spatial cluster structure within the high-dimensional feature space, and using this combined space for discovering multi-dimensional (combined spatial and non-spatial) patterns with efficient computational clustering methods and highly interactive visualization techniques. More specifically, this includes the integration of: (1) a hierarchical spatial clustering method to generate a 1-D spatial cluster ordering that preserves the hierarchical cluster structure, and (2) a density- and grid-based technique to effectively support the interactive identification of interesting subspaces and subsequent searching for clusters in each subspace. The implementation of the proposed approach is in a fully open and interactive manner supported by various visualization techniques.
Similar content being viewed by others
References
C. Aggarwal and P. Yu. “Finding generalized projected clusters in high dimensional spaces,” ACM SIGMOD International Conference on Management of Data, 2000.
C.C. Aggarwal. “Re-designing distance functions and distance-based applications for high dimensional data,” SIGMOD Rec., Vol. 30:13–18, 2001.
C.C. Aggarwal, A. Hinneburg, and D.A. Keim, “On the surprising behavior of distance metrics in high dimensional space,” in Database Theory—ICDT 2001, Vol. 1973, Springer-Verlag: Berlin, 2001.
R. Agrawal, J. Gehrke, D. Gunopulos, and P. Raghavan. “Automatic subspace clustering of high dimensional data for data mining applications,” ACM SIGMOD International Conference on Management of Data, Seattle, WA, USA, 1998.
M. Ankerst, M.M. Breunig, H.-P. Kriegel, and J. Sander. “OPTICS: Ordering Points To Identify the Clustering Structure,” ACM SIGMOD International Conference on Management of Data, Philadelphia, PA, 1999.
M. Ankerst, M. Ester, and H.-P. Kriegel. “Towards an effective cooperation of the user and the computer for classification,” Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining, Boston, Massachusetts, United States, 2000.
S. Baase and A.V. Gelder. Computer Algorithms. Addison-Wesley, 2000.
A. Bookstein, V.A. Kulyukin, and T. Raita. “Generalized Hamming Distance,” Information Retrieval, Vol. 5:353–375, 2002.
P. Bradley, U. Fayyad, and C. Reina. “Scaling clustering algorithms to large databases,” ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York City, 1998.
C. Cheng, A. Fu, and Y. Zhang. “Entropy-based subspace clustering for mining numerical data,” ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Diego, CA, USA, 1999.
R.O. Duda, P.E. Hart, and D.G. Stork. Pattern classification. John Wiley & Sons, New York, 2000.
M. Ester, H.-P. Kriegel, J. Sander, and X. Xu. “A density-based algorithm for discovering clusters in large spatial databases with noise,” The 2nd International Conference on Knowledge Discovery and Data Mining, Portland, Oregon, 1996.
V. Estivill-Castro and I. Lee. “Amoeba: Hierarchical clustering based on spatial proximity using Delaunaty diagram,” 9th International Symposium on Spatial Data Handling, Beijing, China, 2000.
U. Fayyad, G. Piatetsky-Shapiro, and P. Smyth. “From data mining to knowledge discovery-An review,” in U. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusay (Eds.), Advances in Knowledge Discovery, AAAI Press/The MIT Press: Cambridge, MA, 1996.
C. Fraley. “Algorithms for model-based gaussian hierarchical clustering,” SIAM Journal on Scientific Computing, Vol. 20:270–281, 1998.
M. Gahegan. “On the application of inductive machine learning tools to geographical analysis,” Geographical Analysis, Vol. 32:113–139, 2000.
A.D. Gordon. “A review of hierarchical classification,” Journal of the Royal Statistical Society. Series A (General), Vol. 150:119–137, 1987.
A.D. Gordon, “Hierarchical classification,” in P. Arabie, L.J. Hubert, and G.D. Soete (Eds.), Clustering and Classification, World Scientific Publ.: River Edge, NJ, 1996.
L. Guibas and J. Stolfi. “Primitives for the manipulation of general subdivisions and the computation of Voronoi diagrams,” ACT TOG, Vol. 4: 1985.
D. Harel and Y. Koren. “Clustering spatial data using random walks,” Proceedings of the seventh conference on Proceedings of the seventh ACM SIGKDD international conference on knowledge discovery and data mining, San Francisco, California, 2001.
A. Hinneburg and D.A. Keim. “Optimal grid-clustering: towards breaking the curse of dimensionality in high-dimensional clustering,” Proceedings of the 25th VLDB Conference, Edinburgh, Scotland, 1999.
A. Inselberg. “The plane with parallel coordinates,” The Visual Computer, Vol. 1:69–97, 1985.
A.K. Jain and R.C. Dubes, Algorithms for clustering data. Prentice Hall: Englewood Cliffs, NJ, 1988.
A.K. Jain, M.N. Murty, and P.J. Flynn. “Data clustering: A review,” ACM Computing Surveys (CSUR), Vol. 31:264–323, 1999.
I.-S. Kang, T.-W. Kim, and K.-J. Li. “A spatial data mining method by Delaunay triangulation,” The 5th international workshop on Advances in geographic information systems, LasVegas, Nevada, 1997.
H.J. Miller and J. Han. “Geographic data mining and knowledge discovery: an overview,” in H.J. Miller and J. Han (Eds.), Geographic Data Mining and Knowledge Discovery, Taylor & Francis: London and New York, 2001.
R. Ng and J. Han. “Efficient and effective clustering methods for spatial data mining,” Proc. 20th International Conference on Very Large Databases, Santiago, Chile, 1994.
S. Openshaw. “Developing appropriate spatial analysis methods for GIS,” in D.J. Maguire (Ed.), Geographical Information Systems, Vol. 1: Principles, Longman/Wiley, 1991.
S. Openshaw, M. Charlton, C. Wymer, and A. Craft. “A Mark 1 geographical analysis machine for the automated analysis of point data sets,” International Journal of Geographical Information Science, Vol. 1:335–358, 1987.
D.J. Peuquet. Representations of Space and Time. New York: Guilford Press, 2002.
C.M. Procopiuc, M. Jones, P.K. Agarwal, and T.M. Murali. “A Monte Carlo algorithm for fast projective clustering,” ACM SIGMOD International Conference on Management of Data, Madison, Wisconsin, USA, 2002.
E. Schikuta. “Grid clustering: An efficient hierarchical clustering method for very large data sets,” 13th Conf. on Pattern Recognition, Vol. 2, 1996.
T.A. Slocum. Thematic Cartography and Visualization. Upper Saddle River, N.J.: Prentice Hall, 1999.
A.K.H. Tung, J. Hou, and J. Han. “Spatial clustering in the presence of obstacles,” The 17th International Conference on Data Engineering (ICDE'01), 2001.
S. Vaithyanathan and B. Dom. “Model-based hierarchical clustering,” The Sixteenth Conference on Uncertainty in Artificial Intelligence, Stanford, CA, 2000.
D. Vandev and Y.G. Tsvetanova. “Perfect chains and single linkage clustering algorithm,” Statistical Data Analysis, Proceedings SDA-95, 1995.
W. Wang, J. Yang, and R. Muntz. “STING: A statistical information grid approach to spatial data mining,” 23rd Int. Conf on Very Large Data Bases, Athens, Greece, 1997.
C. Zhang and Y. Murayama. “Testing local spatial autocorrelation using k-order neighbors,” International Journal of Geographical Information Science, Vol. 14:681–692, 2000.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Guo, D., Peuquet, D.J. & Gahegan, M. ICEAGE: Interactive Clustering and Exploration of Large and High-Dimensional Geodata. GeoInformatica 7, 229–253 (2003). https://doi.org/10.1023/A:1025101015202
Issue Date:
DOI: https://doi.org/10.1023/A:1025101015202