Abstract
Wide spread clustering algorithms use the Euclidean distance to measure spatial proximity. However, obstacles in other GIS data-layers prevent traversing the straight path between two points. AUTOCLUST+ clusters points in the presence of obstacles based on Voronoi modeling and Delaunay Diagrams. The algorithm is free of user-supplied arguments and incorporates global and local variations. Thus, it detects high-quality clusters (clusters of arbitrary shapes, clusters of different densities, sparse clusters adjacent to high-density clusters, multiple bridges between clusters and closely located high-density clusters) without prior knowledge. Consequently, it successfully supports correlation analyses between layers (requiring high-quality clusters) and more general locational optimization problems in the presence of obstacles. All this within O(n log n+[m+R] log n) expected time, where n is the number of data points, m is the number of line-segments that determine the obstacles and R is the number of Delaunay edges intersecting some obstacles. A series of detailed performance evaluations illustrates the power of AUTOCLUST+ and confirms the virtues of our approach.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
J. D. Banfield and A. E. Raftery. Model-based Gaussian and Non-Gaussian Clustering. Biometrics, 49:803–821, 1993.
C. Eldershaw and M. Hegland. Cluster Analysis using Triangulation. In B. J. Noye, M. D. Teubner, and A. W. Gill, editors, Computational Techniques and Applications: CTAC97, pages 201–208. World Scientific, Singapore, 1997.
M. Ester, M. P. Kriegel, J. Sander, and X. Xu. A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise. In Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining, pages 226–231, 1996.
V. Estivill-Castro. Hybrid genetic algorithms are better for spatial clustering. In Proc. 6th Pacific Rim Intern. Conf. Artificial Intelligence PRICAI 2000, Melbourne, 2000. Springer-Verlag Lecture Notes in AI, to appear.
V. Estivill-Castro and M. E. Houle. Robust Clustering of Large Geo-referenced Data Sets. In Proceedings of the 3rd Pacific-Asia Conference on Knowledge Discovery and Data Mining, pages 327–337, 1999.
V. Estivill-Castro and I. Lee. AMOEBA: Hierarchical Clustering Based on Spatial Proximity Using Delaunay Diagram. In Proceedings of the 9th International Symposium on Spatial Data Handling, 2000. to appear.
V. Estivill-Castro and I. Lee. AUTOCLUST: Automatic Clustering via Boundary Extraction for Massive Point-data Sets. In Proceedings of the 5th International Conference on Geocomputation, 2000. to appear. Extended version is available at http://www.cs.newcastle.edu.au/Dept/techrep.html as a technical report.
V. Estivill-Castro and A.T. Murray. Discovering associations in spatial data-An efficient medoid based approach. In X. Wu, R. Kotagiri, and K.K. Korb, editors, Proceedings of the 2nd Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD-98), pages 110–121, Melbourne, Australia, 1998. Springer-Verlag Lecture Notes in Artificial Intelligence 1394.
C. Fraley and A. E. Raftery. How Many Clusters? Which Clustering Method? Answers Via Model-Based Cluster Analysis. The Computer Journal, 41(8):578–588, 1998.
C. M. Gold. Problems with handling spatial data-The Voronoi approach. CISM Journal ACSGC, 45(1):65–80, 1991.
C. M. Gold. The meaning of Neighbour. In G. Tinhofer and G. Schmidt, editors, Theories and Methods of Spatio-Temporal Reasoning in Geographic Space, pages 220–235, Berlin, 1992. Springer-Verlag Lecture Notes in Computer Science 639.
M. F. Goodchild. Geographical information science. International Journal of Geographical Information Systems, 6(1):31–45, 1992.
S. Guha, R. Rastogi, and K. Shim. CURE: An Efficient Clustering Algorithm for Large Databases. In Proceedings of the ACM SIGMOD International Conference on Management of Data, pages 73–84, 1998.
I. Kang, T. Kim, and K. Li. A Spatial Data Mining Method by Delaunay Triangulation. In Proceedings of the 5th International Workshop on Advances in Geographic Information Systems (GIS-97), pages 35–39, 1997.
G. Karypis, E. Han, and V. Kumar. CHAMELEON: A Hierarchical Clustering Algorithm Using Dynamic Modeling. IEEE Computer: Special Issue on Data Analysis and Mining, 32(8):68–75, 1999.
M. Keil and G Gutwin. The Delauney triangulation closely approximates the complete graph. In F. Denhe, J.-R. Sack, and N. Snatoro, editors, Proceedings of the First Workshop on Algorithms and Data Structures WADS-89, pages 47–56. Springer-Verlag Lecture Notes in Computer Science 382, 1989.
E. M. Knorr, R. T. Ng, and D. L. Shilvock. Finding Aggregate Proximity Relationships and Commonalities in Spatial Data Minings. IEEE Transactions on Knowledge and Data Engineering, 8(6):884–897, 1996.
E. M. Knorr, R. T. Ng, and D. L. Shilvock. Finding Boundary Shape Matching Relationships in Spatial Data. In Proceedings of the 5th International Symposium on Spatial Databases, pages 29–46, Berlin, 1997. Springer-Verlag Lecture Notes in Computer Science 1262.
G. Liotta. Low Degree Algorithm for Computing and Checking Gabriel Graphs. Technical Report 96-28, Department of Computer Science, Brown University, 1996.
K. Mehlhorn and S. Näher. LEDA A platform for combinatorial and geometric computing. Cambridge University Press, Cambridge, 1999.
R. T. Ng and J. Han. Efficient and Effective Clustering Method for Spatial Data Mining. In Proceedings of the 20th International Conference on Very Large Data Bases (VLDB), pages 144–155, 1994.
A. Okabe, B. N. Boots, and K. Sugihara. Spatial Tessellations: Concepts and Applications of Voronoi Diagrams. John Wiley & Sons, West Sussex, 1992.
S. Openshaw. A Mark 1 Geographical Analysis Machine for the automated analysis of point data sets. International Journal of GIS, 1(4):335–358, 1987.
S. Openshaw. Two exploratory space-time-attribute pattern analysers relevant to GIS. In S. Fotheringham and P. Rogerson, editors, Spatial Analysis and GIS, pages 83–104. Taylor and Francis, London, 1994.
J. F. Paper and D. J. Maguire. Design models and functionality in gis. Computers and Geosciences, 18(4):387–394, 1992.
C. Posse. Hierarchical Model-Based Clustering for Large Datasets. Technical Report 363, Department of Statistics, University of Washington, 1999.
P. J. Rousseeuw and A. M. Leroy. Robust Regression and Outlier Detection. John Wiley, NY, 1987.
E. Schikuta and M. Erhart. The BANG-Clustering System: Grid-Based Data Analysis. In X. Liu, P. Cohen, and M. Berthold, editors, Proceedings of the Second International Symposium IDA-97, pages 513–524, Berlin, 1997. Advances in Intelligent Data Analysis, Springer-Verlag Lecture Notes in Computer Science 1280.
E. Son, I. Kang, and K. Li. A Spatial Data Mining Method by Clustering Analysis. In ACM-GIS 1998, pages 157–158, 1998.
A. K. H. Tung, J. Hou, and J. Han. COE: Clustering with Obstacles Entities, A Preliminary Study. In Proceedings of the 4th Pacific-Asia Conference on Knowledge Discovery and Data Mining, 2000.
W. Wang, J. Yang, and R. Muntz. STING+: An Approach to Active Spatial Data Mining. In Proceedings of the International Conference on Data Engineering, pages 116–125, 1999.
C. T. Zahn. Graph-Theoretical Methods for Detecting and Describing Gestalt Clusters. IEEE Transactions of Computers, 20(1):68–86, 1971.
T. Zhang, R. Ramakrishnan, and M. Livny. BIRCH: An Efficient Data Clustering Method for Very Large Databases. In Proceedings of the ACM SIGMOD International Conference on Management of Data, pages 103–114, 1996.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2001 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Estivill-Castro, V., Lee, I. (2001). AUTOCLUST+: Automatic Clustering of Point-Data Sets in the Presence of Obstacles. In: Roddick, J.F., Hornsby, K. (eds) Temporal, Spatial, and Spatio-Temporal Data Mining. TSDM 2000. Lecture Notes in Computer Science(), vol 2007. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45244-3_11
Download citation
DOI: https://doi.org/10.1007/3-540-45244-3_11
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-41773-6
Online ISBN: 978-3-540-45244-7
eBook Packages: Springer Book Archive