AUTOCLUST+: Automatic Clustering of Point-Data Sets in the Presence of Obstacles | SpringerLink
Skip to main content

AUTOCLUST+: Automatic Clustering of Point-Data Sets in the Presence of Obstacles

  • Conference paper
  • First Online:
Temporal, Spatial, and Spatio-Temporal Data Mining (TSDM 2000)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2007))

Abstract

Wide spread clustering algorithms use the Euclidean distance to measure spatial proximity. However, obstacles in other GIS data-layers prevent traversing the straight path between two points. AUTOCLUST+ clusters points in the presence of obstacles based on Voronoi modeling and Delaunay Diagrams. The algorithm is free of user-supplied arguments and incorporates global and local variations. Thus, it detects high-quality clusters (clusters of arbitrary shapes, clusters of different densities, sparse clusters adjacent to high-density clusters, multiple bridges between clusters and closely located high-density clusters) without prior knowledge. Consequently, it successfully supports correlation analyses between layers (requiring high-quality clusters) and more general locational optimization problems in the presence of obstacles. All this within O(n log n+[m+R] log n) expected time, where n is the number of data points, m is the number of line-segments that determine the obstacles and R is the number of Delaunay edges intersecting some obstacles. A series of detailed performance evaluations illustrates the power of AUTOCLUST+ and confirms the virtues of our approach.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. J. D. Banfield and A. E. Raftery. Model-based Gaussian and Non-Gaussian Clustering. Biometrics, 49:803–821, 1993.

    Article  MATH  MathSciNet  Google Scholar 

  2. C. Eldershaw and M. Hegland. Cluster Analysis using Triangulation. In B. J. Noye, M. D. Teubner, and A. W. Gill, editors, Computational Techniques and Applications: CTAC97, pages 201–208. World Scientific, Singapore, 1997.

    Google Scholar 

  3. M. Ester, M. P. Kriegel, J. Sander, and X. Xu. A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise. In Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining, pages 226–231, 1996.

    Google Scholar 

  4. V. Estivill-Castro. Hybrid genetic algorithms are better for spatial clustering. In Proc. 6th Pacific Rim Intern. Conf. Artificial Intelligence PRICAI 2000, Melbourne, 2000. Springer-Verlag Lecture Notes in AI, to appear.

    Google Scholar 

  5. V. Estivill-Castro and M. E. Houle. Robust Clustering of Large Geo-referenced Data Sets. In Proceedings of the 3rd Pacific-Asia Conference on Knowledge Discovery and Data Mining, pages 327–337, 1999.

    Google Scholar 

  6. V. Estivill-Castro and I. Lee. AMOEBA: Hierarchical Clustering Based on Spatial Proximity Using Delaunay Diagram. In Proceedings of the 9th International Symposium on Spatial Data Handling, 2000. to appear.

    Google Scholar 

  7. V. Estivill-Castro and I. Lee. AUTOCLUST: Automatic Clustering via Boundary Extraction for Massive Point-data Sets. In Proceedings of the 5th International Conference on Geocomputation, 2000. to appear. Extended version is available at http://www.cs.newcastle.edu.au/Dept/techrep.html as a technical report.

  8. V. Estivill-Castro and A.T. Murray. Discovering associations in spatial data-An efficient medoid based approach. In X. Wu, R. Kotagiri, and K.K. Korb, editors, Proceedings of the 2nd Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD-98), pages 110–121, Melbourne, Australia, 1998. Springer-Verlag Lecture Notes in Artificial Intelligence 1394.

    Google Scholar 

  9. C. Fraley and A. E. Raftery. How Many Clusters? Which Clustering Method? Answers Via Model-Based Cluster Analysis. The Computer Journal, 41(8):578–588, 1998.

    Article  MATH  Google Scholar 

  10. C. M. Gold. Problems with handling spatial data-The Voronoi approach. CISM Journal ACSGC, 45(1):65–80, 1991.

    Google Scholar 

  11. C. M. Gold. The meaning of Neighbour. In G. Tinhofer and G. Schmidt, editors, Theories and Methods of Spatio-Temporal Reasoning in Geographic Space, pages 220–235, Berlin, 1992. Springer-Verlag Lecture Notes in Computer Science 639.

    Google Scholar 

  12. M. F. Goodchild. Geographical information science. International Journal of Geographical Information Systems, 6(1):31–45, 1992.

    Article  Google Scholar 

  13. S. Guha, R. Rastogi, and K. Shim. CURE: An Efficient Clustering Algorithm for Large Databases. In Proceedings of the ACM SIGMOD International Conference on Management of Data, pages 73–84, 1998.

    Google Scholar 

  14. I. Kang, T. Kim, and K. Li. A Spatial Data Mining Method by Delaunay Triangulation. In Proceedings of the 5th International Workshop on Advances in Geographic Information Systems (GIS-97), pages 35–39, 1997.

    Google Scholar 

  15. G. Karypis, E. Han, and V. Kumar. CHAMELEON: A Hierarchical Clustering Algorithm Using Dynamic Modeling. IEEE Computer: Special Issue on Data Analysis and Mining, 32(8):68–75, 1999.

    Google Scholar 

  16. M. Keil and G Gutwin. The Delauney triangulation closely approximates the complete graph. In F. Denhe, J.-R. Sack, and N. Snatoro, editors, Proceedings of the First Workshop on Algorithms and Data Structures WADS-89, pages 47–56. Springer-Verlag Lecture Notes in Computer Science 382, 1989.

    Google Scholar 

  17. E. M. Knorr, R. T. Ng, and D. L. Shilvock. Finding Aggregate Proximity Relationships and Commonalities in Spatial Data Minings. IEEE Transactions on Knowledge and Data Engineering, 8(6):884–897, 1996.

    Article  Google Scholar 

  18. E. M. Knorr, R. T. Ng, and D. L. Shilvock. Finding Boundary Shape Matching Relationships in Spatial Data. In Proceedings of the 5th International Symposium on Spatial Databases, pages 29–46, Berlin, 1997. Springer-Verlag Lecture Notes in Computer Science 1262.

    Google Scholar 

  19. G. Liotta. Low Degree Algorithm for Computing and Checking Gabriel Graphs. Technical Report 96-28, Department of Computer Science, Brown University, 1996.

    Google Scholar 

  20. K. Mehlhorn and S. Näher. LEDA A platform for combinatorial and geometric computing. Cambridge University Press, Cambridge, 1999.

    MATH  Google Scholar 

  21. R. T. Ng and J. Han. Efficient and Effective Clustering Method for Spatial Data Mining. In Proceedings of the 20th International Conference on Very Large Data Bases (VLDB), pages 144–155, 1994.

    Google Scholar 

  22. A. Okabe, B. N. Boots, and K. Sugihara. Spatial Tessellations: Concepts and Applications of Voronoi Diagrams. John Wiley & Sons, West Sussex, 1992.

    MATH  Google Scholar 

  23. S. Openshaw. A Mark 1 Geographical Analysis Machine for the automated analysis of point data sets. International Journal of GIS, 1(4):335–358, 1987.

    Google Scholar 

  24. S. Openshaw. Two exploratory space-time-attribute pattern analysers relevant to GIS. In S. Fotheringham and P. Rogerson, editors, Spatial Analysis and GIS, pages 83–104. Taylor and Francis, London, 1994.

    Google Scholar 

  25. J. F. Paper and D. J. Maguire. Design models and functionality in gis. Computers and Geosciences, 18(4):387–394, 1992.

    Article  Google Scholar 

  26. C. Posse. Hierarchical Model-Based Clustering for Large Datasets. Technical Report 363, Department of Statistics, University of Washington, 1999.

    Google Scholar 

  27. P. J. Rousseeuw and A. M. Leroy. Robust Regression and Outlier Detection. John Wiley, NY, 1987.

    MATH  Google Scholar 

  28. E. Schikuta and M. Erhart. The BANG-Clustering System: Grid-Based Data Analysis. In X. Liu, P. Cohen, and M. Berthold, editors, Proceedings of the Second International Symposium IDA-97, pages 513–524, Berlin, 1997. Advances in Intelligent Data Analysis, Springer-Verlag Lecture Notes in Computer Science 1280.

    Google Scholar 

  29. E. Son, I. Kang, and K. Li. A Spatial Data Mining Method by Clustering Analysis. In ACM-GIS 1998, pages 157–158, 1998.

    Google Scholar 

  30. A. K. H. Tung, J. Hou, and J. Han. COE: Clustering with Obstacles Entities, A Preliminary Study. In Proceedings of the 4th Pacific-Asia Conference on Knowledge Discovery and Data Mining, 2000.

    Google Scholar 

  31. W. Wang, J. Yang, and R. Muntz. STING+: An Approach to Active Spatial Data Mining. In Proceedings of the International Conference on Data Engineering, pages 116–125, 1999.

    Google Scholar 

  32. C. T. Zahn. Graph-Theoretical Methods for Detecting and Describing Gestalt Clusters. IEEE Transactions of Computers, 20(1):68–86, 1971.

    Article  MATH  Google Scholar 

  33. T. Zhang, R. Ramakrishnan, and M. Livny. BIRCH: An Efficient Data Clustering Method for Very Large Databases. In Proceedings of the ACM SIGMOD International Conference on Management of Data, pages 103–114, 1996.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2001 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Estivill-Castro, V., Lee, I. (2001). AUTOCLUST+: Automatic Clustering of Point-Data Sets in the Presence of Obstacles. In: Roddick, J.F., Hornsby, K. (eds) Temporal, Spatial, and Spatio-Temporal Data Mining. TSDM 2000. Lecture Notes in Computer Science(), vol 2007. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45244-3_11

Download citation

  • DOI: https://doi.org/10.1007/3-540-45244-3_11

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-41773-6

  • Online ISBN: 978-3-540-45244-7

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics