A Fast Two-Level Approximate Euclidean Minimum Spanning Tree Algorithm for High-Dimensional Data | SpringerLink
Skip to main content

A Fast Two-Level Approximate Euclidean Minimum Spanning Tree Algorithm for High-Dimensional Data

  • Conference paper
  • First Online:
Machine Learning and Data Mining in Pattern Recognition (MLDM 2018)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10935))

  • 2110 Accesses

Abstract

Euclidean minimum spanning tree algorithms run typically with quadratic computational complexity, which is not practical for large scale high dimensional datasets. In this paper, we propose a new two-level approximate Euclidean minimum spanning tree algorithm for high dimensional data. In the first level, we perform outlier detection for a given data set to identify a small amount of boundary points and run standard Prim’s algorithm on the reduced dataset. In the second level, we conduct a k-nearest neighbors search to complete an approximate Euclidean Minimum Spanning Tree construction process. Experimental results on sample data sets demonstrate the efficiency of the proposed method while keeping high approximate precision.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
JPY 3498
Price includes VAT (Japan)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
JPY 8464
Price includes VAT (Japan)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
JPY 10581
Price includes VAT (Japan)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Kruskal, J.B.: On the shortest spanning subtree of a graph and the traveling salesman problem. Proc. Am. Math. Soc. 7, 48–50 (1956)

    Article  MathSciNet  Google Scholar 

  2. Prim, R.C.: Shortest connection networks and some generalizations. Bell Syst. Tech. J. 36, 567–574 (1957)

    Article  Google Scholar 

  3. An, L., Xiang, Q.S., Chavez, S.: A fast implementation of the method for phase unwrapping. IEEE Trans. Med. Imaging 19(8), 805–808 (2000)

    Article  Google Scholar 

  4. Xu, Y., Uberbacher, E.C.: 2D image segmentation using minimum spanning trees. Image Vis. Comput. 15, 47–57 (1997)

    Article  Google Scholar 

  5. Zahn, C.T.: Graph-theoretical methods for detecting and describing gestalt clusters. IEEE Trans. Comput. C20, 68–86 (1971)

    Article  Google Scholar 

  6. Xu, Y., Olman, V., Xu, D.: Clustering gene expression data using a graph-theoretic approach: an application of minimum spanning trees. Bioinformatics 18(4), 536–545 (2002)

    Article  Google Scholar 

  7. Zhong, C., Miao, D., Wang, R.: A graph-theoretical clustering method based on two rounds of minimum spanning trees. Pattern Recognit. 43(3), 752–766 (2010)

    Article  Google Scholar 

  8. Juszczak, P., Tax, D.M.J., Pe¸kalska, E., Duin, R.P.W.: Minimum spanning tree based one-class classifier. Neurocomputing 72, 1859–1869 (2009)

    Article  Google Scholar 

  9. Yang, C.L.: Building k edge-disjoint spanning trees of minimum total length for isometric data embedding. IEEE Trans. Pattern Anal. Mach. Intell. 27(10), 1680–1683 (2005)

    Article  MathSciNet  Google Scholar 

  10. Gower, J.C., Ross, G.J.S.: Minimum spanning trees and single linkage cluster analysis. Appl. Stat. 18(1), 54–64 (1969)

    Article  MathSciNet  Google Scholar 

  11. Balcan, M., Blum, A., Vempala, S.: A discriminative framework for clustering via similarity functions. In: Proceedings of ACM Symposium on Theory of Computing, pp. 671–680 (2008)

    Google Scholar 

  12. Bor°uvka, O.: O jist´em probl´emu minim´aln´ım (About a Certain Minimal Problem). Pr´ace moravsk´e pˇr´ırodovˇedeck´e spoleˇcnosti v Brnˇe. III, pp. 37–58 (1926). (in Czech with German summary)

    Google Scholar 

  13. Jarn´ık, V.: O jist´em probl´emu minim´aln´ım (About a Certain Minimal Problem). Pr´ace moravsk´e pˇr´ırodovˇedeck´e spoleˇcnosti v Brnˇe VI, pp. 57–63 (1930). (in Czech)

    Google Scholar 

  14. Dijkstra, E.W.: A note on two problems in connexion with graphs. Numer. Math. 1(1), 269–271 (1959)

    Article  MathSciNet  Google Scholar 

  15. Bentley, J., Friedman, J.: Fast algorithms for constructing minimal spanning trees in coordinate spaces. IEEE Trans. Comput. 27, 97–105 (1978)

    Article  Google Scholar 

  16. Preparata, F.P., Shamos, M.I.: Computational Geometry. Springer, New York (1985). https://doi.org/10.1007/978-1-4612-1098-6

    Book  MATH  Google Scholar 

  17. Callahan, P., Kosaraju, S.: Faster algorithms for some geometric graph problems in higher dimensions. In: Proceedings of 4th Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 291–300 (1993)

    Google Scholar 

  18. Narasimhan, G., Zachariasen, M., Zhu, J.: Experiments with computing geometric minimum spanning trees. In: Proceedings of ALENEX 2000, pp. 183–196 (2000)

    Google Scholar 

  19. March, W.B., Ram, P., Gray, A.G.: Fast euclidean minimum spanning tree: algorithm, analysis, and applications. In: Proceedings of 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), Washington, pp. 603–612 (2010)

    Google Scholar 

  20. Vaidya, P.M.: Minimum spanning trees in k-dimensional space. SIAM J. Comput. 17(3), 572–582 (1988)

    Article  MathSciNet  Google Scholar 

  21. Wang, X., Wang, X., Wilkes, D.M.: A divide-and-conquer approach for minimum spanning tree-based clustering. IEEE Trans. Knowl. Data Eng. 21(7), 945–958 (2009)

    Article  Google Scholar 

  22. Lai, C., Rafa, T., Nelson, D.E.: Approximate minimum spanning tree clustering in high-dimensional space. Intell. Data Anal. 13, 575–597 (2009)

    Google Scholar 

  23. Wang, X., Wang, X.L., Zhu, J.: A new fast minimum spanning tree based clustering technique. In: Proceedings of the 2014 IEEE International Workshop on Scalable Data Analytics, 14–17 December, Shenzhen, China (2014)

    Google Scholar 

  24. Zhong, C., Malinen, M., Miao, D., Fränti, P.: A fast minimum spanning tree algorithm based on K-means. Inf. Sci. 295(C), 1–17 (2015)

    Article  MathSciNet  Google Scholar 

  25. Ester, M., Kriegel, H.-P., Sander, J., Xu, X., Simoudis, E., Han, J., Fayyad, U.M. (eds.): A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (KDD-96), pp. 226–231. AAAI Press (1996)

    Google Scholar 

  26. Breunig, M.M., Kriegel, H.P., Ng, R.T., Sander, J.: LOF: identifying density-based local outliers. In: Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, pp. 93–104 (2000)

    Google Scholar 

  27. Tang, J., Chen, Z., Fu, A.W.-C., Cheung, David W.: Enhancing effectiveness of outlier detections for low density patterns. In: Chen, M.-S., Yu, P.S., Liu, B. (eds.) PAKDD 2002. LNCS (LNAI), vol. 2336, pp. 535–548. Springer, Heidelberg (2002). https://doi.org/10.1007/3-540-47887-6_53

    Chapter  Google Scholar 

  28. Papadimitriou, S., Kitagawa, H., Gibbons, P.B., Faloutsos, C.: LOCI: fast outlier detection using the local correlation integral. In: Proceedings of the IEEE 19th International Conference on Data Engineering, Bangalore, India, pp. 315–328 (2003)

    Google Scholar 

  29. Sun, P., Chawla, S.: On local spatial outliers. In: Proceedings of the 4th International Conference on Data Mining (ICDM), Brighton, UK, pp. 209–216 (2004)

    Google Scholar 

  30. Jin, W., Tung, A.K.H., Han, J., Wang, W.: Ranking outliers using symmetric neighborhood relationship. In: Ng, W.-K., Kitsuregawa, M., Li, J., Chang, K. (eds.) PAKDD 2006. LNCS (LNAI), vol. 3918, pp. 577–593. Springer, Heidelberg (2006). https://doi.org/10.1007/11731139_68

    Chapter  Google Scholar 

  31. Fan, H., Zaïane, O.R., Foss, A., Wu, J.: A nonparametric outlier detection for effectively discovering top-n outliers from engineering data. In: Ng, W.-K., Kitsuregawa, M., Li, J., Chang, K. (eds.) PAKDD 2006. LNCS (LNAI), vol. 3918, pp. 557–566. Springer, Heidelberg (2006). https://doi.org/10.1007/11731139_66

    Chapter  Google Scholar 

  32. Latecki, L.J., Lazarevic, A., Pokrajac, D.: Outlier detection with kernel density functions. In: Perner, P. (ed.) MLDM 2007. LNCS (LNAI), vol. 4571, pp. 61–75. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-73499-4_6

    Chapter  Google Scholar 

  33. Zhang, K., Hutter, M., Jin, H.: A new local distance-based outlier detection approach for scattered real-world data. In: Theeramunkong, T., Kijsirikul, B., Cercone, N., Ho, T.-B. (eds.) PAKDD 2009. LNCS (LNAI), vol. 5476, pp. 813–822. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-01307-2_84

    Chapter  Google Scholar 

  34. Huang, H., Mehrotra, K., Mohan, C.K.: Rank-based outlier detection. J. Stat. Comput. Simul. 1–14 (2011)

    Google Scholar 

  35. Kriegel, H.-P., Schubert, M., Zimek, A.: Angle-based outlier detection in high-dimensional data. In: Proceedings KDD 2008, pp. 444–452 (2008)

    Google Scholar 

  36. Pham, N., Pagh, R.: A near-linear time approximation algorithm for angle-based outlier detection in high-dimensional data. In: Proceedings of the 18th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pp. 877–885 (2012)

    Google Scholar 

  37. Yu, C., Chin Ooi, B., Tan, K.L., Jagadish, H.V.: Indexing the distance: an efficient method to KNN processing. In: Proceedings of the 27th International Conference on Very Large Data Bases, Roma, Italy, pp. 421–430 (2001)

    Google Scholar 

  38. Jagadish, H.V., Chin Ooi, B., Tan, K.L., Yu, C., Zhang, R.: iDistance: an adaptive B+-tree based indexing method for nearest neighbor search. ACM Trans. Data Base Syst. (ACM TODS) 30(2), 364–397 (2005)

    Article  Google Scholar 

  39. Karypis, G., Han, E.H., Kumar, V.: CHAMELEON: a hierarchical clustering algorithm using dynamic modeling. IEEE Comput. 32(8), 68–75 (1999)

    Article  Google Scholar 

  40. http://archive.ics.uci.edu/ml/datasets.html

Download references

Acknowledgment

The authors would like to thank the Chinese National Science Foundation for its valuable support of this work under award 61473220 and all the anonymous reviewers for their valuable comments.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xia Li Wang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG, part of Springer Nature

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Wang, X.L., Wang, X., Li, X. (2018). A Fast Two-Level Approximate Euclidean Minimum Spanning Tree Algorithm for High-Dimensional Data. In: Perner, P. (eds) Machine Learning and Data Mining in Pattern Recognition. MLDM 2018. Lecture Notes in Computer Science(), vol 10935. Springer, Cham. https://doi.org/10.1007/978-3-319-96133-0_21

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-96133-0_21

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-96132-3

  • Online ISBN: 978-3-319-96133-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics