Abstract
Euclidean minimum spanning tree algorithms run typically with quadratic computational complexity, which is not practical for large scale high dimensional datasets. In this paper, we propose a new two-level approximate Euclidean minimum spanning tree algorithm for high dimensional data. In the first level, we perform outlier detection for a given data set to identify a small amount of boundary points and run standard Prim’s algorithm on the reduced dataset. In the second level, we conduct a k-nearest neighbors search to complete an approximate Euclidean Minimum Spanning Tree construction process. Experimental results on sample data sets demonstrate the efficiency of the proposed method while keeping high approximate precision.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Kruskal, J.B.: On the shortest spanning subtree of a graph and the traveling salesman problem. Proc. Am. Math. Soc. 7, 48–50 (1956)
Prim, R.C.: Shortest connection networks and some generalizations. Bell Syst. Tech. J. 36, 567–574 (1957)
An, L., Xiang, Q.S., Chavez, S.: A fast implementation of the method for phase unwrapping. IEEE Trans. Med. Imaging 19(8), 805–808 (2000)
Xu, Y., Uberbacher, E.C.: 2D image segmentation using minimum spanning trees. Image Vis. Comput. 15, 47–57 (1997)
Zahn, C.T.: Graph-theoretical methods for detecting and describing gestalt clusters. IEEE Trans. Comput. C20, 68–86 (1971)
Xu, Y., Olman, V., Xu, D.: Clustering gene expression data using a graph-theoretic approach: an application of minimum spanning trees. Bioinformatics 18(4), 536–545 (2002)
Zhong, C., Miao, D., Wang, R.: A graph-theoretical clustering method based on two rounds of minimum spanning trees. Pattern Recognit. 43(3), 752–766 (2010)
Juszczak, P., Tax, D.M.J., Pe¸kalska, E., Duin, R.P.W.: Minimum spanning tree based one-class classifier. Neurocomputing 72, 1859–1869 (2009)
Yang, C.L.: Building k edge-disjoint spanning trees of minimum total length for isometric data embedding. IEEE Trans. Pattern Anal. Mach. Intell. 27(10), 1680–1683 (2005)
Gower, J.C., Ross, G.J.S.: Minimum spanning trees and single linkage cluster analysis. Appl. Stat. 18(1), 54–64 (1969)
Balcan, M., Blum, A., Vempala, S.: A discriminative framework for clustering via similarity functions. In: Proceedings of ACM Symposium on Theory of Computing, pp. 671–680 (2008)
Bor°uvka, O.: O jist´em probl´emu minim´aln´ım (About a Certain Minimal Problem). Pr´ace moravsk´e pˇr´ırodovˇedeck´e spoleˇcnosti v Brnˇe. III, pp. 37–58 (1926). (in Czech with German summary)
Jarn´ık, V.: O jist´em probl´emu minim´aln´ım (About a Certain Minimal Problem). Pr´ace moravsk´e pˇr´ırodovˇedeck´e spoleˇcnosti v Brnˇe VI, pp. 57–63 (1930). (in Czech)
Dijkstra, E.W.: A note on two problems in connexion with graphs. Numer. Math. 1(1), 269–271 (1959)
Bentley, J., Friedman, J.: Fast algorithms for constructing minimal spanning trees in coordinate spaces. IEEE Trans. Comput. 27, 97–105 (1978)
Preparata, F.P., Shamos, M.I.: Computational Geometry. Springer, New York (1985). https://doi.org/10.1007/978-1-4612-1098-6
Callahan, P., Kosaraju, S.: Faster algorithms for some geometric graph problems in higher dimensions. In: Proceedings of 4th Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 291–300 (1993)
Narasimhan, G., Zachariasen, M., Zhu, J.: Experiments with computing geometric minimum spanning trees. In: Proceedings of ALENEX 2000, pp. 183–196 (2000)
March, W.B., Ram, P., Gray, A.G.: Fast euclidean minimum spanning tree: algorithm, analysis, and applications. In: Proceedings of 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), Washington, pp. 603–612 (2010)
Vaidya, P.M.: Minimum spanning trees in k-dimensional space. SIAM J. Comput. 17(3), 572–582 (1988)
Wang, X., Wang, X., Wilkes, D.M.: A divide-and-conquer approach for minimum spanning tree-based clustering. IEEE Trans. Knowl. Data Eng. 21(7), 945–958 (2009)
Lai, C., Rafa, T., Nelson, D.E.: Approximate minimum spanning tree clustering in high-dimensional space. Intell. Data Anal. 13, 575–597 (2009)
Wang, X., Wang, X.L., Zhu, J.: A new fast minimum spanning tree based clustering technique. In: Proceedings of the 2014 IEEE International Workshop on Scalable Data Analytics, 14–17 December, Shenzhen, China (2014)
Zhong, C., Malinen, M., Miao, D., Fränti, P.: A fast minimum spanning tree algorithm based on K-means. Inf. Sci. 295(C), 1–17 (2015)
Ester, M., Kriegel, H.-P., Sander, J., Xu, X., Simoudis, E., Han, J., Fayyad, U.M. (eds.): A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (KDD-96), pp. 226–231. AAAI Press (1996)
Breunig, M.M., Kriegel, H.P., Ng, R.T., Sander, J.: LOF: identifying density-based local outliers. In: Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, pp. 93–104 (2000)
Tang, J., Chen, Z., Fu, A.W.-C., Cheung, David W.: Enhancing effectiveness of outlier detections for low density patterns. In: Chen, M.-S., Yu, P.S., Liu, B. (eds.) PAKDD 2002. LNCS (LNAI), vol. 2336, pp. 535–548. Springer, Heidelberg (2002). https://doi.org/10.1007/3-540-47887-6_53
Papadimitriou, S., Kitagawa, H., Gibbons, P.B., Faloutsos, C.: LOCI: fast outlier detection using the local correlation integral. In: Proceedings of the IEEE 19th International Conference on Data Engineering, Bangalore, India, pp. 315–328 (2003)
Sun, P., Chawla, S.: On local spatial outliers. In: Proceedings of the 4th International Conference on Data Mining (ICDM), Brighton, UK, pp. 209–216 (2004)
Jin, W., Tung, A.K.H., Han, J., Wang, W.: Ranking outliers using symmetric neighborhood relationship. In: Ng, W.-K., Kitsuregawa, M., Li, J., Chang, K. (eds.) PAKDD 2006. LNCS (LNAI), vol. 3918, pp. 577–593. Springer, Heidelberg (2006). https://doi.org/10.1007/11731139_68
Fan, H., Zaïane, O.R., Foss, A., Wu, J.: A nonparametric outlier detection for effectively discovering top-n outliers from engineering data. In: Ng, W.-K., Kitsuregawa, M., Li, J., Chang, K. (eds.) PAKDD 2006. LNCS (LNAI), vol. 3918, pp. 557–566. Springer, Heidelberg (2006). https://doi.org/10.1007/11731139_66
Latecki, L.J., Lazarevic, A., Pokrajac, D.: Outlier detection with kernel density functions. In: Perner, P. (ed.) MLDM 2007. LNCS (LNAI), vol. 4571, pp. 61–75. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-73499-4_6
Zhang, K., Hutter, M., Jin, H.: A new local distance-based outlier detection approach for scattered real-world data. In: Theeramunkong, T., Kijsirikul, B., Cercone, N., Ho, T.-B. (eds.) PAKDD 2009. LNCS (LNAI), vol. 5476, pp. 813–822. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-01307-2_84
Huang, H., Mehrotra, K., Mohan, C.K.: Rank-based outlier detection. J. Stat. Comput. Simul. 1–14 (2011)
Kriegel, H.-P., Schubert, M., Zimek, A.: Angle-based outlier detection in high-dimensional data. In: Proceedings KDD 2008, pp. 444–452 (2008)
Pham, N., Pagh, R.: A near-linear time approximation algorithm for angle-based outlier detection in high-dimensional data. In: Proceedings of the 18th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pp. 877–885 (2012)
Yu, C., Chin Ooi, B., Tan, K.L., Jagadish, H.V.: Indexing the distance: an efficient method to KNN processing. In: Proceedings of the 27th International Conference on Very Large Data Bases, Roma, Italy, pp. 421–430 (2001)
Jagadish, H.V., Chin Ooi, B., Tan, K.L., Yu, C., Zhang, R.: iDistance: an adaptive B+-tree based indexing method for nearest neighbor search. ACM Trans. Data Base Syst. (ACM TODS) 30(2), 364–397 (2005)
Karypis, G., Han, E.H., Kumar, V.: CHAMELEON: a hierarchical clustering algorithm using dynamic modeling. IEEE Comput. 32(8), 68–75 (1999)
Acknowledgment
The authors would like to thank the Chinese National Science Foundation for its valuable support of this work under award 61473220 and all the anonymous reviewers for their valuable comments.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG, part of Springer Nature
About this paper
Cite this paper
Wang, X.L., Wang, X., Li, X. (2018). A Fast Two-Level Approximate Euclidean Minimum Spanning Tree Algorithm for High-Dimensional Data. In: Perner, P. (eds) Machine Learning and Data Mining in Pattern Recognition. MLDM 2018. Lecture Notes in Computer Science(), vol 10935. Springer, Cham. https://doi.org/10.1007/978-3-319-96133-0_21
Download citation
DOI: https://doi.org/10.1007/978-3-319-96133-0_21
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-96132-3
Online ISBN: 978-3-319-96133-0
eBook Packages: Computer ScienceComputer Science (R0)