Abstract
Nearest neighbor search has a wide variety of applications. Unfortunately, the majority of search methods do not scale well with dimensionality. Recent efforts have been focused on finding better approximate solutions that improve the locality of data using dimensionality reduction. However, it is possible to preserve the locality of data and find exact nearest neighbors in high dimensions without dimensionality reduction. This paper introduces a novel high-performance technique to find exact k-nearest neighbors in both low and high dimensional spaces. It relies on a new method for data-sensitive space partitioning based on explicit data clustering, which is introduced in the paper for the first time. This organization supports effective reduction of the search space before accessing secondary storage. Costly Euclidean distance calculations are reduced through efficient processing of a lightweight memory-based filter. The algorithm outperforms sequential scan and the VA-File in high-dimensional situations. Moreover, the results with dynamic loading of data show that the technique works well on dynamic datasets as well.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Aggarwal, C.C.: On the effects of dimensionality reduction on high dimensional similarity search. In: Proc. 20th PODS Conf. pp. 256–266 (2001)
Aggarwal, C.C.: Hierarchical subspace sampling: A unified framework for high dimensional data reduction, selectivity estimation and nearest neighbor search. In: Proc. ACM SIGMOD Conf., pp. 452–463 (2002)
Berchtold, S., Ertl, B., Keim, D., Kriegel, H.P., Seidl, T.: Fast nearest neighbor search in high-dimensional space. In: Proc. 14th ICDE Int. Conf. on Data Engineering, pp. 209–218 (1998)
Beyer, K., Goldstein, J., Ramakrishnan, R., Shaft, U.: When is nearest neighbor meaningful? In: Beeri, C., Bruneman, P. (eds.) ICDT 1999, vol. 1540, pp. 217–235. Springer, Heidelberg (1999)
Blott, S., Weber, R.: A simple Vector-Approximation file for similarity search in high-dimensional vector spaces. Technical report, Esprit Project Hermes (no. 9141) (1997)
Fagin, R., Kumar, R., Shivakumar, D.: Efficient similarity search and classification via rank aggregation. In: Proc. ACM SIGMOD Conf., pp. 301–312 (2003)
Gionis, A., Indyk, P., Motwani, R.: Similarity search in high dimension via hashing. In: Proc. 25th VLDB Conf., pp. 518–529 (1999)
Hinneburg, A., Aggarwal, C.C., Keim, D.A.: What is nearest neighbor in high dimensional spaces? In: Proc. 26th VLDB Conf., pp. 506–515 (2000)
Katayama, N., Satoh, S.: The SR-tree: An index structure for high-dimensional nearest neighbor queries. SIGMOD Record 26(2), 369–380 (1997)
MacQueen, J.: Some methods for classification and analysis of multivariate observations. In: Proc. 5th Berkeley Symp. Math. Statist. Prob., vol. 1, pp. 281–297 (1967)
Orlandic, R., Lukaszuk, J., Swietlik, C.: The design of a retrieval technique for high-dimensional data on tertiary storage. SIGMOD Record 31(2), 15–21 (2002)
Orlandic, R., Lukaszuk, J.: Efficient high-dimensional indexing by superimposing space-partitioning schemes. In: Proc. 8th International Database Engineering & Applications Symposium IDEAS 2004, pp. 257–264 (2004)
Orlandic, R., Lai, Y., Yee, W.G.: Clustering high-dimensional data using an efficient and effective data space reduction. In: Proc. ACM Conference on Information and Knowledge Management CIKM 2005, pp. 201–208 (2005)
Robinson, J.T.: The K-D-B-Tree: A search structure for large multidimensional dynamic Indexes. In: Proc. ACM SIGMOD Conf., pp. 10–18 (1981)
Sakurai, Y., Yoshikawa, M., Uemura, S., Kojima, H.: The A-tree: An index structure for high-dimensional spaces using relative approximation. In: Proc. 26th VLDB Conf., pp. 516–526 (2000)
Seidl, T., Kriegel, H.P.: Optimal multi-Step k-nearest neighbor search. In: Proc. ACM SIGMOD Conf., pp. 154–165 (1998)
Weber, R., Schek, H.J., Blott, S.: A quantitative analysis and performance study for similarity search methods in high-dimensional spaces. In: Proc. 24th VLDB Conf., pp. 194–205 (1998)
Weber, R., Zezula, P.: The theory and practice of similarity searches in high dimensional data spaces (extended abstract). In: 4th DELOS Workshop (1997)
Yu, C., Ooi, B.C., Tan, K.L., Jagadish, H.V.: Indexing the distance: An efficient method to KNN processing. In: Proc. 26th VLDB Conf., pp. 421–430 (2001)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Kulkarni, S., Orlandic, R. (2006). High-Dimensional Similarity Search Using Data-Sensitive Space Partitioning. In: Bressan, S., Küng, J., Wagner, R. (eds) Database and Expert Systems Applications. DEXA 2006. Lecture Notes in Computer Science, vol 4080. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11827405_72
Download citation
DOI: https://doi.org/10.1007/11827405_72
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-37871-6
Online ISBN: 978-3-540-37872-3
eBook Packages: Computer ScienceComputer Science (R0)