Abstract
It is not desirable in the performance perspective of search algorithms to partition a high dimensional data space by dividing all the dimensions. This is because the number of cells resulted from partitioning explodes as the number of partitioning dimensions increases, thus making any search method based on space partitioning impractical. To address this problem, we propose an algorithm to dynamically select partitioning dimensions based on a data sampling method for efficient similarity join processing. Futhermore, a disk-based plane sweeping method is proposed to minimize the cost of joins between the partitioned cells. The experimental results show that the proposed schemes substantially improve the performance of the partition-based similarity joins in high dimensional data spaces.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
C. Böhm, B. Braunmuller, F. Krebs, and H.-P. Kriegel. Epsilon grid order: An algorithm for the similarity join on massive high-dimensional data. In Proceedings of the 2001 ACM-SIGMOD Conference, 2001.
Norbert Beckmann, Hans-Peter Kriegel, Ralf Schneider, and Bernhard Seeger. The R.-tree: An efficient and robust access method for points and rectangles. In Proceedings of the 1990 ACM-SIGMOD Conference, pages 322–331, Atlantic City, NJ, May 1990.
S. Berchtold, D. A. Keim, and H.-P. Kriegel. The X-tree: An index structure for high-dimensional data. In Proceedings of the 22nd VLDB Conference, Bombay, India, September 1996.
Thomas Brinkho., Hans-Peter Kriegel, and Bernhard Seeger. Efficient processing of spatial joins using R-Trees. In Proceedings of the 1993 ACM-SIGMOD Conference, pages 237–246, Washington, DC, May 1993.
Antonin Guttman. R-Trees: A dynamic index structure for spatial searching. In Proceedings of the 1984 ACM-SIGMOD Conference, pages 47–57, Boston, MA, June 1984.
Gisli R. Hjaltason and Hanan Samet. Incremental distance join algorithms for spatial databases. In Proceedings of the 1998 ACM-SIGMOD Conference, pages 237–248, Seattle, WA, June 1998.
Y.W. Huang, N. Jing, and E. Rundensteiner. Spatial joins using R-trees: Breadth-first traversal with global optimizations. In Proceedings of the 23rd VLDB Conference, 1997.
N. Koudas and C. Sevcik. High dimensional similarity joins: Algorithms and performance evaluation. In Proceedings of the 1998 IEEE International Conference on Data Engineering, 1998.
K. Lin, H. V. Jagadish, and C. Faloutsos. The TV-tree: An index structure for high-dimensional data. VLDB Journal, 3:517–542, 1995.
Ming-Ling Lo and Chinya V. Ravishankar. Spatial hash-join. In Proceedings of the 1996 ACM-SIGMOD Conference, pages 247–258, Montreal, Canada, June 1996.
Jignesh M. Patel and David J. DeWitt. Partition based spatial-merge join. In Proceedings of the 1996 ACM-SIGMOD Conference, pages 259–270, Montreal, Canada, June 1996.
K. Shim, R. Srikant, and R. Agrawal. High-dimensional similarity joins. In Proceedings of the 1997 IEEE International Conference on Data Engineering, 1997.
Hyoseop Shin, Bongki Moon, and Sukho Lee. Adaptive multi-stage distance join processing. In Proceedings of the 2000 ACM-SIGMOD Conference, pages 343–354, Dallas, TX, May 2000.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2002 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Shin, H., Moon, B., Lee, S. (2002). Partition-Based Similarity Join in High Dimensional Data Spaces. In: Hameurlain, A., Cicchetti, R., Traunmüller, R. (eds) Database and Expert Systems Applications. DEXA 2002. Lecture Notes in Computer Science, vol 2453. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-46146-9_73
Download citation
DOI: https://doi.org/10.1007/3-540-46146-9_73
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-44126-7
Online ISBN: 978-3-540-46146-3
eBook Packages: Springer Book Archive