Abstract
Random projections is a technique primarily used in dimension reduction by mapping high dimensional data to a low dimensional space, preserving pairwise distances in expectation, such as the Euclidean distance, inner product, angular distance, and \(l_p\) distance for values of p which are even. These estimated pairwise distances between observations in the low dimensional space can be rapidly computed to be used for nearest neighbor searches, clustering, or even classification. This paper highlights how these two disparate topics have a common thread, and expand upon two computational statistical techniques in recent random projection literature to further improve the accuracy of the estimate of the inner product between vectors under random projection by making use of the properties of the respective dataset, as well as limitations of these methods.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Achlioptas D (2003) Database-friendly random projections: Johnson–Lindenstrauss with binary coins. J Comput Syst Sci 66(4):671–687
Ailon N, Chazelle B (2009) The fast Johnson–Lindenstrauss Transform and approximate nearest neighbors. SIAM J Comput 39(1):302–322
Alkema L, Raftery A, Gerland P, Clark S, Pelletier F, Buettner T, Heilig G (2011) Probabilistic projections of the total fertility rate for all countries. Demography 48(3):815–839
Cai D, He X, Han J (2005) Document clustering using locality preserving indexing. IEEE Trans Knowl Data Eng 17(12):1624–1637
Casella G, Berger R (2001) Statistical inference. Duxbury Resource Center
Charikar MS (2002) Similarity estimation techniques from rounding algorithms. In: Proceedings of the thiry-fourth annual ACM symposium on theory of computing. ACM, pp 380–388
Dasgupta S (2000) Experiments with Random Projection. In: Proceedings of the 16th conference on uncertainty in artificial intelligence, UAI ’00, San Francisco, CA, USA. Morgan Kaufmann Publishers Inc, pp 143–151
Durrant R, Kaban A (2013) Random projections as regularizers: learning a linear discriminant ensemble from fewer observations than dimensions. In: Asian conference on machine learning, pp 17–32
Fosdick BK, Perlman MD (2016) Variance-stabilizing and confidence-stabilizing transformations for the normal correlation coefficient with known variances. Commun Stat Simul Comput 45(6):1918–1935
Fosdick BK, Raftery AE (2012) Estimating the correlation in bivariate normal data with known variances and small sample sizes. Am Stat 66(1):34–41
Fu Y, Wang H, Wong A (2013) Small sample inference for the correlation in bivariate normal with known variances. Far East J Theor Stat 45(2):147
Glynn PW, Szechtman R (2002) Some new perspectives on the method of control variates. In: Monte Carlo and Quasi-Monte Carlo Methods 2000. Springer, pp 27–49
Halko N, Martinsson PG, Tropp JA (2011) Finding structure with randomness: probabilistic algorithms for constructing approximate matrix decompositions. SIAM Rev 53(2):217–288
Indyk P, Motwani R (1998) Approximate nearest neighbors: towards removing the curse of dimensionality. In: Proceedings of the thirtieth annual ACM symposium on theory of computing, STOC ’98, New York, NY, USA. ACM, pp 604–613
Jeffreys H (1961) Theory of probability, 3rd edn. Oxford
Kaban A (2015) Improved bounds on the dot product under random projection and random sign projection. In: Proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 487–496
Kang K (2017a) Random projections with Bayesian priors. In: Natural Language Processing and Chinese Computing - 6th CCF International Conference, NLPCC 2017, Dalian, China, November 8-12, 2017, Proceedings, pp 170–182
Kang K (2017b) Using the multivariate normal to improve random projections. In: Intelligent data engineering and automated learning—IDEAL 2017: 18th international conference, Guilin, China, October 30–November 1, 2017, Proceedings. Springer, Cham, pp 397–405
Kang K, Hooker G (2017a) Control variates as a variance reduction technique for random projections. In: Pattern recognition applications and methods - 6th international conference, ICPRAM 2017, Porto, Portugal, February 24-26, 2017, Revised Selected Papers, pp 1–20
Kang K, Hooker G (2017b) Random projections with control variates. In: Proceedings of the 6th international conference on pattern recognition applications and methods - volume 1: ICPRAM. INSTICC, ScitePress, pp 138–147
Lavenberg SS, Welch PD (1981) A perspective on the use of control variables to increase the efficiency of Monte Carlo simulations. Manage Sci 27(3):322–335
Li P, Hastie T, Church KW (2006a) Improving random projections using marginal information. In: Lugosi G, Simon H-U (eds) COLT, volume 4005 of Lecture Notes in Computer Science. Springer, pp 635–649
Li P, Hastie TJ, Church KW (2006b) Very Sparse Random Projections. In: Proceedings of the 12th ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’06, New York, NY, USA. ACM, pp 287–296
Li P, Mahoney MW, She Y (2010) Approximating higher-order distances using random projections. In: Proceedings of the twenty-sixth conference on uncertainty in artificial intelligence. AUAI Press, pp 312–321
Liberty E, Ailon N, Singer A (2008) Dense fast random projections and lean walsh transforms. In: Goel A, Jansen K, Rolim JDP, Rubinfeld R (eds) APPROX-RANDOM, volume 5171 of Lecture Notes in Computer Science. Springer, pp 512–522
Lichman M (2013) UCI machine learning repository
Madansky A (1965) On the maximum likelihood estimate of the correlation coefficient. Defense Technical Information Center
Mardia KV, Kent JT, Bibby JM (1979) Multivariate analysis. Academic Press, London
Muirhead RJ (2005) Aspects of multivariate statistical theory. Wiley-Interscience, Hoboken
Nadaraya EA (1964) On estimating regression. Theory Probab Appl 9(1):141–142
Oates CJ, Girolami M, Chopin N (2017) Control functionals for Monte Carlo integration. J R Stat Soc: Ser B (Stat Methodol) 79(3):695–718
Papamarkou T, Mira A, Girolami M (2014) Zero variance differential geometric Markov chain Monte Carlo algorithms. Bayesian Anal 9(1):97–128
Paul S, Boutsidis C, Magdon-Ismail M, Drineas P (2013) Random projections for support vector machines. In: Artificial intelligence and statistics, pp 498–506
Portier F, Segers J (2018) Monte carlo integration with a growing number of control variates. arXiv preprint arXiv:1801.01797
Shao J (2003) Mathematical statistics. Springer Texts in Statistics. Springer
Vempala SS (2004) The random projection method, volume 65 of DIMACS series in discrete mathematics and theoretical computer science. Providence, R.I. American Mathematical Society. Appendice, pp 101–105
Watson GS (1964) Smooth regression analysis. Sankhyā: Indian J Stat Ser A 359–372
Acknowledgements
We would like to thank the reviewers for their comments and suggestions for improvement, which has helped to enhance the quality of the paper. We also want to thank the following people: Wong Wei Pin and Sergey Kushnarev for fruitful and productive discussions. We thank Omar Ortiz for his technical assistance.
Author information
Authors and Affiliations
Corresponding author
Additional information
Responsible editor: Fei Wang.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This work is funded by the SUTD Faculty Fellow Grant RGFECA17003 as well as the Singapore Ministry of Education Academic Research Fund Tier 2 Grant MOE2018-T2-2-013.
Rights and permissions
About this article
Cite this article
Kang, K. Correlations between random projections and the bivariate normal. Data Min Knowl Disc 35, 1622–1653 (2021). https://doi.org/10.1007/s10618-021-00764-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10618-021-00764-6