Abstract
We derive a probability distribution for the possible number of iterations required for a SIMT (single instruction multiple thread) program using rejection sampling to finish creating a sample across all threads. This distribution is found to match a recently proposed distribution from Chakraborty and Gupta (in: Communications in statistics: theory and methods, 2015) that was shown as a good approximation of certain datasets. This work demonstrates an exact application of this distribution. The distribution can be used to evaluate the relative merit of some sampling methods on the GPU without resort to numerical tests. The distribution reduces to the expected geometric distribution in the single thread per warp limit. A simplified formula to approximate the expected number of iterations required to obtain rejection iteration samples is provided. With this new result, a simple, efficient layout for assigning sampling tasks to threads on a GPU is found as a function of the rejection probability without recourse to more complicated rejection sampling methods.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Notes
Masked SIMD instructions can also achieve this, but the programming is more cumbersome and these instructions are not always available
References
Abramowitz, M., Stegun, I.A.: eds. Handbook of Mathematical Functions: With Formulas, Graphs, and Mathematical Tables. 0009-Revised edition. Dover Publications, New York, NY. 1046 pp. ISBN: 978-0-486-61272-0 (1965)
Adams, N.M. et al.: A Review of Parallel Processing for Statistical Computation. In: Statistics and Computing 6.1 Mar. 1, (1996), pp. 37–49. ISSN: 1573-1375. https://doi.org/10.1007/BF00161572. (visited on 09/09/2020)
Ahmadi-Javid, A., Moeini, A.: An economical acceptance-rejection algorithm for uniform random variate generation over constrained simplexes. In: Statistics and Computing 26.3 May 1, (2016), pp. 703–713. issn: 1573-1375. https://doi.org/10.1007/s1D1222-015-9553-x. (visited on 03/10/2020)
Aila, T., Laine, S.: Understanding the efficiency of ray traversal on GPUs. In: Proceedings of the High Performance Graphics. High Performance Graphics. (2009). https://research.nvidia.com/publication/understanding-efficiency-ray-traversal-gpus (visited on 09/09/2020)
Basu, S., DasGupta, A.: The mean, median, and mode of unimodal distributions: a characterization. In: Theory of Probability and its Applications, (1997), pp. 210–223. issn: 0040-585X. https://doi.org/10.1137/S0040585X97975447. (visited on 02/22/2020)
Chakraborty, S., Gupta, R.D.: Exponentiated geometric distribution: another generalization of geometric distribution. In: Communications in Statistics: Theory and Methods (2015), pp. 1143–1157. issn: 0361-0926. https://doi.org/10.1080/03610926.2012.763090. (visited on 02/22/2020)
CUDA Toolkit Documentation V10.2.89 (2019). https://docs.nvidia.com/cuda/ (visited on 02/13/2020)
Devroye, L.: Chapter 4 Nonuniform Random Variate Generation. In: S. G. Henderson, B. L. Nelson (ed) Handbooks in Operations Research and Management Science, Vol. 13. Simulation. Elsevier (2006), pp. 83–121. https://doi.org/10.1016/S0927-0507(06)13004-2. http://www.sciencedirect.com/science/article/pii/S0927050706130042 (visited on 02/13/2020)
Flynn, M.J.: Some computer organizations and their effectiveness. In: IEEE Transactions on Computers C-21.9 (1972), pp. 948–960. ISSN: 1557-9956. https://doi.org/10.1109/TC.1972.5009071
George, M., Wan Tsang, W.: The Ziggurat method for generating random variables. In: Journal of Statistical Software 5.1 (2000), pp. 1–7. issn: 1548-7660. https://doi.org/10.18637/jss.v005.i08. https://www.jstatsoft.org/index.php/jss/article/view/v005i08 (visited on 04/11/2020)
Intel\(\textregistered \) Advanced Vector Extensions 512 Overview.: intel.com/avx512 (visited on 09/09/2020)
Kunz, T., Thomaz, A., Christensen, H.: Hierarchical rejection sampling for informed kinodynamic planning in high-dimensional spaces. In: Proceedings of the 2016 IEEE International Conference on Robotics and Automation (ICRA). 2016 IEEE International Conference on Robotics and Automation (ICRA). (2016), pp. 89–96. https://doi.org/10.1109/ICRA.2016.7487120
Marsaglia, G., Wan Tsang, W.: “A Simple Method for Generating Gamma Variables”. In: ACM Transactions on Mathematical Software (TOMS) (2000), pp. 363–372. issn: 0098- 3500. https://doi.org/10.1145/358407.358414. https://doi.org/10.1145/358407.358414 (visited on 02/13/2020)
Matloff, N.: Programming on Parallel Machines (2011)
Murray, L.: GPU acceleration of runge-kutta integrators. In: IEEE Transactions on Parallel and Distributed Systems 23.1 (2012), pp. 94–101. issn: 1558-2183. https://doi.org/10.1109/TPDS.2011.61
Peng, R.D.: 6.3 Rejection Sampling: Advanced Statistical Computing. https://github.com/rdpeng/advstatcomp (visited on 09/16/2020)
Pfeffer, A.: Sampling with Memoization. In: AAAI (2007)
Romano, P., Walsh, J.: An improved target velocity sampling algorithm for free gas elastic scattering. Ann. Nucl. Energy 114, 318–324 (2018)
Romano, P.K., et al.: OpenMC: A state-of-the-art monte carlo code for research and development. In: Annals of Nuclear Energy. Joint International Conference on Supercomputing in Nuclear Applications and Monte Carlo 2013, SNA + MC 2013. Pluri- and Trans-Disciplinarity, Towards New Modeling and Numerical Simulation Paradigms 82 Aug. 1, (2015), pp. 90–97. issn: 0306-4549. https://doi.org/10.1016/j.anucene.2014.07.048. http://www.sciencedirect.com/science/article/pii/S030645491400379X (visited on 03/10/2020)
Rothenstein, W.: Neutron scattering kernels in pronounced resonances for stochastic doppler effect calculations. In: Annals of Nuclear Energy. A Special Issue in Honour of M. M. R. Williams 23.4 Mar. 1, (1996), pp. 441–458. ISSN: 0306-4549. https://doi.org/10.1016/0306-4549(95)00109-3. http://www.sciencedirect.com/science/article/pii/0306454995001093 (visited on 03/10/2020)
Terenin, A., Dong, S., Draper, D.: GPU-Accelerated gibbs sampling: a case study of the horseshoe probit model. In: Statistics and computing 29.2 Mar. 1, (2019), pp. 301–310. ISSN: 1573-1375. https://doi.org/10.1007/s11222-018-9809-3. (visited on 02/13/2020)
Acknowledgements
This research was supported by the Exascale Computing Project (17-SC-20-SC), a collaborative effort of the U.S. Department of Energy Office of Science and the National Nuclear Security Administration.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Numerical experiment code
Numerical experiment code
Rights and permissions
About this article
Cite this article
Ridley, G., Forget, B. A simple method for rejection sampling efficiency improvement on SIMT architectures. Stat Comput 31, 30 (2021). https://doi.org/10.1007/s11222-021-10003-z
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11222-021-10003-z