Abstract
In doing a six-sigma analysis of a process one must first determine the set of possible factors that potentially drive the response of interest. This stage of the the work, known as process mapping, is time consuming. Spending too much time on it wastes investigators and employees time. Yet, spending too little time may result in failing to uncover important factors that drive the process. We model this situation as a general coupon collector’s problem with N distinct coupons, where the exact value of N is not known. Our objective is to devise effective strategies to minimize the total cost incurred due to sampling in addition to the cost of unidentified coupons when the collector stops sampling. We propose a policy based on an asymptotic analysis when N is large and prove that the proposed policy is asymptotically optimal. We also illustrate the effectiveness of this policy with numerical experiments.
Similar content being viewed by others
References
Billingsley, P. (1999). Convergence of probability measures. New York, NY: Wiley.
Boneh, A., & Hofri, M., (1997). The coupon-collector problem revisited—a survey of engineering problems and computational methods. Communications in Statistics—Stochastic Models, 13, 39–66.
Boneh, S., & Papanicolaou, V. G., (1996). General asymptotic estimates for the coupon collector problem. Journal of Computational and Applied Mathematics, 67, 277–289.
Bunge, J., & Fitzpatrick, M., (1993). Estimating the number of species: A review. Journal of the American Statistical Association, 88(421), 364–373.
Finkelstein, M., Tucker, H. G., & Veeh, J. A., (1998). Confidence intervals for the number of unseen types. Statistics and Probability Letters, 37, 423–430.
Gadrich, T., & Ravid, R., (2008). The coupon collector problem in statistical quality control. In S. I. Ao, L. Gelman, D. Hukins, A. Hunter, & A. M.Korsunsky (Eds.), Proceedings of the world congress on engineering (vol. 2, pp. 905–909). London: International Association of Engineers, Newswood Limited.
Holst, L. (1972). Asymptotic normality in a generalized occupancy problem. Probability Theory and Related Fields, 21, 109–120.
Holst, L. (1973). Some limit theorems with applications in sampling theory. The Annals of Statistics, 1, 644–658.
Kan, N. D., & Nevzorov, V. B. (2005). On some limit laws in the coupon collector’s problem. Journal of Mathematical Sciences, 128, 2564–2568.
Luko, S. N. (2009). The “coupon collector’s problem” and quality control. Quality Engineering, 21, 168–181.
Pande, P., & Holpp, L. (2001). What is six sigma?. New York, NY: McGraw-Hill.
Rosen, B. (1969). Asymptotic normality in a coupon collector’s problem. Probability Theory and Related Fields, 13, 256–279.
Rosen, B. (1970). On the coupon collector’s waiting time. The Annals of Mathematical Statistics, 41, 1952–1969.
Sen, P. K. (1979). Invariance principles for the coupon collector’s problem: A martingale approach. The Annals of Statistics, 7, 372–380.
Stadje, W. (1990). The collector’s problem with group drawings. Advances in Applied Probability, 22, 866–882.
Whiting, P., Dupuis, P., & Nuzman, C. (2004). Large deviation asymptotics for occupancy problems. Annals of Probability, 32, 2765–2818.
Yahav, J., & Bickel P.J. (1985). On estimating the number of unseen species—how many executions were there? Technical Report No. 43. UC Berkeley, CA: Department of Statistics.
Zhang, J. X., & Dupuis, P. (2008). Large-deviation approximations for general occupancy models. Combnatorics Probability and Computing, 17(3), 437–470.
Acknowledgements
Tolga Tezcan’s research is supported by National Science Foundation Grants CMMI-0954126 and CMMI-1130266.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Dobson, G., Tezcan, T. Optimal sampling strategies in the coupon collector’s problem with unknown population size. Ann Oper Res 233, 77–99 (2015). https://doi.org/10.1007/s10479-014-1563-0
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10479-014-1563-0