Abstract
In the 1970’s John Gittins discovered that multi-armed bandits, an important class of models for the dynamic allocation of a single key resource among a set of competing projects, have optimal solutions of index form. At each decision epoch such policies allocate the resource to whichever project has the largest Gittins index. Since the 1970’s, Gittins’ index result together with a range of developments and reformulations of it have constituted an influential stream of ideas and results contributing to research into the scheduling of stochastic objects. We give a brief account of many of the most important contributions to this work and proceed to describe how index theory has recently been developed to produce strongly performing heuristic policies for the dynamic allocation of a divisible resource to a collection of stochastic projects (or bandits). A limitation on this work concerns the need for the structural requirement of indexability which is notoriously difficult to establish. We introduce a general framework for the development of index policies for dynamic resource allocation which circumvents this difficulty. We utilise this framework to generate index policies for two model classes of independent interest. Their performance is evaluated in an extensive numerical study.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Ansell, P. S., Glazebrook, K. D., Niño Mora, J., & O’Keeffe, M. (2003). Whittle’s index policy for a multi-class queueing system with convex holding costs. Mathematical Methods of Operations Research, 57, 21–39.
Archibald, T. W., Black, D. P., & Glazebrook, K. D. (2009). Indexability and index heuristics for a simple class of inventory routing problems. Operations Research, 57, 314–326.
Armony, M., & Bambos, N. (2003). Queueing dynamics and maximal throughput scheduling in switched processing systems. QUESTA, 44, 209–252.
Bertsimas, D. P., & Niño Mora, J. (1996). Conservation laws, extended polymatroids and multi-armed bandit problems: A polyhedral approach to indexable systems. Mathematics of Operations Research, 21, 257–306.
Caro, F., & Gallien, J. (2007). Dynamic assortment with demand learning for seasonal consumer goods. Management Science, 53, 276–292.
Dacre, M. J., & Glazebrook, K. D. (2002). The dependence of optimal returns from multi-class queueing systems on their customer base. QUESTA, 40, 93–115.
Dacre, M. J., Glazebrook, K. D., & Niño Mora, J. (1999). The achievable region approach to the optimal control of stochastic systems (with discussion). Journal of the Royal Statistical Society, B61, 747–791.
Dayanik, S., Powell, W., & Yamazaki, K. (2008). Index policies for discounted bandit problems with availability constraints. Advances in Applied Probability, 49(2), 377–400.
Dunn, R. T., & Glazebrook, K. D. (2001). The performance of index-based policies for bandit problems with stochastic machine availability. Advances in Applied Probability, 33, 365–390.
Dunn, R. T., & Glazebrook, K. D. (2004). Discounted multi-armed bandit problems on a collection of machines with varying speeds. Mathematics of Operations Research, 29, 266–279.
Garbe, R., & Glazebrook, K. D. (1998a). Stochastic scheduling with priority classes. Mathematics of Operations Research, 23, 119–144.
Garbe, R., & Glazebrook, K. D. (1998b). Submodular returns and greedy heuristics for queueing scheduling problems. Operations Research, 46, 336–346.
Gittins, J. C. (1979). Bandit processes and dynamic allocation indices (with discussion). Journal of the Royal Statistical Society, B41, 148–177.
Gittins, J. C., Glazebrook, K. D., & Weber, R. R. (2011). Multi-armed bandit allocation indices (2nd ed.). London: Wiley-Blackwell.
Gittins, J. C., & Jones, D. M. (1974) . A dynamic allocation index for the sequential design of experiments. In Progress in statistics, pp. 241–266. Amsterdam: North-Holland.
Glazebrook, K. D. (1976). Stochastic scheduling with order constraints. International Journal of Systems Science, 7, 657–666.
Glazebrook, K. D., Hodge, D. J., & Kirkbride, C. (2011). General notions of indexability for queueing control and asset management. Annals of Applied Probability, 23, 876–907.
Glazebrook, K. D., Kirkbride, C., & Ouenniche, J. (2009). Index policies for the admission control and routing of impatient customers to heterogeneous service stations. Operations Research, 57, 975–989.
Glazebrook, K. D., Kirkbride, C., & Ruiz-Hernandez, D. (2006). Spinning plates and squad systems—Policies for bi-directional restless bandits. Advances in Applied Probability, 38, 95–115.
Glazebrook, K. D., Mitchell, H. M., & Ansell, P. S. (2005). Index policies for the maintenance of a collection of machines by a set of repairmen. European Journal of Operational Research, 165, 267–284.
Glazebrook, K. D., & Niño Mora, J. (2001). Parallel scheduling of multiclass \(M/M/m\) queues: Approximate and heavy-traffic optimization of achievable performance. Operations Research, 49, 609–623.
Glazebrook, K. D., & Wilkinson, D. J. (2000). Index-based policies for discounted multi-armed bandits on parallel machines. Annals of Applied Probability, 10, 877–896.
Hodge, D. J., & Glazebrook, K. D. (2011). Dynamic resource allocation in a multi-product make-to-stock production system. QUESTA, 67, 333–364.
Jacko, P., & Sansò, B. (2012). Optimal anticipative congestion control of flows with time-varying input stream. Performance Evaluation, 69, 86–101.
Katehakis, M. N., & Veinott, A. F. (1987). The multi-armed bandit problem—Decomposition and computation. Mathematics of Operations Research, 12, 262–268.
Klimov, G. P. (1974). Time sharing systems I. Theory of Probability and Its Applications, 19, 532–551.
Nash, P. (1973). Optimal allocation of resources between research projects. Ph.D. Thesis, Cambridge University, Cambridge.
Niño Mora, J. (2007). Dynamic priority allocation via restless bandit marginal productivity indices. TOP, 15, 161–198.
Niño-Mora, J. (2001). Restless bandits, partial conservation laws and indexability. Advances in Applied Probability, 33, 76–98.
Puterman, M. L. (1994). Markov decision processes: Discrete stochastic dynamic programming. New York, NY: Wiley.
Robinson, D. R. (1982). Algorithms for evaluating the dynamic allocation index. Operations Research Letters, 1, 72–74.
Sonin, I. M. (2008). A generalized Gittins index for a Markov chain and its recursive calculation. Statistics & Probability Letters, 78, 1526–1533.
Tsitsiklis, J. N. (1994). A short proof of the Gittins index theorem. Annals of Applied Probability, 4, 194–199.
Tsoucas, P. (1991). The region of achievable performance in a model of Klimov. IBM: Technical report.
Varaiya, P., Walrand, J., & Buyukkoc, C. (1985). Extensions of the multi-armed bandit problem. IEEE Transactions on Automatic Control, AC–30, 426–439.
Weber, R. R. (1992). On the Gittins index for multiarmed bandits. Annals of Applied Probability, 2, 1024–1033.
Weber,R. R., Weiss, G. (1990) . On an index policy for restless bandits. Journal of Applied Probability, 27, 637–648, 1990. (Addendum: Advances in Applied Probability, 23:429–430, 1991).
Weiss, G. (1988). Branching bandit processes. Probability in the Engineering and Informational Sciences, 2, 269–278.
Whittle, P. (1980). Multi-armed bandits and the Gittins index. Journal of the Royal Statistical Society, B42, 142–149.
Whittle, P. (1981). Arm-acquiring bandits. Annals of Probability, 9, 284–292.
Whittle, P. (1988). Restless bandits: Activity allocation in a changing world. In J. Gani (Ed.), A celebration of applied probability, (J. Appl. Prob. Spec. Vol. 25A, pp. 287–298). Sheffield: Applied Probability Trust.
Acknowledgments
An earlier version of this paper was given as a plenary address by the first author at MISTA2011 and his thanks go to the conference organisers. The first two authors acknowledge the support for this work by the Engineering and Physical Sciences Research Council (EPSRC) through grant EP/E049265/1. The third author was supported by an RCUK fellowship and the fourth author by an EPSRC doctoral studentship. All authors are grateful to the two referees whose careful reading of the paper and comments have enabled them to strengthen the paper.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Glazebrook, K.D., Hodge, D.J., Kirkbride, C. et al. Stochastic scheduling: A short history of index policies and new approaches to index generation for dynamic resource allocation. J Sched 17, 407–425 (2014). https://doi.org/10.1007/s10951-013-0325-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10951-013-0325-1