Bandits with many optimal arms

de Heide, Rianne; Cheshire, James; Ménard, Pierre; Carpentier, Alexandra

Computer Science > Machine Learning

arXiv:2103.12452 (cs)

[Submitted on 23 Mar 2021 (v1), last revised 5 Nov 2021 (this version, v2)]

Title:Bandits with many optimal arms

Authors:Rianne de Heide, James Cheshire, Pierre Ménard, Alexandra Carpentier

View PDF

Abstract:We consider a stochastic bandit problem with a possibly infinite number of arms. We write $p^*$ for the proportion of optimal arms and $\Delta$ for the minimal mean-gap between optimal and sub-optimal arms. We characterize the optimal learning rates both in the cumulative regret setting, and in the best-arm identification setting in terms of the problem parameters $T$ (the budget), $p^*$ and $\Delta$. For the objective of minimizing the cumulative regret, we provide a lower bound of order $\Omega(\log(T)/(p^*\Delta))$ and a UCB-style algorithm with matching upper bound up to a factor of $\log(1/\Delta)$. Our algorithm needs $p^*$ to calibrate its parameters, and we prove that this knowledge is necessary, since adapting to $p^*$ in this setting is impossible. For best-arm identification we also provide a lower bound of order $\Omega(\exp(-cT\Delta^2 p^*))$ on the probability of outputting a sub-optimal arm where $c>0$ is an absolute constant. We also provide an elimination algorithm with an upper bound matching the lower bound up to a factor of order $\log(T)$ in the exponential, and that does not need $p^*$ or $\Delta$ as parameter. Our results apply directly to the three related problems of competing against the $j$-th best arm, identifying an $\epsilon$ good arm, and finding an arm with mean larger than a quantile of a known order.

Comments:	Substantial rewrite and added experiments. Accepted for NeurIPS 2021
Subjects:	Machine Learning (cs.LG); Machine Learning (stat.ML)
Cite as:	arXiv:2103.12452 [cs.LG]
	(or arXiv:2103.12452v2 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2103.12452

Submission history

From: Rianne De Heide [view email]
[v1] Tue, 23 Mar 2021 11:02:31 UTC (58 KB)
[v2] Fri, 5 Nov 2021 08:25:11 UTC (215 KB)

Computer Science > Machine Learning

Title:Bandits with many optimal arms

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Bandits with many optimal arms

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators