Large Language Monkeys: Scaling Inference Compute with Repeated Sampling

Brown, Bradley; Juravsky, Jordan; Ehrlich, Ryan; Clark, Ronald; Le, Quoc V.; Ré, Christopher; Mirhoseini, Azalia

Computer Science > Machine Learning

arXiv:2407.21787 (cs)

[Submitted on 31 Jul 2024 (v1), last revised 30 Dec 2024 (this version, v3)]

Title:Large Language Monkeys: Scaling Inference Compute with Repeated Sampling

Authors:Bradley Brown, Jordan Juravsky, Ryan Ehrlich, Ronald Clark, Quoc V. Le, Christopher Ré, Azalia Mirhoseini

View PDF HTML (experimental)

Abstract:Scaling the amount of compute used to train language models has dramatically improved their capabilities. However, when it comes to inference, we often limit models to making only one attempt at a problem. Here, we explore inference compute as another axis for scaling, using the simple technique of repeatedly sampling candidate solutions from a model. Across multiple tasks and models, we observe that coverage -- the fraction of problems that are solved by any generated sample -- scales with the number of samples over four orders of magnitude. Interestingly, the relationship between coverage and the number of samples is often log-linear and can be modelled with an exponentiated power law, suggesting the existence of inference-time scaling laws. In domains like coding and formal proofs, where answers can be automatically verified, these increases in coverage directly translate into improved performance. When we apply repeated sampling to SWE-bench Lite, the fraction of issues solved with DeepSeek-Coder-V2-Instruct increases from 15.9% with one sample to 56% with 250 samples, outperforming the single-sample state-of-the-art of 43%. In domains without automatic verifiers, we find that common methods for picking from a sample collection (majority voting and reward models) plateau beyond several hundred samples and fail to fully scale with the sample budget.

Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2407.21787 [cs.LG]
	(or arXiv:2407.21787v3 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2407.21787

Submission history

From: Bradley Brown [view email]
[v1] Wed, 31 Jul 2024 17:57:25 UTC (592 KB)
[v2] Mon, 16 Sep 2024 17:58:42 UTC (405 KB)
[v3] Mon, 30 Dec 2024 19:03:24 UTC (405 KB)

Computer Science > Machine Learning

Title:Large Language Monkeys: Scaling Inference Compute with Repeated Sampling

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Large Language Monkeys: Scaling Inference Compute with Repeated Sampling

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators