Scaling Test-Time Compute for Reasoning (demo paper)
Spend more compute at inference — sample many chains, verify, and vote — to solve multi-step puzzles.
Tier 1 · Understand
Instead of training a bigger model, the paper scales compute at inference time: generate many candidate reasoning chains (best-of-N), score them with a verifier, and aggregate via voting. A single greedy chain often fails multi-step constraint problems; sampling + verification + voting reliably recovers the correct solution under a fixed compute budget.
- 1Hard puzzles need several reasoning steps that all must line up.
- 2One quick attempt usually breaks at least one constraint.
- 3So the model makes many attempts (best-of-N) instead of one.
- 4A verifier scores each attempt; voting picks the best-supported answer.
- 5More attempts cost more compute — so there's a budget trade-off.
Concept map
Ask the paper
💬 Ask the paper — answers are grounded in its text, with sections cited
Checking…
Tier 2 & 3 · Play / Prove
template: reasoning · conf 90%🎯 Goal — Schedule all 6 sessions so all 8 constraints pass — under the compute budget.
Drag every talk into a seat, then confirm when all 6 guests are happy.
Paper claim: scaling test-time compute (samples + verifier + voting) finds correct multi-step solutions a single sample misses.