Methodology — CAPTRACKER Bot

CAPTRACKER BOT METHODOLOGY

how the bot selects picks, sizes bets, and reports results

AS OF 2026-05-11

CONTENTS

What the bot is
Selection rule
Sizing rule
Walk-forward results
Drawdown disclosure
Trip-wires
Bankroll growth vs unit ROI
Metric definitions
Reproducibility

1. What the CAPTRACKER bot is

CAPTRACKER is a paper-bankroll handicapping bot that selects sports betting picks by composite-score ranking and stakes a flat percentage of bankroll on each one. Its purpose is public validation: every pick is locked before tip-off, settled by ESPN, and recorded to an immutable ledger.

Selection: top-25 picks per scoring cycle by composite score (definition in §2)
Sizing: 3% of current bankroll per pick, flat across all tiers
Sport scope: every sport the scrapers track — NHL, NBA, MLB, NFL, NCAAB, NCAAF, soccer, MMA, etc. Per-sport model coverage varies (table in §2).
Settlement: automated via ESPN scoreboard. No manual grading.
Bankroll: $1,000 paper, starting 2026-03-25.
Ledger: every settled pick is on the ledger, including early backfilled entries that closed historical gaps. No cohort exclusion — the headline numbers on the homepage are the full record.

2. Selection rule

Each scoring cycle, every candidate pick gets a composite score from 0–100 across four components:

Component	Range	What it measures
`capper_score`	0–40	Source handicapper's track record, ROI, streak. Backed by the leaderboard.
`model_score`	0–25	Sport-specific model's agreement with the pick (see coverage table).
`convergence_score`	0–20	Signal strength when multiple cappers + model align on the same canonical bet.
`edge_score`	0–10	Closing-line value gap between model probability and book implied probability.

Top-25 by composite score are promoted to the active portfolio per cycle. Per-game cap is 2 picks; daily upper bound is 10.

Per-sport model coverage

Sport	Model	`model_score` active	Composite leans on
MLB	Poisson run-scoring (wOBA / FIP / park factors)	Yes	All four components
NBA	NetRtg / pace differential / B2B penalties	Yes	All four components
NHL	None (deferred)	No	capper + convergence + edge
NFL	None (offseason)	No	capper + convergence + edge
NCAAB / NCAAF / soccer / MMA	None	No	capper + convergence + edge

Picks in a sport without a dedicated model still compete on the composite — they have model_score=0 and rely on the other three components to rank.

Capper score correctness

The _score_capper function had a field-name bug from launch through 2026-04-25: it only checked pick.get("handicapper") for the source name, missing every Reddit pick (which uses author). Walk-forward replay caught it and the fix landed 2026-04-25. Picks scored before that date may have under-weighted Reddit sources; everything from 2026-04-25 forward ships with the fix in place.

3. Sizing rule

3% of current bankroll per pick, flat across all tiers.

An earlier sizing rule used unit = bankroll / 10 with ELITE = 1.5u (15%) and other tiers = 1.0u (10%). That sizing was high-ruin-probability for the strategy's measured edge. The data:

Sizing experiment results (top-25 cohort, +10.07% flat unit ROI)

1,000 outcome-resampled simulations per sizing %, alternate-season risk:

Sizing	Median final	DD p95	>25% DD risk	>50% DD risk
2%	$1,049	16.0%	0.0%	0.0%
3% (live)	$1,071	23.2%	1.1%	0.0%
5%	$1,107	36.2%	6.4%	0.2%
10%	$1,149	61.1%	19.8%	5.3%
15% (former)	$1,122	77.5%	29.2%	14.6%

3% sizing retains nearly all median return ($1,071 vs $1,149 at 10%) while dropping >25% drawdown risk 18× vs the former 15% rule. p95 max drawdown at 3% is 23.2%, just below the 25% public-marketing threshold for "this is normal variance, not system failure."

Why no tier-graded sizing?

Walk-forward replay can't validate that ELITE picks have meaningfully larger edge than HIGH picks. The replay produces NO_PLAY/MEDIUM tiers exclusively because historical model snapshots and historical odds are both leakage points (we don't have them stored at pick-creation time). Until model_predictions_history accumulates ≥6 months of clean data, flat 3% is the defensible move. Tier-graded sizing returns when there's evidence to support it.

4. Walk-forward backtest results

Each pick in the test set is re-scored against only data that existed at pick.created_at. Sample as of the as-of timestamp at the top of this page.

Test cohort: 1,163 settled picks, window 2026-04-04 → 2026-04-25.

Top-K performance

Cohort	n	Win %	Wilson 95% CI	Variable-unit ROI	Flat-unit ROI
Top-10	10	50.0%	[23.7%, 76.3%]	-2.7%	-2.7% (n=10 noise)
Top-25	25	56.0%	[35.1%, 72.1%]	+10.12%	+10.07%
Top-50	50	51.0%	[37.5%, 64.4%]	+6.57%	-0.80%
Top-100	100	49.5%	[40.0%, 59.0%]	-7.86%	-7.86%

Why two ROI columns

Variable-unit ROI = sum(stored_profit_dollars) / sum(picks.units_field) × 100. Uses the bot's recommended per-pick units. Marketing-credible — answers "if you'd staked the bot's recommended units, what's your ROI?"
Flat-unit ROI = sum(profit_at_$X_flat) / (n × $X) × 100, derived from result + odds_us. Treats every pick as the same dollar bet. The right metric for sizing decisions because uniform sizing is what production uses.

The two metrics agree at top-25 (where unit distribution is near-flat) and diverge at top-50 (where ranks 26-50 carry units > 1 on a few winners that boost variable-unit). Both are valid measurements of different things.

Cohort caveat

The walk-forward test window is the most recent 20% of all settled picks. As new picks settle, the window slides forward. The numbers above are computed against settled picks as of the as-of timestamp at the top of this page, and will move as the database grows. The sizing-experiment snapshot is pinned for reproducibility.

5. Drawdown disclosure

At 3% flat sizing on the top-25 cohort, alternate-season simulations produce:

Median max drawdown: ~12%
95th percentile max drawdown: 23%
99th percentile max drawdown: ~28%
Probability of exceeding 25% drawdown: 1.1%
Probability of exceeding 50% drawdown: 0.0%

In plain language: bankroll dipping 15–25% periodically is expected variance, not a sign the system is broken. Drawdowns deeper than 25% will happen occasionally (about 1 in 90 alternate-season simulations); drawdowns deeper than 50% effectively never (under our edge and sizing).

Realized max drawdown is tracked on the equity curve on the homepage and updates as new picks settle.

6. Trip-wires (monitored invariants)

When any of the following fires, the bot freezes new picks and the cause is investigated before fresh selections resume.

Edge degradation — if walk-forward top-25 ROI drops to top-50 levels (currently ~+6.5% variable-unit). Either the selection rule has stopped picking winners or noise has overwhelmed the signal.
Sizing safety — if realized drawdown exceeds the 95th-percentile simulation bound (23% currently). Either variance is genuinely worse than modeled or sizing assumptions are wrong.
Model calibration — if model_prob expected calibration error (ECE) exceeds threshold (formal threshold pending live data). Indicates sport models are mis-calibrating probabilities.
Loss streak — if a single-week losing streak exceeds 6 picks. Tonight's longest observed streak across the test cohort was 6; exceeding it is a tail event worth examining.

Each trip event is recorded with date, condition, and resolution in the changelog at the bottom of this page.

7. Bankroll growth vs unit ROI

These are different metrics. The site uses both, with explicit labels.

Bankroll growth = (current - starting) / starting × 100. Reflects total compounding. The bankroll card on the homepage displays this and labels it "BANKROLL GROWTH" (formerly the misleading "RETURN ON INVESTMENT").
Unit ROI = total_profit / total_wagered × 100. Reflects per-bet edge. The hero copy and methodology numbers use this metric.

The two diverge under bankroll-fraction sizing because the bankroll turns over multiple times. At 3% sizing on a 100-pick season, total wagered ≈ 300% of starting bankroll, so a bankroll growth of +30% corresponds to roughly +10% unit ROI. Sportsbook industry norm is to quote unit ROI; bankroll growth is the consumer-facing visible metric on a tracker.

8. Definitions of every published metric

Metric	Definition	Where it appears
Settled picks	Count of picks with an ESPN-graded result (WON, LOST, or PUSH)	Hero, telemetry strip
Win rate	`wins / (wins + losses)` — pushes excluded	Hero, equity curve header
Unit ROI	`total_profit / total_wagered × 100`	Hero ("unit ROI" line), methodology
Bankroll growth	`(current - starting) / starting × 100`	Bankroll card
Profit / Loss	Dollar profit summed across settled picks	Bankroll card, hero
Pending picks	Picks placed but not yet settled	Telemetry strip
Composite score	0–100 rank, sum of capper + model + convergence + edge components	Internal selection signal
Convergence tag	Gold (3+ cappers + model + bot) / Silver (cappers only) / Bronze (model only)	Per-pick row badges

"AI Confirmed %" (retired)

An earlier "AI CONFIRMED ✓ NN%" badge appeared in the bet log column. That metric was not derived from model output. Its implementation was a deterministic hash of the pick text plus a base value of 62 if the pick won or 54 if it lost — visual decoration, post-hoc-biased toward winners.

The column was retired. The bet log replaces it with the actual convergence tag (Gold / Silver / Bronze) which is derived from real model + capper agreement. When NHL and NCAA models land, a real model_prob percentage column may return; until then there is no "AI Confirmed %" surface.

9. Reproducibility

The composite-score formulas in §2 and the sizing math in §3 are fully specified in this document. Anyone who pulls our settled-pick ledger via the public dashboard endpoint and re-applies the formulas will reproduce the numbers in §4 within rounding.

The sizing-experiment cohort is frozen in our backtest archive and the published numbers do not retroactively change. As new picks settle they extend the live cohort forward; cohort recalculations refresh published metrics with an "as of" timestamp at the top of this page.

Changelog (this page)

Date	Change
2026-04-26	Initial publication.
2026-05-11	Retired V1 / V2 cohort framing. Single bot, all settled picks on the ledger.

CAPTRACKER · Dashboard · Leaderboard · Daily Feed
Methodology updates are versioned. Past metric definitions and weight changes are recorded in the changelog above.