CAPTRACKER · METHODOLOGY
CAPTRACKER BOT METHODOLOGY
how the bot selects picks, sizes bets, and reports results
AS OF 2026-05-11
CONTENTS
  1. What the bot is
  2. Selection rule
  3. Sizing rule
  4. Walk-forward results
  5. Drawdown disclosure
  6. Trip-wires
  7. Bankroll growth vs unit ROI
  8. Metric definitions
  9. Reproducibility

1. What the CAPTRACKER bot is

CAPTRACKER is a paper-bankroll handicapping bot that selects sports betting picks by composite-score ranking and stakes a flat percentage of bankroll on each one. Its purpose is public validation: every pick is locked before tip-off, settled by ESPN, and recorded to an immutable ledger.

2. Selection rule

Each scoring cycle, every candidate pick gets a composite score from 0–100 across four components:

ComponentRangeWhat it measures
capper_score0–40Source handicapper's track record, ROI, streak. Backed by the leaderboard.
model_score0–25Sport-specific model's agreement with the pick (see coverage table).
convergence_score0–20Signal strength when multiple cappers + model align on the same canonical bet.
edge_score0–10Closing-line value gap between model probability and book implied probability.

Top-25 by composite score are promoted to the active portfolio per cycle. Per-game cap is 2 picks; daily upper bound is 10.

Per-sport model coverage

SportModelmodel_score activeComposite leans on
MLBPoisson run-scoring (wOBA / FIP / park factors)YesAll four components
NBANetRtg / pace differential / B2B penaltiesYesAll four components
NHLNone (deferred)Nocapper + convergence + edge
NFLNone (offseason)Nocapper + convergence + edge
NCAAB / NCAAF / soccer / MMANoneNocapper + convergence + edge

Picks in a sport without a dedicated model still compete on the composite — they have model_score=0 and rely on the other three components to rank.

Capper score correctness

The _score_capper function had a field-name bug from launch through 2026-04-25: it only checked pick.get("handicapper") for the source name, missing every Reddit pick (which uses author). Walk-forward replay caught it and the fix landed 2026-04-25. Picks scored before that date may have under-weighted Reddit sources; everything from 2026-04-25 forward ships with the fix in place.

3. Sizing rule

3% of current bankroll per pick, flat across all tiers.

An earlier sizing rule used unit = bankroll / 10 with ELITE = 1.5u (15%) and other tiers = 1.0u (10%). That sizing was high-ruin-probability for the strategy's measured edge. The data:

Sizing experiment results (top-25 cohort, +10.07% flat unit ROI)

1,000 outcome-resampled simulations per sizing %, alternate-season risk:

SizingMedian finalDD p95>25% DD risk>50% DD risk
2%$1,04916.0%0.0%0.0%
3% (live)$1,07123.2%1.1%0.0%
5%$1,10736.2%6.4%0.2%
10%$1,14961.1%19.8%5.3%
15% (former)$1,12277.5%29.2%14.6%

3% sizing retains nearly all median return ($1,071 vs $1,149 at 10%) while dropping >25% drawdown risk 18× vs the former 15% rule. p95 max drawdown at 3% is 23.2%, just below the 25% public-marketing threshold for "this is normal variance, not system failure."

Why no tier-graded sizing?

Walk-forward replay can't validate that ELITE picks have meaningfully larger edge than HIGH picks. The replay produces NO_PLAY/MEDIUM tiers exclusively because historical model snapshots and historical odds are both leakage points (we don't have them stored at pick-creation time). Until model_predictions_history accumulates ≥6 months of clean data, flat 3% is the defensible move. Tier-graded sizing returns when there's evidence to support it.

4. Walk-forward backtest results

Each pick in the test set is re-scored against only data that existed at pick.created_at. Sample as of the as-of timestamp at the top of this page.

Test cohort: 1,163 settled picks, window 2026-04-04 → 2026-04-25.

Top-K performance

CohortnWin %Wilson 95% CIVariable-unit ROIFlat-unit ROI
Top-101050.0%[23.7%, 76.3%]-2.7%-2.7% (n=10 noise)
Top-252556.0%[35.1%, 72.1%]+10.12%+10.07%
Top-505051.0%[37.5%, 64.4%]+6.57%-0.80%
Top-10010049.5%[40.0%, 59.0%]-7.86%-7.86%

Why two ROI columns

The two metrics agree at top-25 (where unit distribution is near-flat) and diverge at top-50 (where ranks 26-50 carry units > 1 on a few winners that boost variable-unit). Both are valid measurements of different things.

Cohort caveat

The walk-forward test window is the most recent 20% of all settled picks. As new picks settle, the window slides forward. The numbers above are computed against settled picks as of the as-of timestamp at the top of this page, and will move as the database grows. The sizing-experiment snapshot is pinned for reproducibility.

5. Drawdown disclosure

At 3% flat sizing on the top-25 cohort, alternate-season simulations produce:

In plain language: bankroll dipping 15–25% periodically is expected variance, not a sign the system is broken. Drawdowns deeper than 25% will happen occasionally (about 1 in 90 alternate-season simulations); drawdowns deeper than 50% effectively never (under our edge and sizing).

Realized max drawdown is tracked on the equity curve on the homepage and updates as new picks settle.

6. Trip-wires (monitored invariants)

When any of the following fires, the bot freezes new picks and the cause is investigated before fresh selections resume.

  1. Edge degradation — if walk-forward top-25 ROI drops to top-50 levels (currently ~+6.5% variable-unit). Either the selection rule has stopped picking winners or noise has overwhelmed the signal.
  2. Sizing safety — if realized drawdown exceeds the 95th-percentile simulation bound (23% currently). Either variance is genuinely worse than modeled or sizing assumptions are wrong.
  3. Model calibration — if model_prob expected calibration error (ECE) exceeds threshold (formal threshold pending live data). Indicates sport models are mis-calibrating probabilities.
  4. Loss streak — if a single-week losing streak exceeds 6 picks. Tonight's longest observed streak across the test cohort was 6; exceeding it is a tail event worth examining.

Each trip event is recorded with date, condition, and resolution in the changelog at the bottom of this page.

7. Bankroll growth vs unit ROI

These are different metrics. The site uses both, with explicit labels.

The two diverge under bankroll-fraction sizing because the bankroll turns over multiple times. At 3% sizing on a 100-pick season, total wagered ≈ 300% of starting bankroll, so a bankroll growth of +30% corresponds to roughly +10% unit ROI. Sportsbook industry norm is to quote unit ROI; bankroll growth is the consumer-facing visible metric on a tracker.

8. Definitions of every published metric

MetricDefinitionWhere it appears
Settled picksCount of picks with an ESPN-graded result (WON, LOST, or PUSH)Hero, telemetry strip
Win ratewins / (wins + losses) — pushes excludedHero, equity curve header
Unit ROItotal_profit / total_wagered × 100Hero ("unit ROI" line), methodology
Bankroll growth(current - starting) / starting × 100Bankroll card
Profit / LossDollar profit summed across settled picksBankroll card, hero
Pending picksPicks placed but not yet settledTelemetry strip
Composite score0–100 rank, sum of capper + model + convergence + edge componentsInternal selection signal
Convergence tagGold (3+ cappers + model + bot) / Silver (cappers only) / Bronze (model only)Per-pick row badges

"AI Confirmed %" (retired)

An earlier "AI CONFIRMED ✓ NN%" badge appeared in the bet log column. That metric was not derived from model output. Its implementation was a deterministic hash of the pick text plus a base value of 62 if the pick won or 54 if it lost — visual decoration, post-hoc-biased toward winners.

The column was retired. The bet log replaces it with the actual convergence tag (Gold / Silver / Bronze) which is derived from real model + capper agreement. When NHL and NCAA models land, a real model_prob percentage column may return; until then there is no "AI Confirmed %" surface.

9. Reproducibility

The composite-score formulas in §2 and the sizing math in §3 are fully specified in this document. Anyone who pulls our settled-pick ledger via the public dashboard endpoint and re-applies the formulas will reproduce the numbers in §4 within rounding.

The sizing-experiment cohort is frozen in our backtest archive and the published numbers do not retroactively change. As new picks settle they extend the live cohort forward; cohort recalculations refresh published metrics with an "as of" timestamp at the top of this page.

Changelog (this page)

DateChange
2026-04-26Initial publication.
2026-05-11Retired V1 / V2 cohort framing. Single bot, all settled picks on the ledger.
CAPTRACKER · Dashboard · Leaderboard · Daily Feed
Methodology updates are versioned. Past metric definitions and weight changes are recorded in the changelog above.