CAPTRACKER is a paper-bankroll handicapping bot that selects sports betting picks by composite-score ranking and stakes a flat percentage of bankroll on each one. Its purpose is public validation: every pick is locked before tip-off, settled by ESPN, and recorded to an immutable ledger.
Each scoring cycle, every candidate pick gets a composite score from 0–100 across four components:
| Component | Range | What it measures |
|---|---|---|
capper_score | 0–40 | Source handicapper's track record, ROI, streak. Backed by the leaderboard. |
model_score | 0–25 | Sport-specific model's agreement with the pick (see coverage table). |
convergence_score | 0–20 | Signal strength when multiple cappers + model align on the same canonical bet. |
edge_score | 0–10 | Closing-line value gap between model probability and book implied probability. |
Top-25 by composite score are promoted to the active portfolio per cycle. Per-game cap is 2 picks; daily upper bound is 10.
| Sport | Model | model_score active | Composite leans on |
|---|---|---|---|
| MLB | Poisson run-scoring (wOBA / FIP / park factors) | Yes | All four components |
| NBA | NetRtg / pace differential / B2B penalties | Yes | All four components |
| NHL | None (deferred) | No | capper + convergence + edge |
| NFL | None (offseason) | No | capper + convergence + edge |
| NCAAB / NCAAF / soccer / MMA | None | No | capper + convergence + edge |
Picks in a sport without a dedicated model still compete on the composite — they have model_score=0 and rely on the other three components to rank.
The _score_capper function had a field-name bug from launch through 2026-04-25: it only checked pick.get("handicapper") for the source name, missing every Reddit pick (which uses author). Walk-forward replay caught it and the fix landed 2026-04-25. Picks scored before that date may have under-weighted Reddit sources; everything from 2026-04-25 forward ships with the fix in place.
3% of current bankroll per pick, flat across all tiers.
An earlier sizing rule used unit = bankroll / 10 with ELITE = 1.5u (15%) and other tiers = 1.0u (10%). That sizing was high-ruin-probability for the strategy's measured edge. The data:
1,000 outcome-resampled simulations per sizing %, alternate-season risk:
| Sizing | Median final | DD p95 | >25% DD risk | >50% DD risk |
|---|---|---|---|---|
| 2% | $1,049 | 16.0% | 0.0% | 0.0% |
| 3% (live) | $1,071 | 23.2% | 1.1% | 0.0% |
| 5% | $1,107 | 36.2% | 6.4% | 0.2% |
| 10% | $1,149 | 61.1% | 19.8% | 5.3% |
| 15% (former) | $1,122 | 77.5% | 29.2% | 14.6% |
3% sizing retains nearly all median return ($1,071 vs $1,149 at 10%) while dropping >25% drawdown risk 18× vs the former 15% rule. p95 max drawdown at 3% is 23.2%, just below the 25% public-marketing threshold for "this is normal variance, not system failure."
Walk-forward replay can't validate that ELITE picks have meaningfully larger edge than HIGH picks. The replay produces NO_PLAY/MEDIUM tiers exclusively because historical model snapshots and historical odds are both leakage points (we don't have them stored at pick-creation time). Until model_predictions_history accumulates ≥6 months of clean data, flat 3% is the defensible move. Tier-graded sizing returns when there's evidence to support it.
Each pick in the test set is re-scored against only data that existed at pick.created_at. Sample as of the as-of timestamp at the top of this page.
Test cohort: 1,163 settled picks, window 2026-04-04 → 2026-04-25.
| Cohort | n | Win % | Wilson 95% CI | Variable-unit ROI | Flat-unit ROI |
|---|---|---|---|---|---|
| Top-10 | 10 | 50.0% | [23.7%, 76.3%] | -2.7% | -2.7% (n=10 noise) |
| Top-25 | 25 | 56.0% | [35.1%, 72.1%] | +10.12% | +10.07% |
| Top-50 | 50 | 51.0% | [37.5%, 64.4%] | +6.57% | -0.80% |
| Top-100 | 100 | 49.5% | [40.0%, 59.0%] | -7.86% | -7.86% |
sum(stored_profit_dollars) / sum(picks.units_field) × 100. Uses the bot's recommended per-pick units. Marketing-credible — answers "if you'd staked the bot's recommended units, what's your ROI?"sum(profit_at_$X_flat) / (n × $X) × 100, derived from result + odds_us. Treats every pick as the same dollar bet. The right metric for sizing decisions because uniform sizing is what production uses.The two metrics agree at top-25 (where unit distribution is near-flat) and diverge at top-50 (where ranks 26-50 carry units > 1 on a few winners that boost variable-unit). Both are valid measurements of different things.
The walk-forward test window is the most recent 20% of all settled picks. As new picks settle, the window slides forward. The numbers above are computed against settled picks as of the as-of timestamp at the top of this page, and will move as the database grows. The sizing-experiment snapshot is pinned for reproducibility.
At 3% flat sizing on the top-25 cohort, alternate-season simulations produce:
Realized max drawdown is tracked on the equity curve on the homepage and updates as new picks settle.
When any of the following fires, the bot freezes new picks and the cause is investigated before fresh selections resume.
model_prob expected calibration error (ECE) exceeds threshold (formal threshold pending live data). Indicates sport models are mis-calibrating probabilities.Each trip event is recorded with date, condition, and resolution in the changelog at the bottom of this page.
These are different metrics. The site uses both, with explicit labels.
(current - starting) / starting × 100. Reflects total compounding. The bankroll card on the homepage displays this and labels it "BANKROLL GROWTH" (formerly the misleading "RETURN ON INVESTMENT").total_profit / total_wagered × 100. Reflects per-bet edge. The hero copy and methodology numbers use this metric.The two diverge under bankroll-fraction sizing because the bankroll turns over multiple times. At 3% sizing on a 100-pick season, total wagered ≈ 300% of starting bankroll, so a bankroll growth of +30% corresponds to roughly +10% unit ROI. Sportsbook industry norm is to quote unit ROI; bankroll growth is the consumer-facing visible metric on a tracker.
| Metric | Definition | Where it appears |
|---|---|---|
| Settled picks | Count of picks with an ESPN-graded result (WON, LOST, or PUSH) | Hero, telemetry strip |
| Win rate | wins / (wins + losses) — pushes excluded | Hero, equity curve header |
| Unit ROI | total_profit / total_wagered × 100 | Hero ("unit ROI" line), methodology |
| Bankroll growth | (current - starting) / starting × 100 | Bankroll card |
| Profit / Loss | Dollar profit summed across settled picks | Bankroll card, hero |
| Pending picks | Picks placed but not yet settled | Telemetry strip |
| Composite score | 0–100 rank, sum of capper + model + convergence + edge components | Internal selection signal |
| Convergence tag | Gold (3+ cappers + model + bot) / Silver (cappers only) / Bronze (model only) | Per-pick row badges |
An earlier "AI CONFIRMED ✓ NN%" badge appeared in the bet log column. That metric was not derived from model output. Its implementation was a deterministic hash of the pick text plus a base value of 62 if the pick won or 54 if it lost — visual decoration, post-hoc-biased toward winners.
The column was retired. The bet log replaces it with the actual convergence tag (Gold / Silver / Bronze) which is derived from real model + capper agreement. When NHL and NCAA models land, a real model_prob percentage column may return; until then there is no "AI Confirmed %" surface.
The composite-score formulas in §2 and the sizing math in §3 are fully specified in this document. Anyone who pulls our settled-pick ledger via the public dashboard endpoint and re-applies the formulas will reproduce the numbers in §4 within rounding.
The sizing-experiment cohort is frozen in our backtest archive and the published numbers do not retroactively change. As new picks settle they extend the live cohort forward; cohort recalculations refresh published metrics with an "as of" timestamp at the top of this page.
| Date | Change |
|---|---|
| 2026-04-26 | Initial publication. |
| 2026-05-11 | Retired V1 / V2 cohort framing. Single bot, all settled picks on the ledger. |