← Back to the live tracker

How it’s built

The project is a dozen small Python scripts arranged like an assembly line. Each one does a single job and hands its result to the next — get the data, rate the teams, grade every prediction, publish the results this site reads. Nothing ever peeks at the future. This page is the map.

At a glance

Five stages, top to bottom. The detailed map below expands each one.

1 · Get the data 2 · Rate the teams 3 · Combine into one table 4 · Grade & build the feed 5 · Show it here

Each box is a script. There are three kinds:

Data builder
gets data, saves a file
Engine
does the math, saves nothing
Harness
runs the tests & the live feed

A green chip like writes schedules.parquet means that script saves a file to disk for the next stage to pick up.

The detailed map

nflverse — free, public NFL data

Schedules & betting lines · player stats · play-by-play

1

Get the data

Three scripts download free NFL data and save it, so nothing has to be re-fetched later.

pull_data.py

Every game’s schedule, final score, and betting line.

writes schedules.parquet
qb_value.py

Scores how well each quarterback played in each game, from the box score.

writes qb_value.parquet
epa_features.py

Each team’s recent form on offense & defense — counting only games already played.

writes epa_features.parquet

What comes out — the backbone file schedules.parquet (real 2025 rows):

game_idwk awayhome resultspread home MLaway ML
2025_01_DAL_PHI1DALPHI+48.5−425+330
2025_01_KC_LAC1KCLAC+6−3.0+145−175
2025_01_TB_ATL1TBATL−3−1.5−105−115
2025_01_CIN_CLE1CINCLE−1−5.5+195−238

result = home score − away score (so +4 = home won by 4). ML = moneyline odds.

qb_value.parquet — how each QB played:

playerteamvalue
Aaron RodgersPIT75.7
Matthew StaffordLA44.6
Joe FlaccoCLE42.2

epa_features.parquet — recent form (higher = better):

teamoff formdef form
BUF+0.136+0.051
BAL+0.074+0.071
ATL−0.009+0.056
2

Rate the teams (the brains)

Pure logic that other scripts call. Ratings update game-by-game, so a test can’t accidentally see the future.

elo.py

A chess-style power rating: beat a strong team and your number climbs; lose to a weak one and it drops.

qbelo.py ★ main model

Same rating, but it drops when a backup quarterback starts — the moment plain Elo gets fooled.

metrics.py

The scorecard: grades each prediction, and turns the Vegas odds into a clean win % to compare against.

What comes out — each engine’s estimate of the home team’s win chance (real Week 1 2025):

game (away @ home) elo.py qbelo.py ★ home won?
DAL @ PHI0.8110.673yes (PHI +4)
KC @ LAC0.3590.326yes (LAC +6)
TB @ ATL0.4280.322no (TB +3)
CIN @ CLE0.3580.437no (CIN +1)

0.811 = “81% chance the home team wins.” QBElo (★) nudges Elo’s number — biggest when a backup QB is starting.

3

Combine everything into one table

The single step that every test and the live feed below all share.

ml_model.py → assemble()

Runs the ratings and stitches all three data files into one big table — one row per game, with every number lined up: the ratings, the win probabilities, recent form, days of rest, and the Vegas price.

What comes out — the single combined row for one game (Week 1, DAL @ PHI). Every test below reads rows shaped like this:

columnwhat it isvalue
elo_diffPHI’s rating minus DAL’s+205.3
p_home_eloElo’s win chance for PHI0.811
p_home_qbeloQBElo’s win chance for PHI0.673
off_epa_diffoffense-form gap+0.167
def_epa_diffdefense-form gap−0.124
home_rest / away_restdays of rest each7 / 7
p_home_mktVegas’ win chance for PHI0.777
ywhat happened (1 = home win)1
4

Grade it, then build the live feed

These actually run. The tests prove the model works (and doesn’t cheat); the producer writes the file this site reads.

The tests — each saves a report (chart or table)

backtest.py

The main exam: predict past seasons it was never tuned on, and compare to Vegas.

ml_model.py

Tries machine learning instead of the rating — and reports honestly that it doesn’t win.

ml_walk_forward.py

Re-runs that ML test fairly, letting it learn each new season. Still can’t beat the rating.

backup_slice.py

Pinpoints where the QB rating earns its keep: games where a backup starts.

sanity_seasons.py

The cheat-detector: if we beat Vegas every season we’d be peeking. We don’t.

What comes out — the scorecard backtest.py prints (graded on 2019–24 games it never saw):

model Brier ↓ log loss ↓ accuracy ↑
Always pick home0.24950.693753.1%
Elo0.22220.637663.2%
QBElo ★0.22050.633563.6%
Vegas (the ceiling)0.20970.608766.6%

Lower Brier / log loss = sharper percentages. QBElo lands just shy of Vegas — the honest, expected result.

The live producer

season_replay.py

Walks a whole season week by week, scoring every game under every model, and saves the results file the website loads. Change the year to 2026 and a weekly job turns this into the real live tracker.

writes replay_2025.json

What comes out — the live feed (replay_2025.json): every model’s home-win % per game, plus the result. This is exactly what the page below reads.

matchup Elo QBElo ★ ML Vegas result
DAL @ PHI0.8110.6730.7510.777PHI +4
KC @ LAC0.3590.3260.3450.391LAC +6
TB @ ATL0.4280.3220.2220.489TB +3
MIA @ IND0.5050.4130.2990.504IND +25

Run it yourself, end to end

Each step saves a file the next one reads, so the order is just the map above — top to bottom:

# Step 1 — get the data (saves data/*.parquet)
.venv\Scripts\python.exe src\pull_data.py
.venv\Scripts\python.exe src\qb_value.py
.venv\Scripts\python.exe src\epa_features.py

# Step 4 — grade the models (saves reports/*.csv and *.png)
.venv\Scripts\python.exe src\backtest.py
.venv\Scripts\python.exe src\ml_model.py
.venv\Scripts\python.exe src\ml_walk_forward.py
.venv\Scripts\python.exe src\backup_slice.py
.venv\Scripts\python.exe src\sanity_seasons.py

# Step 4 (cont.) — build the live feed the site reads (also copied into web/)
.venv\Scripts\python.exe src\season_replay.py

Steps 2 & 3 (the engines and the combine step) aren’t run on their own — the scripts above call them.

The big picture

The whole project in one line:

📥
Grab free
NFL data
⚖️
Rate every
team
🎯
Predict each
game
📊
Grade it
vs Vegas
🌐
Show it
here

So… does it work?

How often each one picks the right winner (on 2019–24 games it had never seen):

A coin flip50%
Our model (QBElo)64%
Vegas (the ceiling)67%

We land within a few points of the sharpest line in the world — and we don’t beat it. That’s the honest result: the goal was calibrated, trustworthy probabilities, not a fantasy edge over Vegas.

✓ Never peeks at the future ✓ Always graded against Vegas ✓ 100% free, public data