How it's built — NFL Outcome Predictor

At a glance

Five stages, top to bottom. The detailed map below expands each one.

1 · Get the data → 2 · Rate the teams → 3 · Combine into one table → 4 · Grade & build the feed → 5 · Show it here

Each box is a script. There are three kinds:

Data builder
gets data, saves a file

Engine
does the math, saves nothing

Harness
runs the tests & the live feed

A green chip like writes schedules.parquet means that script saves a file to disk for the next stage to pick up.

The detailed map

nflverse — free, public NFL data

Schedules & betting lines · player stats · play-by-play

Get the data

Three scripts download free NFL data and save it, so nothing has to be re-fetched later.

pull_data.py

Every game’s schedule, final score, and betting line.

writes schedules.parquet

qb_value.py

Scores how well each quarterback played in each game, from the box score.

writes qb_value.parquet

epa_features.py

Each team’s recent form on offense & defense — counting only games already played.

writes epa_features.parquet

What comes out — the backbone file schedules.parquet (real 2025 rows):

game_id	wk	away	home	result	spread	home ML	away ML
2025_01_DAL_PHI	1	DAL	PHI	+4	8.5	−425	+330
2025_01_KC_LAC	1	KC	LAC	+6	−3.0	+145	−175
2025_01_TB_ATL	1	TB	ATL	−3	−1.5	−105	−115
2025_01_CIN_CLE	1	CIN	CLE	−1	−5.5	+195	−238

result = home score − away score (so +4 = home won by 4). ML = moneyline odds.

qb_value.parquet — how each QB played:

player	team	value
Aaron Rodgers	PIT	75.7
Matthew Stafford	LA	44.6
Joe Flacco	CLE	42.2

epa_features.parquet — recent form (higher = better):

team	off form	def form
BUF	+0.136	+0.051
BAL	+0.074	+0.071
ATL	−0.009	+0.056

Rate the teams (the brains)

Pure logic that other scripts call. Ratings update game-by-game, so a test can’t accidentally see the future.

elo.py

A chess-style power rating: beat a strong team and your number climbs; lose to a weak one and it drops.

qbelo.py ★ main model

Same rating, but it drops when a backup quarterback starts — the moment plain Elo gets fooled.

metrics.py

The scorecard: grades each prediction, and turns the Vegas odds into a clean win % to compare against.

What comes out — each engine’s estimate of the home team’s win chance (real Week 1 2025):

game (away @ home)	elo.py	qbelo.py ★	home won?
DAL @ PHI	0.811	0.673	yes (PHI +4)
KC @ LAC	0.359	0.326	yes (LAC +6)
TB @ ATL	0.428	0.322	no (TB +3)
CIN @ CLE	0.358	0.437	no (CIN +1)

0.811 = “81% chance the home team wins.” QBElo (★) nudges Elo’s number — biggest when a backup QB is starting.

Combine everything into one table

The single step that every test and the live feed below all share.

ml_model.py → assemble()

Runs the ratings and stitches all three data files into one big table — one row per game, with every number lined up: the ratings, the win probabilities, recent form, days of rest, and the Vegas price.

What comes out — the single combined row for one game (Week 1, DAL @ PHI). Every test below reads rows shaped like this:

column	what it is	value
elo_diff	PHI’s rating minus DAL’s	+205.3
p_home_elo	Elo’s win chance for PHI	0.811
p_home_qbelo	QBElo’s win chance for PHI	0.673
off_epa_diff	offense-form gap	+0.167
def_epa_diff	defense-form gap	−0.124
home_rest / away_rest	days of rest each	7 / 7
p_home_mkt	Vegas’ win chance for PHI	0.777
y	what happened (1 = home win)	1

Grade it, then build the live feed

These actually run. The tests prove the model works (and doesn’t cheat); the producer writes the file this site reads.

The tests — each saves a report (chart or table)

backtest.py

The main exam: predict past seasons it was never tuned on, and compare to Vegas.

ml_model.py

Tries machine learning instead of the rating — and reports honestly that it doesn’t win.

ml_walk_forward.py

Re-runs that ML test fairly, letting it learn each new season. Still can’t beat the rating.

backup_slice.py

Pinpoints where the QB rating earns its keep: games where a backup starts.

sanity_seasons.py

The cheat-detector: if we beat Vegas every season we’d be peeking. We don’t.

What comes out — the scorecard backtest.py prints (graded on 2019–24 games it never saw):

model	Brier ↓	log loss ↓	accuracy ↑
Always pick home	0.2495	0.6937	53.1%
Elo	0.2223	0.6378	63.5%
QBElo ★	0.2212	0.6352	63.7%
Vegas (the ceiling)	0.2097	0.6087	66.6%

Lower Brier / log loss = sharper percentages. QBElo lands just shy of Vegas — the honest, expected result.

The live producer

season_replay.py

Walks a whole season week by week, scoring every game under every model, and saves the results file the website loads. Change the year to 2026 and a weekly job turns this into the real live tracker.

writes replay_2025.json

What comes out — the live feed (replay_2025.json): every model’s home-win % per game, plus the result. This is exactly what the page below reads.

matchup	Elo	QBElo ★	ML	Vegas	result
DAL @ PHI	0.811	0.673	0.751	0.777	PHI +4
KC @ LAC	0.359	0.326	0.345	0.391	LAC +6
TB @ ATL	0.428	0.322	0.222	0.489	TB +3
MIA @ IND	0.505	0.413	0.299	0.504	IND +25

Show it here

A plain static page — no server, no database.

web/index.html — the live tracker

Reads the results file and draws the tables & charts. That’s the page you came from — click to go back. →

Run it yourself, end to end

Each step saves a file the next one reads, so the order is just the map above — top to bottom:

# Step 1 — get the data (saves data/*.parquet)
.venv\Scripts\python.exe src\pull_data.py
.venv\Scripts\python.exe src\qb_value.py
.venv\Scripts\python.exe src\epa_features.py

# Step 4 — grade the models (saves reports/*.csv and *.png)
.venv\Scripts\python.exe src\backtest.py
.venv\Scripts\python.exe src\ml_model.py
.venv\Scripts\python.exe src\ml_walk_forward.py
.venv\Scripts\python.exe src\backup_slice.py
.venv\Scripts\python.exe src\sanity_seasons.py

# Step 4 (cont.) — build the live feed the site reads (also copied into web/)
.venv\Scripts\python.exe src\season_replay.py

Steps 2 & 3 (the engines and the combine step) aren’t run on their own — the scripts above call them.

The big picture

The whole project in one line:

Grab free
NFL data

→

Rate every
team

→

Predict each
game

→

Grade it
vs Vegas

→

Show it
here

So… does it work?

How often each one picks the right winner (on 2019–24 games it had never seen):

A coin flip50%

Our model (QBElo)64%

Vegas (the ceiling)67%

We land within a few points of the sharpest line in the world — and we don’t beat it. That’s the honest result: the goal was calibrated, trustworthy probabilities, not a fantasy edge over Vegas.

✓ Never peeks at the future ✓ Always graded against Vegas ✓ 100% free, public data