How to read this page
This page shows how our prediction model did across the entire 2025 NFL season. Before each game, every model named a favorite and a win chance (e.g. “Eagles 65%”), using only information known beforehand. We then grade those guesses against what actually happened — and against the Vegas odds, the gold standard.
The lines / rows you'll see:
- Our model (QBElo) — this is our prediction. A power ranking like chess rankings, adjusted when a backup quarterback starts.
- Vegas — the betting odds as a clean win %. The benchmark we measure ourselves against (nobody really beats it).
- Elo and ML — two simpler/alternative versions we built for comparison.
The grades:
- Accuracy — how often it picked the right winner. Higher is better.
- Brier / Log loss — how good the percentages were (saying “90%” and being wrong is penalized hard). Lower is better.
- Calibration (the chart below) — when a model says “70%”, do those teams really win about 70% of the time? On the diagonal line = honest.
Bottom line: the homemade models get close to Vegas but don’t beat it — the honest, expected result.
How it works
1. The model learns in a loop
Beat a strong team and your rating jumps; lose to a weak one and it drops. Repeat every week and the ratings sort the league into a power order — no human opinions needed.
2. Turning two ratings into a win chance
The core trick: take the gap between two teams' ratings and bend it into a probability. Drag the slider.
→ Stronger team's win chance: —
3. A real game from 2025
4. Watch a team's power rating over the season
It rises after wins (especially big ones over good teams) and falls after losses. 1500 = average; higher = stronger.
Season totals
loading…
| Model | Brier ↓ | Log loss ↓ | Accuracy ↑ | Games |
|---|
Brier and log loss grade the percentages (lower = better); accuracy is just how often the winner was picked. Market = the Vegas odds (the ceiling); baseline = always guess the home team.
Track record through the season
Each line is a model's running grade on its percentages (Brier — lower is better). Vegas (red) stays the best; the homemade models bunch up just above it.
Weekly breakdown
Click a week to see every game's prediction.
How it was built
The point of this project isn't a secret formula — it's the process: build progressively smarter models and measure each one honestly, never peeking at the future, always benchmarked against Vegas. See the full pipeline & data flow →
The climb — each model, scored on games it had never seen (2019–2024)
Win-pick accuracy (50% = a coin flip). Each step adds sophistication and gets closer to Vegas — none beats it.
The pipeline
Build decisions that matter
- No peeking at the future. Every prediction uses only games played before it — the #1 way projects like this accidentally cheat, and the easiest to get wrong.
- Tried machine learning, reported the truth. Gradient-boosted and logistic models on EPA / rest features didn't beat the simple QB-aware rating. Shown honestly, not buried.
- Benchmarked against the closing Vegas line — the sharpest, hardest number to beat.
- Built from scratch. The rating system, the quarterback-value formula, and the calibration step are all hand-coded — no off-the-shelf prediction library.
Full source code, the commit history (the build step by step), and the methodology write-up live in the project repo.