I don’t know much about sports. I played tennis growing up, I used to “watch” football games in college, and I know that Ty Cobb has the highest career batting average in Major League Baseball (I read it somewhere), but that’s about where my knowledge of sports breaks down. I try not to have a snide attitude about it, I just didn’t really find it very accessible – it was hard for me to make it feel relevant to me.
A couple of years ago, my coworkers started a March Madness pool at the office and I was feeling a little left out. I realized after hearing people going on about free throw percentages and win/loss ratios that there is a lot of data out there about sports, and figured that might be my in – could I use data to win the office March Madness bracket contest? Could I put all those sports fanatics to shame using the powers of computer science and mathematics?!
Well, no. I didn’t win. But I did alright, and even better the next year. I decided to carry this attitude over to our NFL survivor pool last year. I came in second place (woohoo!) but something more significant came out of it too – I caught myself making a joke about the Cleveland Browns. This part of our culture was accessible to me now! So now every year I get to tinker with my models and participate in the conversation.
With all that said: football season is upon us, and I have been dusting off my model of the NFL. It’s nothing fancy (or novel), but I wanted to share my results and describe my process. In the rest of this post, I’ll describe how it works, and share my (very naïve) predictions for this season.
The model I’ve built is so unoriginal that it already has a name – an Elo rating system. Elo ratings were originally developed for rating chess players, but are now used in many different domains. Elo is one of my favorite techniques for modeling things like this because of its simplicity. Like every model, this is not a perfect representation of the world. It’s simplified around some assumptions, which vary in their robustness. The assumptions baked into an Elo model are:
- We can use past results to predict future results.
- Ratings are transitive.
- Predictions are Probabilistic.
- Unexpected Outcomes Warrant Greater Adjustment.
Let’s dig into these ideas in a bit more depth.
Using Past Results to Predict Future Results
“What’s Past is Prologue” – Shakespeare, The Tempest
The first key assumption underlying the Elo rating system is a pretty big one: every competitor can be rated with a single number, and that number is an aggregation of their past performance. This is obviously wrong – history isn’t destiny. But like all models, this is not designed to be perfect or capture all the complexities of the world; it’s designed to simplify things to a representation we can reason about, presumably at the expense of accuracy. It doesn’t allow for a complex model, but its simplicity and straightforwardness make it an understandable and approachable model to begin with. And its results can often be better than more complex models, particularly when you consider what the assumption does for your model.
This assumption makes Elo a bad candidate for some kinds of modeling. For example, I know that an Elo model is likely to be very inaccurate on teams with a lot of turnover year-to-year. In that situation, this year’s team is very different from last year’s team, so prior performance is not super useful. For professional NFL teams, turnover is low enough and franchises tend to attract similar talent often enough that the model holds reasonably well. But if you ask me how my model accounts for the fact that Colin Kaepernick no longer plays for the 49ers, my answer is simple: it doesn’t.
The only thing you need to build an Elo model is a list of past match-ups like this:
|Week||Team 1||Team 2||Winning Team|
In the first week’s matchups, the Unicorns beat the Hedgehogs and the Narwhals beat the Ferrets. Must be something about those pointy horns they have.
If you want to, you can use more specific past results in an Elo model. For example, many models incorporate the point margin (difference between the winning and losing team scores). Because I don’t have much prior knowledge about sportsball, I opted for parsimony over detail. I do account for one extra factor: home field advantage. I decided to include this one because it’s easy to crunch the numbers on and quantify. Historically, the home team beats the visiting team about 58% of the time. If a team is playing on their home field, I add a small amount to their rating for that match, making them a bit more likely to win.
Ratings are Transitive
“Congratulations. According to the transitive property, you just defeated Muammar Qaddafi in arm-wrestling.” – Jack Donaghy, 30 Rock
The second assumption is also super important – a model like this relies on transitivity, a mathy-sounding word that I guarantee everyone can grasp with an example. Look at the table above and answer this question: In week 3, the Unicorns are scheduled to play the Narwhals; who do you think will win that match-up?
Let’s review what we know: The Unicorns beat the Hedgehogs, and the Hedgehogs beat the Narwhals. If we interpret these facts to mean “The Unicorns are probably better than the Hedgehogs, and the Hedgehogs are probably better than the Narwhals,” then we might make the assumption that “The Unicorns are probably better than the Narwhals” even though we haven’t seen those two teams face-off. That’s transitivity. If A>B and B>C, then A>C.
So is transitivity always a good assumption in life? No way! There are obvious weaknesses here, because it doesn’t account for different strengths or weaknesses (for example, having a good offense or a bad defense). Just like our use of past results, this is an assumption that simplifies our world (presumably) at the expense of accuracy. We just have to explore the implications and decide if too much is lost.
Predictions are Probabilistic
“Never tell me the odds!” – Han Solo, Star Wars
You may have noticed that I am throwing the word “probably” around a lot. There’s a lot of uncertainty in the world, you know? When we make predictions, it’s best to make probabilistic predictions. The question “will it rain tomorrow” is rarely answered with a binary “yes” or “no”, instead we talk about the likelihood that it will happen (though I live in Seattle, so the answer is almost always “probably yes”).
Extremely unlikely events occur all the time. For example, just a few weeks ago a woman in Massachusetts won the second-largest Powerball jackpot ever, and she was the only winner with that ticket! It was extremely unlikely for her to win, but she did. Rather than saying “you will not win the Powerball this week,” we’re being more truthful if we say “you have a very small (.0000003%, to be precise) chance of winning the Powerball.”
Elo rating systems are no exceptions – given two ratings (one for each opponent), an Elo model tells you the likelihood that one team will win. If my rating is higher than yours, I will probably win. If my rating is lower, I will probably lose. If our ratings are equal, then it’s a toss-up. There are many different ways to build a model like this, Elo is based around a logistic function that yields an S-shaped curve:
In my NFL model, the median team has a rating of 1500. This curve shows the probability that the median team would beat other teams based on their ratings. As you move to the right, the opponent’s rating goes up, so naturally the probability of beating them goes down. The probability of defeating an opponent could fall anywhere on that curve, but I have a placed three teams on the curve for reference: the Cleveland Browns (the worst team in my model), the Baltimore Ravens (the most “average” team in my model) and the New England Patriots (the best team in my model). Use the purple slider to adjust the rating to see how the probability curve changes.
Unexpected Outcomes Warrant Greater Adjustment
“No one is so brave that he is not disturbed by something unexpected.” – Julius Caesar
Let’s imagine your local weatherperson forecast a 1% chance of rain today. Unfortunately, when you leave your home you find that it is a torrential downpour. That’s not what you expected at all! You might think to yourself – “This meteorologist needs to get her act together and improve her model.”
Let’s instead imagine that she predicted a 40% chance of rain, and you left to find a downpour – you probably wouldn’t think that was quite as bad – she did give it a decent chance after all.
In general, we should respond to unexpected events by adjusting our understanding of the world. In the first case, the meteorologist may look at her model and think “hmm, perhaps on days like today there is a greater chance of rain than I thought.” The Elo model does the same thing – after every match, ratings get adjusted based on the result. The more likely a result is (the more we “expect” it) the less impact there is on ratings, because that suggests that our model is pretty accurate. A very unlikely result (“unexpected”) will result in a greater adjustment in our model. The formula for this is pretty straightforward, you have some constant that you multiply by the difference in the outcome and the expected outcome:
change = k(result – probability)
As you can imagine, the constant is pretty darn important! A larger constant means more substantial changes after each match, and a smaller constant means smaller changes. The constant in my model is 14, because I used a trial-and-error technique to check historical results with different values until I was able to minimize error and variance in my results.
Summarized at each 10% increment, here are the rating changes that come out of my model:
|Likelihood of Win||Change in rating if win||Change in rating if lose|
To take an example that ties it together, let’s suppose the Browns are playing the Patriots on equal footing (ie: both teams are away from home). As of the pre-season, my model rates the Browns at 1390, and the Patriots at 1623. This means the Patriots have a 76% chance of winning that game. The rating changes that result reflect this difference:
|Result||Change in Patriots Rating||Change in Browns Rating|
|Patriots Win (76% chance)||+3.36||-3.36|
|Browns Win (24% chance)||-10.64||+10.64|
This self-correcting property means that – if the algorithm is well-calibrated for variance – if a team is over- or under-rated then this will emerge over the course of the season.
Pulling Everything Together
Now we have all the pieces we need: historical results (I used data from 1996-2016), win probability formula, and the rating adjustment formula from the previous section. To build the model, we simply follow this process:
- Set every team’s rating to the same starting point (I used 1500)
- For each game, compute the probability of each team’s victory
- Use the actual result of that game to calculate the rating adjustment amount
- Apply that adjustment to each team’s rating
- If there are more games, go back to step 2!
Building the model from these 5,535 games produced a total of 11,102 ratings. (You can see a plot of every team’s Elo rating from 1996 to present, though it just looks like a mess of squiggly lines). We use the final ratings from the 2016 season as a starting point for this season and compute the likely result of each game.
Predictions for Week One
Here are my predictions for the first week of football! You can use the dropdown to explore later weeks, but keep in mind that those forecasts will be adjusted as we gain information from the preceding weeks.
I have a few posts in the queue that dig into this in a bit more detail. Here’s a sneak peek:
- How do I know if it works? A piece explaining how I will judge the effectiveness of this model, and what makes a “good” model in general.
- How do I make picks in a survivor-style pool? The mechanics are different here than in most pools in that you can only choose a team once.
- Who’s going to the Super Bowl? A piece describing how I use this model to make predictions (of dubious quality) about what teams might make it to the Super Bowl this year.
A Note of Thanks
A big thank you to Pro Football Reference, a site that graciously offers free access to a huge amount of historical data on NFL games. Access to clean and accurate data makes any analysis effort much easier, and when it’s free it’s all that much sweeter.