Last Sunday was the largest sporting event in the United States, the pinnacle of the 2017 NFL season, and the odds are high that you watched it. NBC's telecast for the Super Bowl drew 103.4 million viewers, apparently tumbling by 7% to its lowest levels since 2009. Even so, 103.4 million viewers are almost a third of the entire population of the USA, a staggering number in its own right.
The Philadelphia Eagles eked out a hard-fought victory against the New England Patriots, winning by a full eight points, 44-31. That 8 point spread was actually the largest margin in a Super Bowl that the Patriots have played in since 2002, the first time Tom Brady played in one. The 8 point loss was by far the largest loss the Patriots have experienced in the Super Bowl, as they fell by 4 and 3 points respectively to the New York Giants.
The Eagles, led by backup quarterback Nick Foles, were billed as the underdogs in this matchup, a mantra they upheld with glee.
Outside of Philly though, nobody expected them to really have a chance. Expert Panel picks across both ESPN and SBNation heavily favorited the Patriots, with 72% and 75% of the members picking them to repeat as champions. The computers anticipated a little closer of a game. ESPN’s Stats and Information gave the Patriots a 52% win probability before kickoff, and FiveThirtyEight gave them a larger 58% chance of taking it all, but only by an estimated 2.5 points. The OddsShark computer, on the other hand, favored the Patriots by 4.5 points. The members on ESPN’s expert panel also expected a close game within 1 score.
After watching both teams throughout the season, it seemed to me that although they both finished with the same record of 13-3, the Eagles slightly outplayed the Patriots, or at least played just as well. According to pro-football-reference, the Eagles finished 4th in team defense to the Patriots' 5th place finish. In terms of total points as an indicator for team offense on nfl.com, the Patriots finished second and the Eagles third. This indicated two somewhat evenly matched teams. I decided to dig more into the numbers and try and predict just how much of an underdog the Eagles really were.
To win a football game, a team needs to score more points than they allow. The more a team wins by, the better they can be assumed to be. If you win by a lot of points, you are less likely to be relying on luck to squeak out a win in tight games and can be seen then as a good team with a sustainable winning culture. By looking at the distribution of points scored and points allowed by the Patriots and Eagles from their 18 games prior to the Super Bowl then, we can empirically determine if one team is better than the other.
I will approach this from two different, but related directions. The first method is to simulate the Super Bowl 100,000 times by sampling from the points scored and allowed for both the Patriots and the Eagles and then determining a game score by averaging the sampled points scored for one team with the other one's defense, or sampled points allowed. This type of sampling is called a Monte Carlo Simulation and requires a distribution to be forced onto the data. The second method is to calculate the expected winning percentage for both the Patriots and Eagles, using points scored and points allowed, and then perform a Bayesian Hypothesis Test to determine the probability that one team's true expected winning percentage is higher than the other. This can be interpreted as the probability that one team is truly better than the other, and consequently, the probability that one team would win in a head-to-head matchup, also known in this case as Super Bowl LII.
I gathered data on week by week games from pro-football-reference. The raw data can be found on their website or downloaded from my GitHub. I did some basic munging to determine the points scored and against every team for every game. Here are the results for the Patriots and the Eagles:
Some quick observations show us that on average, the Patriots scored more points than the Eagles, and the Eagles allowed fewer points than the Patriots. However, the spread tells the opposite story. Going by the median, the Eagles scored more points than the Patriots while the Patriots allowed fewer points than the Eagles! This is because the variance in this data is quite high, probably due to the relatively few observations (18 in each set).
Part 1: Super Bowl Monte Carlo Simulation
As stated above, we need to fit a distribution to each of 4 data histograms plotted above, to allow for random sampling. If you have been on this site before, you know that this is not my first simulation. I also simulated the Super Bowl last year, as well as the most recent World Series. The probability distribution that I used to sample from in those cases was the Negative Binomial Distribution. The Negative Binomial Distribution is useful for modeling discrete, or count data, like points or runs in sports. Additionally, the Negative Binomial Distribution allows the mean and variance to differ, unlike the Poisson distribution, which is a common scenario in sports scores. Finally, the Negative Binomial has been proven to accurately describe scores in baseball, soccer, rugby, and college football.
Unlike my previous simulations where I determined the parameters for the optimal fit of the negative binomial distribution myself, usually resulting in a less than optimal fit, I am going to have R do it for me using Maximum Likelihood from the fitdistr function from the MASS library. The graphs below show the above histograms with the optimized fitted negative binomial distribution overlaid. The smoothing is less than ideal, but the intention can still be seen.
As we can see, the fit is fairly decent. Now we simply sample from each distribution once, determine a game score, and repeat 100,000 times! Luckily, this only takes 10 seconds or so in R.
Team 1 is the Patriots, Team 2 is the Eagles, and the Diff is the difference in Game Scores between the Eagles and the Patriots. By simply looking at the means, we can see that the Eagles scored more than the Patriots by over 1.2 points on average. We can also see that the means and medians are quite close, resulting in a lower variance within these samples than in the original data. We also know by the Central Limit Theorem that these samples should be approximately normal.
Having 100,000 simulated Super Bowls can do more than predict a winner though; we can look at the likelihood of various prop bets! For instance, the probability that the Super Bowl would hit the over, 48 or more points, can be calculated as 46.42% because, in 46,000+ simulations, the combined score totaled 48 points or more. Furthermore, the probability of a Super Bowl with 74 or more combined points, the actual total from this year, was only a paltry 0.81%! This is because this simulation expected the 4th and 5th ranked defenses to actually show up to the big game. The probability of the game ending within 1 score, or 8 points, was 62.8%. This makes sense, as these two teams seem fairly equally matched.
The probability of the game ending within 3 points, or one field goal, is still a relatively high 24.47%. The simulation thinks a close game is likely, which is indeed what we got on Sunday. Finally, the probability that the Eagles would win by 1 score or less, given that they actually won, was a relatively high 59.87%. This means if the Eagles won, they are most likely to win by a touchdown or less.
At last, I calculated the winning percentages. The Eagles, to the surprise perhaps of those outside of Philly, won 54.5% of the simulations, to the Patriots' 45.5%. This would then seem to give the Eagles the edge over New England. Here is the histogram of the difference in scores, with the approximated normal distribution, due to the CLT, overlaid:
By integrating under the normal curve, we can double check those probabilities. Integrating from 0 to positive infinity gives us the probability that the difference in scores is positive, or that the Eagles will win. This comes out to 0.549, or 54.9%, with the Patriots win probability coming out to 45.1%. These are in alignment with the win percentages from the simulation.
Finally, here is a graphic depicting the normal curve approximations from the Central Limit Theorem of the simulated scores achieved by both the Eagles and Patriots. While they do overlap, it is apparent that the Eagles have the slight edge, as reflected in their simulated win probability.
Part 2: Pythagorean Expectation
The expected win percentage of a team can be calculated as a function of their points scored and points allowed. While this formula is well known within the baseball and sabermetric community, its adaptation for football has only been used by Football Outsiders, where it is known as the Pythagorean Projection. Its value as a predictor has been proven.
The 2011 edition of Football Outsiders Almanac states,
"From 1988 through 2004, 11 of 16 Super Bowls were won by the team that led the NFL in Pythagorean wins, while only seven were won by the team with the most actual victories. Super Bowl champions that led the league in Pythagorean wins but not actual wins include the 2004 Patriots, 2000 Ravens, 1999 Rams and 1997 Broncos."
"the Pythagorean projection is also a valuable predictor of year-to-year improvement. Teams that win a minimum of one full game more than their Pythagorean projection tend to regress the following year; teams that win a minimum of one full game less than their Pythagorean projection tend to improve the following year, particularly if they were at or above .500 despite their underachieving. For example, the 2008 New Orleans Saints went 8-8 despite 9.5 Pythagorean wins, hinting at the improvement that came with the next year's championship season."
The formula is surprisingly simple:
This gives the Expected Winning Percentage (xWP) for a team. The expected wins can simply be calculated by multiplying this number by an arbitrary number of games. For example, both the Patriots and the Eagles went 15-3 on their way to the Super Bowl. However, their xWP differ. The Patriots had an xWP of .743, good for an expected record of 13.4 – 4.6 (13-5 if you want to round) out of those 18 games. The Eagles had an xWP of .762, good for an expected record of 13.7 – 4.3 (14 - 4), almost half a game better than the Patriots.
However, these xWPs are merely point estimates. We can get a more robust analysis by performing a Bayesian Hypothesis Test. I will do a post going more in-depth in that methodology in the future, so, for now, I will just discuss the basics. I highly recommend David Robinson's, of Variance Explained, series on Bayesian statistical methods if you want to learn more. Here is a link to his post on Bayesian Hypothesis Testing.
The xWPs calculated here for the Eagles and Patriots are based only on the results of one season. If a team played this season hundreds of thousands of times, we would expect a varying xWP. The one true xWP is unknowable; however, we can try and approximate it knowing what we do about xWP in the NFL from all teams and what we observed for the Patriots and Eagles this season.
Essentially, we want to update our beliefs based on data, to create a new worldview. The fact that we would expect the Patriots to win 13.4 games and lose 4.6, and the Eagles to go 13.7 – 4.3, are our data points out of 18 trials. We still need a prior belief though. This prior can come from the global population of xWP for every team in the NFL in 2017. This is displayed below, with its Beta approximation.
We can see that in 2017, we would expect a winning percentage of anywhere from .2 to .76. This distribution can then be approximated by a Beta distribution. Yes, the Browns hold the lowest xWP, but they still should have won 20% (3) of their games based off of Pythagorean Expectation.
The mean of the distribution is .49, which makes sense. On average, we would expect an arbitrary NFL team to have a .500 record, or 8-8. Teams will do better and some will do worse, but on average, this will cancel out. Knowing this belief, and observing the xWP of the Patriots and Eagles in 2017, we can calculate a posterior distribution, or updated approximation, for what the true xWPs for the Patriots and Eagles really are. This approximation is also by the Beta distribution and can be seen in the next chart:
There is a 95% probability that the true xWP for the Patriots is somewhere between .502 and .792. Likewise, there is a 95% probability that the true xWP for the Eagles is between .512 and .802. The point estimates for each are .65 and .67 respectively, lower than what we saw in the 2017 NFL season. These point estimates are lower because we only observed 18 games, and we went in with a prior belief that an arbitrary NFL team would have an xWP of around .500. The data we saw influenced our belief, however, our prior restrained us from assuming that a 13-3 season is a norm for these two teams.
If this season was repeated thousands of times, it is more likely that the Patriots and Eagles would go 10-6 or 11-5. If we saw from more data that the Patriots or Eagles go 13-3 consistently, then we could recalculate our posterior distribution to take this into account. But for now, this estimate is all we have.
Knowing these two posterior distributions, how do we know which team really has the higher true xWP? By randomly sampling of course! Sampling 100,000 times from each distribution and comparing the results should allow us to determine the probability that one team's true xWP is higher than the other's. This probability can then be used as the probability of that team winning a head-to-head matchup, or even the Super Bowl.
The results? The probability that the Eagles have the higher true xWP came out as .547, while the Patriots' probability was .453. These are near identical to the probabilities calculated above from our simulation sampling! Spooky.
While both of these analyses depended on points scored and allowed as their inputs, and therefore a relationship could be assumed, I honestly did not expect the probabilities to be so close, especially since they were approximated using different distributions and methodologies. However, since they are in agreement, it seems like the probability that the Eagles were truly better than the Patriots in 2017 is .55 to .45. While this does leave some leeway, is no way statistically significant, and is only slightly better than a coin flip, the Eagles did have the edge in the Patriots in the one thing required to win football games, scoring more points than they allowed.
The most probable scenario according to the simulation and analysis is the Eagles winning the Super Bowl by one score (8 points) or less which is actually what happened. Whether they hit the over or not was essentially a coin flip. My estimate would have been a 24-22 victory for the Eagles, exactly the median result from my simulation. The Eagles expectation of winning should have been 55%-45% over the Patriots. The closest I could find was ESPN's 48%-52% at the beginning of the game to New England.
While most people expected a close game or a coin flip determined outcome, most gave the edge to the Patriots. I hope this article shows that in my opinion, the Eagles should have been given a slight advantage. The numbers don't lie, and sometimes the probabilities predict the truth.
What do you think? Were the Eagles snubbed in pre-Super Bowl odds and predictions? Let me know in the comments below! If you're interested in the code or data, all of it can be found on my GitHub.
The SaberSmart Team
P.S. For making it this far, here's one last look at Nick Foles' ridiculous TD grab. What a play!
P.P.S. As a Cowboys fan, I hate both of these teams and am truly the most impartial party to relay this analysis.