Let us go back to a simpler time, when the Astros had yet to win a World Series, the Las Vegas Golden Knights were preparing for the expansion draft, and the hype for The Fate of the Furious was taking over the internet. I am, of course, talking about March 2017.
In my last few posts in this series, I discussed my predictions and conclusions for 2018 MLB end-of-season win totals and playoff odds. However, I decided to step back in this article and examine how this methodology works in greater detail, as well as how it applies to a season where we already know the results, the 2017 season.
Additionally, I finally got access to an API that provided me with day by day standings with only a couple lines of code, instead of manually downloading them like I was doing in my last post. In case you were wondering why I only looked at April 24 and July 16, that’s why. I highly recommend checking out XMLSTATS for all of your MLB and NBA standings and box score needs!
One way to think about a MLB season is to take it as an observed experiment, a series of coin flips if you will. Let us take the Dodgers for instance. The 2017 Dodgers had an inherent, true talent level, that can be defined as their true winning percentage. Every game they played, then, could be seen as a coin flip, with the probability of winning each game equal to their true winning percentage. If their true winning percentage was .500, then their season can be equated to flipping a coin 162 times, and seeing how many heads, wins, came up.
As with any random experiment, their is luck, or randomness, involved. If you perform an experiment where you flip a coin 162 times many times, for example, you would not expect to see 81 heads and 81 tails after every single round, even though the true probability is .500.
The Bayesian way of thinking about the world, including a fair coin, is to judge everything by the data that is gathered. To take the coin example again, let us assume that we know nothing about the true probability of a flip. After one flip, we see heads. A Bayesian would think, “great, for 100% of flips, this coin will land heads. Better flip it again though, just to be sure.” After the second flip, let there be heads again. But then after a third flip, it lands tails. The Bayesian would think that a tail would show up every third flip. Eventually, after thousands of flips, the Bayesian would see that the coin approached the true probability of 50-50.
Great, so an MLB season for a single team can be seen as one experiment of flipping a coin 162 times. But how do we turn that into a probability distribution that we can update after each flip of the coin? Enter the Beta and Binomial distributions. All you need to know here is that the Beta distribution takes two parameters, alpha and beta, and is essentially a probability distribution for possible probabilities between 0 and 1. Since the Beta and Binomial distributions are from the same family, they behave nicely together, in a mathematical sense.
For example, let our prior belief, our belief before any data is collected, or any coins are flipped, be modelled by the distribution Beta(1,1). This is a common prior, as it gives every probability an equal likelihood of occurring. We can update our prior based on the data collected by creating a new distribution modelled by Beta(1 + successes, 1 + failures). This is due to the series of coin flips being modelled by the Binomial distribution, which as said before, behaves super well with the Beta distribution. I try not to delve too much into the mathematical weeds on this site, so I’ll leave the proof of this as an exercise to the reader (or just click this link).
So where does this leave us with the 2017 MLB Dodgers season? Well, let us say that on April 1, 2017, we assume that the Dodgers true winning percentage is modelled by Beta(1,1):
The Dodgers won their first game of the season, on April 3, 2017. We can then update our prior based on this information to model their true winning percentage by Beta(1+1, 1+0), or Beta(2,1).
This still doesn’t tell us very much, due to the very little of information we have acquired. We can kind of say that it is more likely that the Dodgers have a true wp% above .500, but almost every possible probability is still probable.
However, by the All-Star Break, the Dodgers have played 90 games, and won 61 of them. Their true wp% can be modeled by Beta(1+61, 1+29), or Beta(62,30).
Hey, what do you know! That looks like a probability distribution. Unfortunately, since we have only observed 90 trials, the distribution is quite wide. The likely true wp% of the Dodgers is somewhere between 0.550 and 0.800, with a point estimate, in this case the mean or average, at 0.68. The Dodgers would actually finish the season on 10/01/2017 with a 0.642 wp%, with a record of 104-58. The posterior distribution for this would then be Beta(105,59):
You may notice that while the distribution has narrowed, with the mean at around 0.64, the range of actual true wp% for the 2017 Dodgers could still be anywhere from 0.540 to 0.740. This is because we only ran one experiment, with 162 trials. If the Dodgers were to play this season again, we would expect them to finish with a wp% between .540 and .740, but not necessarily exactly 0.642 again, due to inherent randomness and luck.
Here is a sweet gif of the Dodgers 2017 season, with an updated posterior distribution after each day, starting with the uninformed prior, Beta(1,1).
While this all seems well and good, having an uninformed prior has a huge drawback when dealing with small samples, usually less than a few hundred, or more commonly in the era of big data, less than one thousand. As you may have guessed, this drawback is that it takes a large amount of data for any semblance of a narrow distribution to form. While with enough data, a narrow distribution would be fashioned, a single baseball season with 162 trials is not enough to provide a useful estimate when using an uninformed prior.
Creating an informed prior is hard work, and in some cases is hardly more than a guess. Luckily for us, there have been thousands of single team seasons across the 100+ year history of the MLB. This means that we can use historical data to create an informed prior.
I looked at 2016 MLB single team winning percentages. While I could have gone back farther, and used more data, I did not want to overcomplicate things, or dilute the impact of the recent shift (lol) in game strategy seen in the modern game. In 2016, the average winning percentage was 0.500, to the shock and surprise to most of you, I am sure, with a standard deviation of 0.065. Using the handy-dandy 68–95–99.7 rule, we can estimate that 95% of teams had a wp% between 0.370ish and 0.63ish. In fact all 30 teams had a wp% in that range, with the Twins at 0.364 and the Cubs at 0.64 being the only two on the cusp due to rounding.
So how do we turn that 2016 data into a Beta distribution that we can use for our prior? Well, this math gets a little complicated, and I discussed it already in a previous post. Essentially, we find our regression-to-the-mean constant and multiply it by our observed mean wp% to get alpha, and multiply it by 1 - mean wp% to get beta. The full math and proofs behind these couple of oversimplified sentences can be found on Tom’s blog, Phil’s blog, 3-D baseball, and FiveThirtyEight.
After crunching the numbers, the 2016 regression-to-the-mean constant is 93, and since our mean wp% is .500, our alpha = beta = 93*0.5 = 46.5. While the proofs are stupid complicated, the results are pleasantly straightforward. Here is our new and improved, informed prior then, based off of 2016 data, Beta(46.5, 46.5):
We then do exactly as we did before. Since the Dodgers won their first game of the season, their Beta wp% estimate on 4/3/2017 would be modeled by Beta(47.5, 46.5):
As you can probably tell, the distribution hardly changed. Our prior expectations dominate in the face of a lack of gathered data. By the All-Star break, their distribution is now:
This is much narrower when compared to this point in the season with the uninformed prior. This is useful for teams, because it shows with more confidence that the true 2017 wp% for the Dodgers is between 0.520 and 0.680, probably enough to make the playoffs. In which case, they should buy before the trade deadline.
By the end of the season, the resulting distribution for the Dodgers is:
Again, the distribution has narrowed tremendously, creating a smaller confidence interval for the true 2017 wp%. Interestingly, you may have noticed that the mean is centered around 0.590ish, which is lower than their observed wp% of 0.642. This is due to the relatively small sample size collected. If the Dodgers were to play the season again, it would be more likely that they finished with a wp% closer to 0.590. The difference in the observed wp% can be seen as random fluctuation, or if you prefer, the Dodgers were just slightly lucky. This can be seen in their Pythagorean record as well, which suggests that their wp% for 2017 should have been around 0.629 anyway.
This distribution makes more sense than the posterior based off of the uninformed prior in a historical context as well. With the final distribution from the uninformed prior, the right tail stretched all the way to 0.740. This is an essentially impossible feat. In the past twenty years, only two teams have had a wp% even over 0.700, the 2001 Mariners who won 116 games for a 0.716 wp% and the 1998 Yankees who won 114 games for a .704 wp%. Since we used an informed prior, the probability of the true wp% of the 2017 Dodgers being greater than 0.700 is essentially 0 when taken from the above posterior distribution.
Here is the gif from the full season, starting with the informed prior:
However, starting with an informed prior has drawbacks as well when dealing with hyper-small sample sizes, less than 100. Since every team starts with the same prior distribution, differentiating from each other at the beginning of the experiment becomes impossible. Since calculating playoff odds for a given day relies upon sampling from each team’s posterior distribution for that day, when all or most of the distributions are essentially the same, as they likely are in the first half of the season, any conclusions would have to be taken with a heavy dose of salt.
This can be thought of intuitively as well. Since the informed prior is an extension of regression towards the mean, it is foolish to think that in 2017, every team will regress towards the 2016 mean of .500. While this is the case for teams like the Rangers and Cardinals, there is no way the Dodgers were going to regress towards .500 as the 2017 season went on, and same for the Padres. To fix this issue, I developed what I am calling a personalized prior.
The personalized prior is simply that, a Beta distribution to be used as a prior tailored individually for each team. While I used the same standard deviation seen in the 2016 data, since that should not be expected to change, I used an average of each team’s preseason betting over/under records from four oddsmakers, as well as Fangraphs’ preseason record forecasts. I averaged all of these sources together for each team to take advantage of a statistical phenomenon called wisdom of the crowds.
“A large group's aggregated answers to questions involving quantity estimation... has generally been found to be as good as, but often superior to, the answer given by any of the individuals within the group. An explanation for this phenomenon is that there is noise associated with each individual judgment, and taking the average over a large number of responses will go some way toward canceling the effect of this noise.”
I then used that winning percentage instead of the 2016 average of 0.500. For example, before the season started, the four bookies and Fangraphs estimated the Dodgers to win 92.5, 91.5, 94.5, 94.5, and 96.6 games, for an average of 93.9. This translates to an average wp% of 0.580, quite a bit higher than our 2016 mean of 0.500. Using that value then, the regression to the mean constant for the Dodgers changes to 89.4, and consequently the prior Beta becomes Beta(51.8, 37.6):
Naturally, since this personalized prior is more bullish on the Dodgers than the informed general prior, this model expects the Dodgers to do slightly better, comparatively to the model above. It also means at the end of the season, the model thinks the Dodgers weren’t that lucky or unlucky. The mean is around 0.630, near their Pythagorean record, when for the informed prior it was 0.590.
For comparison, here are the posterior estimates for the 3 models, uninformed, informed, personalized, on three dates of the season, April 3, 2017 (one game), July 11, 2017 (All-Star Break), October 1, 2017 (last game of the season). You can see what I mean by how the informed and personalized priors are affected by performance. Since the informed prior originates at .500, it believes that the Dodgers are worse than what is seen, due to small sample size and expected regression back towards .500. This can also be seen in the personalized prior, however since it originates at 0.580, the effect is a lot less noticeable. Open in a new tab for a larger version.
So what’s the big deal with personalized priors anyway? Well, let’s examine the personalized priors I made for each team vs the generalized informed prior. Since we know the results of the 2017 season, we can look and see how our priors fared against what actually happened.
Since both the informed prior and personalized prior use the same variance from the 2016 historical data, the confidence intervals are going to remain the same size. They are:
80%: ±10.75 games
90%: ±13.75 games
95%: ±16.3 games
99%: ±21.25 games
Obviously, falling within the smaller confidence intervals is best. It won’t help the front office much if you say a team is likely to win 81 games plus/minus 21 games. I could tell you that all teams would probably fall within the 60-102 win confidence interval beforehand without any fancy math.
So how did our priors fare?
The difference is stark. The personal prior is more accurate across smaller confidence intervals than the generalized informed prior. How crazy is that? What would you think if I said I could you a plus/minus 10ish game confidence interval for each team’s wins before the season even started that would be accurate for over 75% of the teams?
Interestingly, the one team that missed in the 99% CI was different for each prior. For the informed prior, the team was LA Dodgers, who won 104 games, just outside the right tail of 102.25. However, the LA Dodgers was one of the teams within the 80% CI for my personalized prior, in which the win interval was 83-105, where the point estimate was 94. I wonder why they were my example for much of this post...
For the personalized prior, the one team that missed the 99% CI was, of course, the Giants. Since nobody predicted anywhere close to them only winning 64 games, they fell well outside of the personalized confidence interval.
The teams that missed the 95% CI for each prior show the shortcomings of each strategy. For instance, the teams that the 95% CI for the informed prior missed on mostly won more than 100 games. Since only one team, the Cubs, won more than 100 games in 2016, this was something as seen as unlikely by the informed prior. On the other hand, the teams that missed the 95% CI for the personalized prior all underperformed preseason expectations by a significant amount, including the Giants, Tigers, and Mets. Interestingly, the Tigers actual wp% didn’t end up in either 95% CI until September, after they had traded Verlander.
While both priors still aren’t perfect, the personalized prior is more accurate during the beginning of the season, and allows one to create early season playoff projections, which I’ll discuss the methodology of in a future post. While the bookies are sometimes going to be very wrong in their projections, like with the Giants, they should be more accurate over many years than historical wins, which can vary greatly year to year. 2016 only had one team win 100+ games, while 2017 had three for instance.
So what’s the point of this anyway? Well, by creating a posterior probability distribution for every team for every day of the season allows us to answer some crazy questions, like who’s odds were best at winning 100 games on May 8? Were the Indians more likely than the Astros to win 100 games in the midst of their record-breaking winning streak? Who had the best chance of truly being the worst team in the league on June 6? Did the Giants on August 1 have a higher chance of winning the top draft pick than the Padres did on July 1? And of course, playoff odds for any given day of the year. All of these questions can be answered by Monte Carlo Simulation and comparing the results across teams and distributions.
Luckily for you, I generated 183 images per team per prior, for a total of way too many, and turned them into gifs, for a total of 90 moving image pictures. Below is a gallery of every personalized prior, as well as all of the full season gifs to compare, such as the Tigers, Mets, and Giants, as well as the Cubs, Dodgers, and Houston. I think they're kind of hypnotic.
Uninformed Prior Gifs:
Informed Prior Gifs:
Personalized Prior Gifs:
What do you think? I would love to hear your thoughts on Bayesian thinking and its application to baseball. I know, for instance, that this methodology can be applied to other binomial stats, such as OBP and BA, which I will be looking into as the year progresses. As always though, our code, data, and beta parameters can be found on Github. I ported all of this to Python too, you’re welcome.
The SaberSmart Team
P.S. If you enjoyed this article, and need something off Amazon anyway, why not support this site by clicking through the banner at the bottom of the page? As a member of the Amazon Affiliates program, we may receive a commission on any purchases. All revenue goes towards the continued hosting of this site.