Last week, I released my updated end-of-season record forecasts for each of the thirty MLB teams. In case you missed it, I highly recommend checking it out here, as well as the first part in this series released in April, where I used the small early season samples to predict a more accurate “on-pace” record for end-of-season wins.
On Twitter, someone pointed out that my predictions weren’t very revolutionary, as they matched the same conclusions that can be seen on Fangraph’s Playoff Odds website. If you take the team with highest chance of winning each division, plus the next two teams with the highest wildcard probabilities, Fangraphs had the exact same playoff picture as I did, down to Seattle making their first postseason since 2001 by snagging the second wildcard, the Nats winning the NL East despite being 5.5 games back of first at the time of data collection, and the Brewers/Diamondbacks going 1-2 with the NL wildcards.
Since my last article focused on end-of-season records by creating a posterior distribution of most likely winning percentages (wp%) and taking the mean as a point estimate, in conjunction with a 90% confidence interval of course, I decided that translating those findings into playoff odds would add more context to my conclusions, as well as better highlight which teams were out of the playoff picture and should be sellers at the trade deadline. Since I had already developed a distribution for each team’s end-of-season wp%, I could use my favorite tool in our statistical arsenal, Monte Carlo Simulation.
In the past, I have used Monte Carlo Simulation to estimate pi, predict the 2017 World Series, and give the Eagles the higher probability of winning the 2018 Super Bowl. If you need a refresher, I highly recommend checking out one of those other posts.
In this case, I used Monte Carlo Simulation to determine a team’s odds of winning their division, as well as snagging one of the two wildcard berths. Similar to Fangraphs, I added these two probabilities to determine a team’s overall playoff odds.
To determine a division winner, I sampled from each posterior distribution of each team in the division to get a sample wp%, multiplied by 162 to determine games won, then found the team with the most wins. I simulated 100,000 end-of-season standings for each division, and then turned those division winners into a probability.
For the wildcards, instead of sampling at a division level, I sampled at a league-wide level. After getting a sampled league-wide end-of-season standing, I threw out the three division winners and found the next top two teams with the most wins. Again, I did this 100,000 for each league to determine a team’s wildcard odds.
Interestingly, this simulation method is similar to the one that Fangraphs uses to create their playoff odds. From their “Playoff Odds Explanation”:
“To generate the playoff odds, we take the current standings, the remaining schedule, the team's projected performance, and we simulate the remaining season 10,000 times. All the outcomes are averaged to find the probability of winning the division or wild card…”
The major difference in their method is that Fangraphs simulates each remaining game in a team’s schedule, which while could be more accurate, is way more computationally heavy and time constrained. Perhaps this is why they could afford to only simulate each season 10,000 times. By simply sampling from a team’s posterior distribution, playoff odds can be determined quicker, with less randomness due to more simulations, less code, and perhaps can be just as accurate.
SaberSmart Playoff Odds - July 16
Here are my calculated playoff odds for each division, sampled from the posterior distribution for each team created with data up to the All-Star Break, July 16. For a selection of the distributions used, check out the gallery in my last post, or graphs recently tweeted out.
That’s a lot of numbers, but they provide context to what I argued in my last post. For example, I said that the AL was essentially decided, at least with the teams going to the playoffs. The top 5 teams all have a probability of over 60% of making the playoffs, with the next highest, the Athletics, at a paltry, in comparison, 20.6% odds.
In comparison, the NL only has two teams with playoff odds of over 60%, the Dodgers, and the Cubs. The NL East is a mess, as I said, since three teams all have over a 20% chance of winning the division. The NL wildcards could go anywhere among the Brewers, Diamondbacks, Cardinals, Rockies, and even the Giants. Interestingly, it looks like only the winner of the NL East will make the playoffs, with the runner-ups having only around a 10% chance of snagging a wildcard, due to the already close race.
SaberSmart Playoff Odds - April 24 vs July 16
The next step in my analysis involved calculating these playoff odds, but sampling instead from the posterior distributions I created from data up to April 24th. I then compared the results with the July 16 playoff odds to see which teams increased, or decreased *cough* METS *cough* their playoff odds the most in the last 3 months. In the graphs below, teams on the dashed line have about the same playoff odds from April and July, while those below the line decreased their playoff odds and vice versa.
The cluster of unreadable names near the bottom left corner in the Divisional graph contains COL, SEA, SFG, while the two teams in the top right of the Playoffs graphs are HOU and BOS, naturally.
Unsurprisingly, the collapses from the Mets, Angels, and Blue Jays have hurt their playoff odds the most, mostly the divisional odds for the Mets and the WildCard odds for the Angels and Blue Jays. On the other hand, the Mariners have almost tripled their WildCard odds since April 24!
Likewise, the Cubs have recovered from a relatively slow start to increase their divisional odds the most since April. Six teams with the highest playoff probabilities in April are still among those in the top of the playoff picture now, LAD, CHC, NYY, CLE, BOS, and HOU.
Arizona has probably had the most disappointing change. Since April, their odds of winning the division has plummeted as the Dodger’s odds have risen, yet their wildcard odds has subsequently risen. As any team should know, winning the division is much better than going into the RumbleDome that is the WildCard game, where it looks like the Diamondbacks are headed.
Fangraphs vs SaberSmart Playoff Odds - April 24
Finally, I compared my results to Fangraphs’ playoff odds. I started with my playoff odds after the first ~21ish games of the season, on April 24. The link to Fangraph’s playoff odds for that day can be found here. The raw numbers for my playoff odds from April can be found on my Github.
These graphs below plot Fangraphs’ playoff odds on the x-axis vs mine on the y-axis. Teams below the line are favored more by Fangraphs, while those above are favored more by my model. Interestingly, the mean of the difference in playoff odds is essentially zero, however the standard deviation is quite high, as for a couple of teams, the two models disagree by quite a few percentage points. The two unreadable clusters in the top right of the Playoff Odds graph are CLE, HOU, BOS and NYY, CHC, and LAD.
There is a noticeable pattern in these graphs. Fangraphs seems to be overconfident in their favorites, while giving the underdogs lower playoff odds. My model understands that in April, there is still a lot of uncertainty and, in my opinion, appropriately distributes the lower playoff odds while not being overconfident on the favorites.
This can be seen in the playoff graphs, where although my model still gives NYY, LAD, CHC, BOS, HOU, and CLE the highest playoff odds, I don’t discount teams with strong starts, like MIL, ARI, LAA, and SEA. I found validation in that both my model and Fangraphs had given the Mets the same >50% odds of making the playoffs on April 24.
Fangraphs vs SaberSmart Playoff Odds - July 16
The link to Fangraphs’ playoff odds for July 16 can be found here.
Naturally, as the season progresses, the Fangraphs and my playoff odds are converging. While the mean of the differences are again zero, the standard deviation has decreased by over half when compared to the standard deviation of the differences from April 24.
The cluster in the bottom left of the Divison Odds graph contains SFG, MIN, and SEA. The cluster in the WildCard graph contains the NL East, ATL, PHI, and WSN. The cluster in the top right of the Playoffs graph contains, of course, NYY, CLE, HOU, CHC, and BOS.
The only surprise here is how close most of the odds are! In the Divisional Odds graph, Fangraphs and I have essentially the same probabilities for ATL, PHI, NYY, WSN, and BOS. Fangraphs is more confident in the LAD and CHC, while I give MIL and ARI a slightly better fighting chance to claim the division than Fangraphs.
For the WildCard odds, again most of the probabilities are in accordance. Fangraphs is more bullish on OAK than I am, while I give COL slightly better odds than Fangraphs. This is essentially nitpicking though, as we agree fairly closely on the leaders. The Playoff Odds graph also shows how close these probabilities are with each other.
Trade Deadline Opinion:
If you have >15% playoff odds, then BUY. Else SELL. In the AL, obviously BOS, NYY, HOU, and CLE are contenders and should try and buy. But also OAK and SEA should go all-in to get that second WildCard berth. The teams on the fence, LAA, MIN, TBR, should SELL. Someone tell the Rays to stop waffling and trade Chris Archer. Same with the Twins. The likelihood they make the playoffs is a meager 9%. Blow it up and start again next year. Obviously the teams in the basement are sellers.
In the NL, I think that all the teams in contention for a division or wildcard should try and buy. Again, CHC and LAD are the obvious leaders, and LAD just traded for Machado. But, WSN, PHI, ATL, ARI, MIL, COL, STL, and SFG all have some hope for a playoff berth and should just go all in now. I feel like most of those teams will stand pat though and will regret it at the end of the season.
Teams that should blow it up are the Pirates and the Mets. I know the Pirates are on a tear right now, but I think they may trade for someone and then flip them immediately, perhaps in conjunction with another player or two, to make a nice prospect profit. Those teams, combined with the NL basement, should SELL. It looks like a good seller’s market, especially for pitchers, with the current competitiveness of the NL races and the one-up-manship between BOS, NYY, and HOU in the AL.
Calculating playoff odds does not have to be complicated, time-intensive, or heavily developed for them to be useful in providing context to a team’s season so far or end-of-season win projections. While my playoff odds vs Fangraphs’ playoff odds had a significant amount of variance after the first three weeks of the season, they had essentially converged by July.
However, I believe that my playoff odds do a better job than Fangraphs with the small sample sizes available in April. For example, Fangraphs seemed to not believe in MIL’s or ARI’s hot starts, and gave them playoff odds of 0.25 and 0.5 respectively. On the other hand, my odds came out to 0.5 and 0.75. By July, both of our odds for both teams have stabilized at around 50%, and it looks like both have a legit shot to make the postseason.
Further, Fangraphs does not seem to have many teams hovering around the “underdog” area of probability, which I am defining as 20-30% odds. For example, my model gave PHI, SEA, COL, and SFG all underdog chances of making the playoffs back in April while Fangraphs had them about at half that. While some of those teams may indeed miss the playoffs, I believe my model from April hinted at their current wildcard runs.
Fangraphs also consistently gives the top teams higher playoff odds than my model. I believe this is because Fangraphs is possibly not taking into account the tails of the distribution, where by chance a good team collapses, or a truly poor team goes on an unprecedented run. These winning/losing streaks are integral to the game of baseball, and by having a distribution whose confidence interval depends on the amount of data collected, I can build that randomness into my own model.
Additionally, being able to simulate division/league standings 100,000 times, 10x more than Fangraphs, provides a more robust simulation to account for these edge cases.
In any case, the playoff odds has essentially converged, at only around half the season. As the season further progresses, I expect Fangraphs’ and my playoff odds to continue to converge. If this indeed occurs, I think that perhaps simulating every game is computationally and temporally inefficient, when one can simply sample from an educated posterior of a team’s wp% to randomly generate possible divisional and league standings.
What do you think? Is there a better way to calculate playoff odds from a small sample size, such as what Fangraphs and I tried to do in April? Which underdog team will make the playoffs? Let me know in the comments below! As always, our code, data, and generated playoff odds from April and July can be found on Github.
The SaberSmart Team
P.S. If you enjoyed this article, and need something off Amazon anyway, why not support this site by clicking through the banner at the bottom of the page? As a member of the Amazon Affiliates program, we may receive a commission on any purchases. All revenue goes towards the continued hosting of this site.