Predicting the AL Wild Card Rumbledome... Because the NL is a Beautiful, Unpredictable Maelstrom of Chaos
The Rumbledome. All or Nothing. The Play-In Game. The MLB Wild Card has had many names since its inception in 2012. While some fans may not be on board with it, or the MLB playoffs’ current structure in general, the one thing the Rumbledome has always provided has been drama.
The second wild card spot provides teams with hope, that even though they weren’t the best, by definition, there is still a chance of obtaining the perennial glory of a World Series Championship. You may not believe it, but the 2014 World Series featured two Wild Card teams, with the Giants eventually giving the Royals the best possible postseason winning percentage for a World Series loser before winning it all themselves.
Baseball is a game of trends, regression towards the mean, and playing the long-con; 162 games is quite a lot after all. However, for any individual match-up, most of that goes entirely out of the window. There is so much luck involved, such as that strike call on the outside edge, the unpredictable hop of a baseball on the dirt, the slip of a player on the wet grass, that it is not out of the ordinary for a large upset. The best team does not always win, even when they deserve to.
While all four of the major sports feature game-level randomness, this feature seems to be ingrained into the very essence of baseball, creating a sport that, at least to us, is always compelling to watch.
While we have provided predictions for last year’s World Series, as well as the last couple of Super Bowls, this year, we have decided to provide predictions for every round of the MLB playoffs, including the single game Wild Card.
If you follow our Twitter, you may have noticed that we got quite excited about a recently published paper, How Often Does the Best Team Win? A Unified Approach to Understanding Randomness in North American Sport. This paper was written by some very smart people, and we highly recommend following them on Twitter: Dr. Michael Lopez and Dr. Gregory Matthews.
While the entire paper is quite compelling, their main findings are what stuck out to us:
The median probability of the best team winning a neutral site game is highest in the NBA (67%), followed in order by the NFL (64%), NHL (57%), and MLB (56%).
These numbers are the missing link to applying our Bayesian expected wins across an entire season to a single, winner-take-all Rumbledome.
By adding the probability of being the better team and winning to the probability of being the worst team yet still winning, we can calculate a total probability of a team winning a neutral site game (we’ll get to that part in a bit).
For a trivial example, if we are 100% sure that the Red Sox are better than the Orioles this season, then the probability that the Red Sox would still win a single game is 1*0.56 + 0*0.44 = 0.56. Likewise, in February, we determined that the Eagles had a 0.55 probability of being a better team than the Patriots. 0.55*0.64 + 0.45*0.36 (Using the 64% NFL number) = 0.352 + 0.162 = 51.4% chance of them winning the Super Bowl.
Now you might be questioning how accurate this 56% number is for the MLB, we know we were. We decided to do a quick double check.
FiveThirtyEight provides single game MLB predictions, based off of Elo Ratings. You can read more about how their model works here. Using Elo Ratings, by definition, we can get a sense of which team is “better” - unsurprisingly, it’s the team with the higher Elo Rating before a game.
We took 15 years of regular season games, from 2003-2017, and using Elo Ratings and final scores, determined the percentage of games the “better” teams won. Wouldn’t you know it, the answer was 55.3%, almost exactly the same as what was found in the paper.
If you are wondering, the better team’s home and road win percentage were also both 0.55 supporting the neutral site figure when taken in aggregate over many games and seasons. FiveThirtyEight actually wrote about this phenomenon last year.
Since we now trust this 0.56 number, we just need to determine the probability that one baseball team is better than another. Since our posterior estimates of a team’s true winning percentage are distributions, we can perform a Bayesian Hypothesis Test between two teams.
Essentially, Bayesian Hypothesis Testing consists of Monte Carlo sampling to determine how much overlap there is between two distributions. By sampling from each distribution 100,000 times, we can determine how many samples had a higher win percentage for Team A than Team B.
For example, here are the posterior distributions for the two AL Wild Card teams, the New York Yankees and the Oakland Athletics.
After sampling 100,000 times from each distribution, we get a probability of 0.864 that the Yankees are truly better than the A’s. Their win probability in a head-to-head matchup then is:
(probability of being the better team) * (probability of the better team winning) + (probability of being the worst team) * (probability of the worst team winning)
(0.864)*(0.56) + (0.136)*(0.44) = 0.5437. So while we are 86.4% sure that the Yankees are truly a better team than the A’s this season, in a neutral field, we could expect the Yankees to only have about a 54.4% chance of beating the A’s.
However, this is where home field advantage, and conversely, road trip disadvantage comes into play. While home field advantage may be negligible over an aggregate of many games and seasons, it is definitely a factor in a one game playoff.
Historically, MLB home field advantage, all other things being equal, has been worth around .045 points, or a percent increase of 9%. For two equally matched teams, each with a win probability of 0.500, the team playing at home gets boosted to a 0.545 chance, while the road team decreases to a 0.455 percent chance of winning. 0.045/0.500 = 0.09, or a 9% increase.
We made sure this was accurate for the same 15 historical years mentioned above using FiveThirtyEight's MLB API.
Using this 9% increase for the home team, then, we can adjust the Yankee’s chance of beating the A’s. 0.5437 * 1.09 = 0.5926, or about a 59.3% chance of winning since they are playing the A’s at home.
However, using data from this season, we can determine a more personal home field advantage, and road trip disadvantage for the Yankees and A’s.
In 2018, the Yankees had a 0.654 win percentage at home, as opposed to a 0.580 win percentage on the road. This means that their home field advantage is actually only worth an increase of 1 + (home wp% - total wp%)/(total wp%) = 6.0% (1.06). Their win percent chance then becomes 0.5437 * 1.06 = 0.576 or about a 57.6% chance of winning.
The A’s, on the other hand, only have a small road disadvantage. They had a 0.617 win percentage at home, but a 0.580 win percentage on the road. This means playing on the road is worth a decrease of only 1 + (away wp% - total wp%)/(total wp%) = -3.1% (0.969). Since they had an unadjusted chance of winning of 0.4563, we can adjust that to 0.4563*0.969 = 0.442, or a 44.2% chance of winning. This equates to a 55.8% chance of the Yankees winning.
So how did we deal with these two numbers, the adjusted chance of the Yankees winning at home, and the adjusted chance that the Athletics would lose on the road? We averaged them!
(0.576 + 0.558) / 2 = 0.567, or a 56.7% chance that the Yankees will win the Wild Card game at home.
Next, we decided to compare this to the results from one of our older, yet trusty, models.
We ran our simulator that we created for last year’s World Series solely based off of runs scored and allowed either at home or away, depending on where the teams were playing. The Athletics scored an average of 4.5 runs at home, and 5.5 runs when away. They allowed 3.8 and 4.5 runs on average. The Yankees scored an average of 5.6 runs at home and 4.9 when away, while allowing 4.3 and 3.9 runs respectively.
These distributions can be modeled by the negative binomial distribution. It has been well proven that sports scores across various sports can be modeled with the negative binomial distribution. For example, the negative binomial has been proven to accurately describe scores in baseball, soccer, rugby, and college football. The Negative Binomial requires two inputs, the size or dispersion parameter (the shape parameter of the gamma mixing distribution) and the probability of success where prob = size/(size+mean). We are using a size parameter of 4. As such, it is possible to model both runs scored and runs allowed for each team. Check out the histograms below to see how closely this method matches the actual runs scored for the Yankees and A's in 2018:
After simulating 100,000 times from each distribution to determine 100,000 potential game scores, calculated by averaging the Yankees Runs Scored at home and the Athletics Runs Allowed on the road to get the Yankees score, and the Yankees Runs Allowed at home and the Athletics Runs Scored on the road to get an Athletics’ score, we determined that the A’s won in 48.3% of the simulations while the Yankees won in 51.7%. When the Yankees won, they won by an average of 3 runs, 6.6 - 3.5, and when Oakland won, they also won 6.6 - 3.5 on average. The average 10 runs combined run total far exceeds the Vegas over/under of 8.5 for those of you wondering.
Both of our models predictions, the Yankees winning with a probability 56.7% and 51.7% are lower than FiveThirtyEight’s Elo prediction of the Yankees winning with 59.5%. For what it’s worth, Vegas has the odds set at NYY @ -180, which is a 64.3% chance of winning.
While the Yankees are the favorites, perhaps they should not be as favored by as much as they are by FiveThirtyEight and Vegas. We expect a high scoring, and potentially close game as the Athletics and Yankees try to out mash each other in New York tomorrow night.
If you’re wondering if we did this for the original circuit, we did, but the National League Wild Card was an unpredictable, beautiful mess. Two Game 163s and an incredible Wild Card game made for an unforgettable start to the 2018 postseason. Our predictions will unfortunately live on forever on Twitter though:
In our defense, the Cubs did have a 82.3% chance of being a better team than the Rockies, all else being equal, based off of preseason expectations and the results of the regular season. Factor in home field advantage, and you would be foolish not to have given them the slight edge:
Ah well, hopefully our AL predictions will be a bit better. Look for our full predictions for every game of the MLB postseason on Twitter or on this blog!
Who do you have winning the AL Wild Card? What do you make of our predictions? Let us know in the comments below! We would love to hear your thoughts on our methodology or even the NL Wild Card game last night and anything else you might have to say! As always, our code can be found on GitHub.
The SaberSmart Team
P.S. If you enjoyed this article, and need something off Amazon anyway, why not support this site by clicking through the banner at the bottom of the page? As a member of the Amazon Affiliates program, we may receive a commission on any purchases. All revenue goes towards the continued hosting of this site.