In case you have been distracted by recent NFL drama, NCAA upsets, or the start of the new NBA and NHL seasons, baseball has reached its apex of the 2017 season. The World Series begins tonight! This World Series features the Houston Astros taking on the monolith that is the LA Dodgers. Both teams won over a 100 games in the regular season, a World Series meeting of which is 47 years in the making. Both teams feature at least one Cy Young winning pitcher. Both teams went undefeated at home during the postseason. The Dodgers have not won a title in 28 years. The Astros have never won a single World Series game, and are currently residing in the same ranks as two other teams, the Nationals and the Mariners. Yet, only one team will walk away as World Series champions. This has of the makings of a historic competition! While many people are trying to predict a winner, using their guts or arbitrary measurements, we chose to go a more mathematical route. Back in February, we developed a simulator to predict the Super Bowl featuring the Patriots and the Falcons. We wanted to show that simulating a sports game can be easy, and when simulated a few thousand times, actually provides a relatively reliable winning probability. Other sites predicting or simulating win probabilities do so in a convoluted manner. For instance, FiveThirtyEight uses Elo ratings, a complicated measure of strength based on headtohead results and quality of opponents, to calculate a team’s chances of winning their next game, which is just as confusing as it sounds. Our simulation relies solely on two things, runs scored and runs allowed during the regular season. It has been well proven that sports scores across various sports can be modeled with the negative binomial distribution. For example, the negative binomial has been proven to accurately describe scores in baseball, soccer, rugby, and college football. The Negative Binomial requires two inputs, the size or dispersion parameter (the shape parameter of the gamma mixing distribution) and the probability of success where prob = size/(size+mean). We are using a size parameter of 4. As such, it is possible to model both runs scored and runs allowed for each team. Check out the histograms below to see how closely this method matches the actual runs scored for the Dodgers in 2017:
We then take one random value from each distribution. We determine the score for each team by averaging that teams predicted runs scored with the opposing team’s runs allowed. The random values is selected by Monte Carlo sampling.
For instance, suppose our simulation predicts the Dodgers to score 7 runs and allow 3. Additionally, it predicts the Astros to score 5 and allow 5. The final simulated game score would be Dodgers 6 and Astros 4. Furthermore, ties are thrown out and not counted towards the total iterations. Now, one neat feature of functionality we have added for this series is the ability to take into account home field advantage. While some debate if home field advantage even matters in baseball, we felt that it was a good addition for our simulation. If home field advantage is taken into account the simulated scores are taken from a distribution of runs scored and allowed from either at home or away, depending on which game of the World Series we are simulating. But anyway, enough math. Let's get to some predictions! After 10,000 simulations of the entire World Series, here are the results: So it's basically 5050. Well, it looks like we can just flip a coin and be done with it! Wait, let's dig into some conditional situations to see if there is any more to learn. Given that the World Series ends in a sweep, the probability of either team sweeping is also 5050. However, there was only a sweep in around 12% of the simulations. If the Astros win the whole thing, they are most likely to win in 5 games. The average number of games before a winner is decided is 6. If the series ends in six games, then the Dodgers and Astros are both equally likely to win. However, if the series goes to 7 games, the Dodgers do gain a slight advantage to come out victorious. This shows how important home field advantage can be in the final game. What is most interesting is what happens after one of the two teams win tonight. For whichever team wins the first game, the probability that they win the whole thing jumps to 65%! Interestingly, according to baseballreference, the team that wins Game 1 of the World Series goes on to win the entire thing 70 out of 112 times, or around 62.5%! That is almost exactly the same as the probability determined by our simulation. If you already haven't, be sure to check out baseballreference and their Play Index as they provided all of the data used in this analysis. The verdict? Our money is on whoever wins tonight in 6 games. Probably the Dodgers. But that guess is as good as a coin flip...
Projected World Series Win Probability from Simulations
UPDATE 10/25: Now that we know the Dodgers won Game 1, we delved more indepth into those results in our simulation. For instance, the Dodgers won Game 1 in around half of our 10,000 simulations, or 4,989 to be precise. After winning Game 1, the Dodgers go on to win the entire thing 3,248 times, or a whopping 65.1%! Interestingly, this is almost exactly the same win probability that the BaseballGauge currently gives the Dodgers (65.7%) and only three points lower than FiveThirtyEight's current prediction (68%).
We then broke it down by the number of games it took to win the World Series. The average number of games it takes the Dodgers to win the whole thing, after winning Game 1, is 5.58. As you may guess, the probability of the series ending in 5 or 6 games are almost exactly the same, 29% and 28.5% respectively. The probability of a sweep has increased from 12% to 18.5%. Here is the full distribution:
The Astros are not out of it yet, however it does get much tougher. In the 34.9% of cases where the Astros still go on to win the World Series after a Dodgers Game 1 victory, the average length of the series jumps to 6.3 games. This is in part due to the fact that in an incredible 45% of the time when the Astros still win, the series goes all the way to seven games. Here is that breakdown:
How important is Game 2 tonight? Well, in our simulations, if the Dodgers win both Game 1 and Game 2, then they go on to win the World Series 81.1% of the time, with the Astros coming back in only 18.9%. That's a win probability swing of 16%! However, if the Astros win tonight, then the win probability swings essentially back to 5050%, with a slight edge to Astros due to their 3 remaining home games.
To get more data in terms of specific game/win combinations, we ran the simulation again 100,000 times. None of the above probabilities mentioned changed all that much, but feel free to analyze the output yourself! The csv file can be found here. Let us know in the comments or on Twitter if you discover anything interesting! How often do the Dodgers win in Houston? How often do the Astros go down 03 and then win out? Put your curiosity to work! Update 10/26: The Astros won Game 2 in a historic game that lasted eleven thrilling innings, feature not one, not two, but five extrainning dingers. Catch up on all of the highlights here, trust us, you won't be disappointed. The World Series now goes to Houston with each team tied at 1 game apiece. We again saw what our simulation had to say. Perhaps unsurprisingly, the predicted results are almost the same as those before the series started, since now it is essentially a best out of 5 match. The Astros and Dodgers split the first two games in around half of our 100,000 simulations, or 49,724 to be precise. Going back to Houston all knotted up, the Astros go on to win the entire thing 25,046 times, or 50.4%. This probability is closer to the BaseballGauge's current predictions for the Astros (51.3%) than FiveThirtyEight's current prediction (46%).
We then updated the expected number of games it would took to win the World Series. The average number of games has jumped from 5.9 after Game 1 to 6.12. Obviously, the probability of a sweep is now 0.0%. Our simulation shows that in around 75% of the cases, the World Series will extend to 6 or 7 games.
If the series does end in 5, our simulation gives the Astros the better odds to sweep the next three games, most likely due to them playing at home. Additionally, the Astros have the better odds to win in 6, again due to their higher likelihood to take at least two out of three in Houston.
How important is Game 3? Our simulation shows that whichever team wins Game 3, not only takes a 21 series lead, but also increases their win probability to 69%! That's a win probability swing of 19%. When compared to the odds of winning the last two games which had swings of 15% and 16% in the balance respectively, that is a substantial increase. Update 10/28: The Astros won Game 3, in a forecasted pitchers duel where Yu Darvish was chased from the game after 1.2 innings, while the Astros starter, Lance McCullers, had no problems keeping the Dodgers in line. The Dodgers got just four hits all night. The Astros? 12. Catch up on the rest of the recap here. The World Series now continues in Houston with the Astros holding a 21 series lead. The Astros took a 21 series lead in only around 37.6% of our 100,000 simulations, or 37,640 to be precise. Interestingly, the exact procession of game wins, Dodgers, Astros, Astros, only happened in around 12.6% of the simulation. The probabilities below are for all cases where the Astros took a 21 series lead. Analysis showed that these probabilities were almost identical to the exact procession of game wins that got them there, but are based off of more data. We saw that with this series lead, the Astros go on to win the entire thing 26,036 times, or 69.1%. This probability is again closer to the BaseballGauge's current predictions for the Astros (68.8%) than FiveThirtyEight's current prediction (67%). However, they are starting to converge.
We then updated the expected number of games it would took to win the World Series. The average number of games has dropped slightly from 6.12 to 6.10, due to the higher chance of the series ending in 5 games. Our simulation shows that in around 73.8% of the cases, the World Series will extend to 6 or 7 games. This is a drop of 1.2% from before Game 3.
If the series does end in 5, our simulation obviously gives the Astros 100% chance of winning. Additionally, the Astros have the better odds to win in 6, again due to their higher likelihood to take at least two out of three in Houston. However, if the Dodgers force the series back to LA, then they have a real chance of winning out in Game 6 and 7.
How important is Game 4? Our simulation shows that if the Astros win Game 4, they not only take a commanding 31 series lead, but also increases their win probability to 87%! If the Dodgers wins Game 4, then they can return the win probabilities to essentially 5050, and claim home field advantage for the remaining three games. Game 4 then has a win probability swing of 19%, similar to the importance of Game 3.
Update 10/29:
The Dodgers won Game 4! The story of this game was the pitchers, but then in the top of the ninth inning it was all the Dodgers’ offense as they scored 5 runs. Both starters pitched lights out for the majority of the game. Alex Wood’s nohitter bid was broken up with two outs in the sixth thanks to a George Springer home run. For the Astros, Charlie Morton went 6 1⁄3 innings, allowing only three hits and striking out seven, with no walks. Catch up on the rest of the recap here. The World Series now continues to a third game in Houston with the series tied at 2 games apiece. The series was tied 22 in around 37.5% of our 100,000 simulations, or 37,569 to be precise. Interestingly, the exact procession of game wins, Dodgers, Astros, Astros, Dodgers only happened in around 6.2% of the simulations, exactly half as many as before! Each game really does seem to be just a coin flip. The probabilities below are for all cases where the World Series was knotted at 22. Analysis showed that these probabilities were almost identical to the exact procession of game wins that got them there, but are based off of more data. We saw that with this series tied in this way, the Dodgers go on to win the entire thing 18,710 times, or 49.8%. This probability is somewhat closer to the BaseballGauge's current predictions for the Dodgers (51.7%) than FiveThirtyEight's current prediction (56%). This could be due to the fact that in a plurality of our simulations, the Astros win tomorrow, and then return to LA with a 32 series lead. Hence, we give them a slightly more favorable outlook than these other sites.
We then updated the expected number of games it would took to win the World Series. The average number of games has jumped immensely from 6.10 to 6.5, due to the guaranteed chance that the series ends in 6 or 7 games. Our simulation shows that in 100% of the cases, the World Series will extend to 6 or 7 games. The proof of this is indeed trivial. I crack myself up sometimes. Please someone get this joke.
The only interesting note is that the simulated probabilities are essentially the same for the series ending in 6 or 7 games at this point. Since each game is essentially a coin flip now, this makes sense.
How important is Game 5? Our simulation shows that whichever team wins Game 5, they not only take an impressive 32 series lead, but also increases their win probability to 75%! Game 4 then has a win probability swing of ~25%, much greater than the relative importance of any other game.
However, ESPN Stats and Info thinks that Game 5 is not quite as important:
What do you think? Does Game 5 matter more or less than history says? Let us know in the comments below!
Update 10/31:
There are no words to describe Game 5. Clayton Kershaw and Dallas Keuchel, both Cy Young Award winners, entered Sunday’s showdown on normal rest. Naturally, you could reasonably expect a pitcher’s duel something that looked like Kershaw’s seven shutout innings and Keuchel’s quality start in Game 1. You would be wrong. Instead, baseballs were literally exploding:
Instead there were 7 home runs, 25 runs, 28 hits, and 400+ pitches in the second longest game in World Series History, and an Astros win after falling behind three times. In a word, insane. We recommend Grant Brisbee's, of SBNation, synopsis here and Ashley Varela's, of Baseball Prospectus, recap here.
But on to the predictions.
The World Series now returns to LA with the Houston Astros claiming a tenuous 32 series lead. The Astros claimed a 32 series lead going into Game 6 in around 31.5% of our 100,000 simulations, or 31,579 to be precise. Interestingly, the exact procession of game wins, Dodgers, Astros, Astros, Dodgers, Astros only happened in around 3.1% of the simulations, exactly half as many as before! This again reinforces the fact that each game really does seem to be just a coin flip. The probabilities below are for all cases where the the Astros claimed a 32 series lead. Analysis showed that these probabilities were almost identical to the exact procession of game wins that got them there, but are based off of more data. We saw that with this series tied in this way, the Dodgers go on to win the entire thing 7,913 times, or 25.1%. This probability is somewhat closer to the BaseballGauge's current predictions for the Dodgers (28.6%) than FiveThirtyEight's current prediction (31%). Conversely, the Astros win 23,666 times, or 74.9%, which is comparable to the 71.4% and 69% respectively from the above two sites.
We then updated the expected number of games it would took to win the World Series. The average number of games remained at 6.5, due to the equal chance that the series ends in 6 or 7 games. It should be noted that the simulated probabilities are essentially the same for the series ending in 6 or 7 games at this point. Since each game is essentially a coin flip now, this makes sense. There is a slight lean towards 7 games, due to the Dodgers winning a slightly higher percentage of Game 6, due to their slight home field advantage.
How important is Game 6? Our simulation shows that if the Astros win, then the probability of the Dodgers winning plummets to 0%! Obviously, this is a mustwin game for the Dodgers. However, if they force a Game 7, then each team has a win probability 50%, another net swing of 25%. In fact, we again give the Dodgers a very small edge in winning a Game 7, due to their home field advantage.
Are you watching the game tonight? Let us know! And make sure to continue to check in here after each game for updated probabilities and analysis!
UPDATE 11/01: The greatest day of sports is here! The Dodgers won Game 6, forcing the World Series to Game 7 because of course they did. In one of the craziest World Series in history, there had to be a Game 7. However, Game 6 was a simple, 31 game. It was a baseball game that felt familiar. Was it the calm before the storm? We will know in a couple of hours! Catch up on your Game 6 recap here. While our initial prediction of Dodgers in six has gone out of the window, it is interesting that the Series has gone to 7 games, as 6 or 7 games have been the most likely lengths for the past week or so. Here are our final predictions! The World Series went into Game 7 in around 31.2% of our 100,000 simulations, or 31,205 to be precise. Interestingly, the exact procession of Dodgers, Astros, Astros, Dodgers, Astros, Dodgers occurred in only 1.58% of the simulations, 1583 times actually. Getting an exact series of heads or tails six times from flipping a coin has a probability of 1.56%. This shows that both teams have been evenly matched this entire series, with each game being decided essentially by a coin flip. Since there is no point in predicting series length at this point, we delved into another, perhaps more fascinating, probability. Do not listen to David Ortiz tonight. Ortiz has picked the wrong team for all 6 games so far in the World Series. The probability that David Ortiz would get all six of his picks wrong? Only 1.56%! The probability that he also misses tonight? Well, 50%, since they are independent events. But the probability of getting all 7 picks wrong is a minuscule 0.78%!
Further, a mysterious World Series gambler, who bet on the winning team in each of the first six games of the series, won $14 million after the Dodgers won last night! Similarly, that has a probability of occurring of 1.56%. Unfortunately, it has been reported that they will not be putting it all on the line again tonight, but instead will walk away with their winning. Just another great statistical story surrounding odds and the World Series.
Unlike our previous predictions, we noticed a slight difference in the win probabilities from any time the series was tied 33 versus the exact procession of games that lead to tonight. The Dodgers coming back last night actually lifted their probability of winning tonight by half a percent!
We saw that with this series tied in this way then, the Dodgers go on to win the entire thing 806 times, or 50.9%. This probability is somewhat closer to the BaseballGauge's current predictions for the Dodgers (53.4%) than FiveThirtyEight's current prediction (60%). Conversely, the Astros win 777 times, or 49.1%, which is comparable to the 46.4% and 40% respectively from the above two sites.
Naturally, the expected length of the World Series is now seven games. Since whichever team wins tonight wins the whole thing, tonight has an average swing in win probability of 50%! That mathematically shows that Game 7 has the most leverage of any game.
In the philosophical words of Joe Buck, "This is a mustwin game for both teams!"
Thanks for stopping by, and enjoy the game!
FINAL UPDATE:
THE HOUSTON ASTROS HAVE WON THE WORLD SERIES
This has been a phenomenal series and we are glad to have shared it with you all! Make sure to check out our code on Github and mess around with the simulation yourself! The csv file of all 100,000 World Series results can be found here. Let us know in the comments or on Twitter if you discover anything interesting! The SaberSmart Team
