Fall brings joy to a lot of people, what with the break from the heat of summer, the annual arrival of Pumpkin Spiced Lattes and shoes, apple picking, and of course, the Fall Classic. This year, the World Series features the regular season juggernaut Boston Red Sox, and the perennial playoff participant, but never the winner, Los Angeles Dodgers.
The Red Sox have only lost two games this postseason, cruising past the Yankees in the ALDS and crushing the Astros and their co-co-co aces in the ALCS. Meanwhile, the Dodgers were on auto-pilot in the NLDS where they destroyed the Braves, before winning an epic battle over Milwaukee in seven games in the NLCS. The Dodgers were one win from winning the World Series last year, while the Red Sox last won the championship half a decade ago in 2013.
Interestingly, the Dodgers and Red Sox have only met once before in the World Series, 102 years ago in 1916. In fact, this was so long ago, the Dodgers were known as the Brooklyn Robins, and Babe Ruth was the starting pitcher for the Red Sox!
Unlike what we did for last year’s World Series between the Astros and Dodgers, we are bringing win probabilities not just for the overall series winner, but also for each individual game. We have already provided win probabilities for each postseason game so far, including both wild cards and every game in the LDS and LCS matchups. In case you missed them, check out our Wild Card probabilities here, and our LDS and LCS probabilities here.
For the 28 games that have already been played, we provided probabilities and picks from 4 sources. We have developed two models, including the runs scored and allowed model we developed last year, as well as our new SaberSmart Simulator that we developed for this postseason.
The other two sources are FiveThirtyEight’s Elo win probabilities, adjusted for both home field advantage and starting pitcher, and Vegas’ closing money lines, which can easily be converted into win probabilities using this chart here.
Runs Scored/Allowed (RS/RA) Model:
To quickly recap our runs scored/allowed model, we look at four samples of numbers, the runs scored and allowed by the home team at home, and the runs scored and allowed by the away team on the road from every game in the 2018 season. For example, at home in 2018, the Red Sox have averaged 5.69 runs scored per game, and allowed 4.05 runs per game. The Dodgers on the road have averaged 5.34 runs scored per game and allowed 3.89 runs per game.
These 4 arrays can be modeled by the negative binomial distribution. It has been well proven that sports scores across various sports can be modeled with the negative binomial distribution. For example, the negative binomial has been proven to accurately describe scores in baseball, soccer, rugby, and college football.
The Negative Binomial requires two inputs, the size or dispersion parameter (the shape parameter of the gamma mixing distribution) and the probability of success where prob = size/(size+mean). We are using a size/dispersion parameter of 4. As such, it is possible to model both runs scored and runs allowed for each team.
Check out the histograms below to see how closely this method matches the actual runs scored for the Dodgers and Red Sox so far in 2018:
We sample from each of those 4 generated Negative Binomial distributions 100,00 times and for each of those samples, determine a game score by averaging the Red Sox runs scored at home and the Dodgers runs allowed on the road as an estimate for the Red Sox runs scored, and averaging the Red Sox runs allowed at home and the Dodgers runs scored on the road as an estimate for the Dodgers runs scored.
The number of simulations that have the Red Sox scoring more than the Dodgers is their win probability for the game. Obviously these are reversed to determine win probabilities for games played in Los Angeles.
SaberSmart Simulator (SSS):
However, the Runs Scored/Runs Allowed model is limited and does not consistently beat out FiveThirtyEight Elo or Vegas. This year, we have actually developed a new, and proven better, model for the 2018 postseason using our Bayesian expected wins, taking into account the inherent randomness of the game and adjusting for home and road advantage or disadvantage.
According to this research study, How Often Does the Best Team Win? A Unified Approach to Understanding Randomness in North American Sport:
The median probability of the best team winning a neutral site game is highest in the NBA (67%), followed in order by the NFL (64%), NHL (57%), and MLB (56%).
We can determine the probability of one team being the “best” by sampling from both posterior distributions that model each team’s winning percentage for the 2018 season. We have used these posterior distributions to accurately predict when the Red Sox would win their franchise record 106th win almost a month out as well as determine accurate playoff odds from early in the season.
Through the end of the LCS, the posterior distributions for both the Red Sox and the Dodgers can be modeled by Beta(155.81, 87.68) and Beta(140.80, 103.74) respectively:
From this graph we can see that it is likely that the Red Sox are the truly better team and that their season was not a lucky fluke. After sampling, we found that the Red Sox have a probability of 92.7% of being better than the Dodgers.
Using this number, and the 56% constant from above, we can calculate an unadjusted win probability for a neutral site matchup between the Red Sox and Dodgers:
0.927*0.56 + 0.073*0.44 = 55.1% in favor of the Red Sox or 44.9% in favor of the Dodgers.
Obviously though, none of the games in the World Series will be played at a neutral site and so we have to adjust for home field advantage, and road disadvantage. This adjustment can be calculated by looking at a team’s home field win percentage and road win percentage and determining the percent increase (or decrease) in wp% when playing at home (or on the road) over average.
For example, the Red Sox had a 0.704 home win percentage in 2018. The had an overall win percentage of 0.667. Their home field advantage multiplier is therefore:
1 + (0.704 - 0.667)/0.667 = 1.055
The Dodgers actually had a better road record than home record in 2018. This means they get a road field advantage multiplier of:
1 + (0.580 - 0.564)/0.564 = 1.028
Using these numbers, we can adjust the neutral site probability above:
Probability that the Red Sox win at home: 0.551 * 1.055 = 0.581
Probability that the Dodgers lose on the road = 1 - (0.449 * 1.028) = 0.538
Final win probability for the Red Sox at home: (0.581 + 0.538)/2 = 56%
A change we made for the World Series is to use the Pythagorean expectation for home and road win percentage. This win percentage is derived by the following formula:
Here the runs scored and runs allowed are taken from the appropriate home/road splits. The advantage multipliers changed to 1.028 and 1.033 respectively. Also unlike our Wild Card, LDS, and LCS probabilities, we included data from the postseason as opposed to solely regular season results.
So how do we expect our probabilities to do? Well if the last 28 postseason games are any indicator, pretty darn well.
By using a hard cutoff of >0.5 to predict a winner of each postseason game, our SaberSmart Simulator (SSS) has predicted 18/28 ( 64.3% ), FiveThirtyEight Elo has predicted 13/27 (48%), one wash, RS/RA has predicted 13/26 (50%), 2 washes, and Vegas has called 14/28 (50%).
A metric to measure the accuracy of win probabilities is Brier Scores, which is the aggregate squared error. A baseline prediction of 50/50 for a game would then have an error of (1-.5)^2 = 0.25. The baseline Brier score for 28 games is 28*.25 = 7.00. If we are 100% confident in a winner, and they do win, then we have a Brier score of (1-1)^2 = 0 for that game. Since Brier Scores are aggregated, or summed after each game, the lower your Brier Score, the more accurate your predictions.
Over the last 28 postseason games, including the wild cards, here are the following Brier Scores:
538 Elo: 6.845
Vegas has actually been worse than the baseline! Interestingly, both Vegas and FiveThirtyEight went 0/5 in the ALCS, while the RS/RA model went 1/5 and our SaberSmart Simulator went 4/5.
We will most likely delve more into Brier scores and other error metrics in our retrospective article coming after the World Series. Our biggest miss so far has been the NL Wild Card, while the biggest Vegas upset was the ALCS Game 5 (which we called right), and the biggest FiveThirtyEight upset was Game 3 of the second NLDS when Atlanta beat the Dodgers.
With the explanation behind our model out of the way, let’s get into the numbers!
World Series: BOS vs LAD
Game 1: 10/23/2018
SaberSmart Simulator: BOS 54.9%, LAD 45.1%
Runs Scored/Allowed: BOS 51.4%, LAD 48.6%
FiveThirtyEight Elo: BOS 64.9%, LAD 35.1%
Vegas: BOS 63.6%, LAD 36.4%
Final Score: BOS 8 - 4
Overall Winner (10/22/2018):
SaberSmart Simulator: BOS 65.8%, LAD 34.2%
Runs Scored/Allowed: BOS 54.1%, LAD 45.9%
FiveThirtyEight Elo: BOS 60%, LAD 40%
Game 2: 10/24/2018
SaberSmart Simulator: BOS 55.3%, LAD 44.7%
Runs Scored/Allowed: BOS 51.8%, LAD 48.2%
FiveThirtyEight Elo: BOS 62.0%, LAD 38.0%
Vegas: BOS 59.5%, LAD 40.5%
Final Score: BOS 4 - 2
Overall Winner (10/23/2018):
SaberSmart Simulator: BOS 74.9%, LAD 25.1%
Runs Scored/Allowed: BOS 69.6%, LAD 30.4%
FiveThirtyEight Elo: BOS 72%, LAD 28%
Game 3: 10/26/2018
SaberSmart Simulator: BOS 55.3%, LAD 44.7%
Runs Scored/Allowed: BOS 52.6%, LAD 47.4%
FiveThirtyEight Elo: BOS 44.4%, LAD 55.6%
Vegas: BOS 35.7%, LAD 64.3%
Final Score: LAD 3 - 2
Overall Winner (10/25/2018):
SaberSmart Simulator: BOS 87.3%, LAD 12.7%
Runs Scored/Allowed: BOS 84.1%, LAD 15.9%
FiveThirtyEight Elo: BOS 86%, LAD 14%
Game 4: 10/27/2018
SaberSmart Simulator: BOS 55.1%, LAD 44.9%
Runs Scored/Allowed: BOS 52.5%, LAD 47.5%
FiveThirtyEight Elo: BOS 49.2%, LAD 50.8%
Vegas: BOS 51.2%, LAD 48.8%
Final Score: BOS 9 - 6
Overall Winner (10/26/2018):
SaberSmart Simulator: BOS 76.2%, LAD 23.8%
Runs Scored/Allowed: BOS 72.1%, LAD 27.9%
FiveThirtyEight Elo: BOS 75%, LAD 25%
Game 5: 10/28/2018
SaberSmart Simulator: BOS 55.4%, LAD 44.6%
Runs Scored/Allowed: BOS 53.2%, LAD 46.8%
FiveThirtyEight Elo: BOS 51.6%, LAD 48.4%
Vegas: BOS 40%, LAD 60%
Final Score: BOS 5 - 1
Overall Winner (10/27/2018):
SaberSmart Simulator: BOS 91.1%, LAD 8.9%
Runs Scored/Allowed: BOS 89.3%, LAD 10.7%
FiveThirtyEight Elo: BOS 93%, LAD 7%
Unlike last year when the numbers essentially indicated a tight, evenly matched series, the Dodgers look like clear underdogs to the Red Sox. Both our simulator and FiveThirtyEight give them >60% odds of winning the whole thing, while our runs scored/allowed model is a little more conservative.
The Dodgers’ offense is going to have to meet their 5 runs scored per game on average on the road to stand a chance against the bats of the Red Sox. However, this is baseball, so literally anything can happen. However, we’ll still take the Red Sox as medium to heavy favorites.
Check back often, we’ll be updating our World Series win probabilities after every game!
Who do you think will win the World Series, the Red Sox or the Dodgers? Do you have any questions or suggestions about either our runs scored/allowed model or our SaberSmart Simulator? Let us know your thoughts in the comment section below! As always, our code and data can be found on our Github.
The SaberSmart Team
P.S. If you enjoyed this article, and need something off Amazon anyway, why not support this site by clicking through the banner at the bottom of the page? As a member of the Amazon Affiliates program, we may receive a commission on any purchases. All revenue goes towards the continued hosting of this site.