Last fall, we generated win probabilities for all 33 games of the MLB postseason, including the 2 wild-card games, 26 divisional and championship round games, and the 5 World Series games. We think it is finally time to declare how we did.
For each game, we compared win probabilities from 6 sources, two “baselines” (50/50 odds for each game, and the home team winning each game), two for-profit industries (Vegas betting lines and FiveThirtyEight’s Elo), and our two win probability metrics, Runs Scored/Runs Allowed wp%, created in 2017, and our Bayesian SaberSmart Simulator, created in 2018.
To read more about these methodologies, check out our articles describing our World Series Probabilities, the Divisional/Championship Probabilities, and the Wild Card Probabilities. FiveThirtyEight's Elo predictions can be found here. Historical MLB money lines can be found here, and a chart to convert them to win percentages can be found here. This is how all of our Vegas probabilities are generated.
Our first retrospective metric in evaluating how all of these win probabilities did is simply the number of wins each called correctly. If a team had above a 50% chance of winning and did, then that counts as a correct call.
Correctly Called Wins:
Interestingly, in a straight binary pick of a winner, picking the home team was barely beaten out by Elo, RS/RA, and Vegas. Only our Bayesian Simulator consistently exceeded those expectations.
A better metric than a binary indicator to measures the accuracy of probabilistic predictions are Brier Scores. They are applicable to tasks in which predictions must assign probabilities to a set of mutually exclusive discrete outcomes, for example, whether a team will win or lose a game.
The formula for a Brier Score is simply the squared error. For each event, square the difference between the predicted probability and whether or not the event actually occurred (0 or 1). Then sum them all up.
My favorite explanation comes from Wikipedia, where they calculate the Brier Score of a local weatherman:
Since Brier Scores are a cost function, the lower a model’s total Brier Score, the better its predictions.
We can run through our six models again to see which had the lowest Brier Score. If a model cannot beat the 50/50 model, where the outcome is independent, then it can be seen as no better than random guessing.
Perhaps unsurprisingly, the Home/Away baseline model had the highest Brier Score, due to the “all-in” mentality of its predictions.
All of the models beat the 50/50 baseline, so that means that they are at least better than random guessing. Unfortunately, they are all still really close together. To better show the magnitude of the difference, we can standardize the Brier Scores into a scale we can more readily understand.
FiveThirtyEight’s NFL Predictions game uses a normalized version of Brier Scores to generate Game Points. Instead of a cost function, a prediction is rewarded in Game Points for having a low Brier Score. Game Points range from -75 to 25 depending on the outcome and probability.
The formula for calculating Game Points from two probabilities is:
Depending on whether or not the Home Team wins, only one half of the equation is used as the other half goes to 0.
For a game with a probability of 1 for the home team, if the home team won, that model would get 25 points. If they lost, the model would receive -75 points. This keeps in line with the squared error of the Brier Score. Like Brier Scores, Game Points penalizes over-confidence in wrong predictions heavily.
Here are our six models again, except with Game Points, instead of Brier Scores.
By using Game Points to visualize each model’s Brier Scores, we can see that Elo and our Bayesian Simulator did far better than Vegas or RS/RA than was initially apparent.
Since we know the Game Points for every game for every model, we can also review some of our best calls and worst ones.
Our Bayesian SaberSmart Simulator’s best game, in Game Points Added, was the AL Wild Card. Our model gave the Yankees a 56.7% chance of beating the Oakland A’s, our highest win probability of the entire postseason. In pure symmetry, the NL Wild Card game was our worst call, where we gave the Cubs a 56.6% chance of winning at home against the Rockies.
Elo’s best game was the second game in the Divisional Round between the Los Angeles Dodgers and Atlanta Braves in LA. They gave the Dodgers a 66.5% chance of winning the game, and of course, they did. Unfortunately for FiveThirtyEight, their worst pick came literally in the next game, game three in the Divisional Round between the Los Angeles Dodgers and Atlanta Braves held in Atlanta. They gave the Dodgers a 64.7% of winning Game 3, and the series, in Atlanta. However, the Braves pulled the upset, handing the Elo model -16.86 Game Points.
Honestly, it was nerve-wracking watching each model accumulate Game Points, and it all came down the final game of the playoffs, World Series Game 5. After Game 4, Elo had a slight half-point lead on my Bayesian SaberSmart Simulator, and Vegas had a more massive +13 point lead on my RS/RA model.
Game 5 was played in Los Angeles, between the Red Sox and the Dodgers, who were fighting for their World Series lives. Unfortunately, Vegas was the only one giving the Dodgers the edge, with a 60% chance to win. Elo gave the Dodgers a 48.4% chance, RS/RA a 46.8% chance, and the SaberSmart Simulator only gave the Dodgers a 44.6% chance. The Dodgers lost, resulting in Game Points of -11, +1.57, +3.10, and +5.11 respectively. Enough to give the SaberSmart Simulator the victory, as well as a leap-frogging of our RS/RA model over Vegas.
While our new Bayesian SaberSmart Simulator did beat out Elo in Brier Score, and in net games called, since the Brier Scores were so close, it can’t be definitively said that our model is better than Elo. However, it did do better in the 33 game sample of the MLB postseason.
The real question is why did Vegas do so terribly? We think that the answer lies in the home field advantage most of the models picked up on.
On average, Vegas gave the home team a 56.7% chance of winning, Elo gave the home team a 55.6% chance, RS/RA gave the home team a 48.7% chance, and our SS Simulator gave the home team only a 50.7% chance of winning on average. As mentioned above, the home team only won 51.5% of the games in the postseason. In baseball, the market and general thinking might be overestimating how much home field advantage matters.
What do you think about how poorly Vegas did this last postseason> Do you have any questions or suggestions about either our runs scored/allowed model or our SaberSmart Simulator? Let us know your thoughts in the comment section below! As always, our code and data can be found on our Github.
The SaberSmart Team
P.S. If you enjoyed this article, and need something off Amazon anyway, why not support this site by clicking through the banner at the bottom of the page? As a member of the Amazon Affiliates program, we may receive a commission on any purchases. All revenue goes towards the continued hosting of this site.