Predicting who will win baseball games is a hard task. Baseball is notorious for its high degree of luck, regression to the mean, and the inability for anyone to judge a player’s “true” talent based off of only a few games.
For examples of the small sample size issue, here’s some extra content from Fangraphs, BP, and myself. First, The Meaning of Small Sample Data, by Dave Cameron. Next, here is
Predicting MLB Records from a “Small” Sample Size written by yours truly. Finally here is Baseball Therapy: It’s a Small Sample Size After All by Russell Carleton of Baseball Prospectus.
Now, on August 21, 2019, the Houston Astros played the Detroit Tigers, in Houston. Here's the historic gamecast link.
Just for some perspective, the Astros at the time had a record of 81-46, while the Tigers had a record of 37-86. If the two teams were in the same division, the Tigers would be 40+ games back of the Astros. The Astros were 4-6 in their last 10 games while the Tigers were skidding at 2-8.
Further, the Astros were an astonishing 45-10 at home at that point in the season, for a home win percentage of 81.8%. The Tigers were 20-43 on the road, for an away win percentage of 31.7%.
Finally, the Astros has Justin Verlander as their starting pitcher, he of 1 Cy Young award, and a current record of 15-5, with a sub 1 WHIP. AKA ELITE. The Tigers fielded, well, not Verlander as their starting pitcher. Daniel Norris had a 3-10 record, 4.7 ERA and 1.4 WHIP.
Here’s a quick comparison of their pitcher game scores, from FiveThirtyEight. Verlander is over 1.5x better by this metric than Norris.
All of this culminated in one of the most lopsided games in history, according to Vegas. Or it would have been, if the Astros had actually won. For a quick comparison, the closing Astros betting line would have been like an NFL team laying 25 points. Maybe 30 according to the Philadelphia Inquirer.
According to OddsPortal.com, who tracks closing MLB moneylines, the Astros closed as monster favorites of -526.
What does this mean though? Well, we can translate that to a win percentage by dividing it by the money line + 100. So 526/(526+100) = 526/626 = 84%.
This is a STUPID high win probability.
If this particular Astros/Tigers game was to play out 100 times, Vegas would expect the Astros to win 84 times.
In fact, Houston was the heaviest favorite in a game in at least the past 15 seasons, according to sports betting database BetLabsSports.com, and likely longer.
One of my favorite websites of all time, fivethirtyeight.com, also develop win probabilities for specific games. They gave the Astros an 80.6% chance of winning the game, or 81 times out of 100.
Which by the way is HILARIOUS.
I downloaded FiveThirtyEight’s Elo MLB game probabilities for every game from the 2019 regular season so far. That data is available here.
That game on 8/21 was the first match-up Elo gave over an 80% win probability to all season. The next highest were 78% of HOU over DET on 8/19, then 77% of HOU over BAL on 6/7. For the record, HOU won both of those games.
Anyone who has watched a game of baseball knows a single game of baseball can be unbelievably random. It has been well proven that in a single game of baseball, the team with the best record only wins about 56% of the time.
According to this research study, How Often Does the Best Team Win? A Unified Approach to Understanding Randomness in North American Sport:
The median probability of the best team winning a neutral site game is highest in the NBA (67%), followed in order by the NFL (64%), NHL (57%), and MLB (56%).
However, this number is home field independent as well as context neutral. I developed my own single game predictor using a Bayesian probability model, adjusted for Pythagorean win record, home field advantage, and current record in the last 10 games that takes into account the inherent randomness of the sport.
If you want a more in-depth read about how this model works, I recommend reading my articles on the 2018 World Series, where I put this model to practical use.
My model, in its current state, only gave the Astros a 68% chance of winning that game against Detroit. While still having the Astros as heavy favorites, not being as over-confident as ELO and Vegas allowed my game prediction to have a lower Brier score.
Brier scores are essentially mean-squared errors for probabilities. If a team won, and you gave them a probability of winning of 1, your Brier score is (1-1)^2 which is 0. If they lost, the Brier score would be (1-0)^2 which is 1. The baseline is (1-0.5)^2 which is 0.25.
Vegas had a Brier score of (0-.84)^2 which is .705. Elo had a Brier score of (0-.81)^2 which is 0.656. My model had a Brier score of (0-.68)^2 which is 0.462. Obviously the baseline did the best here since all three models had the Astros winning the game, which they of course did not.
Is there a conclusion to this? Maybe. As a not Houston Astros fan though, kudos to the Tigers on their historic upset Wednesday night. And if you ever see a MLB game win probability over 80%, maybe think twice before believing it. The randomness in the game is hard to overcome, no matter how good your team.
What do you think about this historic upset? What’s the highest win probability you think should be assigned to a single MLB game? Let us know your thoughts in the comment section below! As always, our code and data can be found on our Github.
The SaberSmart Team
P.S. If you enjoyed this article, and need something off Amazon anyway, why not support this site by clicking through the banner at the bottom of the page? As a member of the Amazon Affiliates program, we may receive a commission on any purchases. All revenue goes towards the continued hosting of this site.