So far, the Hot Stove has been, well, rather cold. Derek Jeter’s Miami Marlins are in pursuit of building a winning team, while shedding payroll, that will hopefully help increase the fan base and future revenue from ticket sales, advertising revenue, and merchandise sales. Currently, the Marlins have the 10th lowest payroll in Major League Baseball (MLB), one of their highest rankings in the last ten years. By trading Giancarlo Stanton, reigning MVP, as well as other large players like Ozuna and Yelich, the Marlins can decrease their payroll, however, can they still field a competitive team?
In their 24-year existence, the Marlins have only achieved a record above .500 in six of those seasons, with 2009 being the last instance. In the past ten seasons, the Marlins have won an average of only 75 games. In order to become a legitimate playoff contender, and qualifier, history suggests they will likely need to win at least 90 games.
To create wins, the Marlins will have to generate a surplus of Runs Scored (RS) vs. Runs Allowed (RA) over the course of the 162-game regular season. I will look at the Pythagorean Theorem of Baseball, created by Bill James, as an aid to identify those two thresholds. Over the past ten seasons, the Pythagorean Winning Percentage has been incredibly accurate in predicting the Marlins’ record. The theorem perfectly predicted the Marlins’ record in 2011, and was never more than five games off in any other season over that time frame.
Using my analysis, as well as the support of several academic studies, I have identified the most potent predictors of team efficiency are those with low “walks plus hits per innings pitched” (WHIP), high on-base percentage (OBP), and on-base plus slugging (OPS). In each World Series from 1997 to 2009, for example, there was only one year where the team with the lower OBP went on to win the Series – ironically, this happened in 2003 when the Marlins defeated the Yankees. Surprising to many, data confirms that batting average has a far weaker correlation to runs scored than OBP and OPS.
Yet those players with high OBP and OPS percentages are systematically underpriced when compared to players with high batting averages. With their constrained payroll, the Marlins will be happy to take advantage of this dynamic. Similarly, pitchers with low WHIP are underpriced when compared to those with low ERA, even though WHIP correlated stronger with Runs Allowed.
I want to provide the Marlins with a road map to be a more legitimate playoff contender going forward. The ultimate goal is to create a strategy where the Marlins will be in a position to maximize runs scored while minimizing runs allowed. I feel that focusing on high OBP and OPS, and low WHIP will be key ingredients in achieving that end. From there, I can identify the types of players needed to build a stronger, more affordable Marlins lineup.
The primary data source used in my analysis was extracted from Lahman’s Baseball Database (Lahman’s database) and Baseball Reference (baseball-reference). Lahman’s database contains complete batting and pitching statistics from 1871 to 2016, plus fielding statistics, standings, managerial records, and salaries. I found Lahman’s database was best suited for our analyses in R. Baseball Reference features complimentary statistics to Lahman’s database, and allows for sorting, searching, and analysis via web browser. This was used as a supplement to my analyses in R.
Both Lahman’s database and Baseball Reference afforded the opportunity to evaluate the Marlins against all of the teams in Major League Baseball. This allowed me to determine which thresholds for RS and RA needed to be met that would increase the likelihood of the Marlins making the playoffs. These data sources also allowed for changes to the composition of the roster to decide which players would best meet the Marlins’ needs.
The data relevant to RS that were extracted from Lahman’s database featured just over 2,800 observations of team performance across 48 measurement variables. As the Marlins were formed in 1993 – and with the last major MLB strike occurring in 1994 – we only focused on data from 1995 to 2016. Examples of variables featured within this dataset are: At Bats (AB), Hits (H), Doubles (X2B), Triples (X3B), and Home Runs (HR).
Using these variables, we created a few additional, calculated variables: Batting Average (BA), On-Base Percentage (OBP), Slugging (SLG), On-Base plus Slugging (OPS), and Weighted On-Base Average (wOBA). The data relevant to RA were also extracted from Lahman’s database, and featured just over 650 observations across 11 measurement variables. Examples of variables featured within this dataset were: Hitting Average (HA), Strikeouts per Nine Innings (Kper9), and Walks plus Hits per Inning Pitched (WHIP).
Next time, I will go into the methods used and the first description of my results! In part 3, I will list the free agent players the Marlins should pursue and the players they should trade to create a projected playoff team with minimal payroll. Let us know your thoughts on this analysis in the comments below!
The SaberSmart Team.