We all know that the Marlins are shedding payroll and trading players this off-season. Giancarlo Stanton will most likely be traded to the Giants, Dodgers, or Cardinals. Dee Gordon was sent to the Mariners today, and Yelich and Ozuna will most likely be on the move before long. While most expect the Marlins to be rebuilding for the next few years, it is not impossible to field a competitive team with a low payroll. One example is the Brewers, the team with the lowest payroll of 2017, from earlier this year.
By trading for players known to be available, or signing free agents, that are undervalued, the Marlins should be able to field a team projected to make the playoffs. In this post, we will go into the methods used and the first results obtained from this analysis! Read more on the introduction and data used over in Part 1.
In analyzing the Miami Marlins, we were focused on creating a roster that maximized RS and minimized RA. In order to arrive at the ideal number of RS and RA, we analyzed the following statistics: BA, OBP, SLG, OBP, and wOBA, as well as ERA, WHIP, and Kper9. When applying these statistics, we had to determine which ones were most strongly correlated to RS and RA. Once we determined which statistics were most relevant, we then applied an additional layer of analysis in the form of a constrained optimization model to keep the Marlins’ payroll equal to, or below, $56 million.
While our goal is to optimize RS and RA while staying below a team payroll of $56 million, the overarching goal is to achieve RS and RA thresholds that allow us to make the Marlins a playoff contender. The data we are using allows us to analyze the makeup of playoff teams and championship contenders across the league, providing us the data necessary to determine what thresholds we need to hit or exceed.
After the data was extracted from Lahman’s database and Baseball-Reference, our process for analyzing the data were:
The need to clean the data was light. As baseball data is tracked in tabular form, cleaning the data mostly involved filtering Lahman’s extracts to our needs – pulling statistics relevant to RS and RA. However, the data did require extra variables be calculated, specifically BA, OBP, SLG, OBP, and wOBA.
Following data cleaning, EDA was performed prior to modeling to test which variables may be most relevant for modeling. As it relates to RS, OPS was the best predictor compared to the other offensive metrics we analyzed. For example, OPS had an R-square of 0.89 and a strong positive correlation of 0.94, where batting average, had an R-Square of just .66 and a correlation of .81. If you want to read more of this, check out our earlier article on the The Rise and Fall of the Batting Average! Below are the outputs from this analysis:
As it relates to RA, we compared WHIP to Kper9 which is also a widely used metric for analyzing pitcher performance. We identified that WHIP was a far superior metric to use compared to Kper9 when trying to minimize RA. The R-Squared and correlation coefficient of .83 and .91, respectively, for WHIP showed much greater predictive ability than did the R-Square and correlation coefficient of just .32 and .57, respectively, for Kper9. Again, below are the outputs from our models and analysis:
Using OPS and WHIP, we then treated the situation as a constrained optimization problem. We initially limited the list of batters to those with at least 500 plate appearances in 2016 and the list of pitchers to those with at least 85 innings pitched in 2016. In analyzing batters, the goal was to maximize OPS with the constraints of just 13 batters and a 16 million dollar salary cap. In analyzing pitchers, the goal was to minimize WHIP with 12 pitchers and a 25 million dollar salary cap. Overall the results will give us a full 25 man roster for 31 million dollars with an additional 15 million left over to fill out the 40 man roster.
The initial model included the entire universe of batters and pitchers who met the constraints as outlined above. This model did create a very good team of players with low salaries and an overall win percentage of .768 (a season with 124 victories!). Some of the batters were all-stars such as Jose Altuve, Mookie Betts, Charlie Blackmon and Kris Bryant. However, while these players do fit the constraints of maximizing OPS while minimizing salary, for that very reason they probably would not be available. It is unlikely the Marlins would actually be able to convince any other teams to trade them. Below are the tables showing the optimized batters and pitchers from this attempt:
The next step in the modelling process was to reduce the pool of players to those who are most likely to be available in 2018. This included those who will be free agents in 2018 along with anyone who was considered a trade target in 2017. The assumption in the latter case is that these players will most likely still be available via trade in the off-season.
Make sure to join us next week as we finish creating a projected playoff team with minimal payroll from currently available players! The chosen players may surprise you…
What do you think of our methods so far? Let us know in the comments below! As always, our code can be found on Github.
The SaberSmart Team