A simple way to describe a statistic in terms of other variables is through a linear equation. As the name suggests, a linear equation is an equation that makes a straight line when it is graphed.
Linear equations are powerful tools because they allow you to see relationships between similar variables. If you have the same amount of independent equations as unknown variables, you can even solve for them!
These can be applied to baseball in a plethora of ways because most calculated statistics actually originate from linear equations. I pause here to quickly say that calculated statistics are statistics that involve some sort of arithmetic between multiple variables, such as Batting Average, while counted statistics involve only a singular variable summed over a period of time, such as Home Runs.
Now take for instance offensive statistics. Slugging Percentage (SLG), Batting Average (BA), and Isolated Power (IP) are all calculated statistics derived from linear equations using singles, doubles, triples, and home runs as variables. While most sources of player summary statistics only show these calculated numbers, and occasionally home runs, given enough of them we can determine the underlying distribution of singles, doubles, triples, and home runs for a player!
Caution: Math Ahead!
I decided to calculate this distribution for the 2016 season for one of my favorite baseball players, Adrian Beltre of the Texas Rangers, using only numbers provided on his MLB summary page. Although SLG is not explicitly listed, we can find it by simply subtracting OPS - OBP. Now we have 2 calculated statistics as well as the counted statistics of Hits (H) and At-Bats (AB) and even Home Runs as a bonus:
Adrian Beltre’s Stats, 2016:
Here are the equations defining hits, batting average, and slugging percentage.
Since we have three unknown variables and three equations, we can solve for singles, doubles, and triples!
First, I rearrange our equations to put variables on the left side and constants on the right:
Then I substitute in the values. 1B = x, 2B = y, 3B = z
Oh no! Our first two equations are exactly the same. This is because of the relationship between hits and batting average. When we multiplied both sides of the second equation by at-bats, we accidentally turned it into the definition of hits.
Since we need another equation, let us turn to Isolated Power (IP), or how often a player’s hits goes for extra bases. This is defined as SLG - BA, and can be written using singles, doubles and triples as such:
a) (2B + 2*3B + 3*HR) / AB = IP = SLG - BA
Substituting our values in gives us:
4) y + 2z = (.521 - .300) * 583 - 3 * 32 = .221 * 583 - 96 = 33
However, this equation will be unable to help us as well. As stated above, we need the same number of independent equations as unknown variables. Unfortunately, since IP = SLG - BA, this equation is dependent on our other linear functions. This can be seen if you subtract equation 2) from equation 3). The result is equation 4).
So this leaves us in a bit of a quandary. We still need one more equation independent from both slugging percentage and batting average! Shoot us a message either here or on Twitter (@sabersmartblog) if you can think of another statistic to use. Remember, hits and isolated power are dependent equations and we want an equation that adds in the least amount of new variables.
See you in Part 2!
The SaberSmart Team