SaberSmart
  • Home
  • Blog
    • Throwback
  • Playoff Odds
    • MLB >
      • 2019 >
        • Total Playoff
        • Win Division
        • Win WildCard
        • Expected Wins
      • 2018 >
        • Total Playoff
        • Win Division
        • Win WildCard
        • Expected Wins
      • 2017 >
        • Total Playoff
        • Win Division
        • Win WildCard
        • Expected Wins
    • NBA >
      • 2018 >
        • Total Playoff
        • Expected Wins
      • 2017 >
        • Total Playoff
        • Expected Wins
  • About
    • Contact
    • Comment Policy

Breakdown: Hit Distributions and Linear Eq. Part 1

1/14/2017

Comments

 
Picture
A simple way to describe a statistic in terms of other variables is through a linear equation. As the name suggests, a linear equation is an equation that makes a straight line when it is graphed.

Linear equations are powerful tools because they allow you to see relationships between similar variables. If you have the same amount of independent equations as unknown variables, you can even solve for them!

These can be applied to baseball in a plethora of ways because most calculated statistics actually originate from linear equations. I pause here to quickly say that calculated statistics are statistics that involve some sort of arithmetic between multiple variables, such as Batting Average, while counted statistics involve only a singular variable summed over a period of time, such as Home Runs.
Now take for instance offensive statistics. Slugging Percentage (SLG), Batting Average (BA), and Isolated Power (IP) are all calculated statistics derived from linear equations using singles, doubles, triples, and home runs as variables. While most sources of player summary statistics only show these calculated numbers, and occasionally home runs, given enough of them we can determine the underlying distribution of singles, doubles, triples, and home runs for a player!
Caution: Math Ahead!

I decided to calculate this distribution for the 2016 season for one of my favorite baseball players, Adrian Beltre of the Texas Rangers, using only numbers provided on his MLB summary page. Although SLG is not explicitly listed, we can find it by simply subtracting OPS - OBP. Now we have 2 calculated statistics as well as the counted statistics of Hits (H) and At-Bats (AB) and even Home Runs as a bonus:

Adrian Beltre’s Stats, 2016:
BA = .300
H = 175
1B = ?
3B = ?
SLG = .521
AB = 583
2B = ?
HR = 32
Here are the equations defining hits, batting average, and slugging percentage.

  1. 1B + 2B + 3B + HR = H
  2. H/AB = (1B + 2B + 3B + HR) / AB = BA
  3. (1*1B + 2*2B + 3*3B + 4*HR )/AB = SLG    

Since we have three unknown variables and three equations, we can solve for singles, doubles, and triples!

First, I rearrange our equations to put variables on the left side and constants on the right:

  1. 1B + 2B + 3B = H - HR
  2. 1B + 2B + 3B = (BA * AB) - HR
  3. 1*1B + 2*2B + 3*3B  = (SLG * AB) - 4*HR

Then I substitute in the values. 1B = x, 2B = y, 3B = z

  1. x + y + z = 175 - 32 = 143
  2. x + y + z = 175 - 32 = 143
  3. x + 2y + 3z = (.521 * 583) - (4 * 32) = 304 - 128 = 176

Oh no! Our first two equations are exactly the same. This is because of the relationship between hits and batting average. When we multiplied both sides of the second equation by at-bats, we accidentally turned it into the definition of hits.

Since we need another equation, let us turn to Isolated Power (IP), or how often a player’s hits goes for extra bases. This is defined as SLG - BA, and can be written using singles, doubles and triples as such:

     a) (2B + 2*3B + 3*HR) / AB = IP = SLG - BA

Substituting our values in gives us:

      4) y + 2z =  (.521 - .300) * 583 - 3 * 32 = .221 * 583 - 96 = 33

However, this equation will be unable to help us as well. As stated above, we need the same number of independent equations as unknown variables. Unfortunately, since IP = SLG - BA, this equation is dependent on our other linear functions. This can be seen if you subtract equation 2) from equation 3). The result is equation 4).

So this leaves us in a bit of a quandary. We still need one more equation independent from both slugging percentage and batting average! Shoot us a message either here or on Twitter (@sabersmartblog) if you can think of another statistic to use. Remember, hits and isolated power are dependent equations and we want an equation that adds in the least amount of new variables.

See you in Part 2!

The SaberSmart Team
Comments
comments powered by Disqus

    Archives

    August 2019
    July 2019
    January 2019
    October 2018
    September 2018
    August 2018
    July 2018
    June 2018
    April 2018
    February 2018
    December 2017
    November 2017
    October 2017
    September 2017
    August 2017
    July 2017
    June 2017
    May 2017
    April 2017
    March 2017
    February 2017
    January 2017
    December 2016

    Categories

    All
    Analytics
    Big Data
    Computer Science
    Economics
    Essay
    Football
    Gambling
    History
    Mathematics
    MLB Teams
    NBA Teams
    NFL Teams
    Philosophy
    Super Bowl
    Triple Crown
    World Series

    RSS Feed

    Follow @sabersmartblog
    Tweets by sabersmartblog
 Support this site by clicking through the banner below:
  • Home
  • Blog
    • Throwback
  • Playoff Odds
    • MLB >
      • 2019 >
        • Total Playoff
        • Win Division
        • Win WildCard
        • Expected Wins
      • 2018 >
        • Total Playoff
        • Win Division
        • Win WildCard
        • Expected Wins
      • 2017 >
        • Total Playoff
        • Win Division
        • Win WildCard
        • Expected Wins
    • NBA >
      • 2018 >
        • Total Playoff
        • Expected Wins
      • 2017 >
        • Total Playoff
        • Expected Wins
  • About
    • Contact
    • Comment Policy