SaberSmart
  • Home
  • Blog
    • Throwback
  • Playoff Odds
    • MLB >
      • 2019 >
        • Total Playoff
        • Win Division
        • Win WildCard
        • Expected Wins
      • 2018 >
        • Total Playoff
        • Win Division
        • Win WildCard
        • Expected Wins
      • 2017 >
        • Total Playoff
        • Win Division
        • Win WildCard
        • Expected Wins
    • NBA >
      • 2018 >
        • Total Playoff
        • Expected Wins
      • 2017 >
        • Total Playoff
        • Expected Wins
  • About
    • Contact
    • Comment Policy

To Explain or To Predict? Statistical Inference vs Predictive Modeling

8/7/2017

Comments

 
Picture
With so much terminology thrown around in regards to big data, and especially with machine learning, we thought it would be helpful to explore some of the more common verbiage. In this post, we delve into the idiosyncrasies behind two common concepts thrown around with predictive analytics, statistical inferences and predictive modeling. While sometimes used in similar situations, they really are independent concepts.
​
When analyzing big data, we can build statistical models for inference purposes or predictive purposes. For instance, imagine that we fit a simple linear regression model Y = b0 + b1X. If we fit this model for the purpose of statistical inference, our primary motivations and conclusions gathered from statistical tests are about the data itself. If we built this model for predictive purposes, statistical inference and tests are still important, however our motivations are steered primarily by the success of the predicted values. Consequently, the metrics used to evaluate these models have to be unique to the underlying motivations.
With statistical models developed for inference modeling, the primary focus is on the data that one has and the journey to discover the underlying relationships between the data after statistical fluctuations have been removed. Essentially, there is a large focus on theory and domain knowledge and consequently, the data is used to test these assumptions. On the other hand, predictive models extrapolate into the unknown using the known relationships between the data. The known relationship may emerge from a causal or descriptive analysis, or even some other technique such as machine learning.

When using models to explain existing behavior, or basically the purpose of statistical inference, the typical procedure is to first form a hypothesis about which fields will be useful and what form of a model is the true form. This holds true for almost any analysis project. If the coefficients in the model are determined to be wrong, or if the model errors are too egregious, we have to rebuild the model to get it right, which may mean transforming the inputs or respondent variables so that the model conforms to our assumptions. However, when predicting the target variable accurately is paramount, the actual distribution and construction of the model takes a back seat. One does not need to explain precisely why individuals behave as they do, as long as they can explain how they will behave.

Take the simple case from the introduction, a fitted simple linear regression model Y = b0 + b1X. If we fit this model for the purpose of statistical inference, then we are typically interested in learning more about the relationship between the independent variable, X, and the dependent variable Y. This could simply be determining how to to adjust the input X to result in a predetermined result for Y. In more complex models, this could determine which particular independent variables need to be adjusted to determine a particular change in Y.

If we use this model to predict Y, we must understand how we can utilize the input variables to make better decisions but we do not necessarily need an explanation on how the model actually works. Predictive analytics models may be essentially explicable,however a real-world explanation of why a model has a particular coefficient is definitely not required. With our simple linear regression model, we want to determine Y-hat, or an estimate for the actual value of Y. In this case, we input various X-values and determine how close the resulting Y-hat values are to our expected values of Y. By using R^2 values and F-tests, one can determine how “good” these models actually are in their predictive abilities.

Since a predictive model’s objective is quite clear, it basically has a specific prediction goal, the performance and value of the model can be measured without explaining causality. While statistical inferences are mostly explainers, it is important to remember that even strong correlations do not necessarily imply causation.

In one sentence, predictive modelling is about explaining what is likely to happen, while statistical inference determines how we can change the expected result.

When do you use statistical inference? With predictive modeling dominating big data analytics headlines in recent years, are data explainers even necessary anymore? Let us know your opinions below in the comments!
​
The SaberSmart Team
Comments
comments powered by Disqus

    Archives

    August 2019
    July 2019
    January 2019
    October 2018
    September 2018
    August 2018
    July 2018
    June 2018
    April 2018
    February 2018
    December 2017
    November 2017
    October 2017
    September 2017
    August 2017
    July 2017
    June 2017
    May 2017
    April 2017
    March 2017
    February 2017
    January 2017
    December 2016

    Categories

    All
    Analytics
    Big Data
    Computer Science
    Economics
    Essay
    Football
    Gambling
    History
    Mathematics
    MLB Teams
    NBA Teams
    NFL Teams
    Philosophy
    Super Bowl
    Triple Crown
    World Series

    RSS Feed

    Follow @sabersmartblog
    Tweets by sabersmartblog
 Support this site by clicking through the banner below:
  • Home
  • Blog
    • Throwback
  • Playoff Odds
    • MLB >
      • 2019 >
        • Total Playoff
        • Win Division
        • Win WildCard
        • Expected Wins
      • 2018 >
        • Total Playoff
        • Win Division
        • Win WildCard
        • Expected Wins
      • 2017 >
        • Total Playoff
        • Win Division
        • Win WildCard
        • Expected Wins
    • NBA >
      • 2018 >
        • Total Playoff
        • Expected Wins
      • 2017 >
        • Total Playoff
        • Expected Wins
  • About
    • Contact
    • Comment Policy