SaberSmart
  • Home
  • Blog
    • Throwback
  • Playoff Odds
    • MLB >
      • 2019 >
        • Total Playoff
        • Win Division
        • Win WildCard
        • Expected Wins
      • 2018 >
        • Total Playoff
        • Win Division
        • Win WildCard
        • Expected Wins
      • 2017 >
        • Total Playoff
        • Win Division
        • Win WildCard
        • Expected Wins
    • NBA >
      • 2018 >
        • Total Playoff
        • Expected Wins
      • 2017 >
        • Total Playoff
        • Expected Wins
  • About
    • Contact
    • Comment Policy

Evaluating Model Goodness-Of-Fit, Part 1: OLS

9/12/2017

Comments

 
Picture

​So the lovable kiwi above is a GOOD fruit to help keep you FIT. Can you guess what we are discussing this week? Yup, model goodness-of-fit!
​
Just because we are able to fit a regression model to a data set does not mean that it is the right model to use. It is imperative to assess the goodness-of-fit of a regression model with determined metrics and graphical displays. What actions then constitute an analysis of goodness-of-fit for a regression model? What can go wrong in the interpretation of the results and the use of a regression model that would be deemed to “fit poorly”? Today, we tackle what makes a regression model well fitting. One quick caveat, this post deals only with models fitted using OLS regression, not maximum likelihood.

​A regression model can be declared as well fitting if the predicted values match closely to the observed, or expected, values. The most basic regression model can be considered the “mean model”, or the model where every predicted values is simply the mean of the expected data. Obviously, a regression model should fit better than this. The actions constituting this analysis can consist of many tests and the calculation of various statistics. However, the three statistics most commonly used to evaluate model fit are R-squared, the overall F-test, and the Root Mean Square Error (RMSE).

All three are based on two sums of squares, Sum of Squares Total (SST) and Sum of Squares Error (SSE). SST measures how far the data are from the mean and SSE measures how far the data are from the model’s predicted values. Different combinations of these two values provide different information about how the regression model compares to the mean model.

The difference between SST and SSE is the improvement in prediction from the regression model, compared to the mean model. Dividing that difference by SST gives R-squared, or the coefficient of determination. R-squared is the proportional improvement in prediction from the regression model, compared to the mean model. Additionally, the F-test evaluates the null hypothesis that all regression coefficients are equal to zero versus the alternative that at least one does not, or that the R-squared value equals zero.

A significant F-test indicates that the observed R-squared is reliable, and is not a spurious result of oddities in the data set. Thus, the F-test determines whether the proposed relationship between the response variable and the set of predictors is statistically reliable. Finally, the RMSE is the square root of the variance of the residuals. It indicates the absolute fit of the model to the data, or how close the observed data points are to the model’s predicted values. Lower values of RMSE indicate better fit. If you have any background in statistics, all of these values comprise the ANOVA table.

To conceptually understand these statistics, it is first prudent to understand the residuals, or difference between the predicted and observed value. The residuals thus measure the error, and the error has to be stochastic. Stochastic is basically a fancy word for unpredictable and random. Consequently, the residuals need to also depict an air of randomness. This process is easy to understand with a die-rolling analogy. When you roll a die, you should not be able to predict which number will show on any given toss. However, you can assess a series of tosses to determine whether the displayed numbers follow a random pattern.

For example, if the number one shows up more frequently than randomness dictates, one-sixth of the time, you know something is wrong with your understanding of how the die actually behaves. The dice below always has one show up on every single roll. Consequently then, we can assume this dice is weighted, or actually, that each roll is the same one just looped. Thanks to James Neilson for this awesome animation! The full GIF can be found here on giphy.
Picture

Like the die, the errors should be un
predictable for any given observation. Since they are a calculation of the difference in values, the residuals should not be either systematically high or low. So, the residuals should be centered on zero throughout the range of fitted values. In other words, the model is correct on average for all fitted values. This plot is a Residual vs Fitted Value graph. Additionally, residuals for OLS regressions should be normally distributed. Naturally then, a QQ-plot would be used to assess the normality of the residuals.


In a poorly fit regression model, you can predict non-zero values for the residuals based on the fitted value. The non-random pattern in the residuals indicates that the deterministic portion of the model is not capturing some relevant explanatory information. Possible explanations include a missing variable, a missing higher-order term of a variable in the model to explain the curvature, or a missing interaction between terms already in the model. Everything that is possible with your predictors must be explained so that only random error is leftover. If there are non-random patterns in your residuals, it means that the predictors are missing something that is negatively affecting the regression model.

In the real world, the best measure of a model’s fit depends on your objectives, and more than model one are oftentimes deemed useful. Remember, the metrics and strategies discussed above are applicable only to regression models that use OLS estimation. Many types of regression models, such as mixed models, generalized linear models, and event history models to name a few, use maximum likelihood estimation. Different metrics and criterion should be used in those cases. Look out for an explanation of those in a future post!
​
Have you ever created a model only to discover it is less than useful? Let us know in the comments below! We learn and grow as analysts by sharing our experiences. We can’t wait to join the conversation!

The SaberSmart Team
Comments
comments powered by Disqus

    Archives

    August 2019
    July 2019
    January 2019
    October 2018
    September 2018
    August 2018
    July 2018
    June 2018
    April 2018
    February 2018
    December 2017
    November 2017
    October 2017
    September 2017
    August 2017
    July 2017
    June 2017
    May 2017
    April 2017
    March 2017
    February 2017
    January 2017
    December 2016

    Categories

    All
    Analytics
    Big Data
    Computer Science
    Economics
    Essay
    Football
    Gambling
    History
    Mathematics
    MLB Teams
    NBA Teams
    NFL Teams
    Philosophy
    Super Bowl
    Triple Crown
    World Series

    RSS Feed

    Follow @sabersmartblog
    Tweets by sabersmartblog
 Support this site by clicking through the banner below:
  • Home
  • Blog
    • Throwback
  • Playoff Odds
    • MLB >
      • 2019 >
        • Total Playoff
        • Win Division
        • Win WildCard
        • Expected Wins
      • 2018 >
        • Total Playoff
        • Win Division
        • Win WildCard
        • Expected Wins
      • 2017 >
        • Total Playoff
        • Win Division
        • Win WildCard
        • Expected Wins
    • NBA >
      • 2018 >
        • Total Playoff
        • Expected Wins
      • 2017 >
        • Total Playoff
        • Expected Wins
  • About
    • Contact
    • Comment Policy