This week, we chose to get a little philosophical and discuss the construct-measurement gap in statistical data gathering, or surveys, and in particular the creation of sample statistics in baseball.
The famous scholar Alfred Korzybski once asserted that, "the map is not the territory and the name is not the thing named" to encapsulate his world view that a human abstraction or construct derived from a physical object or consequent reaction thereof is not the same thing as the object itself. His assertion comes as a response to his observation that many people confuse models of reality with its actuality. Korzybski’s legacy includes generations of social scientists conjuring up his name when talking about the "gap" between a construct and measurement.
His ideology applies to statistics and particularly statistical surveys due to the mismatch between a statistical construct, or the elements of information that are sought after by an analyst, and its associated measurement. Since measurements are inherently human models of the underlying constructs, there will always be a “gap” between the conceptualized true value and the measurement gathered.
Statistics call this gap “validity” and note that there is always an error value between a measured statistic and its underlying true value. In a statistical survey, measurements are taken at an individual level in an attempt to describe an underlying statistic for an entire population. In this case, there is a construct-measurement gap between the measurement and true value of an individual as well as the subsequent measurement and true value of the population attempting to be described.
A basic example of this is the measurement of home runs across both American and National Leagues. Since the American League utilizes the designated hitter (DH), our hypothesis is that there are more home runs generated in the American League. Now, we conceptualize home runs simply as the total amount swatted in a season as decided by a player’s, inherent, yet unknown, ability to hit a homerun. By only watching a particular player play during the season do we get a sample of that ability. For example, Kris Bryant of the Chicago Cubs hit 39 home runs last year. But what if, with his innate ability, he should have hit 45? The construct-measurement gap for Kris Bryant is then minus 6. Now, every measurement for all players in a season could have a gap, resulting in a measurement of the sample differing from the true home run hitting ability of the population by the summation of each individual gap. Thus the construct-measurement gap at a simple level.
Simply put, the American League hit 2,953 home runs in 2016 while the National League only hit 2,657. However, due to the construct-measurement gap, we must expect an inherent error in these numbers. Therefore, we cannot unequivocally say that the American League is better at hitting home runs than the National League without simulating the season multiple times to reduce error. We can only conclude that we are pretty confident based off on only this one data point that the American League is more likely to hit more home runs than the National League.
Luckily, the construct-measurement gap does not rule out the entire statistical process. By keeping in mind this error component, baseball analysts and statisticians can develop methodologies to reduce this mismatch and maximize the accuracy of the sample statistic to the inherent population value. This is where confidence intervals, polling errors, and other reported sampling errors originate.
The SaberSmart Team
Groves, R., Fowler, F., Couper, M., Lepkowski, J., Singer, E., Tourangeau, R., (2009). Survey methodology. Hoboken, New Jersey: John Wiley & Sons, Inc.