This week, we want to tell a story. A story of wastefulness, pride, and a team resting on its laurels after almost winning the World Series. The setting, North Texas. Globe Life Park in Arlington, Texas is one of the largest ballparks in the country, with an attendance capacity of 49,115 fans. In 2012, the Texas Rangers had just come off of a second consecutive World Series appearance. However, they were still not achieving sell-outs of every game. In fact, in 2011, they did not have a single sold out game. This means that there was still room for improvement to try and draw more fans to their games. Unfortunately, either they assumed that back to back World Series appearances would be enough to draw fans or they did not believe in the power of giveaways.
Last week, we talked about the differences between building models for statistical inference versus building models for predictive modeling. Predictive modeling has not always been part of the statistics community. One person who is quite responsible for bridging the gap between the computer science community and the statistics community is Leo Breiman, known colloquially as the Father of CART and Random Forests. In his 2001 article, "Statistical Modeling: The Two Cultures", he articulated his views on the difference between the modern, at least for his time, statistics community and the machine learning community. Here are our thoughts on this ground breaking paper, and the topic in general.
With so much terminology thrown around in regards to big data, and especially with machine learning, we thought it would be helpful to explore some of the more common verbiage. In this post, we delve into the idiosyncrasies behind two common concepts thrown around with predictive analytics, statistical inferences and predictive modeling. While sometimes used in similar situations, they really are independent concepts.
When analyzing big data, we can build statistical models for inference purposes or predictive purposes. For instance, imagine that we fit a simple linear regression model Y = b0 + b1X. If we fit this model for the purpose of statistical inference, our primary motivations and conclusions gathered from statistical tests are about the data itself. If we built this model for predictive purposes, statistical inference and tests are still important, however our motivations are steered primarily by the success of the predicted values. Consequently, the metrics used to evaluate these models have to be unique to the underlying motivations.
The St. Louis Cardinals are not great this year. They were not great last year either. To the casual baseball fan, the fact that the Cardinals stood pat at the most recent trade deadline, then, is surprising. Why not selloff your best players for prospects to help you win in the future? While many gripe that this organization has been missing a certainty of direction for the last couple years, we wanted to look back at perhaps their largest trade of the last half decade, one that boosted them from greatness to new heights.