This week, we tackle a common question in today’s realm of Big Data. We all know that there are many systems and methods for collecting data, from web/text scraping, click monitoring, and financial data just to name a few. As a society filled with a rapidly growing amount of data and analytical problems, how do we know when you have enough data? Could there even be situations in which we might have too much data?
First and foremost, an organization, sports or otherwise, needs enough data to compete analytically. That is, they have a large enough sample to build a predictive model that will drive their organizational decision making. From here, if the cost of acquiring data is not too expensive for an organization, they would want as much data as possible and continue to strive to acquire it.
Why as much data as possible? If an organization is analytically driven, they are going to constantly look for new and innovative ways to improve the operations of their company. Data drives this innovation, and if they find themselves limited in terms of pure volume, they will be limited in terms of the amount of data available for innovation. This will make it extremely difficult to stay ahead of the curve.
But what if an organization is just starting to use analytics to drive decision making? In this case, they must understand that much of the volume acquired will be unused. But, they should come to think of this as a long-term investment. They are essentially paying the costs to acquire the data now to innovate later.
Take for example, the MLB. Major League Baseball has taken an interesting approach with big data according to R.J. Anderson of Newsweek. The league is using Statcast, an innovative technology that tracks every pitch, swing, movement, and sending this data to every team. Therefore, teams are not responsible for the cost of acquiring the large amount of data given to them. This is an interesting take, as MLB’s main goal here was to level the playing field and not force small market teams to make an expensive decision.
The NBA, for an additional example, has adopted this method as well. But, first, they allowed each team to decide on whether they wanted to spend nearly $100,000 for cameras from STATS Inc that would track player movements. When left up to each organization, only around half the teams actually paid for this. Because the NBA wanted to level the statistical playing field, they decided to pay for everyone to use these cameras anyway.
We here at SaberSmart believe that all of the big sport conglomerates should encourage equal access to their data. There are still many competitive advantages to be implemented through innovations discovered through analysis of this data. For instance, Statcast is providing more insights into pitcher analysis and even catcher framing and fielding techniques. Eventually, this will lead to new strategies on player worth and free agent targeting, as well as more accurate salaries in relation to the contributions a player can provide for their team.
The only drawback to having so much data is infrastructure costs. It is not cheap to properly store and secure this data, and if improperly managed could lead to breaches. For example, just ask the Astros after the Cardinals hacked them if they wished that they had secured their data better! This could lead to small market teams not being able to effectively utilize all of the data available to them, as they do not have the infrastructure in place to manage the influx of data. This applies to other companies as well, a small company with slim profit margins could not find it worthwhile from a cost analysis view to implement the infrastructure needed for big data analysis.
Ultimately, an organization striving to compete analytically should seek to acquire as much data as possible, keeping in mind infrastructure costs and other consequent expenses. This will pave the way for model building now and innovation in the future, while allowing the company, or sports organization, to make it their in one piece.
What do you think? Are technologies like Statcast getting out of hand? Or should they keep developing new metrics and data to gather to fuel new innovations? We look forward to hearing your thoughts below!
The SaberSmart Team