SaberSmart
  • Home
  • Blog
    • Throwback
  • Playoff Odds
    • MLB >
      • 2019 >
        • Total Playoff
        • Win Division
        • Win WildCard
        • Expected Wins
      • 2018 >
        • Total Playoff
        • Win Division
        • Win WildCard
        • Expected Wins
      • 2017 >
        • Total Playoff
        • Win Division
        • Win WildCard
        • Expected Wins
    • NBA >
      • 2018 >
        • Total Playoff
        • Expected Wins
      • 2017 >
        • Total Playoff
        • Expected Wins
  • About
    • Contact
    • Comment Policy

How Data Types Affect Big Data Analytics

6/29/2017

Comments

 
Picture
In today’s technologically dependent and interconnected world, data can come in many forms. There are structured data as we encounter in the numeric fields of a database. There are semi-structured data and unstructured data, as we encounter in text files and web interaction. In fact, unstructured data is the most common, lurking in places where data is not regularly deemed to exist. Current analytical work requires extensive time spent putting data into a structured form and preparing it for analysis. Consequently, being able to understand the different data types is vital for analytical success.
Structured and unstructured data are both used extensively in big data analysis. Historically, because of limited processing capability, inadequate memory, and high data-storage costs, utilizing structured data was the only means to manage data effectively. More recently, unstructured data analytics sources have skyrocketed in use due to the increased availability of storage and the sheer number of complex data sources.

Structured data is very banal. It concerns all data which can be stored in a relational database like SQL, in tables with rows and columns. They always have a relational key and can be easily mapped into pre-designed fields. Today, this data are the most processed in development and the simplest way to manage information. However, unfortunately structured data represent only 5 to 10% of all informatics datas.

Structured data leaves out immense amounts of material that do not fit simply into a firm’s organization of information. Until recently, structured data was supplemented by this additional information in the form of paper or microfiche. With the improvement of processing by computers, lowered cost of data storage, and the spread of new formats of data, semi-structured data and unstructured data are saturating businesses.

Semi-structured data is information that does not reside in a relational database but that does have some organizational properties that make it easier to analyze. With some process you can store them in relation database, but the semi structure exists to ease space, clarity and computations. Some NoSQL databases are optimized to store semi-structured data.

Unstructured data represents around 80% of data! It often includes text and multimedia content,  e-mail messages, word processing documents, videos, photos, audio files, presentations, webpages and many other kinds of business documents. While these sorts of files may have an internal structure, they are still considered unstructured because the data they contain does not fit neatly in a database. Consequently, unstructured data is everywhere. In fact, most individuals and organizations conduct their lives around unstructured data.

The fundamental challenge of unstructured data sources is that they are difficult for nontechnical business users and data analysts alike to unbox, understand, and prepare for analytic use.  Beyond issues of structure, is the sheer volume of this type of data. Because of this, current data mining techniques often leave out valuable information and make analyzing unstructured data laborious and expensive.

Most of the data used in corporate business structures are unstructured, or sometimes semi-unstructured. For example, for an online retailer, click counts and visitor information are extremely important. While, this information can contain a multitude of data, it does not have to. Usually, this data can be stored in a graph database, so one can track the journey of a customer on their website as one basic use case. A lot of preparation needs to go into cleaning and processing this data to make it usable.

One of the most common metrics in retail is success, e.g. a customer added something to their cart. However, a visitor can have multiple visits in a day, or could have an extensive journey with multiple success actions on various products. To truly understand how a product page is doing requires extensive aggregation and cleaning of this data. However, trying to collect this data in a structured way would make one miss out on other valuable information. Sometimes that is the price you have to pay to achieve a competitive advantage by using predictive analytics!.

What are your thoughts? How much of your time is devoted to data preparation? Let us know in the comments below!

The SaberSmart Team
Comments
comments powered by Disqus

    Archives

    August 2019
    July 2019
    January 2019
    October 2018
    September 2018
    August 2018
    July 2018
    June 2018
    April 2018
    February 2018
    December 2017
    November 2017
    October 2017
    September 2017
    August 2017
    July 2017
    June 2017
    May 2017
    April 2017
    March 2017
    February 2017
    January 2017
    December 2016

    Categories

    All
    Analytics
    Big Data
    Computer Science
    Economics
    Essay
    Football
    Gambling
    History
    Mathematics
    MLB Teams
    NBA Teams
    NFL Teams
    Philosophy
    Super Bowl
    Triple Crown
    World Series

    RSS Feed

    Follow @sabersmartblog
    Tweets by sabersmartblog
 Support this site by clicking through the banner below:
  • Home
  • Blog
    • Throwback
  • Playoff Odds
    • MLB >
      • 2019 >
        • Total Playoff
        • Win Division
        • Win WildCard
        • Expected Wins
      • 2018 >
        • Total Playoff
        • Win Division
        • Win WildCard
        • Expected Wins
      • 2017 >
        • Total Playoff
        • Win Division
        • Win WildCard
        • Expected Wins
    • NBA >
      • 2018 >
        • Total Playoff
        • Expected Wins
      • 2017 >
        • Total Playoff
        • Expected Wins
  • About
    • Contact
    • Comment Policy