SaberSmart
  • Home
  • Blog
    • Throwback
  • Playoff Odds
    • MLB >
      • 2019 >
        • Total Playoff
        • Win Division
        • Win WildCard
        • Expected Wins
      • 2018 >
        • Total Playoff
        • Win Division
        • Win WildCard
        • Expected Wins
      • 2017 >
        • Total Playoff
        • Win Division
        • Win WildCard
        • Expected Wins
    • NBA >
      • 2018 >
        • Total Playoff
        • Expected Wins
      • 2017 >
        • Total Playoff
        • Expected Wins
  • About
    • Contact
    • Comment Policy

Tackling Big Data with Divergent Database Systems

7/10/2017

Comments

 
Picture
​The relational model for databases has been the dominant model for database systems for many years. Perhaps one of the most important reasons this type of database became popular was because of its programming abstraction and structured query language (SQL). While we all may be familiar with SQL and relational databases, how did their rise to power begin? Will they remain dominant in this new age of Big Data analytics?

Essentially, relational databases, and the various software systems used to maintain them (RDBMS), are digital databases whose organization is based upon the relational model of data, first proposed by IBM developer Edgar. F. Codd in 1970. Most of the innumerable data transactions we routinely make today, e.g. using bank accounts and credit cards, trading stock, making travel reservations and participating in online auctions to name a few, use relational databases based on the abstract and sophisticated mathematical theory that Codd first published in his article, A Relational Model of Data for Large Shared Data Banks. Oracle’s database structure and software originated from this innovative paper.
To understand the rise of relational databases, it is imperative to realize that the computing landscape in the early 1970s was a far cry from the gigahertz, terabyte and petaflop scene we have today. Computer calculations cost hundreds of dollars a minute, so great human effort was spent to make programs as efficient as possible before they were run. This can be seen with the giant IBM computers in the movie Hidden Figures. Databases used either a rigid hierarchical structure, like what we know now as XML, or a complex navigational plan of pointers and nodes to the physical locations of the data on magnetic tapes called the network model.

Each node in the network model can have multiple parents and children so that you can walk in many directions from any node. Teams of programmers were needed to express queries to extract meaningful information. While such databases could be efficient in handling the specific data and queries they were designed for, they were absolutely inflexible. New types of queries required complex reprogramming, and adding new types of data forced a total redesign of the database itself.


Codd’s breakthrough proposed replacing the hierarchical or navigational structure with simple tables containing rows and columns. Databases using this structure are called relational because they show relationships among different kinds of data in the form of these tables. Relational databases express those relationships between data using computed joins. These computed joins are dynamically updated by simply changing the appropriate values in the tables.

Thus since the joins are dynamically computed, the tables that "point" at a given table will always point at the "right rows". This simple
data modeling abstraction that relational databases provide is a main cause of their ubiquity and popularity. The relational model can easily map real-world use cases and consequently provide a plethora of business advantages. The table and column abstraction is also easier to grasp than other data modeling abstractions like objects and graphs.


Finally, the structured query language (SQL) behind the relational database is a huge reason for its explosive growth. As stated above, querying previous database structures was difficult and inefficient. However, the SQL programming model is similar to set operations which is easy to learn. In a generic relational database, the query engine parses SQL and generates a query execution path. The user does not have to worry about how the data is stored and accessed.

The query engine has already been optimized by different vendors across several years and most of them generate optimum execution plans. This does mean that there are various SQL dialects so sometimes portability must be sacrificed. Additionally, SQL is a declarative language where you say what you want, rather than how to do it.


Due to the dramatic growth of unstructured data within enterprises, such as from the Big Data revolution, other databases, creatively known as NoSQL, came into existence. NoSQL is having an impact on the $46 billion database market—still just 3% of the market, but growing at a rapid pace even as more traditional relational databases inch up by 5.4%, according to IDC. But, by that same measure, as well as the updated DB-Engines database popularity rankings, relational databases still definitely dominate big data. Also, analytics tooling for NoSQL is still in its infancy.

As Gartner analyst Lynn Robison
points out, NoSQL-friendly analytics tools are not user-friendly, and "it will take years for analytical tools to mature and become accessible to people who are not in data science." With these current trends then, we can expect NoSQL and relational databases to both share the big data winner's podium for many years to come.


What do you think? Which database system do you prefer? Let us know in the comments below!

The SaberSmart Team
Comments
comments powered by Disqus

    Archives

    August 2019
    July 2019
    January 2019
    October 2018
    September 2018
    August 2018
    July 2018
    June 2018
    April 2018
    February 2018
    December 2017
    November 2017
    October 2017
    September 2017
    August 2017
    July 2017
    June 2017
    May 2017
    April 2017
    March 2017
    February 2017
    January 2017
    December 2016

    Categories

    All
    Analytics
    Big Data
    Computer Science
    Economics
    Essay
    Football
    Gambling
    History
    Mathematics
    MLB Teams
    NBA Teams
    NFL Teams
    Philosophy
    Super Bowl
    Triple Crown
    World Series

    RSS Feed

    Follow @sabersmartblog
    Tweets by sabersmartblog
 Support this site by clicking through the banner below:
  • Home
  • Blog
    • Throwback
  • Playoff Odds
    • MLB >
      • 2019 >
        • Total Playoff
        • Win Division
        • Win WildCard
        • Expected Wins
      • 2018 >
        • Total Playoff
        • Win Division
        • Win WildCard
        • Expected Wins
      • 2017 >
        • Total Playoff
        • Win Division
        • Win WildCard
        • Expected Wins
    • NBA >
      • 2018 >
        • Total Playoff
        • Expected Wins
      • 2017 >
        • Total Playoff
        • Expected Wins
  • About
    • Contact
    • Comment Policy