The relational model for databases has been the dominant model for database systems for many years. Perhaps one of the most important reasons this type of database became popular was because of its programming abstraction and structured query language (SQL). While we all may be familiar with SQL and relational databases, how did their rise to power begin? Will they remain dominant in this new age of Big Data analytics?
Essentially, relational databases, and the various software systems used to maintain them (RDBMS), are digital databases whose organization is based upon the relational model of data, first proposed by IBM developer Edgar. F. Codd in 1970. Most of the innumerable data transactions we routinely make today, e.g. using bank accounts and credit cards, trading stock, making travel reservations and participating in online auctions to name a few, use relational databases based on the abstract and sophisticated mathematical theory that Codd first published in his article, A Relational Model of Data for Large Shared Data Banks. Oracle’s database structure and software originated from this innovative paper.
To understand the rise of relational databases, it is imperative to realize that the computing landscape in the early 1970s was a far cry from the gigahertz, terabyte and petaflop scene we have today. Computer calculations cost hundreds of dollars a minute, so great human effort was spent to make programs as efficient as possible before they were run. This can be seen with the giant IBM computers in the movie Hidden Figures. Databases used either a rigid hierarchical structure, like what we know now as XML, or a complex navigational plan of pointers and nodes to the physical locations of the data on magnetic tapes called the network model.
Each node in the network model can have multiple parents and children so that you can walk in many directions from any node. Teams of programmers were needed to express queries to extract meaningful information. While such databases could be efficient in handling the specific data and queries they were designed for, they were absolutely inflexible. New types of queries required complex reprogramming, and adding new types of data forced a total redesign of the database itself.
Codd’s breakthrough proposed replacing the hierarchical or navigational structure with simple tables containing rows and columns. Databases using this structure are called relational because they show relationships among different kinds of data in the form of these tables. Relational databases express those relationships between data using computed joins. These computed joins are dynamically updated by simply changing the appropriate values in the tables.
Thus since the joins are dynamically computed, the tables that "point" at a given table will always point at the "right rows". This simple data modeling abstraction that relational databases provide is a main cause of their ubiquity and popularity. The relational model can easily map real-world use cases and consequently provide a plethora of business advantages. The table and column abstraction is also easier to grasp than other data modeling abstractions like objects and graphs.
Finally, the structured query language (SQL) behind the relational database is a huge reason for its explosive growth. As stated above, querying previous database structures was difficult and inefficient. However, the SQL programming model is similar to set operations which is easy to learn. In a generic relational database, the query engine parses SQL and generates a query execution path. The user does not have to worry about how the data is stored and accessed.
The query engine has already been optimized by different vendors across several years and most of them generate optimum execution plans. This does mean that there are various SQL dialects so sometimes portability must be sacrificed. Additionally, SQL is a declarative language where you say what you want, rather than how to do it.
Due to the dramatic growth of unstructured data within enterprises, such as from the Big Data revolution, other databases, creatively known as NoSQL, came into existence. NoSQL is having an impact on the $46 billion database market—still just 3% of the market, but growing at a rapid pace even as more traditional relational databases inch up by 5.4%, according to IDC. But, by that same measure, as well as the updated DB-Engines database popularity rankings, relational databases still definitely dominate big data. Also, analytics tooling for NoSQL is still in its infancy.
As Gartner analyst Lynn Robison points out, NoSQL-friendly analytics tools are not user-friendly, and "it will take years for analytical tools to mature and become accessible to people who are not in data science." With these current trends then, we can expect NoSQL and relational databases to both share the big data winner's podium for many years to come.
What do you think? Which database system do you prefer? Let us know in the comments below!
The SaberSmart Team