When is a database not so relational?

Storage in the cloud

Deep dive

In the first of two articles on cloud-based data storage, we shall explore the drawbacks of relational databases.

There are two types of databases you can use when hosting data in the cloud: a relational database or one that is so not so relational. For example, Amazon offers a choice of SimpleDB (not so relational) or MySQL (relational).

Soon it will offer Oracle 11g (relational) as another option. Some cloud providers, such as Microsoft, offer a similar choice, while Google, for example, offers only a not-so relational approach.

Typically, software-as-a-service (SaaS) offerings run on top of relational databases: Salesforce.com is a good instance of this.

Confused? You will be

With all these options, it is important to understand when you might prefer a relational approach or one that is not so relational, or when it might be appropriate to use a hybrid approach.

To begin with, why I have called some databases “not so relational”?

The reason is that there are two classes of these databases which confusingly are known as NoSQL, and NOSQL databases. One stands for “Not SQL” and the other stands for “Not only SQL”.

The basic fact is that these are all non-relational databases in the way that they store data. The distinction between “Not SQL” and “Not only SQL” is that the former can only be addressed via a suitable API and the latter can also be addressed using SQL.

This distinction is not confined to the cloud, neither is it new. For the moment we need to understand the problems that they try to address.

Relationship problems

There are two problems with relational databases in the cloud. The first and most important is to do with scalability. Relational databases scale well, but in most cases this is planned scalability – something you do over time. Dynamic scalability is what you need in the cloud, where workloads can triple overnight and then fall back down to half of where you started.

Classic relational databases are simply not designed for this sort of environment. As an example, applications built on top of Oracle RAC (Real Application Clusters) need to be cluster-aware to get optimal performance. That is, they need to know how many nodes are available to make best use of them.

The second issue is that relational databases are a poor fit for most software development. This nowadays usually involves an object-oriented approach in which you need to do object-to-relational mapping (and vice-versa) at the interface between the database and code.

Caught mapping

This mapping adds complexity and cost and decreases performance. Not-so-relational databases do not suffer from this to the same extent because they store information in a more object-oriented fashion.

But while the new generation of not-so-relational databases avoids this mismatch through their architecture (usually known as key/value stores) – this capability is by no means new and does not require this key/value architecture. For example, Caché from InterSystems similarly avoids this issue but it is by no means a key/value store – and it supports SQL. ®

This way for our second article on cloud database storage: Which kind of cloud database is right for you?

Sponsored: Network DDoS protection