Get tooled up before grappling with Google's Spanner database

There's a learning curve, but you too can be Mr Worldwide

technician holding a toolbox

When companies started to build applications at web scale and needed services to scale to millions of requests across the globe, it quickly became apparent that relational databases just didn't work well enough.

So companies like Amazon and Google built their own that could service these requests but without all the features that made relational databases unreliable at that scale. Hence NoSQL, the collection of database-like alternatives that didn't use SQL and threw away transactions and referential integrity.

While the pendulum has swung back to relational, NoSQL databases haven't died out. That's because they have proved to be useful, both for solving the problems of access at scale, and because some of their data models have solved more problems than they create.

Document databases make the data mismatch between a programming language and data model so much easier to overcome than with previous solutions such as Object Relational Mapping. Graph databases are so much more efficient at storing networks of data than relational databases that they have become a staple of data analysis.

Still, many programmers and data modellers yearned for the warm fuzziness of the relational model. Something had to happen, and that something was the birth of NewSQL.

Charlie Chaplin Modern Times

Google Spanner in the NewSQL works?

READ MORE

Google's Spanner is one such NewSQL database. It promises a relational database model that can be scaled across the globe, making your data instantly available, anywhere.

How the software manages to scale so easily and still manage to be relational has been documented elsewhere, but it's all down to the use of atomic clocks and the fact that Google owns and manages its own network. The clocks keep transactions in the correct order while Google's network all but guarantees there won't be a partition, hence no problem with global consistency.

Sounds good, but ordinary folk wanting a slice of the NewSQL action can't just get hold of the software and set up a network of Spanner machines. Yes, you can set up atomic clocks (if you know what you're doing) and perhaps manage to keep them in sync given pesky little things like relativity getting in the way, but setting up and owning a network with the reliability of Google's is probably beyond the capability of all but the biggest corporations.

If you want to play with Spanner, you need a Google Cloud Platform account.

If you've never used it before but are familiar with Azure or AWS, you'll soon find yourself mostly at home. There are differences of course, but nothing that should get in the way. I found that setting up Google cloud on my laptop was relatively easy, especially once the cloud SDK was installed on the machine.

Setting up a Spanner instance is simple as well. Just create a new instance, choose if it's regional or multi-regional and the number of nodes you need. If it's multi-regional, your database will span the globe. Be careful, though, the cost of nodes can grow quite quickly – especially if you pick a multi-regional configuration. I found that testing Spanner drained my educational account quite quickly.

There is good news when it comes to setting up databases and configuring tables. It's all very easy with either a graphical interface or by entering SQL commands in a text window. Google also provides demo programs in a range of languages (Python, Java, C#, Go, Node, PHP and Ruby) that are easy to understand and will have your application up and running in no time. However, you will find that although this is SQL, it's not quite as rich as you'd expect. For instance, there is no "IF EXISTS" when creating a table and the range of data types is limited. Spanner does, however, have an array type to store a collection of values, although you'll find the usage different to Postgres for example.

Amazon data center

Yes sir, no sir, 3 bags NoSQL sir: It's a whizz-bang benchmark ... but WTF does it signify?

READ MORE

And this is where some of the problems start. If you already have a working application using MySQL or Postgres (other expensive relational engines are available) that you want to transfer over to Spanner, it's not just as simple taking your SQL table definitions and pumping them into Spanner via the interface or an application. The definition language is just different enough to require a major rewrite of the SQL.

The thing that makes relational databases so much more useful than most NoSQL databases is the ability to join data across tables through the use of the JOIN command and Spanner boasts a full collection of JOIN commands along with unions and order by. But beware: joins may give a performance hit if you are using multi-region instances, as data may need to be joined across instances.

SQL programmers from a range of backgrounds will find working with Spanner both familiar and strange. Foreign keys, for instance, can help with referential integrity and are often defined when creating a set of tables. Spanner doesn't have this concept – instead tables need to be defined as interleaved, which defines the tables as having a "strong data locality relationship". This is simple to do for two tables but the complexity rises quickly when more tables are involved and it certainly isn't standard SQL. Other statements for integrity are included such as "on delete", which will define what happens to child rows when the parent is deleted, but it might not operate, as you are used to. Indexes do what you expect, but again you might need to define them as interleaved to get the expected behaviour.

In short, Spanner does do what it sets out to do: provide the programmer with a realistic way to create a relational cluster that spans the globe. There is a learning curve to be mastered, though, and if you have an application already using a SQL database then there will be some pain in transferring over to Spanner.

The best option might just be starting small, working on a green-field app or simply fiddling about until that point you need a global, relational-capable database.

Just don't forget the cost of that Spanner cluster! ®




Biting the hand that feeds IT © 1998–2018