Inside the Hekaton: SQL Server 2014's database engine deconstructed
Nadella's database squares the circle of cheap memory vs speed
It's 1996 and Mission:Impossible has just arrived on the cinema screens. RAM is $10 per megabyte and falling. Against this backdrop, Microsoft has quietly commenced its own seemingly impossible mission, a complete re-write of a little known database which will be released two years into the future, and will be known as SQL Server 7.
Redmond's new database will compete with the entrenched kings of the database world, Oracle and DB2.
Fast-forward to 2014.
MI:4 has come and gone. RAM costs $10 a gigabyte. And now we have the latest incarnation of one of the industry's best-selling databases: SQL Server 2014. Microsoft's freshly minted chief executive Satya Nadella presides over the launch at an overly orchestrated corporate event in San Francisco, California, intended to demonstrate his company is all over "big data."
SQL Server is consistently one of the strongest growers in Microsoft's product portfolio, meaning Microsoft's revenue share of the RDBMS market has grown in nearly two decades. Exact numbers vary according to the analyst you buy into, but what's not in dispute is the fact that Microsoft has clawed out a third-place position behind IBM and Oracle (number one) who have been going at this for longer.
Arguably, the only thing that's prevented SQL Server from doing better is the fact it's only available for Windows.
Market share aside, perhaps the two most significant changes in those 18 years have been on the technology front - the increase in server processing power and the price of RAM, the latter decreasing in cost one thousand fold. This is significant, as it is these two factors that are the major drivers behind SQL Server 2014's key theme of performance.
Memory is the new disk
When designing this release, Microsoft re-evaluated the reasons data is stored on disk rather than in memory. This is reflected in the three key features of this release. Assumptions valid when SQL Server 7 was written no longer hold true, ie: main memory was expensive and limited, and CPUs and operating systems had only 32-bit address spaces.
Main memory is now huge, dense, and most importantly, very cheap. Durable flash memory has and will almost certainly continue to follow the same curve. Whilst 32-bit Windows coped with at most 3GB of usable memory, CPUs are now 64-bit and Windows Server 2012 can address 4TB of RAM. Memory sizes are growing at such a rapid rate, it's estimated that more than 95 per cent of disk-based online transaction processing (OLTP) databases are small enough to be stored in-memory.
Conversely, there are problems with disk. Data retrieval latency is orders of magnitude slower than memory. We're talking milliseconds compared to nanoseconds, a million-fold difference.
Also, as processors have reached commercially viable speed limits, to increase their computational throughput CPU manufacturers have pipelined and scaled out their processors. In 1996, twin core CPU systems were a rare novelty and hugely expensive. In 2014, encountering 16, 24 or 32 cores inside a database server is a given.
These parallel giants consume data at a staggering rate, spending much of their time waiting on the disk subsystem to supply it, whilst disk-oriented mechanisms have evolved within the database engine to ensure data updates behave in the consistent way we expect them to in order to maintain ACID compliance.
It is time to re-think why data is stored on slow, unreliable disk, for anything other than recovery and archive purposes.
In SQL Server 2014, Microsoft has started again. The Hekaton database engine – an addition to rather than a replacement for the core SQL Server database engine – now has the capability to store tables in memory. In contrast to Oracle's TimesTen and IBM's SolidDB, SQL Server's in-memory database engine is fully integrated into the core product, and at no additional cost. This makes adoption attractive both technically and financially.
One table at a time migration is practical, reducing risk without requiring expensive consultancy or new skill sets. All common data types are supported - only the less often used XML, CLR and max() types are excluded. Table schemas only, or schemas and data can be persisted in memory, the former useful in cases where temporary data does not have to be persisted, such as staging tables. Furthermore, in contrast to its competitors, all tables or one table at a time can be migrated.
Full ACID compliance is ensured through optimistic MVCC (multi version concurrency control). Row-versioning and time-stamping is used rather than page locks to ensure ACID compliance, now a feasible technique as each row is stored in memory rather than disk. This results in the elimination of locking and blocking, and the waits and timeouts associated with them, a major bane of any production database administrator's life.
As a result, disk read waits are completely eliminated, as all reads are now performed from memory. Schema changes have to be logged to disk, not a big deal as they are comparatively rare. If the schema plus data table type is used, data changes are also logged to disk to provide durability for ACID compliance and recovery.
In this case, checkpointing and write logging still has to be performed to maintain durability, but this is now implemented much more efficiently, and in a well configured system rarely presents a bottleneck.
Hekaton's in-memory features are supported by traditional backup and restore and SQL Server's high availability options, AlwaysOn and log shipping. Puzzlingly, replication is not supported, though neither is database mirroring, which is understandable as it is soon to be deprecated.
A new migration aid, the Analysis, Migration and Reporting tool, is included as part of SQL Server Management Studio to assist the DBA in converting a schema to in-memory tables. With a few clicks, this generates a report containing recommendations and any necessary changes for successful migration.
Although this is of great assistance, a little new knowledge needs to be acquired by the DBA to transition data from disk to memory resident tables, particularly in regard to the restrictions imposed upon table schemas. Developers are unlikely to persevere with them in the current release, but Microsoft’s recent track record with SQL Server is to ship functionality and reliability first, then add GUI based front end management tools later – the way it should be.
In the final pre-manufacturing release, the most notable issues are lack of support for truncate and merge statements, and a clutch of disallowed query hints that might cause existing code to break. There are other minor restrictions, but Microsoft needs to address the keyword constraints, as Hekaton will probably be first adopted by data warehouse customers, where the use of both statements is likely to be more prevalent. The migration wizard also needs some work in terms of its recommendations, but this will no doubt improve in time.