Inside the Hekaton: SQL Server 2014's database engine deconstructed
Nadella's database squares the circle of cheap memory vs speed
It's 1996 and Mission:Impossible has just arrived on the cinema screens. RAM is $10 per megabyte and falling. Against this backdrop, Microsoft has quietly commenced its own seemingly impossible mission, a complete re-write of a little known database which will be released two years into the future, and will be known as SQL Server 7.
Redmond's new database will compete with the entrenched kings of the database world, Oracle and DB2.
Fast-forward to 2014.
MI:4 has come and gone. RAM costs $10 a gigabyte. And now we have the latest incarnation of one of the industry's best-selling databases: SQL Server 2014. Microsoft's freshly minted chief executive Satya Nadella presides over the launch at an overly orchestrated corporate event in San Francisco, California, intended to demonstrate his company is all over "big data."
SQL Server is consistently one of the strongest growers in Microsoft's product portfolio, meaning Microsoft's revenue share of the RDBMS market has grown in nearly two decades. Exact numbers vary according to the analyst you buy into, but what's not in dispute is the fact that Microsoft has clawed out a third-place position behind IBM and Oracle (number one) who have been going at this for longer.
Arguably, the only thing that's prevented SQL Server from doing better is the fact it's only available for Windows.
Market share aside, perhaps the two most significant changes in those 18 years have been on the technology front - the increase in server processing power and the price of RAM, the latter decreasing in cost one thousand fold. This is significant, as it is these two factors that are the major drivers behind SQL Server 2014's key theme of performance.
Memory is the new disk
When designing this release, Microsoft re-evaluated the reasons data is stored on disk rather than in memory. This is reflected in the three key features of this release. Assumptions valid when SQL Server 7 was written no longer hold true, ie: main memory was expensive and limited, and CPUs and operating systems had only 32-bit address spaces.
Main memory is now huge, dense, and most importantly, very cheap. Durable flash memory has and will almost certainly continue to follow the same curve. Whilst 32-bit Windows coped with at most 3GB of usable memory, CPUs are now 64-bit and Windows Server 2012 can address 4TB of RAM. Memory sizes are growing at such a rapid rate, it's estimated that more than 95 per cent of disk-based online transaction processing (OLTP) databases are small enough to be stored in-memory.
Conversely, there are problems with disk. Data retrieval latency is orders of magnitude slower than memory. We're talking milliseconds compared to nanoseconds, a million-fold difference.
Also, as processors have reached commercially viable speed limits, to increase their computational throughput CPU manufacturers have pipelined and scaled out their processors. In 1996, twin core CPU systems were a rare novelty and hugely expensive. In 2014, encountering 16, 24 or 32 cores inside a database server is a given.
These parallel giants consume data at a staggering rate, spending much of their time waiting on the disk subsystem to supply it, whilst disk-oriented mechanisms have evolved within the database engine to ensure data updates behave in the consistent way we expect them to in order to maintain ACID compliance.
It is time to re-think why data is stored on slow, unreliable disk, for anything other than recovery and archive purposes.
In SQL Server 2014, Microsoft has started again. The Hekaton database engine – an addition to rather than a replacement for the core SQL Server database engine – now has the capability to store tables in memory. In contrast to Oracle's TimesTen and IBM's SolidDB, SQL Server's in-memory database engine is fully integrated into the core product, and at no additional cost. This makes adoption attractive both technically and financially.
One table at a time migration is practical, reducing risk without requiring expensive consultancy or new skill sets. All common data types are supported - only the less often used XML, CLR and max() types are excluded. Table schemas only, or schemas and data can be persisted in memory, the former useful in cases where temporary data does not have to be persisted, such as staging tables. Furthermore, in contrast to its competitors, all tables or one table at a time can be migrated.
Full ACID compliance is ensured through optimistic MVCC (multi version concurrency control). Row-versioning and time-stamping is used rather than page locks to ensure ACID compliance, now a feasible technique as each row is stored in memory rather than disk. This results in the elimination of locking and blocking, and the waits and timeouts associated with them, a major bane of any production database administrator's life.
As a result, disk read waits are completely eliminated, as all reads are now performed from memory. Schema changes have to be logged to disk, not a big deal as they are comparatively rare. If the schema plus data table type is used, data changes are also logged to disk to provide durability for ACID compliance and recovery.
In this case, checkpointing and write logging still has to be performed to maintain durability, but this is now implemented much more efficiently, and in a well configured system rarely presents a bottleneck.
Hekaton's in-memory features are supported by traditional backup and restore and SQL Server's high availability options, AlwaysOn and log shipping. Puzzlingly, replication is not supported, though neither is database mirroring, which is understandable as it is soon to be deprecated.
A new migration aid, the Analysis, Migration and Reporting tool, is included as part of SQL Server Management Studio to assist the DBA in converting a schema to in-memory tables. With a few clicks, this generates a report containing recommendations and any necessary changes for successful migration.
Although this is of great assistance, a little new knowledge needs to be acquired by the DBA to transition data from disk to memory resident tables, particularly in regard to the restrictions imposed upon table schemas. Developers are unlikely to persevere with them in the current release, but Microsoft’s recent track record with SQL Server is to ship functionality and reliability first, then add GUI based front end management tools later – the way it should be.
In the final pre-manufacturing release, the most notable issues are lack of support for truncate and merge statements, and a clutch of disallowed query hints that might cause existing code to break. There are other minor restrictions, but Microsoft needs to address the keyword constraints, as Hekaton will probably be first adopted by data warehouse customers, where the use of both statements is likely to be more prevalent. The migration wizard also needs some work in terms of its recommendations, but this will no doubt improve in time.
Also of note in this release, Microsoft has quietly introduced natively compiled stored procedures. These compile down to DLLs containing ready to run machine code, working with in-memory tables to reduce CPU compilation overhead on high volume transactional systems. Although currently not supporting disk-based tables, cursors, multi row inserts, CTEs or correlated sub queries, they are a useful optional performance tweak that can be easily introduced with a few minor syntax changes wherever in-memory tables are used.
ColumnStore indexes caused a sensation in SQL Server 2012, offering big performance gains and very impressive in-memory and on-disk data compression. Unfortunately, because you could not update them, they were limited mostly to data warehouse scenarios where indexes are regularly dropped and rebuilt, and were not adopted as widely as might be expected.
We were all waiting for something we could use on our OLTP databases.
SQL Server 2014 addresses this shortcoming with updateable columnstore indexes, removing the need to drop and rebuild the index each time data is updated. In addition, the COLUMNSTORE_ARCHIVE option offers further compression for rarely accessed, archive and legacy data, a feature particularly useful in data warehouses in conjunction with partitioning.
It's worth remembering that the prospect of updateable columnstore indexes caused much excitement when they were first discussed at SQL Server conferences immediately after the 2012 launch. Although overshadowed by Hekaton, this is a very significant performance enhancing feature that should be understood and in the arsenal of every DBA and developer.
BPE: Buffer Pool Extensions
Whilst the addition of buffer pool extensions may not be the most dazzling new feature in SQL Server 2014, we should not make light of its benefits. This new feature can be used on existing OLTP systems independent of Hekaton, transparently, without any conversion and with minimal effort, unlike Hekaton and ColoumnStore indexes.
Broadly speaking, when a query requests data from a database, SQL Server will read the data pages from disk into the buffer pool, so that the next time they’re requested they can be read from memory rather than disk. Updated pages are modified in the buffer pool before being written back to disk.
In a typical OLTP system, these intensive, random I/O operations are small reads and writes, which contend for the use of a mechanical disk arm and the rotation of a spinning disk in order to meet the request. Under load, I/O latency increases and transactional throughput decreases. Database server performance drops. Specifically, queries that return data from disk are slower than those that return data already cached in memory – the buffer pool.
The traditional solution for reducing these bottlenecks is to add DRAM to the server’s motherboard to create a larger buffer pool. The larger the buffer pool, the less disk access, which improves performance. This, incidentally, is why SQL Server grabs as much memory as it can at start-up.
But main memory is limited, and expanding it involves unacceptable service outage time on many systems. Microsoft have recognised this problem, and with the rapidly falling price, increased capacity and superior speed of SSD storage over disk, have introduced it as a dedicated, tertiary layer between DRAM and disk. This creates a two level, tiered storage hierarchy for the buffer pool. Level 1 is the primary buffer pool, stored in DRAM and level 2 (L2) is the buffer pool extension, held on the SSD. To maintain data consistency, only clean, or unmodified pages are written to the L2 SSD cache to prevent data loss, the buffer manager being responsible for moving pages between the two.
All of this is transparent and can be configured by the DBA in-flight with a single, simple T-SQL command, extending the buffer pool up to 32 times the size of RAM. The only downsides? As might be expected, removing the BPE causes memory pressure and subsequent performance degradation. We also found that when a SQL Server 2014 instance is shutdown, the BPE is automatically removed at re-start, and has to be manually reconfigured. This is something we’d like Microsoft to reconsider for a future minor release.
SQL Server 2014 floats closer to the cloud
You can now backup and restore databases and logs, both directly and through maintenance plans, to a URL or Windows Azure storage. This can be performed at the database or instance level, and from on-site or SQL Azure servers. Addressing security concerns about offsite data, proper database backup encryption has been introduced at long last, offering AES and Triple DES, effectively protecting offsite and onsite backups from those without the appropriate key or certificate. This feature is also available to normal, non-cloud backups.
High availability is there, too. AlwaysOn replicas can be hosted on a Windows Azure machine. This is useful and makes sound financial sense for those global organisations that want to make their data catalogue available through their websites in local markets around the world, or want to test their presence in a new market without rolling out a new data centre.
A mix of up to eight (previously four) secondary replicas can be created, with secondary replicas now remaining readable even in the event of a cluster loss, a significant high availability advance. The AlwaysOn management toolset has also been improved with a suite of new stored procedures and DMVs to further reduce troubleshooting time for production DBAs – critical in a DR situation.
Finally, for operations such as large data loads that would be re-started in the event of a failure, delayed durability increases log write performance, and can be set at the database or transaction level.
SQL Server 2014 dispenses with the fanfare of recent releases. Instead, it quietly but solidly positions it for the future, by exploiting the ongoing sea-changes and cost reductions in storage and CPU technologies. It also places it as the mainstream product others will have to emulate and follow to catch up. We think it’s the release with which SQL Server has finally come of age. Our opinion? Mission: Accomplished. ®
Jon is an independent SQL Server consultant, having worked as a data warehouse developer, DBA, team lead, trainer, author and manager for 18 years. He also runs the Bristol SQL Server user group, holds a BSc in Computer Science and an MSc in Business Intelligence from the University of Dundee. Rachel has worked as a software and database developer for 17 years across a wide variety of private industries. She is MCP Certified, runs the Bristol SQL Server user group and is involved in the technical community.