More like this

Data Center

Arrow

Virtualization

Inside Aurora: how disruptive is Amazon’s MySQL clone?

Storage engine made for the cloud – shame it's only for MySQL

Amazon claims 5x read performance for Aurora vs MySQL

At its recent re:Invent conference in Las Vegas, Amazon announced its new database engine Aurora, claiming it to be a commercial grade database engine at open-source cost. “It’s at least as available, durable and fault-tolerant as the enterprise editions of the proprietary commercial database engines and high-end SANs,” said senior AWS VP Andy Jassy, “and it’s a tenth of the cost.”

So what makes Aurora tick? The database engine is compatible with the open source MySQL, and most of the smarts are in the storage, according to general manager Anurag Gupta, who described the new service in detail at a re:Invent session. When you use the new service, currently in limited preview, you rent a virtual machine (VM) instance in which the SQL and transaction engine runs. Caching also lives on this instance, though it runs in a separate process so that you can restart the database engine without losing the cache. Logging and storage are handled by an external layer that runs on Amazon’s storage service.

This approach means that failover to a replica, if you maintain multiple instances, is almost immediate, since all replicas use the same logging and storage layers. Similarly, there is negligible lag between updating the primary instance and reading the data back from a read replica. Gupta quoted 7.27ms replica lag at 13.8K updates per second. Arguably it is not really a replica at all, since the storage layer is the same; but the data is already replicated six times across three “availability zones”, an isolated location within an Amazon region. Data is split into small 10Gb segments, so recreating a segment after a failure is a quick operation, and you can lose up to two copies without impacting write operations, and up to three copies without impacting read operations.

Aurora uses a technique called Log Structured Storage, which means that the log is integrated into the file system; there may be several versions of any particular piece of data but by consulting the log, the system knows which is current. The storage is SSD-backed for performance. A data insert in MySQL requires six writes, says Gupta, whereas in Aurora it requires only two, because only the log is updated.

Amazon claims 5x read performance for Aurora vs MySQL

Amazon claims 5x read performance for Aurora vs MySQL

The outcome is improved performance versus MySQL, measured by the standard SysBench benchmark tool, and comparing the same operations with Amazon’s MySQL service, according to figures presented by Gupta. The quick summary is 3x write performance and 5x read performance, but this will vary according to factors like the number of concurrent connections, the number of tables, and the complexity of the queries.

Aurora is compatible with MySQL 5.6 using the InnoDB engine, and supports databases up to 64TB. No features have been added except for three statements for failure simulation. There is one for simulating a crash, another for disk failure, and another for network failure, for example:

ALTER SYSTEM SIMULATE percent_failure DISK failure_type IN [DISK index | NODE index] FOR INTERVAL interval

There is no licensing other than Amazon’s usual pay-as-you-go pricing for RDS (Relational Database Services), which for Aurora starts at $0.29 per hour for a VM with 2 virtual CPUs and 15.25GB RAM. In addition, you pay $0.10 per GB/month for storage, and $0.20 per million requests.

RDS MySQL is a little cheaper, especially as it is currently available on lower-spec VMs, but Amazon say that any extra cost for Aurora will be more than compensated by the better performance. Put another way, you would spend more on compute resources for RDS MySQL to get equivalent performance, if that is attainable at all. At the low end, MySQL may always be a more economic choice. One of the factors is that Aurora relies on enhanced networking, which currently rules out smaller VMs that work fine with MySQL.

What about Aurora performance versus commercial database engines like Oracle or SQL Server? Amazon will not say, citing restrictions in the licence agreements for these products that forbid the publishing of benchmarks. Vendors justify this on the grounds that benchmarks can be misleading, especially if the set-up is not properly optimized.

Aurora does have a theoretical advantage though, in that it is built specifically for the AWS storage infrastructure, whereas other database engines will treat that storage as a traditional file system.

This is also why there is no possibility of installing Aurora in your own data centre. Developers are expected to use MySQL as an alternative, if they need a local installation for test and development.

Sponsored: Accelerated Computing and the Democratization of Supercomputing

Next page: Why MySQL?