Amazon: Intel Meltdown patch will slow down your AWS EC2 server
Sysadmins notice performance dip amid security fix rollout. Not everyone hit hard. YMMV etc
Posted in Cloud, 4th January 2018 22:37 GMT
Amazon AWS customers have complained of noticeable slowdowns on their cloud server instances – following the deployment of a security patch to counter the Intel processor design flaw dubbed Meltdown.
Punters said that, since AWS shored up its infrastructure, and began rolling out its Meltdown-patched Linux in December, they have noticed an increase in CPU utilization by their EC2 virtual machines. The solution is to either optimize application code running on the VMs, or move to a more powerful and expensive virtual machine to take the extra load.
Amazon has said it will help those suffering slower-than-expected performance.
To be clear, your humble vultures here at El Reg highly recommend you apply the Meltdown patches on your Intel-powered systems: the processor bug allows user processes to read passwords, keys and other sensitive data out of the kernel's protected memory area.
The software fixes – which are available for Linux, Windows, and macOS on Intel CPUs – move the operating system kernel into its own separate virtual memory space, protecting it from Meltdown exploits. The downside is that this introduces extra overhead, potentially slowing down the system.
The performance hit varies wildly depending on the type of applications you're running. Casual desktop users and gamers applying Meltdown mitigations on their computers shouldn't notice any slowdowns. Light installations, such as simple web servers, will be mildly affected. Machines hammering disk storage, slamming the network, or otherwise making lots of system calls, may experience up to 30 per cent reduction in performance. Your mileage may vary.
AMD processors are not affected by this particular design cockup.
A discussion thread in the AWS support forums details dips in performance that occur after rebooting Linux virtual machines with the Meltdown workaround – dubbed Kernel Page Table Isolation, or KPTI, on Linux – installed.
"Immediately following the reboot my server running on this instance started to suffer from CPU stress," one admin noted after enabling the patch.
"Looking at CPU stats there was a very clear change in daily CPU usage pattern, despite continuing normal traffic to my server. I performed extensive review of what might have changed on my server configuration but drew a complete blank - configuration of the server did not change."
Another added: "This just happened to us today on a c3.large. The cost to us to move the platform to new hardware and the lost confidence from our customers is huge."
Developer Tim Gostony was also able to record how defending against Chipzilla's design blunder impacted the performance of two of his Intel-powered EC2 Linux instances.
AWS confirmed that the potentially-performance-limiting update the users spoke of was its fix for the kernel memory bug that afflicts the Intel CPUs it uses for the EC2 service. This low-level hardware vulnerability was discovered by researchers who privately alerted Intel in June 2017. Operating system-level workarounds were quietly developed, and rolled out on AWS from December. On Tuesday this week, word of Intel's insecure speculative execution engine, at the heart of the security flaw, emerged.
On Wednesday, a collective of researchers went public with details of Meltdown, plus a related set of processor security holes dubbed Spectre – which also hits Intel chips, plus some AMD and Arm cores.
A drop in CPU performance is particularly troublesome for cloud compute subscribers where providers bill by the hour or second. When workloads take longer to run, customers end up paying more over the long run.
AWS told El Reg it will be reaching out to customers who notice a slowdown to help get performance back up to pre-patch levels.
"We don’t expect meaningful performance impact for most customer workloads," the cloud giant said. "There may end up being cases that are workload or OS specific that experience more of a performance impact. In those isolated cases, we will work with customers to mitigate any impact."
Meanwhile, Microsoft Azure is deploying Meltdown defenses, and Google's Compute Engine is secured. Check with your cloud provider for the latest on its response to Intel's engineering gaffe. The slowdown problem is not limited to AWS: you may experience a performance hit on other clouds. If so, this is why. ®