How NVMe tamed the cowboy world of the flash card
Discipline comes to PCIe flash device driving
It all started with flash devices upsetting the cosy disk world. Flash was faster than disk at responding to IO requests, if not at streaming data, and the easiest way to put flash drives, solid state drives, into servers and storage arrays was to use disk drive bays and interfaces such as SAS and SATA.
This was practical but not elegant. No self-respecting computer engineer wants to waste resource, and making SSDs pretend to be disk drives did exactly that. Built for disk drives, SAS and SATA interfaces were slow, with their millisecond response times and limited queuing ability.
SSDs could perform much better than souped-up, faster-responding quasi-disk drives and their faster-than-disk response time was appreciated in the early days.
But processor, server and server software design did not stand still. Four things caused winds of change to blast through the world of SSDs in disk bays.
First servers became virtualised. Yes, their operating systems multi-tasked applications, but badly.
VMware ended this by putting applications into virtual machines, each with its own guest operating system, and then scheduling virtual machines to run via a hypervisor.
Servers could run more applications this way and also multiple operating systems at the same time.
Secondly, processors sprouted cores, virtual processors as it were. A four-core processor could run four virtual machines at once, instead of one, and so generally quadrupled a server’s ability to run applications. Let’s say it increased from four to 16, hypothetically.
Thirdly, servers sprouted sockets and so could have two or four or more processors, each active simultaneously. A four-socket server, fitted with four-core processors, quadrupled the number of applications (in virtual machines) that could run again. Let’s say that the 16 that could be run increased to 64.
Even when the server was fitted with SSDs the SAS or SATA interfaces could be a bottleneck and not respond to IO requests fast enough, causing the processors to wait and applications to pause.
The effects of this were described in a 2012 presentation by EMC on NVMe. It included a slide saying: “Amazon loses one per cent of sales for every 100ms it takes for the site to load.”
The solution was the fourth development. PCIe flash cards were invented with the realisation that accessing the flash drive over the PCIe bus, the server’s main data artery connecting the processor, DRAM and various IO interface controllers, was faster than accessing it through an interface controller connected to the PCIe bus.
That interface controller added latency to IO requests, and the SAS and SATA interfaces, devised in disk-centric times, were not optimised for accessing flash.
The PCIe bus is inherently faster than SAS or SATA interfaces so putting flash directly on a PCIe card brought it closer to the processor and memory.
SATA 3 runs at 6Gbps, with the earlier SATA 2 operating at 3Gbps, and the first SATA 1 at 1.5Gbps, or 187.5MBps. That’s painfully slow compared with PCIe, which has lanes carrying data simultaneously. It has had generational increases in speed, as have SATA and SAS:
- PCIe 1.0 ships 250MBps through a single lane
- PCIe 2.0 raised that to 500MBps
- PCIe 3.0 does 1,000MBps, 1GBps
- An 8-lane gen 2 PCIe interface can deliver more than 3GBps
- A 32-lane PCIe 2.0 connector can ship 16GBps in aggregate
SATA and SAS-interfaced SSDs have standard device drivers, code to access the SSD and read and write data using the SAS or SATA protocol. An IO to/from these devices comes to them through a standard SAS or SATA driver whose back end talks to the SSD device.
It doesn’t matter if the host operating system is Windows, Unix, LInux, OS/X or any other; as long as the operating system has a SAS or SATA driver it can interact with a SAS or SATA SSD respectively through the standard driver.
But… a standard driver didn’t exist for the first PCIe flash cards.
That meant that a PCIe flash card product needed a device driver written for that flash card and the operating system of the server or workstation into which it was going to be installed. Such a driver could be optimised for flash and so be inherently better, producing faster IO, than a driver using disk-optimised code such as SATA or SAS.
The problem was, any such driver was a one-off. PCIe flash card manufacturer A would have to produce specific drivers for every host operating system it wanted to support.
Say there were 10 of them, then 10 device drivers would have to be written and supported. Meanwhile PCIe flash card manufacturer B would have to do the same, meaning 20 drivers, which would not necessarily have the same feature set as the 10 drivers from manufacturer A.
Then along comes PCIe flash card manufacturer C with 30 drivers, and D with 40 drivers. With many many standard device drivers being written previously for other devices, like SAS and SATA SSDs, and disk drivers for that matter, it was obvious that there needed to be a standard PCIe flash card driver to avoid all this wasteful duplicated effort.