Original URL: https://www.theregister.com/2014/11/13/flash_connectors/

Welcome to the fast-moving world of flash connectors

A guided tour

By Trevor Pott and Iain Thomson

Posted in On-Prem, 13th November 2014 14:01 GMT

Flash is the new storage medium of choice and this has led to an explosion of interconnect options.

Magnetic disks are slow and not particularly latency sensitive; interconnects designed for them just don't cut it in a world where flash drives can be 10 times faster and are highly latency sensitive.

Let's take a quick look at how the plugs and cables on our computers are evolving.

The standards

Advanced Host Controller Interface (AHCI): When SATA first emerged we all had to scramble around to find drivers for each implementation from the various vendors. Initially, parallel ATA emulation mode was used as a horrible kludge to get around this but before long AHCI arrived on the scene.

Suddenly there was a common standard that BIOS, operating system and controller could agree upon. As long as everything spoke AHCI – and by the time Windows Vista came out pretty much everything did – it would all "just work".

AHCI is separate from the various SATA standards, although it exposes new functionality in those standards as they are added.

So long as SATA was simply increasing in speed and adding things like hot swap to the mix, life was good. But along came solid state drives (SSDs) and everything went nuts.

SCSI has always been the server counterpart to consumer storage interconnects. Where SAS is the connector interface used by server drives, all the bits are actually communicating with each other using an evolution of the old SCSI standard.

iSCSI is a specific implementation of the SCSI protocol. Not tied to any given interconnect, iSCSI is simply a means of transferring SCSI commands and data across a network.

Drives are connected to a storage unit via SATA or SAS. These are made visible to the operating system, which makes them available across a standard Ethernet network to other servers that want to use them.

NVMe is the industry's response to the proliferation of PCIe (peripheral component interconnect express) and now memory channel SSDs. Like SATA, it requires a different driver for each manufacturer, and support is not baked into operating systems.

NVMe (non-volatile memory express) is to PCIe storage what AHCI was to SATA. It provides a common language for all the various bits to speak, and as a result native drivers exist for most modern operating systems (BSD, Linux, ESXi et al).

Sadly, this comes with the crucial caveat that Microsoft has not backported the driver to either Windows 7 or Server 2008 R2.

There exists an open source Open Fabrics Alliance NVMe driver in both 64bit and 32bit, but to be perfectly blunt, it’s crap. It is prone to crashing and not ready for production. I am not sure it ever will be.

No matter how much we beg, Microsoft is unlikely to backport its Windows 8 driver either, because NVMe support may well be Windows 10's only real selling feature.

Individual device manufacturers are shipping NVMe drivers for Windows. In some cases this is to enable the more popular operating systems like Windows 7 to make use of the new disks. In others it is to overcome design choices made by Microsoft that restrict the ability to pass administrative commands to the SSDs, which is important for server administrators and power users.

NVMe allows for a far greater queue depth than AHCI (65,536 commands per queue for 65,536 queues in NVMe versus 32 commands in one queue for AHCI).

The end result is that utilisation of NVMe SSDs skyrockets when compared with AHCI. It also means that new generations of drives can be designed with larger caches, which in turn allow SSDs to be more efficient about how they write blocks. This increases the SSDs' write lifetime and is good for everyone.

SOP/PQI is the proposed SCSI protocol extension that would compete directly with NVMe. It will incorporate the technical advancements NVMe has brought to the consumer world into the next revision of the SCSI standard (SCSI Express) and will probably never be talked about as a distinct concept once all the arguing is over and the drivers are written.

The old guard

ATA, ATAPI, USB and cards are the old school ways of getting things done. Though it might surprise people, SSDs are still found for old parallel ATA connections as they are quite common in embedded devices.

Card connectors, ranging from SD cards to USB sticks, can be found with ultra-low-end consumer flash as well as ultra-high-end enterprise SLC flash, if you know where to look.

Again, the embedded space makes use of flash extensively and many a VMware ESXi server was built on a USB stick. Though not nearly as fast as the other interconnects – and not designed for server workloads – these old warhorses still see use all over the world.

SATA is probably the most common storage interconnect available today. It is predominantly used in tandem with the AHCI standard and is the consumer storage interconnect that is familiar to anyone who builds their own PCs. It is found on nearly every motherboard out there, from notebooks to multi-socket servers.

Though SATA is significantly faster than its predecessor, parallel ATA, the use of AHCI makes it sub-optimal for flash. In addition flash is growing faster at a rate that the standards bodies struggle to keep up with, rendering traditional SATA ultimately a dead-end technology.

mSATA (mini-SATA) was an interconnect used for a short time to put SSDs into notebook, embedded devices and so forth. Wherever there was a requirement to put storage where a 2.5in disk looked big, you would find mSATA. It was almost exclusively an interconnect for SSDs.

Being just a smaller version of a SATA connector, it suffers from all the same problems as SATA, with the added fun of looking exactly like a PCI Express Mini Interface, but not being electrically compatible with that standard.

mSATA was succeeded by M.2 and is not likely to be included in any future designs, though it will be around for a while in the embedded space where products evolve slowly.

SAS is the next most common storage interconnect available today. It was introduced around the same time as SATA as a means of bringing the technical advantages of high-speed serial interfaces to disks, controllers and drivers using the SCSI standard.

SAS is just flat-out faster than SATA, which is half duplex: information flows in only one direction at a time. SAS is full duplex: you can read and write at the same time.

SAS is also dual port, meaning that for whatever port speed SATA is doing, the equivalent SAS drive is four times as fast. It has two ports that can both read and write at the same time versus one port that can only read or write at any given time.

Later generations of SAS (multilane SAS) have evolved to support four ports. By simply lashing together more and more ports SAS is able to meet bandwidth requirements of new devices and is evolving towards a parallel/serial hybrid interconnect. Whether it will still be in use after the new interconnects and standards are adopted is not yet clear.

Fibre Channel is a less common SCSI interconnect, popular with high-end enterprises. Despite the name, fibre optics is not required. Drives typically use a copper mechanical interconnect similar to SATA or SAS, plug into a hot-swap backplane and are connected to a host bus adapter (HBA).

The HBA connects to a network of some variety – there are a few variations on the network topologies in use – which then allows servers storage access to the drives. As it is both network SCSI protocol and is generally accompanied by its own physical interconnect it straddles the boundary between "standard" and "attachment".

PCIe inside

As you have probably worked out by now, current interconnect solutions are completely inadequate in fully utilising SSDs. The IT industry, being the pack of lock-in hungry, vicious backstabbing horrors that we all know and love, could not possibly be trusted to get together and hammer out a standard before the need became plain, even to consumers.

And so, everyone hijacked PCIe. It is everywhere, from notebooks to servers, and there are even ARM SoCs with PCIe for tablets and embedded devices. It is as close to the CPU as you will get without using RAM slots, and it is fast.

PCI was designed to be faster than anything else in your computer. It is the main bus that all the other bits are supposed to back onto. As such, it is way faster than we are going to see in SATA or even SAS interconnects.

Using traditional PCIe slots to attach storage had a major drawback: you needed to power down the server to swap out the card. No matter how you chose to extend that PCIe slot out of the inside of the server, that power-down requirement still existed.

Without having to redesign an entirely new standard from scratch, standards bodies were able to solve this problem in short order. A series of new standards have emerged to bring PCIe to the drive, and they include port protocols to allow for important features like hot-swapping.

PCIe outside

SATA Express (SATAe) is the result of the move to PCIe. So universal was the push to simply dump the middle man and connect drives directly up to PCIe that the SATA standards body simply threw in the towel.

SATA revision 3.2, or SATAe, is actually just a transition standard defining a bunch of interconnects that can support traditional SATA drives as well as make PCIe lanes available directly for consumption by drives. The protocol used is NVMe, and SATAe can most accurately be called "NVMe over PCIe".

SCSI Express is the evolution of SCSI to fill the same role. The SOP/PQI extensions to the SCSI protocol will result in "SCSI over PCIe”, and we are off to the races once more.

M.2, formerly known as the Next Generation Form Factor (NGFF), is the PCIe-enabled replacement for mSATA. Whereas mSATA simply borrowed the PCI Express Mini Card form factor (resulting in some confusion), M.2 is a completely new connector.

M.2 is to all intents and purposes a SATAe connector. This means it exposes both a traditional SATA interface as well as PCIe lanes. It also exposes a USB 3.0 port, though given that M.2 is pretty much always an internal connector, I still haven't figured out why.

It's complicated

The dual nature of these new connectors makes understanding backwards compatibility a little difficult. Simple questions like "how do you plug in a SATA drive to your new motherboard?” don't have simple answers.

In the new SATAe world, the host plug is the bit on the motherboard (or RAID card) that you plug into. You can plug two traditional 3.5in SATA cables into this plug, and it has an extended bit where the PCIe lanes are exposed. Using the SATAe plug for traditional SATA drives disables the PCIe lanes bit.

A full proper SATAe cable has a host cable receptacle on the end, which is the bit that plugs into the motherboard, and it covers the whole host plug. It eats up the PCIe connector and both the traditional SATA connectors. This would typically be connected up to one drive, usually an SSD that wants the PCIe goodness so as to go fast.

In the world of server-style hot-swap trays things get a little stickier. The SAS connector we have used for the past decade or more is known as the SFF-8482 connector. SATA devices can plug into it. SAS devices can plug into it.

There was a SATA-only connector as well, and SAS devices couldn't plug into it. So long as you knew "SATA can plug into SAS but SAS doesn't plug into SATA" you were fine.

Today we have the 12Gbps Dual Port SAS (SFF-8680) connector as well as the 12Gbps MultiLink (Quad Port) SAS (SFF-8630) connector. As you might expect, traditional SATA as well as SAS drives plug into these ports just fine.

In theory, the Quad Port SAS SFF-8630 connector could have PCI-e lanes attached and thus SATAe drives could plug in as well, though it is rare to actually see support for this and everything has to support it from start to finish.

In addition to the above, the SATAe standard defines the SATAe host receptacle. This is a separate plug that is backwards compatible with a traditional SATA device, but not a SAS device.

Only very horrible people filled with rage at humanity will ever employ this in their designs, so expect it to be sprinkled everywhere.

The wonderfully named SFF-8639 can theoretically solve all ills. It will support SATA, SAS, SATAe and most likely SAS Express. This is the interface that will be deployed by champions and people who have love in their hearts for their fellow man.

Of course, to be difficult, SFF-8639 can be deployed without the PCIe lanes, though I don't expect this to occur in the real world. Almost everyone is simply calling SFF-8639 a "hybrid" port, "hybrid SATA/NVMe" or simply "NVMe". As you can see from the above, that is inaccurate but it won't stop all of us from using that terminology anyway.

March of progress

The need to use PCIe to keep up with SSDs moves beyond just the connectors in the box. PLX has ExpressFabric and A3Cube has RONIEE. Both seek to extend PCIe outside the server to bring its advantages to high-performance computing and data centre storage in a way that older technologies such as Fibre Channel or iSCSI simply cannot.

As PCIe becomes the connector of choice for the average SSD, memory channel storage (MCS) – SSDs in the RAM slots – is taking up the role once served by PCIe SSDs.

MCS is faster than PCIe with far lower and more consistent latency when under load. It is even closer to the CPU than PCIe storage, and it suffers from the same drawbacks as PCIe storage did back in the day.

MCS requires a server with compatible BIOS. You need appropriate drivers, though MCS modules that speak NVMe are emerging. To swap a bad module you need to power down the server.

Still, MCS looks to be the next evolution of storage, already explored for high-end server usage just as we are dipping our toes into the commoditisation of PCIe-based standards.

The wheel never stops turning, but one thing's for sure: SSDs are no longer in the future. They are a fundamental design element of the tablets, notebooks and servers of today. ®