What system builders need to know about solid state drives
Get the best from flash technology
If you are building systems using solid state drives (SSDs), you need rock-solid reliability and performance – and you won't get it from consumer-grade flash.
But how do you know if the drives you choose are enterprise-grade? A supplier may say its SSD is enterprise quality but can you be sure this marketing claim is true? You need to understand the qualities of an enterprise-class SSD and check candidate drives so that the systems you build for your customers have a long, reliable life.
There are four main attributes of an SSD that mark it out as truly enterprise-class: speed, endurance, data integrity and system builder friendliness.
Unlike a hard disk drive with millisecond-class response times, SSDs react to read and write I/O requests in microseconds, process the I/O and then move the data. This is measured in the latency of a device, the number of random I/Os per second it can support, and the amount of read and write data it can stream.
An enterprise-class SSD has to be able to perform against an enterprise workload, not a consumer, single-user workload. It has to outperform fast enterprise disk drives spinning at 15,000rpm with a 24-hour duty cycle, capable of handling write-intensive and variable workloads day after day and month after month.
Many SSDs have a fast performance profile fresh out of the box they arrive in, but then performance falls off once all the data cells in the flash are full and fresh data triggers program-erase cycles of the SSD's blocks.
This cannot be tolerated. It doesn't matter if the SSD uses fast single-level cell flash or the slightly slower 2-bit multi-level cell flash optimised for capacity: the system builder must have consistent long-term performance.
During the SSD’s working life, data in it will be erased and blocks will have deleted cells. The good data can be collected by a background process and moved to fresh data blocks, with the now empty block being erased by the flash controller also in the background.
Any incoming write can go straight to the erased block and complete faster, and that helps sustain the write performance.
You should expect consistent performance with response times of under five milliseconds on average. There should no excessive ramp-up in response time or slowdown in data streaming as the number of I/Os per second increases.
Some SSDs have wildly skewed read-and-write performance; for example, here is the performance profile of a 64GB multi-level cell SSD from a well-known supplier for sequential I/O:
Sequential Read 355MB/sec
Sequential Write 75MB/sec
Crucial RealSSD C300, manufacturer's specification
With very much slower write I/Os compared with the read I/Os, this is obviously a consumer flash device and quite unsuited for system builders needing enterprise-class componentry.
It would be excellent if SSDs had independent testing data and this is beginning to take shape with the Storage Performance Council's SPC-1C benchmark.
With all the noise in the market, it is nice to see industry leaders such as Seagate step up and submit their products. In the meantime, performance data using standard data block sizes, such as 4K for random I/O testing, and standard, openly available test programs such as Iometer can be used.
Enterprise-class single-level cell SSDs exhibit sequential read and write I/O bandwidth of 300MBps and 360MBps, with generally equal read and write speeds, and random read and write IOPS above 48,000 and 22,000 respectively. They will be able to do this for five years, which brings us to working life.
Test of endurance
Unlike disk media, flash media wears out. After a certain number of writes to a cell, its response to read requests tends to be more error-prone and its ability to store more writes falls off. This is far worse with multi-level cell flash as there are two bits per cell, which adds to its electrical activity and wears it out more quickly.
NAND has to be read and written in blocks of cells rather than at an individual cell level. When writing new data to a block the whole block has to be erased before being over-written. The good data in the block to be erased is copied, and then written back with any new data, which roughly speaking means two write operations inside the SSD for a single write operation.
This is called write amplification, and it should be reduced to as near a 1:1 ratio between incoming writes and internal-to-the-flash writes as possible.
There are three basic ways of dealing with this.
One is to store writes in a temporary area and batch them up so they can be written to full block areas in a sequential process, instead of being written to random blocks as they come in. The data might also be compressed to reduce the number of cells it occupies and enhance the working life of the flash.
An enterprise-class SSD should be expected to have five years of life
The second method is to over-provision the flash and set aside an area of separate blocks for use when the other blocks are worn out. The flash controller maintains a map of good blocks in use, data blocks that are wearing out, unused blocks and dead blocks. As blocks wear out they are replaced by fresh ones.
A third method is called wear-levelling, which involves the controller ensuring that writes to the SSD are shared out equally across the blocks in it and not concentrated in just a few. This helps an SSD wear out evenly and preserves its capacity.
In the process of distributing data evenly across the device, some data is moved. This involves new writes which can shorten the SSD's useful life. Controller algorithms are needed to optimise wear-levelling and write reduction.
An enterprise-class SSD should be expected to have, say, five years of life. It should also have a formal amount of data that can be written to it, for example 14.6PB for an 800GB multi-level SSD, which equates to more than 10 full drive capacity writes a day.
If it has a slower rate of data writing, then the device's working life will be extended. The supplier should warrant working life to show that it is committed to delivering drives that fulfil manufacturer claims.
Keep your integrity
Getting data on and off the SSD reliably, quickly and at a consistent rate are three excellent qualities – but correct data is equally necessary.
Error checking and correction is vital for SSDs, as it is for hard disk drives. T10 protection information (PI) and I/O error detection codes (IOEDC) are other techniques used to ensure the integrity of data.
T10 PI, which comes from hard disk drive technology, provides end-to-end assurance that data is correct. When data is written to the device, metadata is added that a server can check to assure itself that what is being read is exactly what was written.
IOEDC is internal to the SSD and enables its controller to identify and correct data errors so that they are invisible to applications reading the drive's contents.
When data is first written, cyclic redundancy check (CRC) data is added to it. The CRC value is computed from the source data's value and its logical address. It is recomputed when the data is read and a comparison of the original and newly computed CRC values will reveal if there is a difference.
The understanding supplier
Until now you may well have been building systems using hard disk drives for storage. These are now being complemented by SSDs for applications needing as much performance as you can deliver.
It is useful to have your SSD components qualified and tested by your supplier to the same or better standard than the hard disk drives you are used to.
The support arrangements should be the same so that you use a familiar and well-understood process. It is helpful for any encryption scheme used with the SSDs to match that of the hard disk drives, and even better if any storage management system supports both media types.
It is helpful for hard disk drives and SSDs to use the same interface, 6Gbps SAS for example, as that means you need be concerned with the connections of only one interface rather than two. Having hard disk drives and SSDs share the same form factor, such as a 2.5in one, makes things easier too.
You need to be able to qualify the SSDs and your supplier should have rigorous quality control procedures so that you develop and ship your systems with consistently reliable media. Look for more than a million-and-a-half hours between failures and an annual failure rate of less than 0.55 per cent.
A cut above
The four headline items that OEMs, system integrators and system builders need from SSD components are fast and consistent performance, a long and guaranteed working life, the best data integrity available, and a supplier that helps simplify the addition of SSDs to existing systems or the development of all-SSD systems.
Consumer-focused SSDs fail to meet most of these criteria, and drives from suppliers with no history of working with system builders may carry an extra level of risk.
Get the SSD characteristics and supplier criteria right and you are best-placed to deliver an excellent product to your customers that will perform at impressive speeds and represent great value for money. ®