Supercapacitors have the power to save you from data loss
Learn all about them
As solid state drives (SSDs) become a critical part of today's storage, it is becomes increasingly important to learn about the supercapacitors that help prevent data loss.
The presence – and type – of supercapacitors in SSDs should be as important a consideration as choosing between MLC, eMLC and SLC-based drives.
Supercapacitors in SSDs ensure that any writes sent to the DRAM cache on the drive are successfully written in the event of a power loss.
To understand their importance, we need a bit of a history lesson. The hows and whys of this discussion will get fairly technical but I will do my best to keep it comprehensible.
Buffer against latency
All modern hard drives, be they of the traditional spinning magnetic variety or the modern solid state persuasion, have a DRAM buffer to improve performance. (The DRAM buffer is generally called a disk cache, though this is an incorrect usage in all but a few configurations.)
The DRAM on most of today's drives is a paltry 64MB, but we usually read and write far more data than 64MB at a time. The result is something that seems counter intuitive if you assume that the DRAM buffer functions in a straightforward first-in, last-out arrangement. It doesn't.
Magnetic drives have to physically move a "head" to different positions to execute reads and writes. The time it takes to receive a command from the computer's operating system, move the head, execute the read or write, then pump either the data or a confirmation back to the operating system is known as latency.
Latency is bad. Ideally you want the least possible time to pass between the execution of your various reads and writes. If a magnetic hard drive were simply to execute the commands in the order in which they were received there is every likelihood of the head flying all over the place in a very sub-optimal pattern.
Solving this problem is the point of the DRAM buffer on a traditional magnetic drive: commands are executed out of the buffer in the order that most reduces latency.
For magnetic drives this was both a blessing and a curse. To get the best results from this bit of DRAM buffer trickery you need to buffer both reads and writes. If, however, you use the disk's DRAM to buffer writes and the power goes out, those writes are lost.
When the power is cut DRAM loses all the data stored in it. Files are rendered corrupt and sysadmins are called into the pointy-haired boss's office with the door shut.
The 'three Rs': Read, write, RAID
To avoid data loss due to power failure, the first option is a really kick-butt backup power setup.
Make sure your server has a redundant power supply with each power module plugged into a separate uninterruptible power supply (UPS). Make sure no UPS is loaded beyond ~40 per cent to ensure that if something untoward happens to your power setup, one side of your UPS can handle the full load.
Load the UPS software onto your servers and configure them to shut down when the UPS detects a power outage. Make sure the UPS is big enough to stay up for the full timeframe required to shut down all your servers. Pray to whatever deity you believe in repeatedly. Test often.
The second option is the battery backup in a RAID card. Here you turn off the cache on your physical hard disks and use the RAID card's DRAM as both write buffer and read cache.
The RAID card can be equipped with a battery backup unit so that if the power fails and the computer goes off the writes are held in DRAM until the computer is back on. The RAID card will then flush the writes to the disks.
As you might expect, turning off the DRAM buffers on your magnetic disks leads to a massive performance drop on those disks. The DRAM buffers cannot be set to read only: they are on or they are off. You makes your choices and you takes your chances.
To my knowledge, only LSI 3ware cards provide a solution to this conundrum in the form of a proprietary Write Journal feature. This sections off a portion of the RAID card's DRAM to mirror the DRAM buffers on the individual disks. If the power goes out then the pending writes are still stored in the RAID card's battery-backed DRAM.
Look, no spinning
Now that you understand how that all works, let's look at SSDs. In an SSD there is no head that moves across a spinning platter to read and write information. Electrical impulses are sent to chips consisting of multiple layers of integrated circuits which respond in various ways, resulting in either a "read", "write" or "erase" operation.
Also unlike magnetic drives, SSDs read and write in pages but must erase in blocks, and every write must be preceded by an erase. The size of both pages and blocks varies according to manufacturer and product.
Let's say that your page size is 4KiB and your block size is 512KiB. To read a single bit the SSD would need to read an entire 4KiB page. It simply wouldn't be capable of operating at smaller increments. But to write a single bit, an entire 512KiB block would have to be erased and all 128 4KiB pages rewritten.
The best of the best flash drives, SLC ones, have a typical endurance of 100,000 writes. This means you can erase a block and then write something to its pages about 100,000 times before you can never write to that block again.
The idea of erasing an entire block just to write one bit of data is outright lunacy
MLC is an order of magnitude less capable, with the consumer-grade stuff typically being capable of 10,000 writes. eMLC (short for enterprise MLC) might get 30,000 writes at the outside, though 20,000 is more common.
Given the write limits of SSDs, the idea of erasing an entire block just to write one bit of data is outright lunacy. There has to be a better way – and there is.
In one sense, the use of DRAM buffers on SSDs is not all that different from their use on magnetic disks. The DRAM buffer on an SSD is not merely a first-in, first-out layer of faster storage between the host computer and the NAND cells in the SSD. The DRAM buffer is how manufacturers extend the usable lifetime of SSDs.
SSDs are marvels of mathematics. They each have proprietary algorithms that figure out not only how to take the data sitting in the DRAM buffer and write it most efficiently to the flash blocks, but consolidate sparsely populated blocks into various writes in order to pre-emptively free up entire blocks for future use.
SSDs are also over-provisioned: a 960GB SSD typically contains 1,024GB of flash, with the extra blocks serving to help wear level the drive. The more blocks you can write to over time, the less you are erasing and rewriting any individual block and the longer the life of the drive.
Of course, DRAM buffers on SSDs suffer the same problem as those on magnetic disks: cut the power and the data in the buffer is gone. And pending writes are not written, so it's back to the pointy-haired boss's office for you.
There is also the possibility that the consequences of failure are even worse with SSDs than with spinning disk. The bigger the DRAM buffer on the SSD, the more efficient the life-extension algorithms are, and thus the DRAM buffer on SSDs tends to be much larger.
You can forget the RAID card trick of disabling the DRAM buffer and relying on the battery-backed RAID card's DRAM. Try this and you will annihilate your SSD's write lifetime in short order.
Fortunately, SSDs' lower power consumption combined with the significantly lower write latency mean that there is a cheap and simple solution to the problem.
If a job is worth doing…
Supercapacitors are like batteries, but more awesome. Depending on various factors, they can discharge more energy in a shorter time than batteries. They can also survive more charge/discharge cycles than any battery.
Supercapacitors can't store nearly as much energy as a battery, and chaining together enough supercapacitors to get to battery-like levels is fairly expensive.
That is perfectly okay for SSDs, however, as they don't need to be on for very long to dump the contents of their DRAM cache into flash. Typically, they need to remain up for less than a second.
As you might expect, the official advice from The Register is fairly clear: do not, under any circumstances, use SSDs that are not equipped with supercapacitors for important workloads.
Supercapacitor-equipped SSDs are available from almost every SSD manufacturer out there, so there is absolutely no excuse not to be using them. If you have non-supercapacitor SSDs in service today, give some very serious thought to replacing them.
Wet versus dry
Not all supercapacitors are made equal. Supercapacitors fall broadly into two main categories: "wet" and "dry".
Wet supercapacitors wear out over time, a problem exacerbated by higher operating voltages and/or higher temperatures. It is not unheard of for modern enterprise SSDs – especially with all that clever maths – to end up in a position where the flash cell write life exceeds the lifespan of the supercapacitors designed to protect the drive.
A narrow working temperature range for wet supercapacitors means that the common industry practice of linear reflow soldering won't work. Lead-free linear reflow soldering has temperature peaks of 200°C.
This is too far from the "up to 70°C" operating temperatures of wet supercapacitors for these components to withstand, even for the brief period of assembly.
That means that wet supercapacitors are soldered to SSDs by hand, which drives up the price. Dry supercapacitors – or at least some of them – avoid a lot of these problems. They are typically advertised as containing either tantalum or niobium oxide. Wet supercapacitors are usually based on aluminium.
Wet supercapacitors based on graphene and carbon nanotube promise to change this all over again, but neither are available in volume.
Study the chemistry
"Classic" tantalum capacitors use solid manganese dioxide (MnO2) as the counter electrode. These supercapacitors have demonstrated excellent reliability thanks to their inherent self-healing behavior.
Unfortunately, the same chemical voodoo that allows Ta-Mn supercapacitors to heal also leads to a small problem with some minor exploding under various circumstances.
For the record, the exploding part is due to MnO2 being used as the cathode, not the use of tantalum.
Seeing as how having your $7500 PCI-E SSD blow up while trying to save your critical data during a power outage would not be the best possible outcome, the industry has largely given up trying to build current limitation circuitry into its supercapacitors and turned instead to tantalum-polymer (Ta-Poly) and niobium oxide.
Ta-Poly supercapacitors also exhibit the self-healing traits of the Ta-Mn variety, but they do not do so quite as readily. The result is a supercapacitor with more current leakage (though this is being worked on) and slightly lower operating temperatures (85°C versus 105°C).
The costs are about the same for both types of tantalum capacitors, as Ta-Mn requires more extensive protection circuitry than Ta-Poly, but you will typically need a few more Ta-Poly supercapacitors to get the job done than Ta-Mn.
Niobium oxide tends to be more reliable than Ta-Mn, battling with the rapidly evolving set of Ta-Poly supercapacitors as the successor to classical MnO2-based tantalum supercapacitors.
On paper, niobium oxide is better than Ta-Mn and Ta-Poly in almost every way. In 2013 there was a burst of hype about niobium oxide supercapacitors as the first that might actually be able to do away with batteries for everyday appliances and gadgets. We are still some way from that, but they are starting to show up in SSDs.
The write stuff
The most ultra-conservative of storage managers won't touch drives with niobium oxide supercapacitors for the simple reason that they haven't been on the market long enough to have experienced every edge case.
For those folks there are plenty of other options to choose from. For the rest of us, however, any supercapacitor in our SSDs is better than none.
Niobium is better than tantalum and tantalum is better than anything else. Whatever you do, don't run production workloads without taking the time to think about what happens if the power goes out.
Will your writes be safe? For how long? With magnetics and a RAID card your writes live as long as your RAID card's battery backup. (Do remember to test and replace regularly.)
With supercapacitor-backed SSDs, your writes are written immediately and safe until the flash in the SSD degrades. ®