How Fusion-io redlined its PCIe flash motor to hit 9.6 MEEELION IOPS
64-byte writes ought to be enough for anybody
In its latest party trick, Fusion-io ramps up its ioDrive2 server card - a slab of 365GB MLC flash storage on a PCIe board - to gobble 9.6 million writes a second.
The feat works by mapping the PCIe card's NAND capacity into an application's memory space and writing 64 bytes - quite likely the PCI backbone's cache line size - at time by a single thread. So that's 9.6 million 64-byte writes a second, or 585MB/s in real money. Your mileage may vary if your own code deviates from this specific benchmark setup.
Fusion provides a directFS file system and a software development kit to make the ioDrive2's flash memory appear in the application's virtual memory space. As this Intel white paper [PDF] shows, it's perfectly possible to instruct the processor to write directly to a memory-mapped card over the PCIe bus just as if it was on-board RAM.
The data should be aligned to the nearest 64-byte address for optimal performance, and atomic read-writes are possible. The idea is to open up an application to a lot of directly coupled non-volatile storage, bypassing the operating system's code and any disk-based capacity. To take advantage of the hardware, the app code must be altered to use the Fusion-io SDK's APIs.
By The Reg storage desk's reckoning, 64 bytes is a sweet spot for Fusion-io's ioDrive2. The PCIe hardware sends 64-byte chunks per processor write cycle, assuming the addresses are optimally aligned: there's no speed advantage in sending fewer than 64 bytes in one cycle.
Sending one byte more than 64 will force the chipset to round up to the next 64-byte size, and thus send 128 bytes over the PCIe bus, which will take two access cycles and, in theory, halve your 9.6-million-writes-a-second figure to 4.8 million IOPS.
El Reg thinks PernixData and its distributed hypervisor-resident cache will also play into this memory-mapped flash arena. It appears that primary data storage is in the very early days of transitioning from disk media to solid-state chips, through flash-as-disk to flash-as-memory. Fusion-io is not alone in this; IBM general manager for storage Ambuj Goyal also thinks transaction-class data should be held on solid-state storage. Disk is just too darned slow.
The highest performance rewards from using flash will be achieved by understanding the hardware's characteristics and going with the flow. ®