Fusion-io demos billion IOPS server config
Previews software with serious grunt
Fusion-io has achieved a billion IOPS from eight servers in a demonstration at the DEMO Enterprise event in San Francisco.
The cracking performance needed just eight HP DL370 G6 servers, running Linux 22.214.171.124-45 on two, 6-core Intel processors, 96GB RAM. Each server was fitted with eight 2.4TB ioDrive2 Duo PCIE flash drives; that's 19.2TB of flash per server and 153.6TB of flash in total.
The demo used a custom load-generator that ran at 125 million ops/sec on each server and transferred 64 byte data packets.
Fusion said the ioDrives use Auto Commit Memory (ACM) software, a coming function in its ioMemory VSL subsystem. It lets developers directly control the data path to persistent (NAND flash) memory and "significantly reduces latency and system overhead in transferring data."
Fusion says data integrity is maintained "by the ioMemory architecture’s ability to flush all in-flight data, even if the power is abruptly cut, without the need for super capacitors or batteries."
The whole aim is to remove latency and O/S overhead from the access to data. David Flynn, Fusion's chairman and CEO, said: "This … is not something that could be achieved with hardware alone. Intelligent software that optimises NAND flash as a low latency, high-capacity, non-volatile memory solution for enterprise servers can transform the way organisations process the immense amounts of data that powers our lives today.”
Woz (Steve Wozniak), Fusion-io's Chief Scientist, said: “Instead of treating flash like storage, where data passes through all of the OS kernel subsystems that were built and optimised for traditional storage, our core ioMemory technology offers a platform with new programming primitives that can provide system and application developers direct access to non-volatile memory.”
Bypassing host O/S I/O subsystem
The demo preview system had 64 ioDrive 2 Duos, each with 2.4TB of capacity. Flynn said existing 2.4TB ioDrive 2 Duos do around a million IOPS. Each one in the demo system delivered 16 million IOPS, a 16X - that's right, sixteen-fold - improvement in performance.
This comes from avoiding using the host OS' I/O subsystem at all. Instead the iODRive capacity is seen by applications as an area of memory. Flynn said apps simply read or write data from an area of memory, using CPU Load Store instructions.
They use an ACM API to do this and so would need writing or rewriting to do so. Flynn said apps could be somehow fooled into thinking they were still using the host OS's I/O pathways, even tough they are not, which would make the adoption of ACM by existing software somewhat easier.
It is a data transfer, an I/O in that sense, but doesn't look like it to the host operating system. The OS's block I/O subsystem simply isn't used.
Flynn says this means application software effectively gets instantaneous I/O, data transferred at near-memory speed, and the server's CPU doesn't don't get involved with this data transfer at all.
Multiplying the effect
It's a multiplier, the company says. This means, for example, that a virtualised server with VMware gets a huge slug of CPU resource released by apps using the cut-through or Fast Path ACM software - so one host could run more virtual machines (VMs). How many more depends upon how bounded by I/O the existing apps are.
El Reg speculates that a host could run half as many VMs again as it does today.
Looking at individual apps we could ask how many more users or clients they could support with the ACM software? Would it be a tenfold increase or even more? We'll have to wait and see.
Flynn pointed out that ACM is still preview software and will be worked on some more. It could go even faster by the time the software is released.
Fusion has a track record in such demonstrations, starting with the 1 million IOPS Quicksilver demo with IBM's SVC in 2009. It needed a rack of systems. Two years later it has gone a thousand times faster with far fewer but more powerful servers.
By producing software such as Auto Commit Memory, Fusion-io is avoiding the coming commodity PCIe hardware flash trap which will cause prices, and profit margins, to drop.
Auto Commit Memory should be a real product by DEMO Enterprise in April; that's in four months. It will only be available with Fusion's hardware.
With ACM, Fusion has significantly raised the bar in enterprise server I/O and separated itself off from the PCIe flash drive pack. It's no longer enough to have fast flash drive hardware. You need clever, clever software to really make your PCIe NAND flash operate as fast as the brown stuff leaving a shovel.
Think about it. If this works then Fusion-io has just completely re-written the rules of server app I/O. Things will never be the same again. Sounds portentous. It's real. ®
If anyone from FusionIO wants to jump in and correct me that's fine. But it seems clear to me that we can't look at the Duo drive specs in the data sheet because those cards come default as block storage devices. This was a demonstration showing flash as a literal extension to memory. From CPU cache, to RAM, to Fusion flash but none of it going through the OS's storage subsystems.
Here is an article I found that quotes one of Fusion's technology architects and will give you a different perspective on how they view the world http://lwn.net/Articles/408428/
Wouldn't database admins talk about TPS? I suppose you might hear a storage administrator who supports DBAs talking about IOPS.
If you were talking about RAM would you use 4k as your packet size? I would assume for single-thread receives, processes, and transmits you would would want to use 64 bytes.
Maybe what is being missed and what I found so interesting about this demo is it blended storage and memory concepts, or what IBM is calling Storage Class Memory. This was high capacity non-volatile storage that was transferring 100's of million of packets per second at the CPUs cache line or data block size.
You're thinking disks
dikrek, you have to leave the 4k paradigm behind because we're talking about a new type of memory, a nonvolatile storage class memory. Since we're talking memory the 64-byte size packets does make sense for their demo considering the memory size of a cache line is 64-bytes.
Anonymous, their power-cut flush feature is nothing new. For example, HP's user guide for the IO Accelerator talks about it. Below is from the HP IO Accelerator Linux User Guide:
"The Remote Power Cut Module ensures in-flight writes are completed to NAND flash in these catastrophic scenarios. The Remote Power Cut Module is not required, but HP recommends the module. NOTE: The power cut feature is built into PCIe IO Accelerators; therefore, no Remote Power Cut Module is necessary."