Original URL: http://www.theregister.co.uk/2009/10/05/averes_ftx/
Avere bets filers are crying out for more tiers
Four storage levels starting with NVRAM
Start-up Avere has designed a 4-tier filer appliance to deliver much faster I/O and dramatically lower the number of spindles needed in a high-performance filer.
With the Avere FTX Series, the company has undertaken a complete overhaul of filer architecture and moved on from the embedded server-plus-drive-trays approach characteristic of most network-attached storage (NAS) products. At the same time its product shares a few characteristics with F5/Acopia's ARX file virtualising resource switch. Specifically it sits in-line, in front of NAS arrays, interfacing them to accessing servers and presenting a single, global file namespace to those servers.
The filer sees Avere's FTX product as a server client and the servers see the FTX as a filer.
Ron Bianchini, Avere co-founder and CEO, says Avere is very conscious of the hard drive access density problem, the trend for hard drive capacity to rise faster than hard drive I/O performance. He says: "The performance per bit of hard drives is showing a 40 per cent decline a year." Or worse: "For 8KB blocks there's a 90 per cent reduction per year in random I/O performance per bit."
There's nothing existing NAS filers can do about this except suffer or over-provision and short-stroke. There's various other ways of trying to get out of the trap, such as clustering filers (Isilon), hardware-accelerating the processing (BlueArc) or caching the controller (NetApp).
The basic lesson seems to be that not all filer I/Os are equal and not all data should be read from or written to disk.
Avere's I/O profiling
What Avere's founders have done is look at the pattern of I/Os that hit filers and aimed to store data in the FTX in the most appropriate storage medium to provide optimum performance and cost per GB. There are three types of storage, apart from DRAM, in the FTX product: NVRAM (battery-backed non-volatile RAM) as a first tier, NAND flash solid state storage as a second tier, and 15,000rpm SAS hard disk drives in a third tier. The fourth tier in the hierarchy is SATA drives, located in a third-party filer and linked to the FTX by Ethernet.
The SATA drives are for long-term archival storage where SATA drives are "ten times cheaper than Fibre Channel which is ten times cheaper than single-level cell (SLC) NAND," according to Avere CEO and co-founder Ron Bianchini.
The I/O type is characterised by whether it's a read or write, sequential vs random, and its size. Avere reckons that bulk reads of large sequential files are best done from SAS hard drives. A DRAM buffer will front-end it with 2MB of file data at a time going into this and 4KB blocks streamed out of it the accessing server application.
Suiting storage media to I/O type
Small random reads are best done from DRAM which can sustain 300,000 4KB random IOPS. Bianchini compares this to SLC flash's 24,000 IOPS. The DRAM can be used to store file metadata and similar working data needing the fastest access.
SAS and NVRAM are favoured for writes. Random writes can go into the NVRAM, giving a good balance between speed and storage persistence. Where large file (sequential) write is needed it will be streamed into NVRAM in 4KB blocks and then 2MB at a time shipped out to the SAS drives. Bianchini says that this 2-step process is faster than writing to flash with its read-erase-write block cycle.
With a log file, again the random writes come in, are batched up in NVRAM and then written sequentially to a log file on hard drives.
Avere has three ways of decreasing disk drive latency: RAM hides access latency for sequential reads; NVRAM hides access latency for sequential writes; and log-based file systems minimize access latency for random writes.
Avere's software also characterises data by its activity level and type as well as by its I/O patterning. It automatically moves data (files and blocks within files) between the four tiers using access patterns, access frequency and data characteristics with no upfront or ongoing policy configuration, or waiting hours or days for promotion/demotion. The company calls this demand-driven storage.
Automated data movement is what EMC is promising with FAST  but Avere has got there first for files, as Compellent got there before EMC for block data.
FTX speeds and feeds
The FTX hardware comes as a 2U rack-mount box containing 64GB of DRAM, 1GB of NVRAM, and either 1.2TB of internal SAS drives (FTX 2300) or 3.6TB of internal SAS (FTX 2500 model). The FTX2300 can scale to 29.2TB per cluster and the FTX 2500 up to 90TB per cluster. There is a maximum of 1.6TB of DRAM per cluster.
There is no flash storage tier, though; that comes in the next product release. Both FTX models run Avere O/S v 1.0 which has its general availability scheduled for October 15. THere are redundant network ports, power supplies and cooling.
NFS v3 (TCP/UDP) and CIFS are supported as client protocols with NFS v3 being used to access the back-end SATA NAS storage.
Single FTX performance is said to be 80,000 read ops/sec, 49,000 SPECsfs97 ops/sec, and 23,000 SPECsfs08 ops/sec, with aggregate throughput being 1GB/sec read and 325MB/sec write.
Avere provides a management GUI, with policy-based management features, SNMP support and email alerts.
There can be up to 25 FTX nodes in a cluster, with N+1 failover. FTX tiers are shared globally within a cluster. Customers can buy more FTX nodes as they need to scale performance, with Avere claiming that performance grows linearly as nodes are added.
FTX performance and alternative approaches
Bianchini says an FTX-front-ended filer or filers needs far fewer spindles to sustain a performance level. Avere calculated the requirements sing various manufacturers' products to achieve 100K SpecFS97 operations. Compared to an FTX-system Avere says you would need 13 times more disk drives and a 5.1 times increase in cost.
A NetApp alternative would mean 10.7 times as many drives and a 5.2X cost increment, BlueArc would entail 10.5 times as many spindles and a 2.8X cost increment while a Panasas-based system would need 9.4 times as many drives and a fourfild cost increment.
With FTX use, Avere says there are comcommitent reductions in power draw and data centre floor space needs, and in cost per I/O, compared to these other suppliers.
Boabchini offers Avere's views of other ways of scaling filer performance. A filer supplier may rely on over-provisioning fast drives and short-stroking them, which needs more electricity data centre floor space. There is also a level of management oveerhead involved in scaling systems. The use of Gear6-like caching appliances applies only to read work loads and is non-persistent, and typically restricted to just the NFS protocol.
A filer controller cache, like NetApp's PAM, has an inability to scale separately from the controller and it is proprietary. (We note though that you can scale PAM within the controller by adding more cards.) Bianchini says of Texas Memory Systems (SamSan) and Fusion-io solid state storage apperoaches that there is no tier zero management and there is a high media cost.
The F5/Acopia switch has an inability to scale outside switch and it is not transparent. Bianchini would say that Avere's FTX technology scales filer performance with none of these disdadvantages or restrictions.
The net of this
It's a coincidence that Dataram is coming out with its XcelaSAN  flash-based caching front-end layered onto SAN storage arrays at roughly the same time as Avere has its front-end FTX layered onto filer arrays. Both aim to plug into existing environments and both aim to accelerate, respectively, block and file-based I/O from the storage arrays they serve, but Avere uses more techniques to do so.
Avere's technology is innovative and its avoidance of including bulk NAS storage in its product means that customers do not have to rip-and-replace their existing filers with Avere ones. Instead they can front-end the filers with an FTX node or two or more to accelerate filer read and write I/O performance and so support more filer-accessing multi-core and virtualised servers from their filer estate.
Indeed, if Bianchini is right, they can do so from a consolidated filer estate because all filer disk drive over-provisioning and short-stroking can be jettisoned in favour of simple basic SATA filer arrays with all I/O acceleration carried out by an FTX layer. Similarly filer use of flash drives as a tier zero in font of hard disk drives, or filer controller flash caches, can both be junked in favour of the single FTX accelerator which does its magic on all front-ended filers, being, in that respect, open.
List pricing of the FTX appliances starts at $52,500. Actual benchmark data hasn't been published yet but, with Avere's use of SPEC file operation  benchmarks we can be moderately confident of Avere benchmark data being published fairly soon though.
Steve Duplessie, the founder of storage consultancy ESG, is quoted in the Avere release: "Conceptually, an architecture like this could quite literally change everything we thought we knew about storage and I/O. If the Avere architecture can perform as intended, it might just turn decades of thinking on its head." ®