Alacritech apprehends an NFS anomaly
Wildly unbalanced filer I/O
Comment Alacritech claims NFS filer I/O is grossly skewed towards reads and suffers from read metadata processing that chokes controller CPUs.
It has just launched its ANX 1500 filer accelerating cache product based on its recognition of NFS read metadata filer I/O loads that can overwhelm filer processors and delay file delivery.
A couple of years ago Alacritech had a 10gig Ethernet adapter nearing readiness but found that the market had moved on, wanting converged network adapters (CNA) which could do FCoE and iWARP, as well as iSCSI and TCP/IP Offload and basic Ethernet NIC'ing. It would need to have written its own code or licensed IP from Emulex or QLogic and decided, according to marketing VP Doug Rainbolt, it was "not worth it". (Ironically Emulex licenses Alacritech IP for its CNA.)
Alacritech decided to turn aside from the adapter business and, reflecting its founders' Auspex roots, look at accelerating network-attached storage (NAS) file access. Most filer shops use NFS v3. Close inspection to NFS v3 filer I/O patterns showed wild read and write asymmetry. One Fortune 500 company exhibited this pattern:
- Reads – 52 per cent
- Metadata (eg Lookups, GETATTRS) – 47.96 per cent
- Writes – 0.04 per cent
From the point of view of the filer's controller, half of its life was spent getting data off the disk drives and out to accessing host servers and the other half checking the metadata associated with read requests. Write I/O activity was basically inconsequential. Particularly from the disk I/O point of view as writes would be cached in the controller's NVRAM and re-ordered to provide near-sequential I/O. Also, for NetApp users, Rainbolt said WAFL is good for writes.
Reads can not be re-ordered because they have to be answered as and when they come in and are randomly located on the filer's drive platters. The typical answer to this is to use high-speed drives and, if necessary, short-stroke them to minimise head movements (seek time). Both are expensive to do.
But what Alacritech realised was that the randomness of read I/O wasn't the only problem – read metadata was just as big a problem, turning a filer's processors into access bottlenecks if enough metadata checking was needed. Rainbolt said: "The controller is becoming a bottleneck before the disk drives do. The processor can't keep up ... Metadata consumes the CPU like you wouldn't believe."
If you could remove the metadata checking from the filer's CPUs and carry it out some place else, then the filer could get on with its core job of answering read requests and serving files as fast as it is capable of doing.
Alacritech and Isilon
Rainbolt said Isilon's scale-out clustered filers are affected by the same problem even though they serve lots of large files, meaning more sequential than random reads. Accessing clients store lots of Isilon-originated data in their caches and check whether their cache contents are up to date before hitting the Isilon fillers with read requests, meaning the Isilon processors can also get hit with metadata requests. Isilon-type systems also struggle when faced with lots of small file requests.
Rainbolt said an example 9-node Isilon system was running 500,000 NFS metadata operations per second. Placing an Alacritech ANX 1500 front-end metadata offload engine in front of it bumped the number up to 2.6 to 3 million NFS metadata ops/sec and the Isilon served more files.
In other words, Alacritech contends, there is generic filer processor bottlenecking going on, slowing down filer responsiveness to read requests, due to the metadata processing consequent on NFS v3 read requests.
Isilon has added flash to speed up metadata operations.
Alacritech saw an opportunity to cache filer metadata in a front-end device, its ANX 1500 – an NFS metadata offload engine in effect – and remove that burden from the filer. That means filers can stop using lots of expensive short-stroked 15K rpm drives and revert to using fewer slower and cheaper middle of the road drives.
Alacritech co-founder Peter Craft said: "We created an appliance to do metadata caching and use SSD (Solid State Drives). It involves our NFS Bridge technology and uses the ASICs from our 10gig Ethernet adapter work. It is very efficient and we have very low CPU utilisation on our box."
The ANX 1500 uses these ASICs with micro-code and has a "very thin, high-performance operating system."
Alacritech and NetApp
Craft said that other people saw there was a file access speed problem and recognised flash was a potential solution – and so mentioned NetApp's PAM (Performance Acceleration Module, now called Flash Cache). This is a slug of flash in NetApp's FAS controllers which functions as a read cache. He said: "In SPEC results PAM systems use fewer disk drives but the top end result is the same because they are CPU-bound. Even Avere can only do 22,000 ops. We can scale to hundreds of thousands of (SPEC NFS) ops."
He is saying that NetApp filers are limited in NFS ops scalability because they become limited by CPU processing bandwidth and not disk bandwidth. Cache resolves disk bandwidth problems but sits downstream of the CPUs and doesn't fix CPU issues.