Intel eats crow on software RAID
SAS on motherboard shocker
IDF When Intel releases its Sandy Bridge-based two-socket "Romley" platform in the middle of next year, its "Patsburg" platform controller hub (PCH) will include support for serial attached SCSI (SAS).
By putting SAS support on the motherboard, Intel is embracing what it formerly shunned: software RAID.
"I'll plead guilty. We stood up here 10 years ago and told you software RAID sucked, you didn't want it, it wasn't a viable solution," Susan Bobholz of Intel's storage product marketing group told attendees at a Wednesday SAS and RAID session at the Intel Developer Forum in San Francisco.
"But that's one of those things that's starting to change in the industry," she added. "Software RAID is no longer the evil stepchild of the enterprise anymore."
Citing testing done by Tom's Hardware, Bobholz claimed that "Software RAID equals or outperforms hardware RAID these days. And this is because the host processors have gotten so much faster and you've got all the different cores going on."
Referring to software-RAID I/O performance of years past, she said: "Historically, it was slower. It just was. And it's not anymore. And in fact in a lot of cases it's a higher-performing solution."
Bobholz also discussed CPU usage. "Another thing that we tried to convince everyone — which was true way back when — is that in order to free up the CPU on your host processor, you needed to move your processing of your RAID onto another CPU in the system. But as CPUs have gotten better and faster, and you've all these different cores involved, we are anticipating that the CPU load will be very low — in some cases we're looking at less than five per cent of the CPU to get a full-performing RAID subsystem."
When announcing the support for SAS in the Patsburg PCH, Clive D'Souza of Intel's storage technical marketing group said: "Intel has gone and done what naturally complements our solution from the silicon perspective, and we have integrated SAS onto the chipset. That's a major performance statement, so I'm going to give it a few seconds to sink in."
Patsburg will also support SATA and USB. A slide in D'Souza's presentation included the USB 2.0 logo, but a Patsburg engineer with whom The Reg spoke would neither confirm nor deny whether USB 3.0 would join the party.
Intel's implementation of Patsburg's onboard SAS controller will be SAS 2.1 compliant, support up to 10m data cables, RAID 0/1/10/5, and expanders and enclosures. D'Souza also claimed that "We will be exceeding what the requirements of the T10 SAS-specs are."
Although software RAID may no longer be the evil stepchild of the enterprise, "evil" might be a word that SAS host bus adapter (HBA) makers such as Adaptec, HighPoint, LSI and other might want to apply to Intel, as the addition of SAS support to the motherboard obviates the need for their cards.
When listing the advantages of onboard SAS, D'Souza extolled the elimination of PCI HBAs: "The fact that we have integrated our SAS solution onto the chipset, we actually free up a PCI slot — and in the server world that's a big deal."
That's a boon to IT folks, said D'Souza, because "It opens up the possibility for all the data centers to go and add more functionalities that need a PCI slot."
But before HBA makers use their sharp-edged cards to either slash their wrists or attack Intel engineers, they should take comfort in the fact that their software stacks are welcome in Intel's onboard-SAS world.
As D'Souza explained: "By definition of our implementation, we are not restricting or constricting any RAID technology. If your software stack supports it, we support it." Bobholz added: "It's always up to the software," including, for example, the ability to support RAID 6, hot sparing, or variable allocation of host memory for RAID buffer duty.
Bobholz declined to give speeds and feeds for the Patsburg SAS implementation, specifically dodging a question about RAID 5, which has been traditionally problematic in terms of software-RAID performance.
"We've had our processor silicon back for 30 days," she explained. "We've got teams in the lab, locked in the lab, we're feeding them pizzas and candies under the door as they're working on really looking at the actual performance and functionality of our drivers. We will be providing that information down the road after we've done some actual measurements."
As brightly as Bobholz and D'Souza painted the onboard SAS future to be, Bobholz did note that software RAID remains dependent on the robustness of the operating system upon which it is running.
"I know some operating system vendors don't like us to say this," she admitted, "but should the operating system crash, yeah, your RAID stack will go down with it." ®
This means more fakeraids. STOP THIS NONSENSE. It's as idiotic as the Winmodems of 10 years ago. It's bad, and should not be encouraged!
RAID software quality from Intel etc
Several issues come to mind.
Historically, Intel has had soft-RAID "support" in several generations of their ICH's - on top of SATA HBA's, up to six drive channels. A few years ago it was called the "Application Accelerator", then it was renamed to "Matrix Storage". I don't know for sure if there's ever been a RAID5/XOR Accelerator in there, or if the RAID feature consisted of some ability to change PCI ID's of the SATA HBA at runtime + dedicated BIOS support (= RAID support consisting of software and PR/advertising, on top of a little chipset hack). Based on the vague response in the article, I'd guess that there's still no RAID5 XOR (let alone RAID6 Reed-Solomon) acceleration in the PCH hardware - what they said means that they're looking at the performance and trying to squeeze out as much as possible out of the software side. Looks like not much is new here on the software part (RAID BIOS + drivers) - the only news is SAS support (how many HBA channels?), which gives you access to some swift and reliable spindles (the desktop-grade SATA spindles are neither), if the ports support multi-lane operation they could be used for external attachment to entry-level HW RAID boxes, and if the claim about expander support is true, you could also attach a beefy JBOD enclosure with many individual drives (unless the setup gets plagued by some expander/HBA/drive compatibility issues, which are not uncommon even with the current "discrete" SAS setups). I'm wondering about "enclosure management" - something rather new to Intel soft-RAID, but otherwise a VERY useful feature (especially the per-drive failure LED's are nice to have).
The one safe claim about Intel on-chip SATA soft-raid has always been "lack of comfort" (lack of features). The Intel drivers + management software, from Application Accelerator to Matrix Storage, has been so spartan that it was not much use, especially in critical situations (drive fails and you need to replace it). I've seen worse (onboard HPT/JMicron I believe), but you can also certainly do much more with a pure-SW RAID stack - take Promise, Adaptec HostRAID or even the LSI soft-RAID for example. It's just that the vanilla Intel implementation has always lacked features (not sure about bugs/reliability, never used it in practice). Probably as a consequence, some motherboard vendors used to supply (and still do supply) their Intel ICH-R-based boards with a 3rd-party RAID BIOS option ROM (and OS drivers). I've seen Adaptec HostRAID and the LSI soft-stack. Some motherboards even give you a choice in the BIOS setup, which soft-stack you prefer: e.g., Intel Matrix Storage or Adaptec HostRAID. Again, based on one note in the article, this practice is likely to continue. I just wish Intel did something to improve the quality of their own vanilla software.
One specific chapter is Linux (FOSS) support. As the commercial software-RAID stacks contain all the "intellectual property" in software, they are very unlikely to get open-sourced. And there's not much point in writing an open-source driver from scratch on top of reverse-enginered on-disk format. There have been such attempts in history and led pretty much nowhere. Any tiny change in the vendor's closed-source firmware / on-disk format would "break" the open driver. And the open-source volunteers will never be able to write plausible management utils from scratch (unless supported by the respective RAID vendor). Linux and FreeBSD nowadays contain pretty good native soft-RAID stacks and historically the natural tendency has been to work on the native stacks and ignore the proprietary soft-RAID stacks. The Linux/BSD native soft-RAID stacks can run quite fine on top of any Intel ICH, whether it has the -R suffix or not :-)
People who are happy to use a soft-RAID hardly ever care about battery-backed write-back cache. Maybe the data is just not worth the additional money, or maybe it's easy to arrange regular backup in other ways - so that the theoretical risk of a dirty server crash becomes a non-issue. Power outages can be handled by a UPS. It's allways a tradeoff between your demands and budget.
As far as performance is concerned:
Parity-less soft-RAIDs are not limited by the host CPU's number-crunching performance (XOR/RS). If you omit the possibility of sub-prime soft RAID stack implementation, the only potential bottleneck that remains is bus throughput: the link from north bridge to south bridge, and the SATA/SAS HBA itself. Some Intel ICH's on-chip SATA HBA's used to behave as if two drives shared a virtual SATA channel (just like IDE master+slave) in the old days - not sure about the modern-day AHCI incarnations. Also the HubLink used to be just 256 MBps thick. Nowadays the DMI is 1 GBps+ (full duplex), which is plenty good enough for 6 modern rotating drives, even if you only care about sequential throughput. Based on practical tests, one thing's for sure: Intel's ICH on-chip SATA HBA's have always been the best performers around in their class - the competition was worse, sometimes much worse.
As for parity-based RAID levels (5, 6, their derivatives and others): a good indicator may be the Linux native MD RAID's boot messages. When booting, the Linux MD driver "benchmarks" the (potentially various) number-crunching subsystems available, such as the inherent x86 ALU XOR vs. MMX/SSE XOR, or several software algorithm implementations, and picks the one which is best. On basic desktop CPU's today (Core2), the fastest benchmark usually says something like 3 GBps, and that's for a single CPU core. I recall practical numbers like 80 MBps RAID5 sequential writing on a Pentium III @ 350 MHz in the old days.
The higher-end internal RAID cards, containing an IOP348 CPU at ~1GHz, tend to be limited to around 1 GBps when _not_ crunching the data with XOR (appears to be a PCI-e x8 bus limit). They're slower when number-crunching.
In reality, for many types of load I would expect the practical limit to be set by the spindles' seeking capability - i.e., for loads that consist of smaller transactions and random seeking. A desktop SATA drive can do about 60-75 random seeks per second, enterprise drives can do up to about 150. SSD's are much faster.
The one thing I've recently been wondering about is this: where did Intel get their SAS HBA susbsystem from? Already the IOP348 contains an 8way SAS HBA. Now the Sandy Bridge PCH should also contain some channels. Are they the same architecture? Are they not? Is that Intel's in-house design? Or, is it an "IP core" purchased from some incumbent in the SCSI/SAS chipmaking business? (LSI Fusion MPT or Agilent/Avago/PMC Tachyon come to mind.) The LSI-based HBA's tend to be compatible with everything around. Most complaints about SAS incompatibility that I've noticed tend to involve an Intel IOP348 CPU (on boards e.g. from Areca or Adaptec) combined with a particular expander brand or drive model / firmware version... Sometimes it was about SATA drives hooked up over a SAS expander etc. The situation gets hazy with other less-known vendors (Broadcom or Vitesse come to mind) producing their own RoC's with on-chip HBA's...
Safe write caching
One of the traditional downsides to software raid that even effects RAID1, which doesn't suffer the CPU overhead, is the need to log writes unless there is some form of safe write caching. Are these new boxes going to provide battery backed up write cache?