Rackspace: Why we're designing our own cloud servers
Just what will it take to compete with Amazon and Google
Exclusive Any cloud computing provider that wants to operate at scale and compete against its peers is under pressure to build some kind of custom hardware. It may, in fact, be necessary to compete at all.
That is what Rackspace, which is making the transition from website hosting to cloud systems, believes. And that's why the San Antonio, Texas-based company started up OpenStack - the open-source cloud controller software project - with NASA nearly three years ago, and accepted an invitation from Facebook to join the Open Compute Project, an effort by the social network to design open-source servers and storage and the data centres in which they run.
Rackspace, which was founded in 1998, grew up just as Linux and rack-mounted off-the-shelf servers were starting to make their way into data centres in big numbers, but the company had not been fully commercialised yet. And its early machines reflected that.
"What most companies did was colocation," said chief technology officer John Engates, referring to the practise of renting data-centre space, and paying for power and internet connectivity, in order to get a server onto the web. Engates was a founder and manager of Internet Direct, one of the original internet service providers in Texas back when the 'net was being commercialised in the mid-1990s.
"We took the model of putting servers up on racks very quickly and turning them on in 24 hours and we called it managed hosting. At the time, all of our founders at Rackspace were Linux geeks and they were all do-it-yourselfers, and they were literally building white-box servers. They were buying motherboards, processors, and everything piecemeal, and we assembled these tower-chassis form-factors on metal bread racks and it was really not very sexy."
Rackspace CTO John Engates
The description sounds precisely like early Beowulf clusters based on cheap PCs or tower servers, halls of machines powering the first dot-com boom, or indeed the early generations of hardware at search engine giant Google. After a few years, Rackspace decided to chase enterprise customers to do their managed hosting, and that meant shifting to higher-end gear.
"We mimicked what the enterprise would do in their data centre to go win business from those enterprises," said Engates. "Enterprises didn't want to think they were being put on a white-box, homemade server. They wanted a real server with redundant power supplies and all that fancy stuff."
Rack servers evolved and matured, giving much better density than a bunch of tower machines stacked on bread shelves, and Rackspace started buying Dell PowerEdge 2650s for the first generation of enterprise-grade kit and then 2850s for the second generation. Today, in its managed hosting business, the split is about 60 per cent Dell iron and about 40 per cent Hewlett-Packard iron, and all of it is, of course, x86 machinery.
Now fast forward to a couple of years ago, and cloud computing gets under way. Instead of dedicating a server to a customer, each machine is thrown a hypervisor that slices up its processing abilities and memory capacity, and clients are sold access to a pool of these CPU and RAM chunks to run their Windows or Linux workloads on demand.
"Now," said Engates, "we are basically back to our own designs because it really doesn't make a lot of sense to put cloud customers on enterprise gear. Clouds are different animals – they are architected and built differently, customers have different expectations, and the competition is doing different things."
At first, when building its public cloud computing service, what Rackspace focussed on was getting custom gear from Dell and HP that better fit its needs. The web biz had the two vendors get all of the gear configured and cabled up in racks to make it easier to buy server and storage capacity and roll it right into the data centre so it could be given power and network and start doing useful work straight away.
And then Frank Frankovsky, vice-president of hardware design and supply chain at Facebook, invited Rackspace to join the Open Compute Project (OCP)'s open-source computer design efforts a little more than three years ago – by sending Engates a message through Facebook, of course. And from that moment, Rackspace has been moving more and more towards self-sufficiency for server and rack design.
Monitor ports, DVD drives, pretty LCD panels, all in the bin
What is good for Facebook is not perfect for Rackspace, as the latter explained at the Open Compute Summit back in January, but the basic rack and server designs can be tweaked to fit the needs of a managed hosting and public cloud provider.
The first OCP machines for servers and storage roll out in the Rackspace data centres in April; Wiwynn and Quanta are building servers and Quanta will build a just-a-bunch-of-disks (JBOD) array that better suits the needs of Rackspace than the giant winged beast that Facebook invented for itself and opened up.
"Everything that is in our multi-tenant business is some non-standard server or storage architecture," said Engates, and that can mean something cooked up by a specialist hardware manufacturer or the custom server business units of Hewlett-Packard or Dell. Most of the dedicated hosting is done on plain vanilla, enterprise-class servers, still.
"But that may change over time because we count private cloud in that category and we do have plans over time to offer Open Compute-powered private clouds. So even in the dedicated business, it is likely to be non-branded gear over time."
The vanity-free design is something that appeals to Rackspace for the same reasons as it appealed to Facebook, and indeed, is why Google started making its own servers many years ago. If you are never going to plug a monitor into a machine, why bother with a console port? You don't need CD-ROMs nor DVDs, either, and forget that front LCD panel. All of these things block airflow, add cost, and are a potential point of failure (either hardware or software) in the server and should be eliminated.
"The goal is to use OCP designs in more locations and to have a lower number of SKUs and fewer parts to stock, and therefore as we increase the number of servers that we buy we can lower the cost," said Engates. "We also improve our ability to maintain them by having fewer machines to train people on; as people understand the machines and get familiar with them, it is easier.
"You homogenise the data centre as much as you can because homogeneity in the data centre is a good thing, you want fewer moving parts in your data centre design and operations, and this is one of the means of getting there. And one of the beautiful things about Open Compute is that we remove things from the servers that we don't need."
'Another reason is that we are dancing with elephants right now'
Rackspace has been pretty quiet about what it has been doing with Open Compute up until earlier this year, and part of that is Rackspace's decision to radically change its business with both OpenStack and Open Compute.
"We have been heads down," conceded Rackspace chief operating officer Mark Roenigk. "We are growing rapidly and we are a lot like Google from an engineering perspective – we don't make a lot of splash in the media with regard to what we are going. And another reason is that we are dancing with elephants right now.
"Until we knew we had a great path with Open Compute hardware and could be fully open with everything that we are doing, we kept quiet. Was it all innovation? I don't know. We were just doing what customers were asking us to do. But with the convergence of servers, storage, and networking, and all being controlled by software, then hardware is the next obvious place to do innovation to keep up with the pace of change in the software world."
Rackspace COO Mark Roenigk
Roenigk has been around the IT supply chain for a long, long time. He was hired by Compaq to be a manufacturing engineer in 1988 and spent seven years there managing procurement for the upstart maker of PCs and then servers during the heady client-server era. Then he jumped from hardware procurement to managing Microsoft's licensing in 1997, and then was put in charge of Microsoft's original equipment manufacturer business and then its procurement operations for its supply chain, which included, among many other things, the Xbox game console manufacturing operations.
He did a two-year stint at accounting software supplier Intuit, did two years at XM Satellite Radio building its supply chain from the ground up, did two years at eBay cleaning up the operations across its eBay, PayPal, Skype, and StubHub units, and then finally came to Rackspace to become its COO in January 2010. If you are going to go Open Compute, you need someone to manage the partners who are building the stuff you design.
"I have been in this business about 24 years, and the world is changing very rapidly, and a year equates to about ten years in the old world," Roenigk said with a laugh. "We certainly embrace the standards, but one of the reasons we want to be on an open platform is that we have got to innovate faster, and we want to differentiate in new ways."
OpenStack has thousands of coders whacking away at improving it, which Rackspace could never afford to do on its own, and the software is evolving at a rapid pace. And Open Compute servers and storage have shaved the time to get a new set of infrastructure into the field by between 6 and 8 months, according to Roenigk.
The custom Open Compute machines cost anywhere from 18 to 22 per cent less to build than the bespoke boxes from HP and Dell that make up about 18 per cent of the Rackspace server fleet, which is about 16,300 of the 90,525 boxes that were running at the end of December across the cloud company's data centres. The heavier OCP boxes that Rackspace is building for April can hold more virtual machines than the typical bespoke box, and therefore the hardware savings come to be more in the order of 28 to 30 per cent for hosting virtual server slices.
A rack of tweaked OCP iron designed by Rackspace
"We have all kinds of horses in the barn, but for the past 18 months, we have been only dealing in high-density compute with boxes completely maxed out. We will run almost 20 kilowatts per cabinet, so it is very dense compute," said Roenigk.
And for this reason, the techies in the Rackspace labs are playing around with various kinds of water and liquid cooling, passive cooling, and the ability to operate in higher heat ranges than data centres of the past could do. Rackspace likes all this free cooling, but competitors such as Google, which have maybe one to ten apps running on their monstrous data centres, can fail over from one hall to another if there is a wildfire nearby or a storm hanging over the facility.
But the customers using Rackspace managed hosting are running their own applications and it is not trivial to fail them all over at once. The good news is that the cloud side of the Rackspace business will be given different levels – call them standard, business, and first class – and the higher-class customers will get automagic failover across Rackspace facilities. For a fee, of course.
The managed hosting fleet makes up 82 per cent of the machines, or just more than 74,000 servers as of December. These boxes have a field life of four to five years. Roenigk said the OCP machines from Quanta and Wiwynn will roll into custom Rackspace racks based on the Open Rack design can be used for managed hosting. Rackspace will have multiple specialist hardware manufacturers, and the big-name computer makers will play an important role, but the odds are it will also diminish over time. Incidentally, Rackspace already has plans to convert its first generation of OCP machines to uses as JBOD controllers in about 18 months from now.
So where does that leave HP and Dell? That is the same thing both were asking after Facebook decided to build its own boxes.
"I think that we have gotten to the point where we have the right engineering talent in house, and we have people who are actively designing Open Compute servers," said Engates. "We are intimately familiar with what Open Compute is. But if you were an enterprise trying to take advantage of Open Compute and you didn't have the engineering talent on board, then companies like Dell and HP might add value and be your guide to Open Compute."
"In the history of my career I have seen companies turn things around, so we're not counting them out," Roenigk added, referring to the two dominant x86 server suppliers. "This is one of the things that is driving Dell to go private. They have got to do some really radical things to their business."
In fact, Rackspace has asked Dell and HP to bid on the final integration job for its OCP iron, but with this only being somewhere on the order of two to five points of the cost of a finished rack, it is hard to say how interested they will be. No deal has been inked yet with either server maker.
So, by embracing Open Compute hardware, Rackspace is making life tough for HP and Dell. But isn't Rackspace also making it easier for its competition and therefore harder for itself?
"We are," admitted Roenigk. "But we feel like we have to continue to put pressure on ourselves to innovate, and that is really what we did with OpenStack as well. We burned the boats and said this is what we are going to do, and it changes the entire behaviour and culture of the company when you don't actually have any other choice. We are a scrappy little company from Texas, and we have to try to outmanoeuvre, and speed is extremely important." ®