Twitter: It's the end of the sysadmin as we know it

Grow some Ganglia

High performance access to file storage

Web 2.0 Expo Twitter says it's the end of the sysadmin as we know it.

Speaking this morning at the annual Web 2.0 Expo in San Francisco, Twitter operations man John Adams warned sysadmins they won't succeed on today's intertubes unless they learn to do a bit more than system administration. Sysadmin 2.0, he said, must develop certain talents for analyzing data.

"This is a whole new world, " Adams said. "For the longest time, people ran large data systems on a kind of ad hoc basis. We're in a world now where so many people are depending on the real-time web. A system administrator is not just a system administrator anymore. You have to use analytics. You have to grab data. You have to look at where a site is trending and where things are going so you can scale...

"If you don't start doing this work early, if you don't start collecting this data early, you will fail."

That may be stating the obvious. But there you have it.

It's no secret that Twitter endured its own scaling problems in its earlier days, as the digerati embraced the Web2.0rhea service en masse. But in mid-2008, the company ported a portion of its core code from Ruby on Rails to Scala - a new-age programming language that combines functional and object-oriented techniques - and in 2009, according to net-research outfit comScore, the micro-blogger rode a 1,358 per cent traffic leap.

Adams believes the company has handled such growth in large part because it uses open source tools like Ganglia to track performance across the site's back-end infrastructure. Currently, Adams explained, he and his ops team track about 15,000 points of site performance.

"If you're collecting data, you want to look at the aggregate. It's about how many areas across the site where you see errors. A server isn't going to tell you much. It's about the overall application," Adams explained. "We do this in as near-real-time as possible."

Well, we would hope so.

Adams acknowledged that Twitter has endured its fair share of performance problems - and in the past, the company has freely admitted that its original infrastructure was ill-suited to rapid scaling - but he insists that careful preparation got the company to where it is today. "In the first year, the site was kind of unstable, but what we were doing was trying to ramp up to a position where we are planning the future."

Such data tracking, he said, helped sidestep the so-called Twitpocalypse, the Y2K of Web2.0rhea. Many suspected that third-party clients would start failing when the unique identifier attached to each Tweet reached first the signed integer limit (2 31) and then the unsigned limit (2 32). Mining site data, Adams said, his crew accurately predicted when it would hit these limits - which meant they knew just how soon they need to make any infrastructure changes.

Over the past six months, he said, the site's most important infrastructure upgrade was the adoption of the Ruby on Rails application server known as Unicorn. The company still uses Ruby on the front-end, and Unicorn has provided an estimated 30 per cent performance improvement over the previous setup, based on Apache and a server called Mongrel.

Adams compares Unicorn to a grocery store where a single line feeds a row of cashiers - as opposed to the traditional setup where each cashier has a separate line. With Mongrel, each worker had its own request queue. With Unicorn, there's a single request queue, and requests are feed to workers as they become available.

"Ordinarily, if you're standing in line at a grocery store, you don't have any idea how quickly each cashier is going to move people through. You have ten random lines. You stand in one line. And you have no idea how long you're going to wait," he said. "The other model - which pretty much describes Unicorn - is where you have one line, and when a cashier is done, it grabs the next person from that one line. This creates a very rapid response."

He added that the 30 per cent performance boost from Unicorn affords only a "few more days of scaling" on a site like Twitter - but, he says, "every small amount helps."

Adams's other big piece of advice for aspiring Sysadmins 2.0 was to avoid - as much as possible - pulling data from disk. "Another discovery that we made when trying to increase the scale of Twitter was that disk is the new tape," he said. "With any sort of social networking operation - juggling followers, sending mail, etc. - disk is extremely slow."

Twitter has worked around the disk problem with, yes, memached, the open source distributed-memory object-caching system. But he also warns that there's such a thing as too much memcached. "You can't rely too heavily on memcached," he said. "If you put too much data into memcached, you enter this problem where if the memcached server goes down, you have to take all the data out of your database and reinsert it into memcached. You want to cache in a balanced way."

In addition to memcached, MySQL, and the open-source distributed database Cassandra, Twitter leans on a message-queue server known as Kestrel and a kind of follower database known as FlockDB. Both were developed at the company, and both have now been open sourced.

In one sense, Adam and his team exhibit a certain Google-like quality. They believe in data. But their Googleness goes only so far. They're also committed to open sourcing their back end. ®

Combat fraud and increase customer satisfaction

More from The Register

next story
This time it's 'Personal': new Office 365 sub covers just two devices
Redmond also brings Office into Google's back yard
Oh no, Joe: WinPhone users already griping over 8.1 mega-update
Hang on. Which bit of Developer Preview don't you understand?
Inside the Hekaton: SQL Server 2014's database engine deconstructed
Nadella's database sqares the circle of cheap memory vs speed
Microsoft lobs pre-release Windows Phone 8.1 at devs who dare
App makers can load it before anyone else, but if they do they're stuck with it
Half of Twitter's 'active users' are SILENT STALKERS
Nearly 50% have NEVER tweeted a word
Internet-of-stuff startup dumps NoSQL for ... SQL?
NoSQL taste great at first but lacks proper nutrients, says startup cloud whiz
Batten down the hatches, Ubuntu 14.04 LTS due in TWO DAYS
Admins dab straining server brows in advance of Trusty Tahr's long-term support landing
IRS boss on XP migration: 'Classic fix the airplane while you're flying it attempt'
Plus: Condoleezza Rice at Dropbox 'maybe she can find ... weapons of mass destruction'
OpenSSL Heartbleed: Bloody nose for open-source bleeding hearts
Bloke behind the cockup says not enough people are helping crucial crypto project
prev story


Top three mobile application threats
Learn about three of the top mobile application security threats facing businesses today and recommendations on how to mitigate the risk.
Combat fraud and increase customer satisfaction
Based on their experience using HP ArcSight Enterprise Security Manager for IT security operations, Finansbank moved to HP ArcSight ESM for fraud management.
The benefits of software based PBX
Why you should break free from your proprietary PBX and how to leverage your existing server hardware.
Five 3D headsets to be won!
We were so impressed by the Durovis Dive headset we’ve asked the company to give some away to Reg readers.
SANS - Survey on application security programs
In this whitepaper learn about the state of application security programs and practices of 488 surveyed respondents, and discover how mature and effective these programs are.