Feeds

Facebook open sources live MySQL makeover

Schema reschemed on fly

Intelligent flash storage arrays

Facebook has open sourced a new MySQL utility that lets the social networking colossus update its database indexes and juice query times without staging the changes on test servers. With the tool – known as Online Schema Change for MySQL, or OSC – it can update indexes on live servers.

In the past, according Facebook MySQL guru Mark Callaghan, the company needed a good six months to roll index updates across its sea of MySQL servers. Now, it needs no more than a few days. "This lets us make schema changes much, much faster," Callaghan tells The Register. "And the benefit from the changes is that database queries will be faster."

At Facebook, MySQL is the primary repository for user data, with InnoDB the accompanying storage engine. "When it comes to user data, the workload is transaction processing – many simple database queries with a lot of concurrency," Callaghan explains. "And that workload is something that MySQL, along with InnoDB, excels at supporting. Performance is very good on that workload."

Nonetheless, the company is always working to improve performance, creating new indexes or updating existing indexes as a means of juicing query response time. For Facebook, such changes take to long when they're done through ALTER TABLE statements or the "fast index create" tool offered by the InnoDB plugin in MySQL 5.1. One other option is to first stage changes on test servers: you make the change on an extra server, and then you move the live workload onto that extra server when the change is complete.

But such juggling doesn't suit the sort of massive infrastructure that backs a social networking site serving over 500 million users. "We run a, well, large number of servers," Callaghan says, "and we prefer to not have a large number of extra servers."

All Callaghan will say is that the company runs "X thousands" of MySQL servers. "X" is such a large number, the company needed a way of making index changes on live machines. There's an existing tool that allows for live updates – "oak-online-alter-table," from independent developer Shlomi Noach – but Facebook has merely used this as a starting point.

"We wanted to do some things slightly different," Callaghan says. "And we do a lot in PHP, which [Noach's] tool doesn't use. It's a great tool, but it just wasn't right for us."

Written by Facebook engineer Vamsi Ponnekanti, Open Schema Change lets the company update indexes without user downtime, according to Callaghan. "Previously, we could make an update quickly on a small number of machines, but to do it on all of the machines, it took six months," Callaghan says, explaining that the company could only make updates during off-peak hours. "Now we can do it on all of the machines at pretty the same time and we can do it in about half a day.

"The difference is that there is no downtime, so we have more flexibility in scheduling when we do it."

When updating indexes, Facebook makes copies of its MySQL tables, and this alone can take a few hours. In essence, the OSC utility gives the company a way of accommodating changes that were made to the table while it was being copied.

"We need some way of applying the changes that we missed after the copy was started," he said. OSC does so using database triggers. "When the copy finishing, we replay all of the changes that were logged by the triggers, and then we briefly – for a fraction of a second – block access to the table and then we switch the original table with the copy."

This allows the company to make a change whenever it likes – even in the midst of peak workloads.

The code for OSC is derived from Shlomi Noach's tool, and like Noach's tool, it has been open sourced under a BSD license. You can find the code here.

At Facebook, the tool has been in place for about four weeks. And it's been used to make index changes the company has been planning for well over a year.

Facebook has developed other systems for other data storage and retrieval tasks, including the open source Cassandra for inbox search, the Hadoop-friendly SQL-like language known as Hive for analytics, and the proprietary Haystack for photos. But at least for the time being, MySQL will remain the tool of choice for user data.

"I personally don't have the time [to build a replacement]," Callaghan says. "All of my time is dedicated to improving quality of service and reliability of MySQL. So if it's going to be done, it's not going to be done by me." ®

Update: This story has been updated to mention ALTER TABLE statements and the "fast index create" InnoDB plugin tool as possible options for making index changes.

Top 5 reasons to deploy VMware with Tegile

More from The Register

next story
Preview redux: Microsoft ships new Windows 10 build with 7,000 changes
Latest bleeding-edge bits borrow Action Center from Windows Phone
Google opens Inbox – email for people too thick to handle email
Print this article out and give it to someone tech-y if you get stuck
Microsoft promises Windows 10 will mean two-factor auth for all
Sneak peek at security features Redmond's baking into new OS
UNIX greybeards threaten Debian fork over systemd plan
'Veteran Unix Admins' fear desktop emphasis is betraying open source
Entity Framework goes 'code first' as Microsoft pulls visual design tool
Visual Studio database diagramming's out the window
Google+ goes TITSUP. But WHO knew? How long? Anyone ... Hello ...
Wobbly Gmail, Contacts, Calendar on the other hand ...
DEATH by PowerPoint: Microsoft warns of 0-day attack hidden in slides
Might put out patch in update, might chuck it out sooner
Ubuntu 14.10 tries pulling a Steve Ballmer on cloudy offerings
Oi, Windows, centOS and openSUSE – behave, we're all friends here
prev story

Whitepapers

Choosing cloud Backup services
Demystify how you can address your data protection needs in your small- to medium-sized business and select the best online backup service to meet your needs.
Forging a new future with identity relationship management
Learn about ForgeRock's next generation IRM platform and how it is designed to empower CEOS's and enterprises to engage with consumers.
Security for virtualized datacentres
Legacy security solutions are inefficient due to the architectural differences between physical and virtual environments.
Reg Reader Research: SaaS based Email and Office Productivity Tools
Read this Reg reader report which provides advice and guidance for SMBs towards the use of SaaS based email and Office productivity tools.
Storage capacity and performance optimization at Mizuno USA
Mizuno USA turn to Tegile storage technology to solve both their SAN and backup issues.