Feeds

Facebook adds Flash to up the tempo of its enormous disk-o-tech

'Anyone wanting to deliver Terabytes to the web might be interested'

Choosing a cloud hosting partner with confidence

Facebook has updated an open source tool that lets admins wring fast performance cheaply from disk-based arrays fed from PCI-e flash cards.

The "Flashcache" tool was updated to version 3.0 by the company on Wednesday. The tool lets the company sit a high-performance cache on PCI-e flash cards to speed access to important data for applications, without having to break the bank and start using all-SSD arrays.

Flashcache is a writeback block caching technology and is implemented as a Linux kernel device mapper target, which makes it easy to use as a general purpose system for highly trafficked applications, Facebook said.

"Our setup of enterprise flash plus massive arrays may be interesting to anyone who wants to build a multiple-terabyte system that needs web access latencies - it does not need rewrite of software to get benefits, so investment even at few machine scale is smaller than putting everything on all-flash," Domas Mituzas, a Facebook data engineer, told The Register via email.

Version 3.0 of the technology has been given better read-write distribution by tuning the disk-side and flash-side sizes of sets to disperse hot data over more of the cache and avoid bottlenecks. Facebook also modified its cache eviction and write efficiency techniques to provide more predictable performance.

Though originally designed at Facebook, the open source technology has received some interest from the wider community. "We see community efforts around it – there is activity on mailing lists, open source code submissions and consulting companies in the database space are providing support for it," Domas Mituzas, a Facebook Data Engineer, told The Register via email.

The next areas of technology development for Flashcache include metadata restructuring to make accessing data more efficient, and making sure that it isn't writing too much into the cache so it avoids flooding the underlying disk infrastructure with queued writes.

"As we end up having multiple terabytes of cache and tens of terabyte of data per machine, we need to cautiously balance usage of memory and CPU," Mituzas explains. "More CPU-efficient algorithms tend to consume more memory. For example, adding additional pointer or timestamp to metadata entry for a system page requires 4GB of RAM if 2TB of cache is being used ... as applications can have great uses for it as well."

But it's worth noting that Facebook's tools are not for everyone, as you need a certain amount of expertise and scale in-house before a fully integrated self-built stack becomes possible.

"There is significant software work required to shift from more expensive to cheaper technology - which saves lots of money at large scale, and on the other hand, going to more capable storage devices allows to move faster in engineering storage-centric systems," Mituzas said. ®

Beginner's guide to SSL certificates

More from The Register

next story
Just don't blame Bono! Apple iTunes music sales PLUMMET
Cupertino revenue hit by cheapo downloads, says report
The DRUGSTORES DON'T WORK, CVS makes IT WORSE ... for Apple Pay
Goog Wallet apparently also spurned in NFC lockdown
Cray-cray Met Office spaffs £97m on VERY AVERAGE HPC box
Only 250th most powerful in the world? Bring back Michael Fish
Microsoft brings the CLOUD that GOES ON FOREVER
Sky's the limit with unrestricted space in the cloud
'ANYTHING BUT STABLE' Netflix suffers BIG Europe-wide outage
Friday night LIVE? Nope. The only thing streaming are tears down my face
IBM, backing away from hardware? NEVER!
Don't be so sure, so-surers
Google roolz! Nest buys Revolv, KILLS new sales of home hub
Take my temperature, I'm feeling a little bit dizzy
prev story

Whitepapers

Why cloud backup?
Combining the latest advancements in disk-based backup with secure, integrated, cloud technologies offer organizations fast and assured recovery of their critical enterprise data.
Forging a new future with identity relationship management
Learn about ForgeRock's next generation IRM platform and how it is designed to empower CEOS's and enterprises to engage with consumers.
High Performance for All
While HPC is not new, it has traditionally been seen as a specialist area – is it now geared up to meet more mainstream requirements?
New hybrid storage solutions
Tackling data challenges through emerging hybrid storage solutions that enable optimum database performance whilst managing costs and increasingly large data stores.
Getting ahead of the compliance curve
Learn about new services that make it easy to discover and manage certificates across the enterprise and how to get ahead of the compliance curve.