How the Yahoo! homepage predicts your clicks

We! Know! You! Want! To! Click! Here!

The Power of One eBook: Top reasons to choose HP BladeSystem

In the summer of 2008, at an artificial intelligence confab deep in Silicon Valley, Yahoo! senior research scientist Deepak Agarwal revealed that the web giant was using automated algorithms to select news stories on its famous front page. These algorithms, he said, had boosted click-through-rates by 25 to 30 per cent, driving millions of additional dollars in ad revenue.

When we approached Agarwal after his presentation to discuss the new technology and identified ourselves as The Register, he promptly buttoned his lip. We could only hope he would have done the same with The Wall Street Journal or The New York Times or Mom and Pop's Shoestring Guide to All Things Artificial Intelligence, and three years later, our hope is still alive.

This week, Raghu Ramakrishnan – Yahoo!'s chief scientist for search and cloud platforms – sat down with The Register to explain the technology in detail, boasting that click-through rates have now risen more than 270 per cent on the "Today" news module at the heart of the Yahoo! home page. Known as CORE – short for Content Optimization and Relevance Engine – the system doesn't replace human editors. It works alongside them, making many but not all the decisions, and at same time, it feeds editors information that can inform their very human thinking.

"To be honest, the entire increase can't be attributed just to the algorithmic aspects. A big part of the increase is due to editors working much more effectively, with some of the interactive data and real-time feedback that they're getting," Ramakrishnan tells The Register. Yes, The Register. "That was one of the key decisions we made early on. We wouldn't try to replace editors. There would be certain things that always came from them."

When Ramakrishnan arrived at Yahoo! in the summer of 2006 – after nearly twenty years as professor of computer sciences at the University of Wisconsin-Madison – the Yahoo! homepage was arranged almost entirely by human editors. But he was soon approached by two other Yahoo! bigwigs – executive vice president Jeff Weiner and his engineering counterpart Venkat Panchapakesan – with the idea of moving to a more automated setup.

This gave rise to a Yahoo! Research project dubbed the Content Optimization Knowledge Engine – COKE, for short – and those behind the project were affectionately known as Cokeheads. With this system, Deepak Agarwal said at the time, human editors still chose the pool of stories that were eligible for the Today module, but then automated algorithms decided which stories got placed where – and for how long.

"The goal was to use data mining and machine learning to optimize the content shown to users on web portals," Agarwal said.

The system was first tested in late 2007 or early 2008, and it was soon selecting Today module stories for all Yahoo! users. According to Ramakrishnan, it immediately boosted click-through-rates 40 per cent. That's a little higher than the figure Agarwal gave three years ago. But they seem to be in agreement that the effect was rather significant. "I still recall Venkat [Panchapakesan] accusing me of sandbagging things," Ramakrishnan says.

Before the system launched, Ramakrishnan predicted a 20 per cent boost, so he too was impressed by the initial spike. But it was merely a start. The initial system, Agarwal said, was based on the Kalman algorithm, a filtering method developed in the early 1960s. In essence, COKE determined where stories should be placed by analyzing millions of user clicks on the fly. "We track user responses," Agarwal said, "and then we respond ourselves - in real-time."

Back then, Yahoo! had tried to personalize story placement for individual users, but this didn't have a positive effect on the click-through rate. But since then, the company has settled on a personalization method that makes that initial initial 20 to 40 per cent spike look rather small. In January and then again in March, Yahoo! says, the Today module received over one billion clicks, with the click-through rate rising 270 per cent in the US since the automated setup first debuted.

About a year ago, Yahoo! changed the name of the project, dropping COKE for CORE. This is a tad unfortunate, but the system has grown up, spreading beyond the Today module to other Yahoo! services, including its primary news site. And Ramakrishnan says the company plans to plug into so countless other services.

The system still works to predict clicks for Yahoo! users as a whole. But at the same time, it predicts clicks for individual users or segments of users, leaning on information such as their sex and their age (which users supply when they sign up for a Yahoo! account) or even what browser they use.

Drawing on scads of existing data, when a story enters the system, CORE will generate an a priori estimate of how well a story will perform based on "intrinsic features", meaning what words are in the headline and the body of the story. Then, the system tests stories – in real-time – to get a better idea of how they will perform, and – in a matter of minutes – it uses these tests to adjust the way stories will be presented to everyone.

"We don't do this for every story," Ramakrishnan said. "I want to show a few things to a few people and, based on that, have a good estimate for everything in my pool. And then exploit the ones that are the most promising"

"The game here is that I never actually predict true popularity, but I use parsimonious exploration. We have billions of impressions. A tiny fraction of them I'm willing to spend to explore and get an assessment of how popular a story is likely to be – for a given user or for a given segment of users."

Ramakrishnan says the methods used are similar to the "multi-armed bandit" algorithms used in the world of slot machines, but then go much further. "The difference is that we have an extremely dynamic pool of data. We've essentially developed extensions of this statistical approach," he says. "With these, we can come up with a very accurate estimate with how popular a story is likely to be – across the entire population; across a segment, such as males 40 years old and older; with [a particular user]; with [a particular user] while they're having their morning coffee."

But even as the system exploits this information on the fly, it feeds data to a web-based dashboard used by Yahoo!'s human editors. These editors use this information to tweak the system's overall "business rules" and choose the pool of stories from which the system chooses. They also have the power to manually override the system at any time.

The system, Yahoo! says, helps editors create over 13 million different combination of stories on the homepage each day – or 45,000 variations every five minutes. In that time, CORE processes 100GB of user feedback, including clicks, comments, Facebookian "likes", and links from other sites. The result is a system that reaches a middle ground between the human and the inhuman.

The system knows that women generally favor stories about Brad Pitt, but after some real-time analysis, it can quickly realize that men are far more like to click on a Brad Pitt story that involves a sports movie. It can realize that aging Baby Boomers enjoying reading about Justin Beiber as much as the teenage set.

On some level, Ramakrishnan says, people are predictable. But on another, they are not. CORE tries to predict the unpredictable. And if statistics are a reliable judge, it has some success. ®

Top three mobile application threats

More from The Register

next story
BBC goes offline in MASSIVE COCKUP: Stephen Fry partly muzzled
Auntie tight-lipped as major outage rolls on
iPad? More like iFAD: We reveal why Apple fell into IBM's arms
But never fear fanbois, you're still lapping up iPhones, Macs
Stick a 4K in them: Super high-res TVs are DONE
4,000 pixels is niche now... Don't say we didn't warn you
Philip K Dick 'Nazi alternate reality' story to be made into TV series
Amazon Studios, Ridley Scott firm to produce The Man in the High Castle
Bose says today is F*** With Dre Day: Beats sued in patent battle
Music gear giant seeks some of that sweet, sweet Apple pie
There's NOTHING on TV in Europe – American video DOMINATES
Even France's mega subsidies don't stop US content onslaught
You! Pirate! Stop pirating, or we shall admonish you politely. Repeatedly, if necessary
And we shall go about telling people you smell. No, not really
Too many IT conferences to cover? MICROSOFT to the RESCUE!
Yet more word of cuts emerges from Redmond
Joe Average isn't worth $10 a year to Mark Zuckerberg
The Social Network deflates the PC resurgence with mobile-only usage prediction
prev story


Implementing global e-invoicing with guaranteed legal certainty
Explaining the role local tax compliance plays in successful supply chain management and e-business and how leading global brands are addressing this.
Consolidation: The Foundation for IT Business Transformation
In this whitepaper learn how effective consolidation of IT and business resources can enable multiple, meaningful business benefits.
Application security programs and practises
Follow a few strategies and your organization can gain the full benefits of open source and the cloud without compromising the security of your applications.
How modern custom applications can spur business growth
Learn how to create, deploy and manage custom applications without consuming or expanding the need for scarce, expensive IT resources.
Securing Web Applications Made Simple and Scalable
Learn how automated security testing can provide a simple and scalable way to protect your web applications.