How the Yahoo! homepage predicts your clicks

We! Know! You! Want! To! Click! Here!

Build a business case: developing custom apps

In the summer of 2008, at an artificial intelligence confab deep in Silicon Valley, Yahoo! senior research scientist Deepak Agarwal revealed that the web giant was using automated algorithms to select news stories on its famous front page. These algorithms, he said, had boosted click-through-rates by 25 to 30 per cent, driving millions of additional dollars in ad revenue.

When we approached Agarwal after his presentation to discuss the new technology and identified ourselves as The Register, he promptly buttoned his lip. We could only hope he would have done the same with The Wall Street Journal or The New York Times or Mom and Pop's Shoestring Guide to All Things Artificial Intelligence, and three years later, our hope is still alive.

This week, Raghu Ramakrishnan – Yahoo!'s chief scientist for search and cloud platforms – sat down with The Register to explain the technology in detail, boasting that click-through rates have now risen more than 270 per cent on the "Today" news module at the heart of the Yahoo! home page. Known as CORE – short for Content Optimization and Relevance Engine – the system doesn't replace human editors. It works alongside them, making many but not all the decisions, and at same time, it feeds editors information that can inform their very human thinking.

"To be honest, the entire increase can't be attributed just to the algorithmic aspects. A big part of the increase is due to editors working much more effectively, with some of the interactive data and real-time feedback that they're getting," Ramakrishnan tells The Register. Yes, The Register. "That was one of the key decisions we made early on. We wouldn't try to replace editors. There would be certain things that always came from them."

When Ramakrishnan arrived at Yahoo! in the summer of 2006 – after nearly twenty years as professor of computer sciences at the University of Wisconsin-Madison – the Yahoo! homepage was arranged almost entirely by human editors. But he was soon approached by two other Yahoo! bigwigs – executive vice president Jeff Weiner and his engineering counterpart Venkat Panchapakesan – with the idea of moving to a more automated setup.

This gave rise to a Yahoo! Research project dubbed the Content Optimization Knowledge Engine – COKE, for short – and those behind the project were affectionately known as Cokeheads. With this system, Deepak Agarwal said at the time, human editors still chose the pool of stories that were eligible for the Today module, but then automated algorithms decided which stories got placed where – and for how long.

"The goal was to use data mining and machine learning to optimize the content shown to users on web portals," Agarwal said.

The system was first tested in late 2007 or early 2008, and it was soon selecting Today module stories for all Yahoo! users. According to Ramakrishnan, it immediately boosted click-through-rates 40 per cent. That's a little higher than the figure Agarwal gave three years ago. But they seem to be in agreement that the effect was rather significant. "I still recall Venkat [Panchapakesan] accusing me of sandbagging things," Ramakrishnan says.

Before the system launched, Ramakrishnan predicted a 20 per cent boost, so he too was impressed by the initial spike. But it was merely a start. The initial system, Agarwal said, was based on the Kalman algorithm, a filtering method developed in the early 1960s. In essence, COKE determined where stories should be placed by analyzing millions of user clicks on the fly. "We track user responses," Agarwal said, "and then we respond ourselves - in real-time."

Back then, Yahoo! had tried to personalize story placement for individual users, but this didn't have a positive effect on the click-through rate. But since then, the company has settled on a personalization method that makes that initial initial 20 to 40 per cent spike look rather small. In January and then again in March, Yahoo! says, the Today module received over one billion clicks, with the click-through rate rising 270 per cent in the US since the automated setup first debuted.

About a year ago, Yahoo! changed the name of the project, dropping COKE for CORE. This is a tad unfortunate, but the system has grown up, spreading beyond the Today module to other Yahoo! services, including its primary news site. And Ramakrishnan says the company plans to plug into so countless other services.

The system still works to predict clicks for Yahoo! users as a whole. But at the same time, it predicts clicks for individual users or segments of users, leaning on information such as their sex and their age (which users supply when they sign up for a Yahoo! account) or even what browser they use.

Drawing on scads of existing data, when a story enters the system, CORE will generate an a priori estimate of how well a story will perform based on "intrinsic features", meaning what words are in the headline and the body of the story. Then, the system tests stories – in real-time – to get a better idea of how they will perform, and – in a matter of minutes – it uses these tests to adjust the way stories will be presented to everyone.

"We don't do this for every story," Ramakrishnan said. "I want to show a few things to a few people and, based on that, have a good estimate for everything in my pool. And then exploit the ones that are the most promising"

"The game here is that I never actually predict true popularity, but I use parsimonious exploration. We have billions of impressions. A tiny fraction of them I'm willing to spend to explore and get an assessment of how popular a story is likely to be – for a given user or for a given segment of users."

Ramakrishnan says the methods used are similar to the "multi-armed bandit" algorithms used in the world of slot machines, but then go much further. "The difference is that we have an extremely dynamic pool of data. We've essentially developed extensions of this statistical approach," he says. "With these, we can come up with a very accurate estimate with how popular a story is likely to be – across the entire population; across a segment, such as males 40 years old and older; with [a particular user]; with [a particular user] while they're having their morning coffee."

But even as the system exploits this information on the fly, it feeds data to a web-based dashboard used by Yahoo!'s human editors. These editors use this information to tweak the system's overall "business rules" and choose the pool of stories from which the system chooses. They also have the power to manually override the system at any time.

The system, Yahoo! says, helps editors create over 13 million different combination of stories on the homepage each day – or 45,000 variations every five minutes. In that time, CORE processes 100GB of user feedback, including clicks, comments, Facebookian "likes", and links from other sites. The result is a system that reaches a middle ground between the human and the inhuman.

The system knows that women generally favor stories about Brad Pitt, but after some real-time analysis, it can quickly realize that men are far more like to click on a Brad Pitt story that involves a sports movie. It can realize that aging Baby Boomers enjoying reading about Justin Beiber as much as the teenage set.

On some level, Ramakrishnan says, people are predictable. But on another, they are not. CORE tries to predict the unpredictable. And if statistics are a reliable judge, it has some success. ®

Next gen security for virtualised datacentres

More from The Register

next story
Video of US journalist 'beheading' pulled from social media
Yanked footage featured British-accented attacker and US journo James Foley
Microsoft exits climate denier lobby group
ALEC will have to do without Redmond, it seems
Caught red-handed: UK cops, PCSOs, specials behaving badly… on social media
No Mr Fuzz, don't ask a crime victim to be your pal on Facebook
Barnes & Noble: Swallow a Samsung Nook tablet, please ... pretty please
Novelslab finally on sale with ($199 - $20) price tag
Ballmer leaves Microsoft board to spend more time with his b-balls
From Clippy to Clippers: Hi, I see you're running an NBA team now ...
Kate Bush: Don't make me HAVE CONTACT with your iPHONE
Can't face sea of wobbling fondle implements. What happened to lighters, eh?
Amazon takes swipe at PayPal, Square with card reader for mobes
Etailer plans to undercut rivals with low transaction fee offer
Assange™: Hey world, I'M STILL HERE, ignore that Snowden guy
Press conference: ME ME ME ME ME ME ME (cont'd pg 94)
Call of Duty daddy considers launching own movie studio
Activision Blizzard might like quality control of a CoD film
prev story


Implementing global e-invoicing with guaranteed legal certainty
Explaining the role local tax compliance plays in successful supply chain management and e-business and how leading global brands are addressing this.
Top 10 endpoint backup mistakes
Avoid the ten endpoint backup mistakes to ensure that your critical corporate data is protected and end user productivity is improved.
Top 8 considerations to enable and simplify mobility
In this whitepaper learn how to successfully add mobile capabilities simply and cost effectively.
Rethinking backup and recovery in the modern data center
Combining intelligence, operational analytics, and automation to enable efficient, data-driven IT organizations using the HP ABR approach.
Reg Reader Research: SaaS based Email and Office Productivity Tools
Read this Reg reader report which provides advice and guidance for SMBs towards the use of SaaS based email and Office productivity tools.