Feeds

'The most ambitious project at eBay for a long, long time'

Inside the auction leviathan's search megaproject

Choosing a cloud hosting partner with confidence

More data, more Hadoop, more patterns

Cassini exploits Williams' background in finding ways to improve search in order to hugely expand eBay's use of Hadoop to search. The secret to success is not a crushingly hyped idea like natural-language query or committing yourself to some mystical algorithm alchemy. Rather it is making use of the data you have and will continue to amass on users' searches and search behavior – mining it to see what customers want. Doing this will make Cassini more intuitive than Voyager at working out what it is eBay's customers want, Williams reckons.

He gives the example of searching for the Snowboard Kids game for the Wii. Type "snowboard kids" into eBay today and, yes, you'll get the game, but you'll also get snowboards, goggles, boots, gloves, jackets and much, much more. For kids.

"Voyager doesn't understand the past behaviors of users and intent behind users queries," Williams said.

Cassini will mine data from eBay's 97 million active users using Hadoop in a massively parallel and distributed architecture in order to rank different items. Data patterns will be identified by crunching information on – among other things – corrections made by users to searches, contracted acronyms, expanded acronyms and words that are in different languages.

'We will throw more data at [Hadoop] – more data and mining of that data to create richer tasks' – Hugh Williams

"Understanding the user intent is a data rather than an algorithm task," Williams asserts. "We have been using Hadoop for a while – it has been around for two years. We haven't used it extensively for the things we are talking about now, but it was a component in the Voyager system. We will throw more data at it – more data and mining of that data to create richer tasks."

Hadoop is the open-source architecture inspired by Google's MapReduce – and initially championed by Yahoo! – to process huge sets of data by harnessing the power of large numbers of clustered servers. Hadoop's code is available under an Apache Software Foundation licence but it has received commercial support from Cloudera, the start-up that is home to Hadoop founder Doug Cutting. In June Yahoo! spun out the remaining members of its Hadoop engineering team to create Horton Works, backed by the venture capitalist Rob Bearden from Benchmark Capital. Today Hadoop is used by Facebook and Twitter among other web-scale giants besides eBay.

At eBay, Hadoop is used by the search science team. William's engineers are working a Hadoop engineering team and a Hadoop product team elsewhere at eBay and working on eBay's implementation. The engineering team works on the changes to Hadoop and Hbase, the Hadoop database modelled on Google's BigTable for distributed storage, which eBay also uses. The team refines things like scheduling and makes sure the right jobs have the right priorities on eBay's Hadoop set-up, while delivering as much concurrency as possible across the thousands of servers running Hadoop.

The product team, meanwhile, deals with what comes out of the Hadoop changes that the first group have built. They clean the data spat out, manage the grid, and work with the 41 eBay marketplaces that rely on the search service and, by extension, Hadoop.

Real-time challenge

The search science team is working with the other two groups under Williams' control: the search back-end and search front-end teams. The back-end team is taking items from customers; processing and tagging them; constructing a product and shipping index; working with the caching layer; and handling updates. Twenty per cent of the goods in eBay leave the system each day, making this full-time process. "It's probably the most challenging real-time search environment," Williams said.

The front-end team forms works on presenting the search results. The interface is built using mostly Java with some HTML, CSS and Javascript.

Williams says that Cassini is one of the most ambitious projects he has been involved with, but at least he has the Bing project in his back pocket. "It's technically very hard in the search world and it's an engineering challenge because it involves so many people," Williams says. ®

Top 5 reasons to deploy VMware with Tegile

More from The Register

next story
'Kim Kardashian snaps naked selfies with a BLACKBERRY'. *Twitterati gasps*
More alleged private, nude celeb pics appear online
Wanna keep your data for 1,000 YEARS? No? Hard luck, HDS wants you to anyway
Combine Blu-ray and M-DISC and you get this monster
US boffins demo 'twisted radio' mux
OAM takes wireless signals to 32 Gbps
Google+ GOING, GOING ... ? Newbie Gmailers no longer forced into mandatory ID slurp
Mountain View distances itself from lame 'network thingy'
Apple flops out 2FA for iCloud in bid to stop future nude selfie leaks
Millions of 4chan users howl with laughter as Cupertino slams stable door
Students playing with impressive racks? Yes, it's cluster comp time
The most comprehensive coverage the world has ever seen. Ever
Run little spreadsheet, run! IBM's Watson is coming to gobble you up
Big Blue's big super's big appetite for big data in big clouds for big analytics
Seagate's triple-headed Cerberus could SAVE the DISK WORLD
... and possibly bring us even more HAMR time. Yay!
prev story

Whitepapers

Secure remote control for conventional and virtual desktops
Balancing user privacy and privileged access, in accordance with compliance frameworks and legislation. Evaluating any potential remote control choice.
Intelligent flash storage arrays
Tegile Intelligent Storage Arrays with IntelliFlash helps IT boost storage utilization and effciency while delivering unmatched storage savings and performance.
WIN a very cool portable ZX Spectrum
Win a one-off portable Spectrum built by legendary hardware hacker Ben Heck
High Performance for All
While HPC is not new, it has traditionally been seen as a specialist area – is it now geared up to meet more mainstream requirements?
Beginner's guide to SSL certificates
De-mystify the technology involved and give you the information you need to make the best decision when considering your online security options.