Feeds

Revealed: Google's manual for its unseen humans who rate the web

Technology? Yes, but also toiling home-workers

  • alert
  • submit to reddit

Intelligent flash storage arrays

It's widely believed that Google search results are produced entirely by computer algorithms - in large part because Google would like this to be widely believed. But in fact a little-known group of home-worker humans plays a large part in the Google process. The way these raters go about their work has always been a mystery. Now, The Register has seen a copy of the guidelines Google issues to them.

The 160-page manual gives detailed advice for raters - on relevance, spamminess, and - more controversially - the elusive "quality". For relevance raters are advised to give a rating based on "Vital", "Useful", "Relevant", Slightly Relevant", "Off-Topic or Useless" or "Unratable".

Raters may also be asked to give a spam rating: "Not Spam", "Maybe Spam", "Spam", "Porn" and "Malicious".

Interestingly, raters are not advised to rate websites with out of date security certificates as Spam or Malicious. At the time the rating guide was written, the US army portal - for instance - currently used an out-of-date certificate.

Users are asked to second guess "user intent". "What was the user trying to accomplish when he typed this query?" asks the manual. Google classifies intentions into three categories: the first is "action intent" - a user wanting to "accomplish a goal or engage in an activity". Then there are what the Chocolate Factory calls "do queries" and navigational, or "go queries". They're not mutually exclusive, the guide stresses, and some are ambivalent: such as the search query "iPad".

Raters are advised to look for websites with content fresher than four months old - if it's older, it shouldn't be rated "Vital".

Much of this part of the guideline document is intended to cope with sites attempting to game Google. For example, this blog is cited as an example of "gibberish". Google's PageRank system was originally devised to rank authority according to popularity. This worked for academic papers, where frequently-cited documents, tended to be the most important. Other tweaks were then added. But the increasing popularity of weblogs in 2003 caused all kinds of problems for Google, as they gamed the PageRank algorithm so effectively: creating a rats nest of links.

By 2006, automated tools could create hundreds of blogs in just a few minutes - see our contemporary interview with the author of 'Blog Mass Installer' - populating them with machine-generated content that even humans found hard to distinguish from a human-generated site. This also posed an ethical business dilemma for Google, which had begun to grow rapidly from low-cost keyword search advertising placed on blogs. Google needed the blogs to help it grow, as each blog was a potential advertising space. But it couldn't afford to populate the search results with low quality, spammy blog results.

It's actually a reminder of how tricky it is to create good search results. What appears obvious to us - that a chain of hotels for pets is not suitable for a search query "hotels" - is not obvious to an algorithm. But isn't a pet hotel part of the web's rich tapestry, too? It's a deeply subjective decision. Here's where humans come in: it's astonishing to think such a decision isn't a subjective human choice - and a sign that we childishly believe computers are magic.

Google joked that trained pigeons rate the web. In fact, it's humans.

Google's human raters must also make decisions on pornographic material. Here, too, the Google Rater has to decide what the searcher's intention is. The example of "spanking" is cited: information on parents spanking children from the University of Maine is regarded as "relevant", a page about spanking fetish is "Slightly Relevant" and triggers the Porn flag. Porn is still deemed relevant - just not so much.

"Please do not assign a Porn flag to a non-porn page, just because the query has porn intent. If the landing page is not porn, it should not be flagged", says the guide.

But a subjective rating isn't all that there is. In addition to relevance, there's Page Quality - and that's a far more controversial and ambivalent yardstick.

Raters are invited to infer a website's reputation. For example, Google asks Raters: "What kind of Reputation Does the Website Have? ... negative or malicious reputation ... Mixed reputation ... Positive or OK reputation ... little or not information found ..."

It goes on to explain:

"Reputation research in Page Quality rating is very important. A positive reputation from a consensus of experts is often what distinguishes an overall Highest quality page from a High quality page. A negative reputation should not be ignored and is a reason to give an overall Page Quality rating of Low or Lowest."

It's controversial for a number of reasons. The web isn't a reliable feedback system - anonymous complaints are noisy and rife, and may not be representative. A site's detractors may also be motivated by an agenda that isn't obvious to a rater. And the Google advice to look for "a consensus of experts" doesn't always help. It depends on who the "experts" are. As an example, some academics - such as Evgeny Morozov - have already called for search engines to put warnings by climate sites that disagree with the "consensus" - fully entering into the editorial process.

Google is sensitive to the accusation that contractors could game the system. Matt Cutts insisted last year that "even if multiple search quality raters mark something as spam or non-relevant, that doesn't affect a site's rankings or throw up a flag". So, Google employs a network of site raters, devises a complex manual for them to follow, then ignores their judgements?

Who are the Raters?

Google's outsources the ratings to contractors Leapforce and Lionbridge, who employ home workers. Lionbridge describes itself as a "global crowdsourcing" agency and lists the advertisements here. According to one Leapforce job ad there are 1,500 raters. The work is flexible but demanding - raters must pass an examination and are consistently evaluated by Google. For example, a rater is given a "TTR" score - "Time to Rate" measures how quickly they make their decisions. Here's one contractor's tale, and an interview at SEO site SearchEngineLand with another.

It's amazing how the image Google likes to promote - and politicians believe - one of high tech boffinry and magical algorithms, contrasts with the reality. Outsourced home workers are keeping the machine running. Glamorous, it isn't. ®

Top 5 reasons to deploy VMware with Tegile

Whitepapers

Why cloud backup?
Combining the latest advancements in disk-based backup with secure, integrated, cloud technologies offer organizations fast and assured recovery of their critical enterprise data.
Forging a new future with identity relationship management
Learn about ForgeRock's next generation IRM platform and how it is designed to empower CEOS's and enterprises to engage with consumers.
5 critical considerations for enterprise cloud backup
Key considerations when evaluating cloud backup solutions to ensure adequate protection security and availability of enterprise data.
High Performance for All
While HPC is not new, it has traditionally been seen as a specialist area – is it now geared up to meet more mainstream requirements?
Business security measures using SSL
Examines the major types of threats to information security that businesses face today and the techniques for mitigating those threats.