Revealed: Google's manual for its unseen humans who rate the web
Technology? Yes, but also toiling home-workers
It's widely believed that Google search results are produced entirely by computer algorithms - in large part because Google would like this to be widely believed. But in fact a little-known group of home-worker humans plays a large part in the Google process. The way these raters go about their work has always been a mystery. Now, The Register has seen a copy of the guidelines Google issues to them.
The 160-page manual gives detailed advice for raters - on relevance, spamminess, and - more controversially - the elusive "quality". For relevance raters are advised to give a rating based on "Vital", "Useful", "Relevant", Slightly Relevant", "Off-Topic or Useless" or "Unratable".
Raters may also be asked to give a spam rating: "Not Spam", "Maybe Spam", "Spam", "Porn" and "Malicious".
Interestingly, raters are not advised to rate websites with out of date security certificates as Spam or Malicious. At the time the rating guide was written, the US army portal - for instance - currently used an out-of-date certificate.
Users are asked to second guess "user intent". "What was the user trying to accomplish when he typed this query?" asks the manual. Google classifies intentions into three categories: the first is "action intent" - a user wanting to "accomplish a goal or engage in an activity". Then there are what the Chocolate Factory calls "do queries" and navigational, or "go queries". They're not mutually exclusive, the guide stresses, and some are ambivalent: such as the search query "iPad".
Raters are advised to look for websites with content fresher than four months old - if it's older, it shouldn't be rated "Vital".
Much of this part of the guideline document is intended to cope with sites attempting to game Google. For example, this blog is cited as an example of "gibberish". Google's PageRank system was originally devised to rank authority according to popularity. This worked for academic papers, where frequently-cited documents, tended to be the most important. Other tweaks were then added. But the increasing popularity of weblogs in 2003 caused all kinds of problems for Google, as they gamed the PageRank algorithm so effectively: creating a rats nest of links.
By 2006, automated tools could create hundreds of blogs in just a few minutes - see our contemporary interview with the author of 'Blog Mass Installer' - populating them with machine-generated content that even humans found hard to distinguish from a human-generated site. This also posed an ethical business dilemma for Google, which had begun to grow rapidly from low-cost keyword search advertising placed on blogs. Google needed the blogs to help it grow, as each blog was a potential advertising space. But it couldn't afford to populate the search results with low quality, spammy blog results.
It's actually a reminder of how tricky it is to create good search results. What appears obvious to us - that a chain of hotels for pets is not suitable for a search query "hotels" - is not obvious to an algorithm. But isn't a pet hotel part of the web's rich tapestry, too? It's a deeply subjective decision. Here's where humans come in: it's astonishing to think such a decision isn't a subjective human choice - and a sign that we childishly believe computers are magic.
Google's human raters must also make decisions on pornographic material. Here, too, the Google Rater has to decide what the searcher's intention is. The example of "spanking" is cited: information on parents spanking children from the University of Maine is regarded as "relevant", a page about spanking fetish is "Slightly Relevant" and triggers the Porn flag. Porn is still deemed relevant - just not so much.
"Please do not assign a Porn flag to a non-porn page, just because the query has porn intent. If the landing page is not porn, it should not be flagged", says the guide.
But a subjective rating isn't all that there is. In addition to relevance, there's Page Quality - and that's a far more controversial and ambivalent yardstick.
Raters are invited to infer a website's reputation. For example, Google asks Raters: "What kind of Reputation Does the Website Have? ... negative or malicious reputation ... Mixed reputation ... Positive or OK reputation ... little or not information found ..."
It goes on to explain:
"Reputation research in Page Quality rating is very important. A positive reputation from a consensus of experts is often what distinguishes an overall Highest quality page from a High quality page. A negative reputation should not be ignored and is a reason to give an overall Page Quality rating of Low or Lowest."
It's controversial for a number of reasons. The web isn't a reliable feedback system - anonymous complaints are noisy and rife, and may not be representative. A site's detractors may also be motivated by an agenda that isn't obvious to a rater. And the Google advice to look for "a consensus of experts" doesn't always help. It depends on who the "experts" are. As an example, some academics - such as Evgeny Morozov - have already called for search engines to put warnings by climate sites that disagree with the "consensus" - fully entering into the editorial process.
Google is sensitive to the accusation that contractors could game the system. Matt Cutts insisted last year that "even if multiple search quality raters mark something as spam or non-relevant, that doesn't affect a site's rankings or throw up a flag". So, Google employs a network of site raters, devises a complex manual for them to follow, then ignores their judgements?
Who are the Raters?
Google's outsources the ratings to contractors Leapforce and Lionbridge, who employ home workers. Lionbridge describes itself as a "global crowdsourcing" agency and lists the advertisements here. According to one Leapforce job ad there are 1,500 raters. The work is flexible but demanding - raters must pass an examination and are consistently evaluated by Google. For example, a rater is given a "TTR" score - "Time to Rate" measures how quickly they make their decisions. Here's one contractor's tale, and an interview at SEO site SearchEngineLand with another.
It's amazing how the image Google likes to promote - and politicians believe - one of high tech boffinry and magical algorithms, contrasts with the reality. Outsourced home workers are keeping the machine running. Glamorous, it isn't. ®
Re: Devil's Advocate
The linked article (interview with LionBridge user) states:
One thing I think the SEO community is missing is that this program has nothing to do with SEO or rankings. What this program does is help Google refine their algorithm. For example, the Side-by-Side tasks show the results as they are next to the results with the new algorithm change in them. Google doesn’t hire these raters to rate the web; they hire them to rate how they are doing in matching users queries with the best source of information.
If it looks too good to be true..
Judging from the general thrust of comments on one of the blogs linked in the article, a job as a rater doesn't seem to be much different than the "work from home" horror stories of yesteryear. The ones where poor/desperate people get sucked in to investing time and money in 'home assembling ballpoint pens' or 'stuffing envelopes for marketing companies", only to find they either make no money for a lot of effort or in fact lose money due to up front investment in time that could be spent more productively, purchasing materials or paying for compulsory 'training'.
Prospective raters appear to be required to take an initial simple test based on the instruction manual (probably to weed out those that can't actually read). After this they are then required to take a 140+ question 'test' evaluating actual sites. What is interesting is that a lot of the posters appear to have to "wait" until some test data is available (why aren't they using a bank of standardised tests?), and when they inevitably fail they are either immediately hit with a request to re-take the test with new data (some several times) or after a long period of begging they are suddenly 'allowed' to re-take it (also presumably reflecting that there may be a load of actual work at that time or they have to wait until more becomes available). Of course they don't get paid for any of this testing!
A suspicious person might conclude that most of the 'testing' is unpaid processing of new test data supplied by actual customers in a bid to keep overheads down and profits up. After all they already have a basic ability to weight the candidate's findings based on the short initial test, and since they don't have to pay these chumps they are probably amalgamating the results across a whole bunch of them plus those of them a small number of paid testers whom they have already found to be reliable, classic crowdsourcing with a twist.
There are a few really positive comments that reek of astroturfing, and a few more genuine-looking ones from people who are getting paid, most claiming they book 8 or 9 hours of work a week, but that they have to invest significantly more than that in the research side of the evaluation which they cannot book. One guy claims the going rate is 9 Euros an hour (i.e. about a quid more than UK minimum wage), plus one suspects the 'contractor' is responsible for all tax, NI or other equivalents plus 30-60 days payment terms. Factor in the non-paid research overheads and you'd probably be better off getting a paper round.
Interesting job requirements
"This is a Personalized Search Engine Evaluator position. As a Personalized Search Engine Evaluator, you will be given tasks that are generated from your personalized content based on your Google account linked to your Gmail address that you use to register with Leapforce. Ideal candidates will be highly active users of Google's search engine and other products; use Google Play at least once per week; use Google+ more than once per month and have more than 11 people per circle and have a Gmail account with web history turned on."
Interesting that you have to have Web history turned on to get the job, I wonder why.. Are they going look it up?
Also interesting they only list these requirements for jobs outside the US, Canada and Egypt. Employer law issues? Maybe rejecting candidates for their web history or Google+ posts isn't well received by some authorities?
Alas I miss on all of these, shame.