Feeds

Google crowdsources card index for 'humanity's last library'

Garbage in, garbage out

Top three mobile application threats

Google has responded to criticism of the quality of its books metadata - by inviting anyone to write anything they want. Before you read on, remember that Google Books could become the world's digital library by default - it's been called "the last library" - since nobody is likely to do the scanning ever again.

However, for researchers and scholars, a collection is only as good as its metadata - and the quality of the metadata at Google Books falls far short of any library in history. Last year Stanford linguist and columnist Geoffrey Nunberg writing in the Chronicle of Higher Education described the errors as "disastrous".

Nunberg found that potentially hundreds of thousands of books were misdated, with titles credited to authors before they were born. Google Books showed books from Victorian era discussing Jimi Hendrix, or the microprocessor, for example.

Freud had strong views on web browsers

Attribution errors commonly miscredited authors, with Madame Bovary credited to Henry James. And bizarre classification errors abound. A Mae West biography was filed under Religion, for example. Jane Eyre showed up under Love Stories, Architecture, and Antiques and Collectables. And on top of this mass of errors, was a superstructure of erroneous links. Google's "related books" rarely point to anything related.

In short, if this is humanity's last ever library, humanity's last ever scholars won't get very far with their research.

"Our reputation precedes us" - The Victorians discuss Jimi Hendrix

(We've also highlighted problems due to lack of care and attention at Google Books here.)

When Salon revisited Google Books earlier this month, things hadn't improved. And worse, the answer to 'garbage out' is 'more garbage in' - crowdsourcing.

A Google engineer called "SofiaF" now invites us to nominate books that are out of print. They're only suggestions, but given that none of us are as dumb as all of us, can we expect the quality of the metadata to improve? As with classification, knowing the copyright status of a work requires expertise, particularly the intricacies of territorial copyright. It's not something a helpful amateur with time on their hands can usefully do.

For Nunberg, Google's haste to complete the project is the problem - it prefers to get it finished, for competitive reasons, rather than devote expert resources to getting it right.

"People at Google are also saying, 'Let's crowdsource this,' but that is a stupid idea. You and I are both smart, knowledgeable people, but I wouldn't trust either of us to do the skilled work of cataloging a 1890 edition of Madame Bovary," Nunberg told Salon.

He suggests that Google devote more expert resources to the problem - which is expensive - and that librarians, who have up until now trusted Google Books to get it right, become more feisty and pro-active. ®

Related link

Google Book errors, illustrated [PDF, 1.6MB]

Build a business case: developing custom apps

More from The Register

next story
Stick a 4K in them: Super high-res TVs are DONE
4,000 pixels is niche now... Don't say we didn't warn you
BBC goes offline in MASSIVE COCKUP: Stephen Fry partly muzzled
Auntie tight-lipped as major outage rolls on
Philip K Dick 'Nazi alternate reality' story to be made into TV series
Amazon Studios, Ridley Scott firm to produce The Man in the High Castle
iPad? More like iFAD: We reveal why Apple fell into IBM's arms
But never fear fanbois, you're still lapping up iPhones, Macs
Amazon Reveals One Weird Trick: A Loss On Almost $20bn In Sales
Investors really hate it: Share price plunge as growth SLOWS in key AWS division
Bose says today is F*** With Dre Day: Beats sued in patent battle
Music gear giant seeks some of that sweet, sweet Apple pie
There's NOTHING on TV in Europe – American video DOMINATES
Even France's mega subsidies don't stop US content onslaught
You! Pirate! Stop pirating, or we shall admonish you politely. Repeatedly, if necessary
And we shall go about telling people you smell. No, not really
Too many IT conferences to cover? MICROSOFT to the RESCUE!
Yet more word of cuts emerges from Redmond
prev story

Whitepapers

Designing a Defense for Mobile Applications
Learn about the various considerations for defending mobile applications - from the application architecture itself to the myriad testing technologies.
Implementing global e-invoicing with guaranteed legal certainty
Explaining the role local tax compliance plays in successful supply chain management and e-business and how leading global brands are addressing this.
Top 8 considerations to enable and simplify mobility
In this whitepaper learn how to successfully add mobile capabilities simply and cost effectively.
Seven Steps to Software Security
Seven practical steps you can begin to take today to secure your applications and prevent the damages a successful cyber-attack can cause.
Boost IT visibility and business value
How building a great service catalog relieves pressure points and demonstrates the value of IT service management.