Feeds

Google crowdsources card index for 'humanity's last library'

Garbage in, garbage out

The next step in data security

Google has responded to criticism of the quality of its books metadata - by inviting anyone to write anything they want. Before you read on, remember that Google Books could become the world's digital library by default - it's been called "the last library" - since nobody is likely to do the scanning ever again.

However, for researchers and scholars, a collection is only as good as its metadata - and the quality of the metadata at Google Books falls far short of any library in history. Last year Stanford linguist and columnist Geoffrey Nunberg writing in the Chronicle of Higher Education described the errors as "disastrous".

Nunberg found that potentially hundreds of thousands of books were misdated, with titles credited to authors before they were born. Google Books showed books from Victorian era discussing Jimi Hendrix, or the microprocessor, for example.

Freud had strong views on web browsers

Attribution errors commonly miscredited authors, with Madame Bovary credited to Henry James. And bizarre classification errors abound. A Mae West biography was filed under Religion, for example. Jane Eyre showed up under Love Stories, Architecture, and Antiques and Collectables. And on top of this mass of errors, was a superstructure of erroneous links. Google's "related books" rarely point to anything related.

In short, if this is humanity's last ever library, humanity's last ever scholars won't get very far with their research.

"Our reputation precedes us" - The Victorians discuss Jimi Hendrix

(We've also highlighted problems due to lack of care and attention at Google Books here.)

When Salon revisited Google Books earlier this month, things hadn't improved. And worse, the answer to 'garbage out' is 'more garbage in' - crowdsourcing.

A Google engineer called "SofiaF" now invites us to nominate books that are out of print. They're only suggestions, but given that none of us are as dumb as all of us, can we expect the quality of the metadata to improve? As with classification, knowing the copyright status of a work requires expertise, particularly the intricacies of territorial copyright. It's not something a helpful amateur with time on their hands can usefully do.

For Nunberg, Google's haste to complete the project is the problem - it prefers to get it finished, for competitive reasons, rather than devote expert resources to getting it right.

"People at Google are also saying, 'Let's crowdsource this,' but that is a stupid idea. You and I are both smart, knowledgeable people, but I wouldn't trust either of us to do the skilled work of cataloging a 1890 edition of Madame Bovary," Nunberg told Salon.

He suggests that Google devote more expert resources to the problem - which is expensive - and that librarians, who have up until now trusted Google Books to get it right, become more feisty and pro-active. ®

Related link

Google Book errors, illustrated [PDF, 1.6MB]

Choosing a cloud hosting partner with confidence

More from The Register

next story
Phones 4u slips into administration after EE cuts ties with Brit mobe retailer
More than 5,500 jobs could be axed if rescue mission fails
JINGS! Microsoft Bing called Scots indyref RIGHT!
Redmond sporran metrics get one in the ten ring
Driving with an Apple Watch could land you with a £100 FINE
Bad news for tech-addicted fanbois behind the wheel
Murdoch to Europe: Inflict MORE PAIN on Google, please
'Platform for piracy' must be punished, or it'll kill us in FIVE YEARS
Bono: Apple will sort out monetising music where the labels failed
Remastered so hard it would be difficult or impossible to master it again
Phones 4u website DIES as wounded mobe retailer struggles to stay above water
Founder blames 'ruthless network partners' for implosion
Sony says year's losses will be FOUR TIMES DEEPER than thought
Losses of more than $2 BILLION loom over troubled Japanese corp
Radio hams can encrypt, in emergencies, says Ofcom
Consultation promises new spectrum and hints at relaxed licence conditions
prev story

Whitepapers

Secure remote control for conventional and virtual desktops
Balancing user privacy and privileged access, in accordance with compliance frameworks and legislation. Evaluating any potential remote control choice.
WIN a very cool portable ZX Spectrum
Win a one-off portable Spectrum built by legendary hardware hacker Ben Heck
Intelligent flash storage arrays
Tegile Intelligent Storage Arrays with IntelliFlash helps IT boost storage utilization and effciency while delivering unmatched storage savings and performance.
High Performance for All
While HPC is not new, it has traditionally been seen as a specialist area – is it now geared up to meet more mainstream requirements?
Beginner's guide to SSL certificates
De-mystify the technology involved and give you the information you need to make the best decision when considering your online security options.