Software

Wikidata makes Wikipedia a database. Let the fun begin

What are the ten largest cities with female mayors? For the first time, machines can answer

Aliens have invaded! They threaten to destroy the Earth, unless we can answer a simple question: What are the ten largest cities on the planet with female mayors?

Where would you even begin to answer that question? Back in the pre-Web era, you’d visit the information desk in the library, summon the reference librarian, pull every almanac for every nation and do a lot of photocopying and list-compiling.

Long before that, those aliens would have given us failing marks, leaving Earth a dissipating cloud of vapor. Oops.

Even now, in the era of information-at-your-fingertips and Wikipedia, this is a hard question to answer. The information in Wikipedia is designed to be digested by humans. Slow and error-prone, we’d get to an answer faster than before, but we’re still talking hours - and there’s likely to be at least a few mistakes. ZZZAP, vapor, great disturbance in the Force, etc.

Actually, I’d hope someone would be smart enough to feed all of that data into IBM’s Watson, which could then structure it into something relational and quickly searchable. A few JOINs later, and BLAM! - you’ve got an answer even before those aliens get an itch in their trigger tentacles.

That’s part of IBM’s sales pitch for Watson - it translates the human realm of information into something that can be understood and accelerated by software. It’s a worthy effort, and IBM will do bang-up business in years to come, just on that feature alone.

But there’s a lot of data in this world, and a lot of questions we could answer - if only we could bring the data to hand.

That, in essence, is the mission statement of the Wikidata project.

We float in a sea of data - not just the data companies like Google and Facebook gather about us, or the data that intelligence agencies hoover up about everyone, but the vast ocean of public data - shared out of generosity, or via mandate - that sits mostly unused because it’s just too hard.

Wikidata offers two possible solution for this problem: dump the nicely-structured data and metadata into Wikidata, which will then serve it up to others, or create the interfaces that allow any Wikidata user to access the data in your own systems. Centralised or decentralised: pick your flavour of data openness.

Wikidata already has sixteen million ‘items’ of data - much of which is structured metadata pointing back into Wikipedia. That’s hugely important, because rather than relying on an AI to suck in and understand Wikipedia, Wikidata presents Wikipedia as structured data. Wikipedia, via Wikidata, is now a giant database.

That presents some really interesting possibilities. Smartphone apps can make inquiries to Wikipedia, based on location, to find out what’s around. Where a city has integrated its own data into Wikidata, that search becomes much richer, as the city exposes itself in detail.

Put all of that into a bit of augmented reality kit, like Microsoft’s Hololens, and you’d really have something interesting. The transformation wrought by Wikidata is profound. Wikipedia has shown the importance of instant access to factual knowledge, but it has essentially ignored our machines. Wikidata closes that gap, giving our connected devices the same capacities to learn the facts of the world in real-time, just as we do.

This world-as-database has been a long time coming. In some ways it’s the fulfillment of Sir Tim Berners-Lee’s ‘semantic web’, where copious metadata acts as guide for both humans and machines. In other ways it’s the final realisation of big data - one database to rule them all.

In contrast to Wikipedia, it’s easy to imagine an ecology of very lucrative apps built atop Wikidata. (Follow that link to learn why that generated attention from all sorts of vested interests.) The data may be free and universally accessible, but how that data gets put to work to solve a problem will be a fertile area of commerce for at least the next generation.

Wikipedia has wisely resisted the siren’s lure of advertising, staying true to its ideals. Paradoxically, if Wikidata stays true to its own ideals of connecting and opening all the data, it amplifies the value of the commercial ecosystem it supports.

There was a time, back around ten years ago, when everything factual was being sucked into Wikipedia, It grew from 14,000 articles to tens of millions. That same virtuous cycle is nearly upon us with Wikidata, as everyone comes to understand the power - and the huge commercial value - of connecting and sharing data designed to be efficiently searched, explored, and built upon. ®

Sponsored: The Nuts and Bolts of Ransomware in 2016