Feeds

How to build your own Watson Jeopardy! supermachine

To rule humanity, download the following open source code...

Secure remote control for conventional and virtual desktops

If you don't want your own Watson question-and-answer machine after watching the supercomputer whup the human race on Jeopardy! last week, you must be a lawyer. Only lawyers think they already have all the answers.

But if you grew up watching Robbie the Robot in Lost in Space, HAL in 2001: A Space Odyssey, the unnamed but certainly capitalized Computer in Star Trek, R2D2 in Star Wars, Ahh-nold in The Terminator, and Number Six in Battlestar Galactica – we'll stop now – you desperately want a Watson: something that can answer all your questions and maybe even rule the world. So why not build your own Watson-style QA machine?

As it turns out, the basic foundations are there for the taking.

Let's start with the iron – it really isn't that much hardware, after all. With the beta version of the Watson software, IBM started out with a few racks of its BlueGene/P parallel supercomputers, a grandson of the Deep Blue RS/6000 SP PowerParallel machine that played a chess tournament against Gary Kasparov – and beat him – back in 1997. But because the Watson effort was not just a technical challenge, but also a killer marketing campaign for the current Power7-based Power Systems lineup, Big Blue eventually switched the Watson DeepQA software stack to a cluster of Power 750 midrange servers.

To have enough memory and bandwidth to store all the necessary data, IBM put 90 of these Power 750 servers into ten server racks. Each server is configured with four of IBM's eight-core Power7 chips running at 3.55GHz. That gives Watson 2,880 cores and 11,520 threads on which to run its software stack. If the DeepQA software is thread-heavy – and there's every reason to believe it is – you'll need iron with lots of threads.

The 90 servers underpinning the Watson machine had a combined 16TB of main memory, but it looks like that was not evenly distributed across the nodes. The math works out to 182GB per machine, which is a silly, non-base-2 number.

David Gondek – who was in on both the system strategy and algorithms teams behind the "Blue J" project, as Watson was known internally – tells El Reg that the DeepQA system creates an in-memory database of the information that's pumped into the system. The machines are networked together, obviously, but being a software guy, Gondek didn't know what network IBM used. I would guess 40Gb/sec InfiniBand or 10 Gigabit Ethernet with Remote Direct Memory Access (RDMA) support to speed up the communication between nodes. Gondek said that the data that's put on memory and disk is replicated and distributed around the system for both speed and high availability.

The Watson box has 4TB of data capacity, which is not all that much, really. IBM did not say if it was disk drives or flash, but if most of the data used by Watson is stored in main memory, there is no reason to use the more expensive flash technology. But what the heck. Let's use flash anyway so it doesn't run so hot.

Because Linux is the fastest operating system on IBM's Power platforms (at least according to the SPEC family of benchmarks), Big Blue chose a variant of Linux to run on the Power 750 nodes. In this case, Novell's SUSE Linux Enterprise Server 11. SLES has a lot of tuning for HPC workloads and dominates supercomputers – although Red Hat is getting some traction in HPC now as Novell's fate has been uncertain for the past year or so: SGI has certified both SLES 11 and RHEL 6 on its latest massively parallel boxes as well as Windows Server 2008, where it formerly only did SLES on prior iron.

Internet Security Threat Report 2014

More from The Register

next story
The cloud that goes puff: Seagate Central home NAS woes
4TB of home storage is great, until you wake up to a dead device
Azure TITSUP caused by INFINITE LOOP
Fat fingered geo-block kept Aussies in the dark
You think the CLOUD's insecure? It's BETTER than UK.GOV's DATA CENTRES
We don't even know where some of them ARE – Maude
Intel offers ingenious piece of 10TB 3D NAND chippery
The race for next generation flash capacity now on
Want to STUFF Facebook with blatant ADVERTISING? Fine! But you must PAY
Pony up or push off, Zuck tells social marketeers
Oi, Europe! Tell US feds to GTFO of our servers, say Microsoft and pals
By writing a really angry letter about how it's harming our cloud business, ta
SAVE ME, NASA system builder, from my DEAD WORKSTATION
Anal-retentive hardware nerd in paws-on workstation crisis
prev story

Whitepapers

Choosing cloud Backup services
Demystify how you can address your data protection needs in your small- to medium-sized business and select the best online backup service to meet your needs.
Getting started with customer-focused identity management
Learn why identity is a fundamental requirement to digital growth, and how without it there is no way to identify and engage customers in a meaningful way.
Reg Reader Research: SaaS based Email and Office Productivity Tools
Read this Reg reader report which provides advice and guidance for SMBs towards the use of SaaS based email and Office productivity tools.
Choosing a cloud hosting partner with confidence
Download Choosing a Cloud Hosting Provider with Confidence to learn more about cloud computing - the new opportunities and new security challenges.
Intelligent flash storage arrays
Tegile Intelligent Storage Arrays with IntelliFlash helps IT boost storage utilization and effciency while delivering unmatched storage savings and performance.