IBM serves System S streaming super

Prototype lands in Canada

Gartner critical capabilities for enterprise endpoint backup

Information - which is what happens to data when you filter out useless stuff and add context so human beings can make decisions - cannot be easily generated or quickly integrated with business processes. The advent of the Internet and its various forms of media complicate the task of turning data into information, what government people are fond of calling "actionable intelligence." Mashing up various text, video, and audio streams with databases and other data storehouses is a grand challenge, one that needs something that looks and smells like a supercomputer.

Which is why the techies at IBM Research have been working for more than six years on a project called the System S. This stream computing system runs on IBM's BlueGene massively parallel supercomputing iron, but it puts the iron to work running very different kinds of software than is typically used in a supercomputer to do weather or financial modeling.

IBM started talking publicly, and very sketchily, about System S back in June 2007, and this week, the company announced that TD Securities, the investment banking arm of Toronto Dominion Bank, has taken the first prototype of the System S machine, which runs a bit of software that Big Blue calls InfoSphere Streams atop a BlueGene/P supercomputer.

The streaming software - which was created at the T.J. Watson Research Center, in Hawthorne, New York - is designed to not just do complex queries against data that is a bit more amorphous than fields stored in a database. But as the streaming part of its name suggests, it's also meant to continuously update that data as it changes in real-time.

As IBM explains in this whitepaper, in a normal information system, you ask a bunch of questions of a relatively static database and you get data that you need to make a decision. With a streaming system, huge amounts of raw information from as many sources as you can stomach are streamed into the box, and the InfoSphere Streams software keeps a database of your queries and constantly updates the data it provides to decision makers.

In one system, you ask a database to list all the people with the last name of Smith who live within 100 miles of the center of the city. With the System S, you ask that question once, and it taps all the available information you feed into it - government databases, Web traffic, email, GPS data, sensors, badge swipes, video feeds, audio feeds, what have you - and it tells you how the Smiths identified in the original query are moving around the city within a 100 mile radius in real-time (presumably when Smiths leave and new ones arrive).

Personally, I can't imagine why anyone would want such information, but remember System S when you are tweeting your freaking brains out like a teenager or sending text messages over your cell phone.

Anyway, the System S super is not just about surveillance, and TD Bank isn't interested in the box for that reason. But the same InfoSphere Streams software can be used to consume vast amounts of news feeds, financial information databases, and other sources of data to make decisions about stock trades, and TD Securities says that it has in fact put the System S through the paces and created an options trading system front-ended by the super that can process 21 times more information that the prior systems that the bank's securities trading experts have put together. (That doesn't mean people are 21 times smarter at using that data, unfortunately).

According to the Financial Information Forum, the amount of data generated by the securities and options trading systems in the world has been doubling every year since 2003, and TD Securities took the System S prototype from Big Blue because it wants to create an options trading system that will be able to cope with the data streams it expects two to three years from now. And IBM slapped the InfoSphere Streams software on the BlueGene/P, its most scalable server, to give TD Securities plenty of scalability room. That said, IBM says that the software works just fine on anywhere from 50 to 500 server nodes and that it did development, testing, and production on a one-rack BlueGene/P machine.

The BlueGene/P super, you will recall, puts four 850 MHz single-core PowerPC 450 chips onto a single processor card and then links them by symmetric multiprocessing so they can share 2 GB of DDR2 main memory. A single rack has 1,024 of these four-core processor nodes, if you can believe it, and would be rated at around 13.9 teraflops of number-crunching performance if it was running simulations.

The prototype options trading system build atop the System S setup is able to crunch 5 million options valuations per second, which is 20 times the record for this kind of trading, apparently. So System S can consume 21 times the data and do options trading 20 times faster. Milliseconds are millions of dollars in this racket, so it is hard to imagine IBM won't be selling these machines to every financial services firm very shortly.

This prototype System S machine installed at TD Securities runs Red Hat's Fedora 8 development Linux for PowerPC chips, which has been tweaked to support BlueGene hardware and software extensions. And strictly speaking, the InfoSphere Streams software is not supported on BlueGene/P iron. But clearly, if you have money, IBM has the support.

You can find a little more detail about the System S and the InfoSphere Streams software here. IBM plans to offer commercial versions of this platform in the first half of 2010, and my guess is that it will be on Power7-based server platforms, not BlueGene/P or its kickers. ®

Secure remote control for conventional and virtual desktops

More from The Register

next story
The Return of BSOD: Does ANYONE trust Microsoft patches?
Sysadmins, you're either fighting fires or seen as incompetents now
Microsoft: Azure isn't ready for biz-critical apps … yet
Microsoft will move its own IT to the cloud to avoid $200m server bill
Oracle reveals 32-core, 10 BEEELLION-transistor SPARC M7
New chip scales to 1024 cores, 8192 threads 64 TB RAM, at speeds over 3.6GHz
US regulators OK sale of IBM's x86 server biz to Lenovo
Now all that remains is for gov't offices to ban the boxes
Object storage bods Exablox: RAID is dead, baby. RAID is dead
Bring your own disks to its object appliances
Nimble's latest mutants GORGE themselves on unlucky forerunners
Crossing Sandy Bridges without stopping for breath
prev story


Implementing global e-invoicing with guaranteed legal certainty
Explaining the role local tax compliance plays in successful supply chain management and e-business and how leading global brands are addressing this.
Top 10 endpoint backup mistakes
Avoid the ten endpoint backup mistakes to ensure that your critical corporate data is protected and end user productivity is improved.
Top 8 considerations to enable and simplify mobility
In this whitepaper learn how to successfully add mobile capabilities simply and cost effectively.
Rethinking backup and recovery in the modern data center
Combining intelligence, operational analytics, and automation to enable efficient, data-driven IT organizations using the HP ABR approach.
Reg Reader Research: SaaS based Email and Office Productivity Tools
Read this Reg reader report which provides advice and guidance for SMBs towards the use of SaaS based email and Office productivity tools.