More like this

Data Centre

IBM lobs 3,500 staffers at Apache Spark

Big Blue researchers pile into cluster parade

IBM has thrown its full weight behind Spark, Apache’s open-source cluster computing framework.

Spark will form the basis of all of Big Blue's analytics and commerce platforms and its Watson Health Cloud. The framework will also be sold as a service on its Bluemix cloud.

IBM will commit more than 3,500 of its researchers and developers to Spark-related projects and promised a Spark Technology Center in San Francisco, California where data science and developers can work with IBM designers and architects.

The giant also committed to release, under open source terms, its SystemML family machine-learning libraries.

Spark was invented by researchers at the University of California at Berkeley in 2009, under Matei Zaharia, and donated to Apache in 2013.

Written in Java, Scala and Python, Spark is an in-memory system for processing large data sets. It consists of scheduling and dispatching, SQL-style programming language, a machine-learning framework and distributed graphics processing framework.

Spark can scale to more than 8,000 production nodes and, while it works with Hadoop and MapReduce, is claimed to also be faster on certain workloads. Up until last year, Spark had just 465 contributors.

The presence of IBM can make or break open-source projects.

IBM adopted the Eclipse framework early on, making it the basis of its Rational programming tools. Serving as the foundation of IBM’s tools helped establish Eclipse as one industry’s biggest development environments, behind Microsoft’s Visual Studio, and guaranteed an entire ecosystem of ISVs building Eclipse plug-ins.

It’s been a virtuous circle: IBM is freed from having to maintain the IDE plumbing, ISVs and devs got an open, pluggable tools platform, and IBM benefits from advances and partners.

On the other extreme, you have Harmony – also an Apache project, for an independent alternative to Java from the now non-existent Sun Microsystems.

IBM threw in its lot because it vied with Sun for stewardship over Java.

When Sun ceased to exist, bought by Oracle, IBM withdrew from Harmony in October 2010 to join the OpenJDK project with Apple and Oracle.

Drained of its biggest backer, Harmony shut down 12 months later.

Oracle sought to make amends of a kind with Apache in 2011 by punting its OpenOffice productivity suite over the open-source project shop’s auspices.

Announcing its backing for Apache's Spark Monday, IBM painted Spark as a platform for data and analytics, the analogy being Linux – which IBM also contributes to – as a platform for apps.

The parallel, though, would seem closer to Eclipse. ®

Sponsored: Boost business agility and insight with flash storage for analytics