Feeds

Elders tell cluster tool Apache Spark it's time to quit chillin' in the crib

Hadoop Swiss Army knife software graduates from Incubator to full-blown project

Internet Security Threat Report 2014

The Apache Foundation has promoted a fast data-processing tool out of the Apache Incubator in a further sign of the maturity of the Hadoop family.

Apache Spark is a fast processing layer for computing data stored within the open-source Hadoop file system or other shared file systems such as NFS. It supports Scala, Java, and Python. In some tests it has demonstrated a speedup of 100 times over Hadoop when dealing with in-memory sets, and 10 times for hard-disk-held data.

On Sunday, Spark was unanimously voted to graduate from the Incubator, and some of those voting included Hadoop luminaries such as the technology's creator Doug Cutting.

Now that Spark has been promoted, a project management committee will be established for the software, and Databricks co-founder and former AMP Lab PHD student Matei Zaharia will be appointed to the role of 'Vice President, Apache Spark".

Like Hadoop, Spark has become the foundation for other data-processing engines as well, such as Shark for SQL-on-Hadoop queries, MLib for machine learning, Spark Streaming for dealing with streaming data, and GraphX for graph processing.

Some of the technology's users include Baidu, Databricks, IBM's Almaden research group, TrendMicro, Yahoo! and Alibaba.

The graduation of Apache Spark caps off a vertiginous rise for the data-processing system, which was created at the University of California at Berkeley's AMPLab in 2009 and was published as open source in 2010.

Since then, the system has gained a vigorous developer community, and more than 120 developers from 25 companies contribute source code. There seems to be enough activity around the software for businesses to smell money – as last week Hadoop hothouse Cloudera announced commercial support for the tool. ®

Internet Security Threat Report 2014

More from The Register

next story
Docker's app containers are coming to Windows Server, says Microsoft
MS chases app deployment speeds already enjoyed by Linux devs
IBM storage revenues sink: 'We are disappointed,' says CEO
Time to put the storage biz up for sale?
'Hmm, why CAN'T I run a water pipe through that rack of media servers?'
Leaving Las Vegas for Armenia kludging and Dubai dune bashing
Facebook slurps 'paste sites' for STOLEN passwords, sprinkles on hash and salt
Zuck's ad empire DOESN'T see details in plain text. Phew!
SDI wars: WTF is software defined infrastructure?
This time we play for ALL the marbles
Windows 10: Forget Cloudobile, put Security and Privacy First
But - dammit - It would be insane to say 'don't collect, because NSA'
Oracle hires former SAP exec for cloudy push
'We know Larry said cloud was gibberish, and insane, and idiotic, but...'
prev story

Whitepapers

Forging a new future with identity relationship management
Learn about ForgeRock's next generation IRM platform and how it is designed to empower CEOS's and enterprises to engage with consumers.
Cloud and hybrid-cloud data protection for VMware
Learn how quick and easy it is to configure backups and perform restores for VMware environments.
Three 1TB solid state scorchers up for grabs
Big SSDs can be expensive but think big and think free because you could be the lucky winner of one of three 1TB Samsung SSD 840 EVO drives that we’re giving away worth over £300 apiece.
Reg Reader Research: SaaS based Email and Office Productivity Tools
Read this Reg reader report which provides advice and guidance for SMBs towards the use of SaaS based email and Office productivity tools.
Security for virtualized datacentres
Legacy security solutions are inefficient due to the architectural differences between physical and virtual environments.