Feeds

VMware mashes up Hadoop with Spring

Corralling that big data elephant inside a framework

Choosing a cloud hosting partner with confidence

VMware has taken its Spring Java application framework and integrated it with the open source Apache Hadoop distribution to create a mashup that it's calling – somewhat unimaginatively – Spring Hadoop.

VMware might be the juggernaut of server virtualization and a serious contender for building public and private clouds, but the company knows that it has to move up the stack from infrastructure to application platforms if it wants to keep growing. That means leveraging the Spring application development framework to hook into all kinds of modern applications, such as the Hadoop MapReduce data muncher.

Analytics departments are bound to welcome anything that makes it easier to combine Apache Hadoop, its Hadoop Distributed File System, and add-ons such as the Pig high-level data analytics language, Hive data warehouse, and SQL-like ad hoc query language. That's the raison d'être of Spring Hadoop, which is part of the Spring Data "umbrella" that allows the framework to hook into relational databases, data grids, key/value stores, document stores, and MapReduce tools such as Hadoop.

VMware will unveil Spring Hadoop at the Strata Conference currently underway in Santa Clara, California, but Costin Leau, a staff engineer at the SpringSource division of VMware, let the elephant out of the bag in a blog post ahead of the formal unveiling of Spring Hadoop 1.0.0.M1.

"Whether one is writing stand-alone, vanilla MapReduce applications, interacting with data from multiple data stores across the enterprise, or coordinating a complex workflow of HDFS, Pig, or Hive jobs, or anything in between, Spring Hadoop stays true to the Spring philosophy offering a simplified programming model and addresses 'accidental complexity' caused by the infrastructure," Leau explained.

Spring Hadoop is available for download for free, and is open source under the Apache 2.0 license, just like Apache Hadoop and the Spring framework.

For you Hadoop-heads out there, here are the features that VMware is calling out with the first Spring Hadoop release:

  • Support for configuration, creation, and execution of MapReduce, Streaming, Hive, Pig, and Cascading jobs via the Spring container
  • Comprehensive HDFS data access support through JVM scripting languages (Groovy, JRuby, Jython, Rhino, etc.)
  • Declarative configuration support for HBase
  • Dedicated Spring Batch support for developing workflow solutions that incorporate HDFS operations and all types of Hadoop jobs
  • Support for use with Spring Integration that provides access to a wide range of existing systems using an extensible event-driven pipes and filters architecture
  • Hadoop configuration options and a templating mechanism for client connections to Hadoop
  • Declarative and programmatic support for Hadoop Tools, including FsShell and DistCp

Take a peek into the reference manual, and you'll see that you need to have systems configured with JDK 6.0 (the same as required by Hadoop itself) with Spring Framework 3.1 recommended, although the 3.0 release is technically supported as well. You can use Apache Hadoop 0.20.2, but the 1.0.0 release is also recommended by VMware. The Hadoop-released HBase 0.90.X, Hive 0.7.X, and Pig 0.9.X and above projects are supported. ®

Beginner's guide to SSL certificates

More from The Register

next story
NSA SOURCE CODE LEAK: Information slurp tools to appear online
Now you can run your own intelligence agency
Azure TITSUP caused by INFINITE LOOP
Fat fingered geo-block kept Aussies in the dark
Yahoo! blames! MONSTER! email! OUTAGE! on! CUT! CABLE! bungle!
Weekend woe for BT as telco struggles to restore service
Cloud unicorns are extinct so DiData cloud mess was YOUR fault
Applications need to be built to handle TITSUP incidents
BOFH: WHERE did this 'fax-enabled' printer UPGRADE come from?
Don't worry about that cable, it's part of the config
Stop the IoT revolution! We need to figure out packet sizes first
Researchers test 802.15.4 and find we know nuh-think! about large scale sensor network ops
SanDisk vows: We'll have a 16TB SSD WHOPPER by 2016
Flash WORM has a serious use for archived photos and videos
Astro-boffins start opening universe simulation data
Got a supercomputer? Want to simulate a universe? Here you go
Microsoft adds video offering to Office 365. Oh NOES, you'll need Adobe Flash
Lovely presentations... but not on your Flash-hating mobe
prev story

Whitepapers

Driving business with continuous operational intelligence
Introducing an innovative approach offered by ExtraHop for producing continuous operational intelligence.
Why CIOs should rethink endpoint data protection in the age of mobility
Assessing trends in data protection, specifically with respect to mobile devices, BYOD, and remote employees.
Forging a new future with identity relationship management
Learn about ForgeRock's next generation IRM platform and how it is designed to empower CEOS's and enterprises to engage with consumers.
High Performance for All
While HPC is not new, it has traditionally been seen as a specialist area – is it now geared up to meet more mainstream requirements?
Reducing the cost and complexity of web vulnerability management
How using vulnerability assessments to identify exploitable weaknesses and take corrective action can reduce the risk of hackers finding your site and attacking it.