Feeds

EMC cranks Greenplum database to 4.2

Goosing Hadoop links and warehouse backups

Top three mobile application threats

Big data is not just a problem because it is big, but because it keeps swelling. That goes as much for traditional data warehouses as it does for more modern Hadoop MapReduce data munchers. And with the latest update of its eponymous database, the Greenplum division of IT conglomerate EMC has made some tweaks to its homegrown database to make wrestling with big data a bit easier.

The Greenplum Database is available in two forms, just like its predecessor. One runs on Greenplum's own hardware appliance (which is based on hardware from an unspecified server OEM partner), and the other is a software-only distribution that customers can run on any x86 server machine that supports Red Hat Enterprise Linux, Oracle Solaris, or Apple's OS X.

The Greenplum database is a parallelized and heavily customized version of the open source PostgreSQL database, and has been optimized for ad hoc queries instead of transaction processing. It is a massively parallel, shared-nothing database and has "polymorphic data storage" to allow database administrators to carve up a range of data in a database table and choice the row or column orientation a query should use as well as what storage, execution, or compression settings that should apply to this segment of data.

Like other data warehouse engines, the Greenplum Database is a heavy user of data compression to speed up queries and reduce disk storage capacity needs.

Greenplum's Hadoop distributions are similarly available on the same hardware appliances – with some tweaks – as well as a software-only product that can run on any Linux-based x86 server.

Back in December, Greenplum unveiled its long-range plan to mashup its data warehousing and Hadoop stacks to create a giant data muncher called the Unified Analytics Platform.

Greenplum Database 4.2

The building blocks of the Greenplum Database

With Greenplum Database 4.2, the EMC unit is doing a few different things. First, on as it promised back in December, Greenplum has tweaked its parallel data warehouse loading technology, called gNET, so it can import and export data in parallel from a warehouse to a Hadoop cluster.

Equally significant is that the gNET feature in the 4.2 release of the relational database actually allows for gNET to reach into the Hadoop cluster and query data right where it is sitting, using some of the Hadoop cluster resources instead of burdening the iron running the data warehouse.

"This used to be a read-only tool," explains Mike Maxey, senior director of product marketing at the Greenplum. "Now it leaves more data in Hadoop and does more processing inside Hadoop."

Greenplum Database 4.2 also includes a new management console called Command Center, which replaces an older tool called PerfMon that database admins have been using up until now. Maxey says Command Center, unlike PerfMon, is a Web-based tool and has more functions that database admins have been looking for, such as the ability to start, stop and initialize databases on the fly, recover and rebalance database mirrors, and search, prioritize, or cancel any query on the system.

Command Center is also able to reach out across the network into a Greenplum HD or MR Hadoop cluster and check the state of the cluster from inside this console. "Over time, Command Center will evolve to have broader and deeper coverage of the database and Hadoop platforms," says Maxey.

The initial release of Command Center is available initially with the Data Computing Appliance 1.2 system, and will eventually be available in the software-only distribution.

The 4.2 release of the database has the requisite performance tweaks, including dynamic partition elimination and query memory optimization. The database also has a new package manager that does automatic installation and updating of extensions to the database on a running system with multiple nodes and different features running hither and yon.

Finally, EMC has integrated its Data Domain Boost data de-duplication backup software with Greenplum Database 4.2. In benchmark tests, EMC was able to back up a 173TB data warehouse in under less than eight hours. This was achieved by spreading parts of the Data Domain de-duping operations over the data warehouse nodes in an appliance, thus parallelizing the massive job and making the backup run faster because the de-duping was faster.

At the Strata Conference today in Santa Clara, California, in addition to launching the new database release, Greenplum is also talking up its ability to run Greenplum MR Hadoop atop Cisco Systems' C-Series rack-based servers. El Reg already told you all about that two weeks ago. ®

Seven Steps to Software Security

More from The Register

next story
NO MORE ALL CAPS and other pleasures of Visual Studio 14
Unpicking a packed preview that breaks down ASP.NET
Captain Kirk sets phaser to SLAUGHTER after trying new Facebook app
William Shatner less-than-impressed by Zuck's celebrity-only app
Apple fanbois SCREAM as update BRICKS their Macbook Airs
Ragegasm spills over as firmware upgrade kills machines
Cheer up, Nokia fans. It can start making mobes again in 18 months
The real winner of the Nokia sale is *drumroll* ... Nokia
Mozilla fixes CRITICAL security holes in Firefox, urges v31 upgrade
Misc memory hazards 'could be exploited' - and guess what, one's a Javascript vuln
Put down that Oracle database patch: It could cost $23,000 per CPU
On-by-default INMEMORY tech a boon for developers ... as long as they can afford it
Google shows off new Chrome OS look
Athena springs full-grown from Chromium project's head
prev story

Whitepapers

Top three mobile application threats
Prevent sensitive data leakage over insecure channels or stolen mobile devices.
Implementing global e-invoicing with guaranteed legal certainty
Explaining the role local tax compliance plays in successful supply chain management and e-business and how leading global brands are addressing this.
Boost IT visibility and business value
How building a great service catalog relieves pressure points and demonstrates the value of IT service management.
Designing a Defense for Mobile Applications
Learn about the various considerations for defending mobile applications - from the application architecture itself to the myriad testing technologies.
Build a business case: developing custom apps
Learn how to maximize the value of custom applications by accelerating and simplifying their development.