Feeds

EMC cranks Greenplum database to 4.2

Goosing Hadoop links and warehouse backups

HP ProLiant Gen8: Integrated lifecycle automation

Big data is not just a problem because it is big, but because it keeps swelling. That goes as much for traditional data warehouses as it does for more modern Hadoop MapReduce data munchers. And with the latest update of its eponymous database, the Greenplum division of IT conglomerate EMC has made some tweaks to its homegrown database to make wrestling with big data a bit easier.

The Greenplum Database is available in two forms, just like its predecessor. One runs on Greenplum's own hardware appliance (which is based on hardware from an unspecified server OEM partner), and the other is a software-only distribution that customers can run on any x86 server machine that supports Red Hat Enterprise Linux, Oracle Solaris, or Apple's OS X.

The Greenplum database is a parallelized and heavily customized version of the open source PostgreSQL database, and has been optimized for ad hoc queries instead of transaction processing. It is a massively parallel, shared-nothing database and has "polymorphic data storage" to allow database administrators to carve up a range of data in a database table and choice the row or column orientation a query should use as well as what storage, execution, or compression settings that should apply to this segment of data.

Like other data warehouse engines, the Greenplum Database is a heavy user of data compression to speed up queries and reduce disk storage capacity needs.

Greenplum's Hadoop distributions are similarly available on the same hardware appliances – with some tweaks – as well as a software-only product that can run on any Linux-based x86 server.

Back in December, Greenplum unveiled its long-range plan to mashup its data warehousing and Hadoop stacks to create a giant data muncher called the Unified Analytics Platform.

Greenplum Database 4.2

The building blocks of the Greenplum Database

With Greenplum Database 4.2, the EMC unit is doing a few different things. First, on as it promised back in December, Greenplum has tweaked its parallel data warehouse loading technology, called gNET, so it can import and export data in parallel from a warehouse to a Hadoop cluster.

Equally significant is that the gNET feature in the 4.2 release of the relational database actually allows for gNET to reach into the Hadoop cluster and query data right where it is sitting, using some of the Hadoop cluster resources instead of burdening the iron running the data warehouse.

"This used to be a read-only tool," explains Mike Maxey, senior director of product marketing at the Greenplum. "Now it leaves more data in Hadoop and does more processing inside Hadoop."

Greenplum Database 4.2 also includes a new management console called Command Center, which replaces an older tool called PerfMon that database admins have been using up until now. Maxey says Command Center, unlike PerfMon, is a Web-based tool and has more functions that database admins have been looking for, such as the ability to start, stop and initialize databases on the fly, recover and rebalance database mirrors, and search, prioritize, or cancel any query on the system.

Command Center is also able to reach out across the network into a Greenplum HD or MR Hadoop cluster and check the state of the cluster from inside this console. "Over time, Command Center will evolve to have broader and deeper coverage of the database and Hadoop platforms," says Maxey.

The initial release of Command Center is available initially with the Data Computing Appliance 1.2 system, and will eventually be available in the software-only distribution.

The 4.2 release of the database has the requisite performance tweaks, including dynamic partition elimination and query memory optimization. The database also has a new package manager that does automatic installation and updating of extensions to the database on a running system with multiple nodes and different features running hither and yon.

Finally, EMC has integrated its Data Domain Boost data de-duplication backup software with Greenplum Database 4.2. In benchmark tests, EMC was able to back up a 173TB data warehouse in under less than eight hours. This was achieved by spreading parts of the Data Domain de-duping operations over the data warehouse nodes in an appliance, thus parallelizing the massive job and making the backup run faster because the de-duping was faster.

At the Strata Conference today in Santa Clara, California, in addition to launching the new database release, Greenplum is also talking up its ability to run Greenplum MR Hadoop atop Cisco Systems' C-Series rack-based servers. El Reg already told you all about that two weeks ago. ®

Top three mobile application threats

More from The Register

next story
Apple fanbois SCREAM as update BRICKS their Macbook Airs
Ragegasm spills over as firmware upgrade kills machines
HIDDEN packet sniffer spy tech in MILLIONS of iPhones, iPads – expert
Don't panic though – Apple's backdoor is not wide open to all, guru tells us
NO MORE ALL CAPS and other pleasures of Visual Studio 14
Unpicking a packed preview that breaks down ASP.NET
Captain Kirk sets phaser to SLAUGHTER after trying new Facebook app
William Shatner less-than-impressed by Zuck's celebrity-only app
Do YOU work at Microsoft? Um. Are you SURE about that?
Nokia and marketing types first to get the bullet, says report
Microsoft takes on Chromebook with low-cost Windows laptops
Redmond's chief salesman: We're taking 'hard' decisions
Cheer up, Nokia fans. It can start making mobes again in 18 months
The real winner of the Nokia sale is *drumroll* ... Nokia
Mozilla fixes CRITICAL security holes in Firefox, urges v31 upgrade
Misc memory hazards 'could be exploited' - and guess what, one's a Javascript vuln
prev story

Whitepapers

Designing a Defense for Mobile Applications
Learn about the various considerations for defending mobile applications - from the application architecture itself to the myriad testing technologies.
How modern custom applications can spur business growth
Learn how to create, deploy and manage custom applications without consuming or expanding the need for scarce, expensive IT resources.
Reducing security risks from open source software
Follow a few strategies and your organization can gain the full benefits of open source and the cloud without compromising the security of your applications.
Boost IT visibility and business value
How building a great service catalog relieves pressure points and demonstrates the value of IT service management.
Consolidation: the foundation for IT and business transformation
In this whitepaper learn how effective consolidation of IT and business resources can enable multiple, meaningful business benefits.