Feeds

EMC cranks Greenplum database to 4.2

Goosing Hadoop links and warehouse backups

SANS - Survey on application security programs

Big data is not just a problem because it is big, but because it keeps swelling. That goes as much for traditional data warehouses as it does for more modern Hadoop MapReduce data munchers. And with the latest update of its eponymous database, the Greenplum division of IT conglomerate EMC has made some tweaks to its homegrown database to make wrestling with big data a bit easier.

The Greenplum Database is available in two forms, just like its predecessor. One runs on Greenplum's own hardware appliance (which is based on hardware from an unspecified server OEM partner), and the other is a software-only distribution that customers can run on any x86 server machine that supports Red Hat Enterprise Linux, Oracle Solaris, or Apple's OS X.

The Greenplum database is a parallelized and heavily customized version of the open source PostgreSQL database, and has been optimized for ad hoc queries instead of transaction processing. It is a massively parallel, shared-nothing database and has "polymorphic data storage" to allow database administrators to carve up a range of data in a database table and choice the row or column orientation a query should use as well as what storage, execution, or compression settings that should apply to this segment of data.

Like other data warehouse engines, the Greenplum Database is a heavy user of data compression to speed up queries and reduce disk storage capacity needs.

Greenplum's Hadoop distributions are similarly available on the same hardware appliances – with some tweaks – as well as a software-only product that can run on any Linux-based x86 server.

Back in December, Greenplum unveiled its long-range plan to mashup its data warehousing and Hadoop stacks to create a giant data muncher called the Unified Analytics Platform.

Greenplum Database 4.2

The building blocks of the Greenplum Database

With Greenplum Database 4.2, the EMC unit is doing a few different things. First, on as it promised back in December, Greenplum has tweaked its parallel data warehouse loading technology, called gNET, so it can import and export data in parallel from a warehouse to a Hadoop cluster.

Equally significant is that the gNET feature in the 4.2 release of the relational database actually allows for gNET to reach into the Hadoop cluster and query data right where it is sitting, using some of the Hadoop cluster resources instead of burdening the iron running the data warehouse.

"This used to be a read-only tool," explains Mike Maxey, senior director of product marketing at the Greenplum. "Now it leaves more data in Hadoop and does more processing inside Hadoop."

Greenplum Database 4.2 also includes a new management console called Command Center, which replaces an older tool called PerfMon that database admins have been using up until now. Maxey says Command Center, unlike PerfMon, is a Web-based tool and has more functions that database admins have been looking for, such as the ability to start, stop and initialize databases on the fly, recover and rebalance database mirrors, and search, prioritize, or cancel any query on the system.

Command Center is also able to reach out across the network into a Greenplum HD or MR Hadoop cluster and check the state of the cluster from inside this console. "Over time, Command Center will evolve to have broader and deeper coverage of the database and Hadoop platforms," says Maxey.

The initial release of Command Center is available initially with the Data Computing Appliance 1.2 system, and will eventually be available in the software-only distribution.

The 4.2 release of the database has the requisite performance tweaks, including dynamic partition elimination and query memory optimization. The database also has a new package manager that does automatic installation and updating of extensions to the database on a running system with multiple nodes and different features running hither and yon.

Finally, EMC has integrated its Data Domain Boost data de-duplication backup software with Greenplum Database 4.2. In benchmark tests, EMC was able to back up a 173TB data warehouse in under less than eight hours. This was achieved by spreading parts of the Data Domain de-duping operations over the data warehouse nodes in an appliance, thus parallelizing the massive job and making the backup run faster because the de-duping was faster.

At the Strata Conference today in Santa Clara, California, in addition to launching the new database release, Greenplum is also talking up its ability to run Greenplum MR Hadoop atop Cisco Systems' C-Series rack-based servers. El Reg already told you all about that two weeks ago. ®

3 Big data security analytics techniques

More from The Register

next story
Ubuntu 14.04 LTS: Great changes, but sssh don't mention the...
Why HELLO Amazon! You weren't here last time
This time it's 'Personal': new Office 365 sub covers just two devices
Redmond also brings Office into Google's back yard
Next Windows obsolescence panic is 450 days from … NOW!
The clock is ticking louder for Windows Server 2003 R2 users
Half of Twitter's 'active users' are SILENT STALKERS
Nearly 50% have NEVER tweeted a word
OpenBSD founder wants to bin buggy OpenSSL library, launches fork
One Heartbleed vuln was too many for Theo de Raadt
Got Windows 8.1 Update yet? Get ready for YET ANOTHER ONE – rumor
Leaker claims big release due this fall as Microsoft herds us into the CLOUD
Batten down the hatches, Ubuntu 14.04 LTS due in TWO DAYS
Admins dab straining server brows in advance of Trusty Tahr's long-term support landing
Red Hat to ship RHEL 7 release candidate with a taste of container tech
Grab 'near-final' version of next Enterprise Linux next week
Apple inaugurates free OS X beta program for world+dog
Prerelease software now open to anyone, not just developers – as long as you keep quiet
prev story

Whitepapers

Mobile application security study
Download this report to see the alarming realities regarding the sheer number of applications vulnerable to attack, as well as the most common and easily addressable vulnerability errors.
3 Big data security analytics techniques
Applying these Big Data security analytics techniques can help you make your business safer by detecting attacks early, before significant damage is done.
The benefits of software based PBX
Why you should break free from your proprietary PBX and how to leverage your existing server hardware.
Securing web applications made simple and scalable
In this whitepaper learn how automated security testing can provide a simple and scalable way to protect your web applications.
Combat fraud and increase customer satisfaction
Based on their experience using HP ArcSight Enterprise Security Manager for IT security operations, Finansbank moved to HP ArcSight ESM for fraud management.