Feeds

Welcome to the Petabyte Club

It's not big, it's BI+g data

Top 5 reasons to deploy VMware with Tegile

Hype alert; hype alert; Big Data is coming our way. A new volcano has blasted its way above the surface of the marketing sea, spewing out "big data" messages in enormous flows of thought leader bullshit. What the heck is this big data thing?

EMC says it's to do with handling data at the petabyte scale, where things like compression, deduplication, thin provisioning and management facilities can become much more important because they enable large savings of cash.

Big data in EMC's mind is connected with Atmos (cloud), Greenplum (business intelligence or BI) and Isilon (scale-out NAS) and with its petabyte club customers, those with more than a petabyte of storage.

In other people's minds, such as ESG's Steve Duplessie, big data is connected to BI systems and use of the phrase is a good marketing tactic drawing attention to Oracle and obstructing an Exadata marketing strategy.

The backdrop is that Oracle is trying to get more of the data warehousing/BI pie at the expense of Teradata, IBM-acquired Netezza, and others. What it's done in the usual Oracle way is to cut costs everywhere in the BI stack except for the Oracle software and so represent to customers that it saves them money. This stack, the Exadata bundle, is made of Sun servers, storage and Oracle's own software.

Duplessie has blogged that Oracle, with its complex licensing terms and conditions, is able to go to its customers, run a software audit and find they are breaking their licensing terms and conditions, and need to buy more licensed stuff. But Oracle says buy Exadata systems instead for your BI work and that licensing problem goes away. It seems amazing but is, apparently true.

Competitors like EMC, HP and IBM are now using the "big data" idea to alert potential customers to the need to think differently about petabyte-class storage applications and to persuade Oracle customers that don't have to play ball with Oracle by default; their are alternatives, such as IBM/Netezza or EMC/Greenplum.

Is BI big data different from the petabyte data seen in film post-product work and in the oil and gas industry's seismic data? In sheer size terms, no, but in data characteristic terms, yes.

BI data is held in databases and based on transactions. It is copied data, not original, and often held in storage area networks (SANs) with block access. The media and seismic-type data is not transaction based and is original data, often stored in files, in NAS (network-attached storage) arrays, typically called scale-out NAS.

Oracle does not have an offering in this space. EMC (Isilon), IBM (SONAS), and HP (Ibrix) do, as does BlueArc and DataDirect Networks. These products are often about parallel access to files.

BI big data is susceptible to being stored and analysed in a single integrated system, like Exadata, or like a vBlock Greenplum bundle. File-based big data has not been treated in the same way, there being no file-based equivalent of an Exadata box or a vBlock Greenplum system. That may be because multiple end-user systems work on the data and not a single, multi-core, multi-processor server. Also there are no single, dominating application types here in the same way as an analytics app working on BI data.

When people - suppliers - talk about big data ask if they are talking about data analytics big data (BI+g data) or file-based big data. It makes a difference in terms of the product pitches that come your way.

Oddly, no-one yet is talking much about compressing and deduplicating big data. Duplessie mentioned this in an Infosmack podcast. It's odd because such data reduction would have a huge pay-off in disk capacity purchase terms.

My presumption is that this deduplication blind spot is due to performance concerns. But Rainstor (Clearpace as was) can deduplicate and reduce databases in size. Permabit's marketing message about its Albireo software is that it can work its data reduction magic without affecting performance. BlueArc has a license for it.

Big data will only going to get bigger; transactions just accumulate and never get thrown away, being digital spoil heaps that can be mined for ever. High-definition, computer graphic-enhanced movies seem to get larger and larger too. Ways to lower the cost per petabyte of storing the stuff and managing it will surely become vastly more important.

A last thought; where is Dell in big data? It's relatively nowhere, and observers are suggesting it might buy Aster Data to stake its claim in the big data gold rush. ®

Choosing a cloud hosting partner with confidence

More from The Register

next story
Just don't blame Bono! Apple iTunes music sales PLUMMET
Cupertino revenue hit by cheapo downloads, says report
The DRUGSTORES DON'T WORK, CVS makes IT WORSE ... for Apple Pay
Goog Wallet apparently also spurned in NFC lockdown
Hey - who wants 4.8 TERABYTES almost AS FAST AS MEMORY?
China's Memblaze says they've got it in PCIe. Yow
Cray-cray Met Office spaffs £97m on VERY AVERAGE HPC box
Only 250th most powerful in the world? Bring back Michael Fish
Microsoft brings the CLOUD that GOES ON FOREVER
Sky's the limit with unrestricted space in the cloud
'ANYTHING BUT STABLE' Netflix suffers BIG Europe-wide outage
Friday night LIVE? Nope. The only thing streaming are tears down my face
IBM, backing away from hardware? NEVER!
Don't be so sure, so-surers
Google roolz! Nest buys Revolv, KILLS new sales of home hub
Take my temperature, I'm feeling a little bit dizzy
prev story

Whitepapers

Cloud and hybrid-cloud data protection for VMware
Learn how quick and easy it is to configure backups and perform restores for VMware environments.
Getting started with customer-focused identity management
Learn why identity is a fundamental requirement to digital growth, and how without it there is no way to identify and engage customers in a meaningful way.
High Performance for All
While HPC is not new, it has traditionally been seen as a specialist area – is it now geared up to meet more mainstream requirements?
Storage capacity and performance optimization at Mizuno USA
Mizuno USA turn to Tegile storage technology to solve both their SAN and backup issues.
Protecting against web application threats using SSL
SSL encryption can protect server‐to‐server communications, client devices, cloud resources, and other endpoints in order to help prevent the risk of data loss and losing customer trust.