Feeds

Welcome to the Petabyte Club

It's not big, it's BI+g data

Internet Security Threat Report 2014

Hype alert; hype alert; Big Data is coming our way. A new volcano has blasted its way above the surface of the marketing sea, spewing out "big data" messages in enormous flows of thought leader bullshit. What the heck is this big data thing?

EMC says it's to do with handling data at the petabyte scale, where things like compression, deduplication, thin provisioning and management facilities can become much more important because they enable large savings of cash.

Big data in EMC's mind is connected with Atmos (cloud), Greenplum (business intelligence or BI) and Isilon (scale-out NAS) and with its petabyte club customers, those with more than a petabyte of storage.

In other people's minds, such as ESG's Steve Duplessie, big data is connected to BI systems and use of the phrase is a good marketing tactic drawing attention to Oracle and obstructing an Exadata marketing strategy.

The backdrop is that Oracle is trying to get more of the data warehousing/BI pie at the expense of Teradata, IBM-acquired Netezza, and others. What it's done in the usual Oracle way is to cut costs everywhere in the BI stack except for the Oracle software and so represent to customers that it saves them money. This stack, the Exadata bundle, is made of Sun servers, storage and Oracle's own software.

Duplessie has blogged that Oracle, with its complex licensing terms and conditions, is able to go to its customers, run a software audit and find they are breaking their licensing terms and conditions, and need to buy more licensed stuff. But Oracle says buy Exadata systems instead for your BI work and that licensing problem goes away. It seems amazing but is, apparently true.

Competitors like EMC, HP and IBM are now using the "big data" idea to alert potential customers to the need to think differently about petabyte-class storage applications and to persuade Oracle customers that don't have to play ball with Oracle by default; their are alternatives, such as IBM/Netezza or EMC/Greenplum.

Is BI big data different from the petabyte data seen in film post-product work and in the oil and gas industry's seismic data? In sheer size terms, no, but in data characteristic terms, yes.

BI data is held in databases and based on transactions. It is copied data, not original, and often held in storage area networks (SANs) with block access. The media and seismic-type data is not transaction based and is original data, often stored in files, in NAS (network-attached storage) arrays, typically called scale-out NAS.

Oracle does not have an offering in this space. EMC (Isilon), IBM (SONAS), and HP (Ibrix) do, as does BlueArc and DataDirect Networks. These products are often about parallel access to files.

BI big data is susceptible to being stored and analysed in a single integrated system, like Exadata, or like a vBlock Greenplum bundle. File-based big data has not been treated in the same way, there being no file-based equivalent of an Exadata box or a vBlock Greenplum system. That may be because multiple end-user systems work on the data and not a single, multi-core, multi-processor server. Also there are no single, dominating application types here in the same way as an analytics app working on BI data.

When people - suppliers - talk about big data ask if they are talking about data analytics big data (BI+g data) or file-based big data. It makes a difference in terms of the product pitches that come your way.

Oddly, no-one yet is talking much about compressing and deduplicating big data. Duplessie mentioned this in an Infosmack podcast. It's odd because such data reduction would have a huge pay-off in disk capacity purchase terms.

My presumption is that this deduplication blind spot is due to performance concerns. But Rainstor (Clearpace as was) can deduplicate and reduce databases in size. Permabit's marketing message about its Albireo software is that it can work its data reduction magic without affecting performance. BlueArc has a license for it.

Big data will only going to get bigger; transactions just accumulate and never get thrown away, being digital spoil heaps that can be mined for ever. High-definition, computer graphic-enhanced movies seem to get larger and larger too. Ways to lower the cost per petabyte of storing the stuff and managing it will surely become vastly more important.

A last thought; where is Dell in big data? It's relatively nowhere, and observers are suggesting it might buy Aster Data to stake its claim in the big data gold rush. ®

Top 5 reasons to deploy VMware with Tegile

More from The Register

next story
Just don't blame Bono! Apple iTunes music sales PLUMMET
Cupertino revenue hit by cheapo downloads, says report
The DRUGSTORES DON'T WORK, CVS makes IT WORSE ... for Apple Pay
Goog Wallet apparently also spurned in NFC lockdown
IBM, backing away from hardware? NEVER!
Don't be so sure, so-surers
Hey - who wants 4.8 TERABYTES almost AS FAST AS MEMORY?
China's Memblaze says they've got it in PCIe. Yow
Microsoft brings the CLOUD that GOES ON FOREVER
Sky's the limit with unrestricted space in the cloud
This time it's SO REAL: Overcoming the open-source orgasm myth with TODO
If the web giants need it to work, hey, maybe it'll work
'ANYTHING BUT STABLE' Netflix suffers BIG Europe-wide outage
Friday night LIVE? Nope. The only thing streaming are tears down my face
Google roolz! Nest buys Revolv, KILLS new sales of home hub
Take my temperature, I'm feeling a little bit dizzy
prev story

Whitepapers

Choosing cloud Backup services
Demystify how you can address your data protection needs in your small- to medium-sized business and select the best online backup service to meet your needs.
A strategic approach to identity relationship management
ForgeRock commissioned Forrester to evaluate companies’ IAM practices and requirements when it comes to customer-facing scenarios versus employee-facing ones.
High Performance for All
While HPC is not new, it has traditionally been seen as a specialist area – is it now geared up to meet more mainstream requirements?
New hybrid storage solutions
Tackling data challenges through emerging hybrid storage solutions that enable optimum database performance whilst managing costs and increasingly large data stores.
Protecting users from Firesheep and other Sidejacking attacks with SSL
Discussing the vulnerabilities inherent in Wi-Fi networks, and how using TLS/SSL for your entire site will assure security.