Feeds

Welcome to the Petabyte Club

It's not big, it's BI+g data

Internet Security Threat Report 2014

Hype alert; hype alert; Big Data is coming our way. A new volcano has blasted its way above the surface of the marketing sea, spewing out "big data" messages in enormous flows of thought leader bullshit. What the heck is this big data thing?

EMC says it's to do with handling data at the petabyte scale, where things like compression, deduplication, thin provisioning and management facilities can become much more important because they enable large savings of cash.

Big data in EMC's mind is connected with Atmos (cloud), Greenplum (business intelligence or BI) and Isilon (scale-out NAS) and with its petabyte club customers, those with more than a petabyte of storage.

In other people's minds, such as ESG's Steve Duplessie, big data is connected to BI systems and use of the phrase is a good marketing tactic drawing attention to Oracle and obstructing an Exadata marketing strategy.

The backdrop is that Oracle is trying to get more of the data warehousing/BI pie at the expense of Teradata, IBM-acquired Netezza, and others. What it's done in the usual Oracle way is to cut costs everywhere in the BI stack except for the Oracle software and so represent to customers that it saves them money. This stack, the Exadata bundle, is made of Sun servers, storage and Oracle's own software.

Duplessie has blogged that Oracle, with its complex licensing terms and conditions, is able to go to its customers, run a software audit and find they are breaking their licensing terms and conditions, and need to buy more licensed stuff. But Oracle says buy Exadata systems instead for your BI work and that licensing problem goes away. It seems amazing but is, apparently true.

Competitors like EMC, HP and IBM are now using the "big data" idea to alert potential customers to the need to think differently about petabyte-class storage applications and to persuade Oracle customers that don't have to play ball with Oracle by default; their are alternatives, such as IBM/Netezza or EMC/Greenplum.

Is BI big data different from the petabyte data seen in film post-product work and in the oil and gas industry's seismic data? In sheer size terms, no, but in data characteristic terms, yes.

BI data is held in databases and based on transactions. It is copied data, not original, and often held in storage area networks (SANs) with block access. The media and seismic-type data is not transaction based and is original data, often stored in files, in NAS (network-attached storage) arrays, typically called scale-out NAS.

Oracle does not have an offering in this space. EMC (Isilon), IBM (SONAS), and HP (Ibrix) do, as does BlueArc and DataDirect Networks. These products are often about parallel access to files.

BI big data is susceptible to being stored and analysed in a single integrated system, like Exadata, or like a vBlock Greenplum bundle. File-based big data has not been treated in the same way, there being no file-based equivalent of an Exadata box or a vBlock Greenplum system. That may be because multiple end-user systems work on the data and not a single, multi-core, multi-processor server. Also there are no single, dominating application types here in the same way as an analytics app working on BI data.

When people - suppliers - talk about big data ask if they are talking about data analytics big data (BI+g data) or file-based big data. It makes a difference in terms of the product pitches that come your way.

Oddly, no-one yet is talking much about compressing and deduplicating big data. Duplessie mentioned this in an Infosmack podcast. It's odd because such data reduction would have a huge pay-off in disk capacity purchase terms.

My presumption is that this deduplication blind spot is due to performance concerns. But Rainstor (Clearpace as was) can deduplicate and reduce databases in size. Permabit's marketing message about its Albireo software is that it can work its data reduction magic without affecting performance. BlueArc has a license for it.

Big data will only going to get bigger; transactions just accumulate and never get thrown away, being digital spoil heaps that can be mined for ever. High-definition, computer graphic-enhanced movies seem to get larger and larger too. Ways to lower the cost per petabyte of storing the stuff and managing it will surely become vastly more important.

A last thought; where is Dell in big data? It's relatively nowhere, and observers are suggesting it might buy Aster Data to stake its claim in the big data gold rush. ®

Beginner's guide to SSL certificates

More from The Register

next story
Docker's app containers are coming to Windows Server, says Microsoft
MS chases app deployment speeds already enjoyed by Linux devs
'Hmm, why CAN'T I run a water pipe through that rack of media servers?'
Leaving Las Vegas for Armenia kludging and Dubai dune bashing
'Urika': Cray unveils new 1,500-core big data crunching monster
6TB of DRAM, 38TB of SSD flash and 120TB of disk storage
Facebook slurps 'paste sites' for STOLEN passwords, sprinkles on hash and salt
Zuck's ad empire DOESN'T see details in plain text. Phew!
SDI wars: WTF is software defined infrastructure?
This time we play for ALL the marbles
Windows 10: Forget Cloudobile, put Security and Privacy First
But - dammit - It would be insane to say 'don't collect, because NSA'
Oracle hires former SAP exec for cloudy push
'We know Larry said cloud was gibberish, and insane, and idiotic, but...'
prev story

Whitepapers

Forging a new future with identity relationship management
Learn about ForgeRock's next generation IRM platform and how it is designed to empower CEOS's and enterprises to engage with consumers.
Why cloud backup?
Combining the latest advancements in disk-based backup with secure, integrated, cloud technologies offer organizations fast and assured recovery of their critical enterprise data.
Win a year’s supply of chocolate
There is no techie angle to this competition so we're not going to pretend there is, but everyone loves chocolate so who cares.
High Performance for All
While HPC is not new, it has traditionally been seen as a specialist area – is it now geared up to meet more mainstream requirements?
Intelligent flash storage arrays
Tegile Intelligent Storage Arrays with IntelliFlash helps IT boost storage utilization and effciency while delivering unmatched storage savings and performance.