Feeds

Welcome to the Petabyte Club

It's not big, it's BI+g data

Internet Security Threat Report 2014

Hype alert; hype alert; Big Data is coming our way. A new volcano has blasted its way above the surface of the marketing sea, spewing out "big data" messages in enormous flows of thought leader bullshit. What the heck is this big data thing?

EMC says it's to do with handling data at the petabyte scale, where things like compression, deduplication, thin provisioning and management facilities can become much more important because they enable large savings of cash.

Big data in EMC's mind is connected with Atmos (cloud), Greenplum (business intelligence or BI) and Isilon (scale-out NAS) and with its petabyte club customers, those with more than a petabyte of storage.

In other people's minds, such as ESG's Steve Duplessie, big data is connected to BI systems and use of the phrase is a good marketing tactic drawing attention to Oracle and obstructing an Exadata marketing strategy.

The backdrop is that Oracle is trying to get more of the data warehousing/BI pie at the expense of Teradata, IBM-acquired Netezza, and others. What it's done in the usual Oracle way is to cut costs everywhere in the BI stack except for the Oracle software and so represent to customers that it saves them money. This stack, the Exadata bundle, is made of Sun servers, storage and Oracle's own software.

Duplessie has blogged that Oracle, with its complex licensing terms and conditions, is able to go to its customers, run a software audit and find they are breaking their licensing terms and conditions, and need to buy more licensed stuff. But Oracle says buy Exadata systems instead for your BI work and that licensing problem goes away. It seems amazing but is, apparently true.

Competitors like EMC, HP and IBM are now using the "big data" idea to alert potential customers to the need to think differently about petabyte-class storage applications and to persuade Oracle customers that don't have to play ball with Oracle by default; their are alternatives, such as IBM/Netezza or EMC/Greenplum.

Is BI big data different from the petabyte data seen in film post-product work and in the oil and gas industry's seismic data? In sheer size terms, no, but in data characteristic terms, yes.

BI data is held in databases and based on transactions. It is copied data, not original, and often held in storage area networks (SANs) with block access. The media and seismic-type data is not transaction based and is original data, often stored in files, in NAS (network-attached storage) arrays, typically called scale-out NAS.

Oracle does not have an offering in this space. EMC (Isilon), IBM (SONAS), and HP (Ibrix) do, as does BlueArc and DataDirect Networks. These products are often about parallel access to files.

BI big data is susceptible to being stored and analysed in a single integrated system, like Exadata, or like a vBlock Greenplum bundle. File-based big data has not been treated in the same way, there being no file-based equivalent of an Exadata box or a vBlock Greenplum system. That may be because multiple end-user systems work on the data and not a single, multi-core, multi-processor server. Also there are no single, dominating application types here in the same way as an analytics app working on BI data.

When people - suppliers - talk about big data ask if they are talking about data analytics big data (BI+g data) or file-based big data. It makes a difference in terms of the product pitches that come your way.

Oddly, no-one yet is talking much about compressing and deduplicating big data. Duplessie mentioned this in an Infosmack podcast. It's odd because such data reduction would have a huge pay-off in disk capacity purchase terms.

My presumption is that this deduplication blind spot is due to performance concerns. But Rainstor (Clearpace as was) can deduplicate and reduce databases in size. Permabit's marketing message about its Albireo software is that it can work its data reduction magic without affecting performance. BlueArc has a license for it.

Big data will only going to get bigger; transactions just accumulate and never get thrown away, being digital spoil heaps that can be mined for ever. High-definition, computer graphic-enhanced movies seem to get larger and larger too. Ways to lower the cost per petabyte of storing the stuff and managing it will surely become vastly more important.

A last thought; where is Dell in big data? It's relatively nowhere, and observers are suggesting it might buy Aster Data to stake its claim in the big data gold rush. ®

Internet Security Threat Report 2014

More from The Register

next story
Docker's app containers are coming to Windows Server, says Microsoft
MS chases app deployment speeds already enjoyed by Linux devs
IBM storage revenues sink: 'We are disappointed,' says CEO
Time to put the storage biz up for sale?
'Hmm, why CAN'T I run a water pipe through that rack of media servers?'
Leaving Las Vegas for Armenia kludging and Dubai dune bashing
Facebook slurps 'paste sites' for STOLEN passwords, sprinkles on hash and salt
Zuck's ad empire DOESN'T see details in plain text. Phew!
Windows 10: Forget Cloudobile, put Security and Privacy First
But - dammit - It would be insane to say 'don't collect, because NSA'
Symantec backs out of Backup Exec: Plans to can appliance in Jan
Will still provide support to existing customers
VMware's tool to harden virtual networks: a spreadsheet
NSX security guide lands in intriguing format
prev story

Whitepapers

Forging a new future with identity relationship management
Learn about ForgeRock's next generation IRM platform and how it is designed to empower CEOS's and enterprises to engage with consumers.
Cloud and hybrid-cloud data protection for VMware
Learn how quick and easy it is to configure backups and perform restores for VMware environments.
Three 1TB solid state scorchers up for grabs
Big SSDs can be expensive but think big and think free because you could be the lucky winner of one of three 1TB Samsung SSD 840 EVO drives that we’re giving away worth over £300 apiece.
Reg Reader Research: SaaS based Email and Office Productivity Tools
Read this Reg reader report which provides advice and guidance for SMBs towards the use of SaaS based email and Office productivity tools.
Security for virtualized datacentres
Legacy security solutions are inefficient due to the architectural differences between physical and virtual environments.