Feeds

Welcome to the Petabyte Club

It's not big, it's BI+g data

Remote control for virtualized desktops

Hype alert; hype alert; Big Data is coming our way. A new volcano has blasted its way above the surface of the marketing sea, spewing out "big data" messages in enormous flows of thought leader bullshit. What the heck is this big data thing?

EMC says it's to do with handling data at the petabyte scale, where things like compression, deduplication, thin provisioning and management facilities can become much more important because they enable large savings of cash.

Big data in EMC's mind is connected with Atmos (cloud), Greenplum (business intelligence or BI) and Isilon (scale-out NAS) and with its petabyte club customers, those with more than a petabyte of storage.

In other people's minds, such as ESG's Steve Duplessie, big data is connected to BI systems and use of the phrase is a good marketing tactic drawing attention to Oracle and obstructing an Exadata marketing strategy.

The backdrop is that Oracle is trying to get more of the data warehousing/BI pie at the expense of Teradata, IBM-acquired Netezza, and others. What it's done in the usual Oracle way is to cut costs everywhere in the BI stack except for the Oracle software and so represent to customers that it saves them money. This stack, the Exadata bundle, is made of Sun servers, storage and Oracle's own software.

Duplessie has blogged that Oracle, with its complex licensing terms and conditions, is able to go to its customers, run a software audit and find they are breaking their licensing terms and conditions, and need to buy more licensed stuff. But Oracle says buy Exadata systems instead for your BI work and that licensing problem goes away. It seems amazing but is, apparently true.

Competitors like EMC, HP and IBM are now using the "big data" idea to alert potential customers to the need to think differently about petabyte-class storage applications and to persuade Oracle customers that don't have to play ball with Oracle by default; their are alternatives, such as IBM/Netezza or EMC/Greenplum.

Is BI big data different from the petabyte data seen in film post-product work and in the oil and gas industry's seismic data? In sheer size terms, no, but in data characteristic terms, yes.

BI data is held in databases and based on transactions. It is copied data, not original, and often held in storage area networks (SANs) with block access. The media and seismic-type data is not transaction based and is original data, often stored in files, in NAS (network-attached storage) arrays, typically called scale-out NAS.

Oracle does not have an offering in this space. EMC (Isilon), IBM (SONAS), and HP (Ibrix) do, as does BlueArc and DataDirect Networks. These products are often about parallel access to files.

BI big data is susceptible to being stored and analysed in a single integrated system, like Exadata, or like a vBlock Greenplum bundle. File-based big data has not been treated in the same way, there being no file-based equivalent of an Exadata box or a vBlock Greenplum system. That may be because multiple end-user systems work on the data and not a single, multi-core, multi-processor server. Also there are no single, dominating application types here in the same way as an analytics app working on BI data.

When people - suppliers - talk about big data ask if they are talking about data analytics big data (BI+g data) or file-based big data. It makes a difference in terms of the product pitches that come your way.

Oddly, no-one yet is talking much about compressing and deduplicating big data. Duplessie mentioned this in an Infosmack podcast. It's odd because such data reduction would have a huge pay-off in disk capacity purchase terms.

My presumption is that this deduplication blind spot is due to performance concerns. But Rainstor (Clearpace as was) can deduplicate and reduce databases in size. Permabit's marketing message about its Albireo software is that it can work its data reduction magic without affecting performance. BlueArc has a license for it.

Big data will only going to get bigger; transactions just accumulate and never get thrown away, being digital spoil heaps that can be mined for ever. High-definition, computer graphic-enhanced movies seem to get larger and larger too. Ways to lower the cost per petabyte of storing the stuff and managing it will surely become vastly more important.

A last thought; where is Dell in big data? It's relatively nowhere, and observers are suggesting it might buy Aster Data to stake its claim in the big data gold rush. ®

Choosing a cloud hosting partner with confidence

More from The Register

next story
729 teraflops, 71,000-core Super cost just US$5,500 to build
Cloud doubters, this isn't going to be your best day
Want to STUFF Facebook with blatant ADVERTISING? Fine! But you must PAY
Pony up or push off, Zuck tells social marketeers
Oi, Europe! Tell US feds to GTFO of our servers, say Microsoft and pals
By writing a really angry letter about how it's harming our cloud business, ta
SAVE ME, NASA system builder, from my DEAD WORKSTATION
Anal-retentive hardware nerd in paws-on workstation crisis
Microsoft adds video offering to Office 365. Oh NOES, you'll need Adobe Flash
Lovely presentations... but not on your Flash-hating mobe
prev story

Whitepapers

Choosing cloud Backup services
Demystify how you can address your data protection needs in your small- to medium-sized business and select the best online backup service to meet your needs.
Getting started with customer-focused identity management
Learn why identity is a fundamental requirement to digital growth, and how without it there is no way to identify and engage customers in a meaningful way.
Reg Reader Research: SaaS based Email and Office Productivity Tools
Read this Reg reader report which provides advice and guidance for SMBs towards the use of SaaS based email and Office productivity tools.
Simplify SSL certificate management across the enterprise
Simple steps to take control of SSL across the enterprise, and recommendations for a management platform for full visibility and single-point of control for these Certificates.
Saudi Petroleum chooses Tegile storage solution
A storage solution that addresses company growth and performance for business-critical applications of caseware archive and search along with other key operational systems.