Feeds

Isilon and a question of Big Data

Or was that ingestion?

High performance access to file storage

Interview Xiotech technology VP Rob Peglar has moved to Isilon, now an EMC business, to become chief technology officer (CTO) for the Americas.

We interviewed Rob and asked him questions that reveal quite a lot about Isilon's prospects, big data, the role of flash in scale-out filers, reduplication and Isilon, and what we should think about archiving data from Isilon clusters.

El REg Why did you join Isilon?

Rob Peglar: Primarily, for a personal reason - to take the CTO Americas role.  Secondarily, significant parts of the industry are moving towards greater use of file-based storage and the resultant use (gathering, analysis, reduction) of data stored in files. Isilon is an innovator and leader in that space and I joined to help end users realize new capabilities in their use of file data as well as be a key participant in the next generation(s) of file-based storage architectures.

El Reg What does the CTO Americas do that's different from the overall CTO?

Rob Peglar: CTO Americas role is an allied position to the corporate CTO (Paul Rutherford).  Isilon has a thrice-distributed CTO function in world geographies; Americas (basically, the Western Hemisphere), EAME and Asia-Pacific (AP). These roles have an outward (i.e. towards end users and channels) function as well as an inward (i.e. towards products, roadmap, strategy, engineering, etc.) function. In my role, I will be facing customers and channels to give them a thorough understand of not only what Isilon does, how and why we do it, and so on, but also higher-level industry trends, techniques, technologies, and executive-level briefings on the strategic implications of file data to businesses and organizations.

El Reg Is big data in general different from big data in the HPC world and, if so, how?

Rob Peglar: In general, it is. While there are some similarities – both being unstructured data, for example – there are typically differences between big data in the commercial/business world and big data in the traditional HPC/supercomputing world. I am fortunate to have experience in both worlds, dating back to 1978 on the traditional HPC side. HPC typically involves the analysis of very large but ‘fixed’ sets of data, i.e. a dataset describing an initial condition. That data is then ingested and subjected to an iterative process, typically a very large job which simulates and analyzes the forward-in-time progress of the computation, performing a certain computational model based on the initial condition.

During the job, large intermediate files are produced to save the job’s state and its data at a given time step. This process is often referred to as ‘checkpointing’.  Checkpoints are taken because HPC jobs may run for weeks at a time; restarting a job from its initial condition is to be avoided, for all the obvious reasons. The end result of the HPC job may actually be very little data; just a set of results or a visualisation, computed over a given time interval. Or, the net result may be another very large dataset which would then in turn undergo yet another set of analysis, perhaps by a different job.

Contrast this with commercial/business ‘big data’ as being generated and stored by what I call ‘constantly running’ applications, e.g. web hits, cookie-based widgets, error logs, transaction logs, streaming apps, and the like. This kind of data, while unstructured like its HPC cousins, is constantly changing and being appended to by the outside world.

Data analysis jobs in this world typically take a ‘chunk’ of this big data and attempt to reduce it for specific analysis, pattern matching, searching, and/or general data mining, seeking to understand the data itself for a business purpose. The key to this kind of big data is that it’s constantly evolving, whereas data in the HPC world typically doesn’t. Both types of big data, however, require large, reliable and – the seminal characteristic by far – scalable storage.

High performance access to file storage

More from The Register

next story
Seagate brings out 6TB HDD, did not need NO STEENKIN' SHINGLES
Or helium filling either, according to reports
European Court of Justice rips up Data Retention Directive
Rules 'interfering' measure to be 'invalid'
Dropbox defends fantastically badly timed Condoleezza Rice appointment
'Nothing is going to change with Dr. Rice's appointment,' file sharer promises
Cisco reps flog Whiptail's Invicta arrays against EMC and Pure
Storage reseller report reveals who's selling what
Just what could be inside Dropbox's new 'Home For Life'?
Biz apps, messaging, photos, email, more storage – sorry, did you think there would be cake?
IT bods: How long does it take YOU to train up on new tech?
I'll leave my arrays to do the hard work, if you don't mind
Amazon reveals its Google-killing 'R3' server instances
A mega-memory instance that never forgets
USA opposes 'Schengen cloud' Eurocentric routing plan
All routes should transit America, apparently
prev story

Whitepapers

Mainstay ROI - Does application security pay?
In this whitepaper learn how you and your enterprise might benefit from better software security.
Five 3D headsets to be won!
We were so impressed by the Durovis Dive headset we’ve asked the company to give some away to Reg readers.
3 Big data security analytics techniques
Applying these Big Data security analytics techniques can help you make your business safer by detecting attacks early, before significant damage is done.
The benefits of software based PBX
Why you should break free from your proprietary PBX and how to leverage your existing server hardware.
Mobile application security study
Download this report to see the alarming realities regarding the sheer number of applications vulnerable to attack, as well as the most common and easily addressable vulnerability errors.