Feeds

Isilon and a question of Big Data

Or was that ingestion?

Internet Security Threat Report 2014

Interview Xiotech technology VP Rob Peglar has moved to Isilon, now an EMC business, to become chief technology officer (CTO) for the Americas.

We interviewed Rob and asked him questions that reveal quite a lot about Isilon's prospects, big data, the role of flash in scale-out filers, reduplication and Isilon, and what we should think about archiving data from Isilon clusters.

El REg Why did you join Isilon?

Rob Peglar: Primarily, for a personal reason - to take the CTO Americas role.  Secondarily, significant parts of the industry are moving towards greater use of file-based storage and the resultant use (gathering, analysis, reduction) of data stored in files. Isilon is an innovator and leader in that space and I joined to help end users realize new capabilities in their use of file data as well as be a key participant in the next generation(s) of file-based storage architectures.

El Reg What does the CTO Americas do that's different from the overall CTO?

Rob Peglar: CTO Americas role is an allied position to the corporate CTO (Paul Rutherford).  Isilon has a thrice-distributed CTO function in world geographies; Americas (basically, the Western Hemisphere), EAME and Asia-Pacific (AP). These roles have an outward (i.e. towards end users and channels) function as well as an inward (i.e. towards products, roadmap, strategy, engineering, etc.) function. In my role, I will be facing customers and channels to give them a thorough understand of not only what Isilon does, how and why we do it, and so on, but also higher-level industry trends, techniques, technologies, and executive-level briefings on the strategic implications of file data to businesses and organizations.

El Reg Is big data in general different from big data in the HPC world and, if so, how?

Rob Peglar: In general, it is. While there are some similarities – both being unstructured data, for example – there are typically differences between big data in the commercial/business world and big data in the traditional HPC/supercomputing world. I am fortunate to have experience in both worlds, dating back to 1978 on the traditional HPC side. HPC typically involves the analysis of very large but ‘fixed’ sets of data, i.e. a dataset describing an initial condition. That data is then ingested and subjected to an iterative process, typically a very large job which simulates and analyzes the forward-in-time progress of the computation, performing a certain computational model based on the initial condition.

During the job, large intermediate files are produced to save the job’s state and its data at a given time step. This process is often referred to as ‘checkpointing’.  Checkpoints are taken because HPC jobs may run for weeks at a time; restarting a job from its initial condition is to be avoided, for all the obvious reasons. The end result of the HPC job may actually be very little data; just a set of results or a visualisation, computed over a given time interval. Or, the net result may be another very large dataset which would then in turn undergo yet another set of analysis, perhaps by a different job.

Contrast this with commercial/business ‘big data’ as being generated and stored by what I call ‘constantly running’ applications, e.g. web hits, cookie-based widgets, error logs, transaction logs, streaming apps, and the like. This kind of data, while unstructured like its HPC cousins, is constantly changing and being appended to by the outside world.

Data analysis jobs in this world typically take a ‘chunk’ of this big data and attempt to reduce it for specific analysis, pattern matching, searching, and/or general data mining, seeking to understand the data itself for a business purpose. The key to this kind of big data is that it’s constantly evolving, whereas data in the HPC world typically doesn’t. Both types of big data, however, require large, reliable and – the seminal characteristic by far – scalable storage.

Internet Security Threat Report 2014

More from The Register

next story
Docker's app containers are coming to Windows Server, says Microsoft
MS chases app deployment speeds already enjoyed by Linux devs
IBM storage revenues sink: 'We are disappointed,' says CEO
Time to put the storage biz up for sale?
'Hmm, why CAN'T I run a water pipe through that rack of media servers?'
Leaving Las Vegas for Armenia kludging and Dubai dune bashing
Facebook slurps 'paste sites' for STOLEN passwords, sprinkles on hash and salt
Zuck's ad empire DOESN'T see details in plain text. Phew!
SDI wars: WTF is software defined infrastructure?
This time we play for ALL the marbles
Windows 10: Forget Cloudobile, put Security and Privacy First
But - dammit - It would be insane to say 'don't collect, because NSA'
Oracle hires former SAP exec for cloudy push
'We know Larry said cloud was gibberish, and insane, and idiotic, but...'
prev story

Whitepapers

Forging a new future with identity relationship management
Learn about ForgeRock's next generation IRM platform and how it is designed to empower CEOS's and enterprises to engage with consumers.
Cloud and hybrid-cloud data protection for VMware
Learn how quick and easy it is to configure backups and perform restores for VMware environments.
Three 1TB solid state scorchers up for grabs
Big SSDs can be expensive but think big and think free because you could be the lucky winner of one of three 1TB Samsung SSD 840 EVO drives that we’re giving away worth over £300 apiece.
Reg Reader Research: SaaS based Email and Office Productivity Tools
Read this Reg reader report which provides advice and guidance for SMBs towards the use of SaaS based email and Office productivity tools.
Security for virtualized datacentres
Legacy security solutions are inefficient due to the architectural differences between physical and virtual environments.