Feeds

Cray puts super stake in the big data ground

Crunch this

Combat fraud and increase customer satisfaction

Big data may or may not pan out for the users, but it is a bit of a boom for IT vendors, who are scrambling to prove their data analytics chops and go for the easiest money in the market these days. And to that end, supercomputer maker Cray is setting up a dedicated division to chase big data biz.

The division, called YarcData, is a bit of a private joke. YARC is an acronym that is short for "Yet Another Router Chip", and it is the architectural name that Cray slapped onto the high radix router at the heart of the experimental "BlackWidow" supercomputer. This was commercialized as none other than the "Gemini" XE interconnect inside its latest XE6 Opteron-based massively parallel supers as well as the XK6 hybrid Opteron-Tesla machines. Yarc is also Cray spelled backwards, so presumable the new division is "a tad Cray."

Cray already had a knowledge management practice, but has decided to create a proper division – pulling in employees from research and development, marketing, sales, services, and support and dedicating them towards creating and supporting hardware and software for running big data and analytics workloads (as distinct from the kinds of simulation workloads that Cray's gear generally runs).

"Cray is best known for building supercomputers that can run massive scientific and engineering simulations, and from that work we have developed unique technologies and amassed significant experience working with some of the largest data-intensive environments in the world," explained Peter Ungaro, Cray's president and CEO, in a statement announcing the new division. "This makes our entry into the big data market a natural evolution."

Cray has hired a manager from outside the company to run the division: Arvind Parthasarathi, who was named senior vice president and general manager of YarcData. Prior to joining Cray, Parthasarathi was senior vice president and general manager of Informatica's Master Data Management (MDM) business unit, and he was previously vice president of product management for the company's data quality products. (Which means, by the way, that Parthasarathi has a keen understanding of the fact that the biggest problem that big companies have with big data projects is that their information is largely garbage.)

Before joining Informatica, Parthasarathi was director of product management at i2 Technologies (now part of JDA Software), running its RFID, product information management, supply chain integration, and supply chain event management products. He started his career at Oracle, where he was a product line manager in charge of the software giant's Intel Technologies division. Parthasarathi has a BS in computer science from the Indian Institute of Technology and a MS in computer science from the Massachusetts Institute of Technology.

So here's the fun bit: Trying to figure out what Cray is actually going to do in the big data racket. Cray did not speak of such things today, of course, but here's what is obvious from El Reg's systems desk. First, Cray can build server clusters with tens of thousands of cores and wonking clustered file systems with a high-speed XE interconnect linking nodes to each other. If you could beef up a Cray XE blade with some disk drives, you could make a hell of a Hadoop cluster.

Also, the Cray Linux Environment (a variant of SUSE Linux) has a nifty feature called Cluster Compatibility Mode, which makes the XE interconnect look like a standard Ethernet controller as far as Linux applications are concerned. CLE 4.0, the latest release, supports the Java JDK 1.6.0 and can therefore run Java applications.

And the Hadoop MapReduce algorithm and its HDFS file system is a humongous Java app. At the moment, Hadoop tops out at around 4,000 nodes maximum, and Cray could certainly help the open source Apache project do a better job scaling across more nodes. There's no reason why the open source R stats program could not be parallelized, as Revolution Analytics has done, and run across a Cray XE6 super – and run in conjunction with Hadoop, chewing on the reduced data.

Supercomputer rival Silicon Graphics has been going on about how its shared memory parallel supers, the UV 1000 Xeon-based machines, can scale Windows Server 2008 R2 across 256 cores and 2TB of memory – the upper limit of that Microsoft operating system – making it an ideal box for running big databases for online transaction processing and data warehousing. Since last fall, SGI has been selling variants of its Rackable rackish servers with the Cloudera CDH3 commercial Hadoop distribution. SGI has taken down a number of Hadoop deals with as many as 1,200 nodes each in the quarter ended in December.

Cray would have to do some substantial engineering to the XE interconnect to create a shared memory architecture that could match the Windows Server scalability that SGI has. But on parallel commercial workloads like Hadoop, and maybe even on NoSQL data stores, the engineering job is do-able. ®

3 Big data security analytics techniques

More from The Register

next story
This time it's 'Personal': new Office 365 sub covers just two devices
Redmond also brings Office into Google's back yard
Kingston DataTraveler MicroDuo: Turn your phone into a 72GB beast
USB-usiness in the front, micro-USB party in the back
Dropbox defends fantastically badly timed Condoleezza Rice appointment
'Nothing is going to change with Dr. Rice's appointment,' file sharer promises
BOFH: Oh DO tell us what you think. *CLICK*
$%%&amp Oh dear, we've been cut *CLICK* Well hello *CLICK* You're breaking up...
AMD's 'Seattle' 64-bit ARM server chips now sampling, set to launch in late 2014
But they won't appear in SeaMicro Fabric Compute Systems anytime soon
Amazon reveals its Google-killing 'R3' server instances
A mega-memory instance that never forgets
Cisco reps flog Whiptail's Invicta arrays against EMC and Pure
Storage reseller report reveals who's selling what
prev story

Whitepapers

SANS - Survey on application security programs
In this whitepaper learn about the state of application security programs and practices of 488 surveyed respondents, and discover how mature and effective these programs are.
Combat fraud and increase customer satisfaction
Based on their experience using HP ArcSight Enterprise Security Manager for IT security operations, Finansbank moved to HP ArcSight ESM for fraud management.
The benefits of software based PBX
Why you should break free from your proprietary PBX and how to leverage your existing server hardware.
Top three mobile application threats
Learn about three of the top mobile application security threats facing businesses today and recommendations on how to mitigate the risk.
3 Big data security analytics techniques
Applying these Big Data security analytics techniques can help you make your business safer by detecting attacks early, before significant damage is done.