Feeds

IBM's tools give Big Data a good seeing to

Company shares nothing but Hadoop and GPFS

Choosing a cloud hosting partner with confidence

IBM is using Hadoop to make its General Parallel File System capable of dealing with Big Data - extremely large data sets - for cloud-based analytic computing.

Announced at the Supercomputing 2010 conference, the General Parallel File System-Shared Nothing Cluster (GPFS-SNC) project at IBM Research Almaden involves an architecture designed to provide higher availability through clustering technologies, dynamic file system management and replication.

GPFS is the basis for IBM's High Performance Computing Systems, Information Archive, Scale-Out NAS (SONAS), and Smart Business Compute Cloud. GPFS-SNC is a distributed, shared-nothing, computing architecture in which each node is self-sufficient; tasks are divided up between these independent computers and no one node waits on any other.

Hadoop, which is used by Yahoo!, has evolved from Google's MapReduce technology for computations involving petabyte-level data sets distributed across thousands of commodity hsrdware-based computational nodes. The Hadoop Distributed File System (HDFS) is a distributed, scalable and portable file system, written in Java, involving a cluster of data nodes.

HDFS is aware of the location, in a network switch sense, of servers (worker nodes) in the cluster and the system uses this to ensure they compute data local to them and thus reduce data traffic across the network. Different copies of data are kept on different sets of worker nodes, with data being replicated across nodes this way to avoid unnecessary redundancy and high availability, without RAID, should a worker node rack or network switch fail.

HDFS is not POSIX-compliant and one aspect of the GPFS-SNC project is to provide POSIX-compliance. GPFS on its own is POSIX-compliant.

IBM says running data analytics applications in the cloud on extremely large data sets is gaining traction because it is affordable and the underlying infrastructure can store and compute the immense amount of data involved. A POSIX interface means traditional applications using POSIX interfaces can use the cloud resources.

The end-user apps IBM has in mind are things like business intelligence, digital media processing and surveillance video searches. GPFS-SNC technology decomposes the large computation involved into a set of smaller parallelisable computations. IBM reckons GPFS-SNC can work around the frequent failures expected in large-scale commodity server and storage deployments, while being an efficient user of compute, storage and network resources.

IBM's announcement statement says GPFS-SNC "will convert terabytes of pure information into actionable insights twice as fast as previously possible... the design provides a common file system and namespace across disparate computing platforms, streamlining the process and reducing disk space."

The GPFS-SNC project is likely to be used in the EU-funded, IBM-led VISION cloud project announced in the beginning of November. ®

Business security measures using SSL

More from The Register

next story
'Windows 9' LEAK: Microsoft's playing catchup with Linux
Multiple desktops and live tiles in restored Start button star in new vids
Not appy with your Chromebook? Well now it can run Android apps
Google offers beta of tricky OS-inside-OS tech
New 'Cosmos' browser surfs the net by TXT alone
No data plan? No WiFi? No worries ... except sluggish download speed
SUSE Linux owner Attachmate gobbled by Micro Focus for $2.3bn
Merger will lead to mainframe and COBOL powerhouse
iOS 8 release: WebGL now runs everywhere. Hurrah for 3D graphics!
HTML 5's pretty neat ... when your browser supports it
Greater dev access to iOS 8 will put us AT RISK from HACKERS
Knocking holes in Apple's walled garden could backfire, says securo-chap
NHS grows a NoSQL backbone and rips out its Oracle Spine
Open source? In the government? Ha ha! What, wait ...?
Google extends app refund window to two hours
You now have 120 minutes to finish that game instead of 15
Intel: Hey, enterprises, drop everything and DO HADOOP
Big Data analytics projected to run on more servers than any other app
prev story

Whitepapers

Providing a secure and efficient Helpdesk
A single remote control platform for user support is be key to providing an efficient helpdesk. Retain full control over the way in which screen and keystroke data is transmitted.
Saudi Petroleum chooses Tegile storage solution
A storage solution that addresses company growth and performance for business-critical applications of caseware archive and search along with other key operational systems.
Security and trust: The backbone of doing business over the internet
Explores the current state of website security and the contributions Symantec is making to help organizations protect critical data and build trust with customers.
Reg Reader Research: SaaS based Email and Office Productivity Tools
Read this Reg reader report which provides advice and guidance for SMBs towards the use of SaaS based email and Office productivity tools.
Security for virtualized datacentres
Legacy security solutions are inefficient due to the architectural differences between physical and virtual environments.