Amazon sounds death knell for rocket-science grids
Clustered instances semi standard
Comment Amazon's Cluster Compute Instances officially sounded the death knell for grid computing efforts that once held promise as the "next big thing".
Cluster Compute Instances takes a multiple of x64 and links them together using 10 Gigabit Ethernet interfaces and switches. The EC2 virtual server slices function just like any other sold by Amazon, except that the HPC variants have 10 Gigabit Ethernet links and also have a specific hardware profile that allow for fine-tuning of applications.
This semi-standardization of clustered instances reduces not only the cost to run a grid or high-performance computing HPC application, but also the vast complexity associated with building grids and the associated applications.
And it's all thanks to the cloud. No, really, grid applications are one of the best use-cases for cloud service yet. Not only does the cloud have scale, but there are simple deployment methods and far less operational concerns. And the cloud has market momentum versus Grid's scientific and academic connection.
Much in the same way that Linux usurped the marketing crown from Unix - as well as eventual market share - cloud computing took away all the glory from grid computing, which circa 2004/2005 was the term used to describe large-scale distributed computing systems - unless of course you listened to pundit Nicholas Carr and called it Utility Computing. Either way, cloud won.
And while the technological approach underlying grid and cloud are a bit different - an oversimplified explanation involves the fact that most clouds run stacks atop of virtual machines whereas grids tend to use whole machines for processing - the underlying notion of elasticity and pay-as-you go consumption is roughly the same, although the implementation and operations require different approaches and skillsets.
So why cloud and not grid? Grid computing has tended to focus on computationally intense operations, whereas cloud is more oriented toward scale and ease of deployment. Most HPC applications are typically designed to perform one specific set of functions on a specific set of hardware, whereas new-school data processing tools like Hadoop were developed to run on distributed systems that care much less about the underlying infrastructure.
I'm not suggesting that new-school applications would or should only run in the cloud. What I am saying is these new architectural patterns mean that developers can mimic a distributed environment much more easily, and that data can cross enterprise and data center boundaries in new ways. There are also many more deployment options when you are targeting clouds than your own data center.
With the exception of very specific privacy and security issues - which can arguably be addressed anyway - there are fewer and fewer reasons why any organization would want or need to run their own massive server farm.
This is not to suggest that grid and HPC will become completely obsolete but rather that, going forward, will exist in the context of cloud and will be prime candidates to parcel out to providers who can provide a vast amount of on-demand compute capacity.
In place of large numbers of servers that have to be procured and managed, cloud-based grids application deployments will look a lot more like XML and a lot less like rocket science.
Perhaps what matters most is the way developers and system administrators interact with a large amount of computing resources. It's not so much the specific code or application infrastructure that makes the cloud more appealing but the methods and capabilities that make the cloud significantly easier to use and manage.
To be clear, the new AWS offering is not a "complete" solution. Just as AWS lacks tooling for standard AMIs, so too do you need the proper tooling to manage your HPC applications on the new cluster instances. But it doesn't matter. You no longer have to own, deploy and manage hundreds of boxes to run an HPC application. You simply deploy a bunch of AMIs and kill them when the job is done.
The last iteration of grid computing required too much hardware, too much software and way too much money to reach its true potential. Clouds, both public and private, are a giant step on the data processing evolutionary scale. ®
"There are also many more deployment options when you are targeting clouds than your own data center.
With the exception of very specific privacy and security issues - which can arguably be addressed anyway - there are fewer and fewer reasons why any organization would want or need to run their own massive server farm."
I would argue that any company that had a serious ongoing need for HPC would rather target a cloud *in* their own data centre. Privacy and security issues cannot just be glossed over with "can arguably be addressed" at all. There are plenty of organisations out there that would potentially use HPC - let's take financial institutions for example - that cannot just squirt data around the globe to old-mate's cloud because regulators don't allow it. Users of HPC are also unlikely to want their IP floating around in someone else's cloud either. Companies like to be masters of their own destiny which is why they run their own server farms.
Then we have the practicality of all of this - a lot of HPC functions don't perform well in virtual environments. I can name Matlab as one shining example of something that works rapidly on bare-iron, has it's own inbuilt grid functionality, and runs like shit on virtualised hardware. As soon as you virtualise you add overhead and speed bumps and, much as vendors like to spout "typical slowdowns in the region of only 5-10%" I have witnessed intensive CPU<->Memory tasks (essentially what HPC is) suffer a slowdown such that a job will take almost twice as long on virtualised hardware. Virtualisation is great for many things, but HPC just isn't one of them - unless you're a company that cannot afford to run a server farm.
Cloud? Grid? HPC?
Can you really compare the three? As I understand the terms, each of them is meant to address a very different problem:
1) Cloud: Lots of users with relatively small processing requirements per user and small amounts of data to transfer per user. A perfect example are applications running on top of a database to which each individual user cannot add more than a little bit of data. For example, search, email, etc.
2) Grid: A relatively small number of users whose computational needs can be broken up into separate tasks, each of which requires a very small amount of data to specify it's input and output, but a very large amount of processing time which is then spun off to other computers on the grid to allow exploiting the left over processing power of other individual users. The fact that there are very few users scheduling tasks is important because if you assume that half of the users are scheduling tasks, then all they get in average is another machine available to perform work. Folding @ Home is the perfect example.
3) HPC: A single task that is monstrously computationally intensive and most likely requiring a very large amount of data to be moved around. Think solving partial differential equations on huge grids.
The cloud does not have the bandwidth necessary to move around huge grids into it or from it.
The cloud is not left over computing capacity, but rather paid for capacity.
So exactly, how is it that the cloud can do what grid or hpc are meant to do?
Grid is not HPC
Grid computing does not attempt to replace HPC - HPC is what it is, and has been established for a long time. Grid computing does not focus so much on high performance but high throughput. The issue for scientific Grid computing is not the processing power needed to run, but the size and location of the data sets that must be operated on. Cloud services from the likes of Amazon do provide great alternatives for various situations, but will not help a scientific research group with petabytes of data spread across multiple collaborating sites, because getting that data within range of the processing units is the hard part. Using Grid computing techniques, each site can run its own Grid on local hardware that is close to the large dataset, and results can be centrally collated. The type of site involved in this sort of project - e.g. a large University - have plenty of machines with spare CPU cycles available to perform the operations - Grid computing is an attempt to harness those spare cycles and put them to use.