Feeds

Amazon fluffs up fat EC2 images for big data munching

Flips switch on Data Pipeline automagic bit shifter

Choosing a cloud hosting partner with confidence

The big fat storage instances that Amazon Web Services was promising to deliver back at its re:Invent user conference in November are now shipping, and we now know a few more things about them – such as how expensive they are.

Amazon has also fired up the Data Pipeline service, which is a workflow-based tool for moving data between various AWS services and into and out of third-party databases and data stores.

The High Storage Eight Extra Large instance, abbreviated hs1.8xlarge by AWS, has 117GB of virtual memory with two dozen 2TB drives for a total of 48TB of capacity associated with it. It has 16 virtual cores assigned to it for a total of 35 EC2 Compute Units (EC2s) of processing power, which is a little less than half of the generic 8XL EC2 instance on which it is based, which has 88 ECUs of virtual oomph. AWS said in a blog post that those local drives in the physical server can deliver 2.4GB/sec of I/O performance through the customized Xen hypervisor that underlies all EC2 instances on the AWS cloud.

Amazon recommends that customers using these High Storage instances turn on RAID 1 mirroring or RAID 5 or 6 data striping and parity protection to secure their data, and says further that a clustered file system such as Gluster (also known as Red Hat Storage Server if you use the commercial version) or a distributed storage system such as the Hadoop Distributed File System (HDFS) to provide fault tolerance. And, as you might expect, Amazon also wants customers to back up the data they put on these storage-heavy compute nodes onto its S3 object storage.

Amazon says that the High Storage instance is aimed at Hadoop data munching, data warehousing, log processing, and seismic-analysis workloads where having lots of local storage on the nodes and high sequential I/O are important.

At the moment, the High Storage instances are only available from Amazon's US East region in northern Virginia, and other regions around the globe will get these fat storage nodes in the coming months.

And they're not cheap, at $4.60 per hour for on-demand instances running Linux and $4.931 per hour running Windows. A regular8XL instance (also known as a Cluster Compute instance) costs $2.40 per hour running Linux and $2.97 per hour running Windows. Those 8XL instances have a little more than twice as much compute, but hardly any local storage. That's US East region pricing on EC2; other regions will have slight different pricing.

The High Storage instances are being used for Amazon's own Redshift data warehousing service and are options for the Elastic MapReduce Hadoop service, as well.

On Friday, Amazon also turned on its Data Pipeline service so customers can start using it, as you can see in this blog post. The service provides a workflow to automatically move information from Amazon's S3, Relational Data Service database, DynamoDB NoSQL data store, and Elastic MapReduce Hadoopery, or into it from applications or across these various services as data is chewed and sorted for various applications.

Data Pipeline has a free usage tier, just like EC2 instances, and at the moment is only available in the US East region, just like the fat storage server slices. You can run five "low frequency" activities, which means they are scheduled to run no more than once a day, in this free tier. The High frequency tier is not free, and it is for data movements that occur more than once a day.

You have to use Amazon's graphical tool to build pipelines to move data between services, and you pay 60 cents per month for a low-frequency data movement and $1 per month for high-frequency data movements. You have to pay $1 per month for each inactive pipeline you have set up but not used, and if you want to do data movements either out to or in from outside data sources, then a low-frequency data movement will cost you $1.50 per month to set up and $2.50 per month if you do it more than once a day.

These Data Pipeline service fees do not include any bandwidth or storage fees associated with core AWS infrastructure services. ®

Top 5 reasons to deploy VMware with Tegile

More from The Register

next story
'Kim Kardashian snaps naked selfies with a BLACKBERRY'. *Twitterati gasps*
More alleged private, nude celeb pics appear online
Wanna keep your data for 1,000 YEARS? No? Hard luck, HDS wants you to anyway
Combine Blu-ray and M-DISC and you get this monster
US boffins demo 'twisted radio' mux
OAM takes wireless signals to 32 Gbps
Google+ GOING, GOING ... ? Newbie Gmailers no longer forced into mandatory ID slurp
Mountain View distances itself from lame 'network thingy'
EMC, HP blockbuster 'merger' shocker comes a cropper
Stand down, FTC... you can put your feet up for a bit
Apple flops out 2FA for iCloud in bid to stop future nude selfie leaks
Millions of 4chan users howl with laughter as Cupertino slams stable door
Students playing with impressive racks? Yes, it's cluster comp time
The most comprehensive coverage the world has ever seen. Ever
Run little spreadsheet, run! IBM's Watson is coming to gobble you up
Big Blue's big super's big appetite for big data in big clouds for big analytics
prev story

Whitepapers

Secure remote control for conventional and virtual desktops
Balancing user privacy and privileged access, in accordance with compliance frameworks and legislation. Evaluating any potential remote control choice.
Intelligent flash storage arrays
Tegile Intelligent Storage Arrays with IntelliFlash helps IT boost storage utilization and effciency while delivering unmatched storage savings and performance.
WIN a very cool portable ZX Spectrum
Win a one-off portable Spectrum built by legendary hardware hacker Ben Heck
High Performance for All
While HPC is not new, it has traditionally been seen as a specialist area – is it now geared up to meet more mainstream requirements?
Beginner's guide to SSL certificates
De-mystify the technology involved and give you the information you need to make the best decision when considering your online security options.