Feeds

Amazon fluffs up fat EC2 images for big data munching

Flips switch on Data Pipeline automagic bit shifter

Application security programs and practises

The big fat storage instances that Amazon Web Services was promising to deliver back at its re:Invent user conference in November are now shipping, and we now know a few more things about them – such as how expensive they are.

Amazon has also fired up the Data Pipeline service, which is a workflow-based tool for moving data between various AWS services and into and out of third-party databases and data stores.

The High Storage Eight Extra Large instance, abbreviated hs1.8xlarge by AWS, has 117GB of virtual memory with two dozen 2TB drives for a total of 48TB of capacity associated with it. It has 16 virtual cores assigned to it for a total of 35 EC2 Compute Units (EC2s) of processing power, which is a little less than half of the generic 8XL EC2 instance on which it is based, which has 88 ECUs of virtual oomph. AWS said in a blog post that those local drives in the physical server can deliver 2.4GB/sec of I/O performance through the customized Xen hypervisor that underlies all EC2 instances on the AWS cloud.

Amazon recommends that customers using these High Storage instances turn on RAID 1 mirroring or RAID 5 or 6 data striping and parity protection to secure their data, and says further that a clustered file system such as Gluster (also known as Red Hat Storage Server if you use the commercial version) or a distributed storage system such as the Hadoop Distributed File System (HDFS) to provide fault tolerance. And, as you might expect, Amazon also wants customers to back up the data they put on these storage-heavy compute nodes onto its S3 object storage.

Amazon says that the High Storage instance is aimed at Hadoop data munching, data warehousing, log processing, and seismic-analysis workloads where having lots of local storage on the nodes and high sequential I/O are important.

At the moment, the High Storage instances are only available from Amazon's US East region in northern Virginia, and other regions around the globe will get these fat storage nodes in the coming months.

And they're not cheap, at $4.60 per hour for on-demand instances running Linux and $4.931 per hour running Windows. A regular8XL instance (also known as a Cluster Compute instance) costs $2.40 per hour running Linux and $2.97 per hour running Windows. Those 8XL instances have a little more than twice as much compute, but hardly any local storage. That's US East region pricing on EC2; other regions will have slight different pricing.

The High Storage instances are being used for Amazon's own Redshift data warehousing service and are options for the Elastic MapReduce Hadoop service, as well.

On Friday, Amazon also turned on its Data Pipeline service so customers can start using it, as you can see in this blog post. The service provides a workflow to automatically move information from Amazon's S3, Relational Data Service database, DynamoDB NoSQL data store, and Elastic MapReduce Hadoopery, or into it from applications or across these various services as data is chewed and sorted for various applications.

Data Pipeline has a free usage tier, just like EC2 instances, and at the moment is only available in the US East region, just like the fat storage server slices. You can run five "low frequency" activities, which means they are scheduled to run no more than once a day, in this free tier. The High frequency tier is not free, and it is for data movements that occur more than once a day.

You have to use Amazon's graphical tool to build pipelines to move data between services, and you pay 60 cents per month for a low-frequency data movement and $1 per month for high-frequency data movements. You have to pay $1 per month for each inactive pipeline you have set up but not used, and if you want to do data movements either out to or in from outside data sources, then a low-frequency data movement will cost you $1.50 per month to set up and $2.50 per month if you do it more than once a day.

These Data Pipeline service fees do not include any bandwidth or storage fees associated with core AWS infrastructure services. ®

Eight steps to building an HP BladeSystem

More from The Register

next story
Sysadmin Day 2014: Quick, there's still time to get the beers in
He walked over the broken glass, killed the thugs... and er... reconnected the cables*
SHOCK and AWS: The fall of Amazon's deflationary cloud
Just as Jeff Bezos did to books and CDs, Amazon's rivals are now doing to it
Amazon Reveals One Weird Trick: A Loss On Almost $20bn In Sales
Investors really hate it: Share price plunge as growth SLOWS in key AWS division
US judge: YES, cops or feds so can slurp an ENTIRE Gmail account
Crooks don't have folders labelled 'drug records', opines NY beak
Auntie remains MYSTIFIED by that weekend BBC iPlayer and website outage
Still doing 'forensics' on the caching layer – Beeb digi wonk
Manic malware Mayhem spreads through Linux, FreeBSD web servers
And how Google could cripple infection rate in a second
BlackBerry: Toss the server, mate... BES is in the CLOUD now
BlackBerry Enterprise Services takes aim at SMEs - but there's a catch
The triumph of VVOL: Everyone's jumping into bed with VMware
'Bandwagon'? Yes, we're on it and so what, say big dogs
prev story

Whitepapers

Top three mobile application threats
Prevent sensitive data leakage over insecure channels or stolen mobile devices.
Implementing global e-invoicing with guaranteed legal certainty
Explaining the role local tax compliance plays in successful supply chain management and e-business and how leading global brands are addressing this.
Boost IT visibility and business value
How building a great service catalog relieves pressure points and demonstrates the value of IT service management.
Designing a Defense for Mobile Applications
Learn about the various considerations for defending mobile applications - from the application architecture itself to the myriad testing technologies.
Build a business case: developing custom apps
Learn how to maximize the value of custom applications by accelerating and simplifying their development.