Amazon fluffs up fat EC2 images for big data munching

Flips switch on Data Pipeline automagic bit shifter

Remote control for virtualized desktops

The big fat storage instances that Amazon Web Services was promising to deliver back at its re:Invent user conference in November are now shipping, and we now know a few more things about them – such as how expensive they are.

Amazon has also fired up the Data Pipeline service, which is a workflow-based tool for moving data between various AWS services and into and out of third-party databases and data stores.

The High Storage Eight Extra Large instance, abbreviated hs1.8xlarge by AWS, has 117GB of virtual memory with two dozen 2TB drives for a total of 48TB of capacity associated with it. It has 16 virtual cores assigned to it for a total of 35 EC2 Compute Units (EC2s) of processing power, which is a little less than half of the generic 8XL EC2 instance on which it is based, which has 88 ECUs of virtual oomph. AWS said in a blog post that those local drives in the physical server can deliver 2.4GB/sec of I/O performance through the customized Xen hypervisor that underlies all EC2 instances on the AWS cloud.

Amazon recommends that customers using these High Storage instances turn on RAID 1 mirroring or RAID 5 or 6 data striping and parity protection to secure their data, and says further that a clustered file system such as Gluster (also known as Red Hat Storage Server if you use the commercial version) or a distributed storage system such as the Hadoop Distributed File System (HDFS) to provide fault tolerance. And, as you might expect, Amazon also wants customers to back up the data they put on these storage-heavy compute nodes onto its S3 object storage.

Amazon says that the High Storage instance is aimed at Hadoop data munching, data warehousing, log processing, and seismic-analysis workloads where having lots of local storage on the nodes and high sequential I/O are important.

At the moment, the High Storage instances are only available from Amazon's US East region in northern Virginia, and other regions around the globe will get these fat storage nodes in the coming months.

And they're not cheap, at $4.60 per hour for on-demand instances running Linux and $4.931 per hour running Windows. A regular8XL instance (also known as a Cluster Compute instance) costs $2.40 per hour running Linux and $2.97 per hour running Windows. Those 8XL instances have a little more than twice as much compute, but hardly any local storage. That's US East region pricing on EC2; other regions will have slight different pricing.

The High Storage instances are being used for Amazon's own Redshift data warehousing service and are options for the Elastic MapReduce Hadoop service, as well.

On Friday, Amazon also turned on its Data Pipeline service so customers can start using it, as you can see in this blog post. The service provides a workflow to automatically move information from Amazon's S3, Relational Data Service database, DynamoDB NoSQL data store, and Elastic MapReduce Hadoopery, or into it from applications or across these various services as data is chewed and sorted for various applications.

Data Pipeline has a free usage tier, just like EC2 instances, and at the moment is only available in the US East region, just like the fat storage server slices. You can run five "low frequency" activities, which means they are scheduled to run no more than once a day, in this free tier. The High frequency tier is not free, and it is for data movements that occur more than once a day.

You have to use Amazon's graphical tool to build pipelines to move data between services, and you pay 60 cents per month for a low-frequency data movement and $1 per month for high-frequency data movements. You have to pay $1 per month for each inactive pipeline you have set up but not used, and if you want to do data movements either out to or in from outside data sources, then a low-frequency data movement will cost you $1.50 per month to set up and $2.50 per month if you do it more than once a day.

These Data Pipeline service fees do not include any bandwidth or storage fees associated with core AWS infrastructure services. ®

Choosing a cloud hosting partner with confidence

More from The Register

next story
Fat fingered geo-block kept Aussies in the dark
NASA launches new climate model at SC14
75 days of supercomputing later ...
Yahoo! blames! MONSTER! email! OUTAGE! on! CUT! CABLE! bungle!
Weekend woe for BT as telco struggles to restore service
You think the CLOUD's insecure? It's BETTER than UK.GOV's DATA CENTRES
We don't even know where some of them ARE – Maude
Cloud unicorns are extinct so DiData cloud mess was YOUR fault
Applications need to be built to handle TITSUP incidents
BOFH: WHERE did this 'fax-enabled' printer UPGRADE come from?
Don't worry about that cable, it's part of the config
Stop the IoT revolution! We need to figure out packet sizes first
Researchers test 802.15.4 and find we know nuh-think! about large scale sensor network ops
Trio of XSS turns attackers into admins
prev story


Why and how to choose the right cloud vendor
The benefits of cloud-based storage in your processes. Eliminate onsite, disk-based backup and archiving in favor of cloud-based data protection.
Forging a new future with identity relationship management
Learn about ForgeRock's next generation IRM platform and how it is designed to empower CEOS's and enterprises to engage with consumers.
Designing and building an open ITOA architecture
Learn about a new IT data taxonomy defined by the four data sources of IT visibility: wire, machine, agent, and synthetic data sets.
How to determine if cloud backup is right for your servers
Two key factors, technical feasibility and TCO economics, that backup and IT operations managers should consider when assessing cloud backup.
High Performance for All
While HPC is not new, it has traditionally been seen as a specialist area – is it now geared up to meet more mainstream requirements?