Feeds

Amazon fluffs up fat EC2 images for big data munching

Flips switch on Data Pipeline automagic bit shifter

3 Big data security analytics techniques

The big fat storage instances that Amazon Web Services was promising to deliver back at its re:Invent user conference in November are now shipping, and we now know a few more things about them – such as how expensive they are.

Amazon has also fired up the Data Pipeline service, which is a workflow-based tool for moving data between various AWS services and into and out of third-party databases and data stores.

The High Storage Eight Extra Large instance, abbreviated hs1.8xlarge by AWS, has 117GB of virtual memory with two dozen 2TB drives for a total of 48TB of capacity associated with it. It has 16 virtual cores assigned to it for a total of 35 EC2 Compute Units (EC2s) of processing power, which is a little less than half of the generic 8XL EC2 instance on which it is based, which has 88 ECUs of virtual oomph. AWS said in a blog post that those local drives in the physical server can deliver 2.4GB/sec of I/O performance through the customized Xen hypervisor that underlies all EC2 instances on the AWS cloud.

Amazon recommends that customers using these High Storage instances turn on RAID 1 mirroring or RAID 5 or 6 data striping and parity protection to secure their data, and says further that a clustered file system such as Gluster (also known as Red Hat Storage Server if you use the commercial version) or a distributed storage system such as the Hadoop Distributed File System (HDFS) to provide fault tolerance. And, as you might expect, Amazon also wants customers to back up the data they put on these storage-heavy compute nodes onto its S3 object storage.

Amazon says that the High Storage instance is aimed at Hadoop data munching, data warehousing, log processing, and seismic-analysis workloads where having lots of local storage on the nodes and high sequential I/O are important.

At the moment, the High Storage instances are only available from Amazon's US East region in northern Virginia, and other regions around the globe will get these fat storage nodes in the coming months.

And they're not cheap, at $4.60 per hour for on-demand instances running Linux and $4.931 per hour running Windows. A regular8XL instance (also known as a Cluster Compute instance) costs $2.40 per hour running Linux and $2.97 per hour running Windows. Those 8XL instances have a little more than twice as much compute, but hardly any local storage. That's US East region pricing on EC2; other regions will have slight different pricing.

The High Storage instances are being used for Amazon's own Redshift data warehousing service and are options for the Elastic MapReduce Hadoop service, as well.

On Friday, Amazon also turned on its Data Pipeline service so customers can start using it, as you can see in this blog post. The service provides a workflow to automatically move information from Amazon's S3, Relational Data Service database, DynamoDB NoSQL data store, and Elastic MapReduce Hadoopery, or into it from applications or across these various services as data is chewed and sorted for various applications.

Data Pipeline has a free usage tier, just like EC2 instances, and at the moment is only available in the US East region, just like the fat storage server slices. You can run five "low frequency" activities, which means they are scheduled to run no more than once a day, in this free tier. The High frequency tier is not free, and it is for data movements that occur more than once a day.

You have to use Amazon's graphical tool to build pipelines to move data between services, and you pay 60 cents per month for a low-frequency data movement and $1 per month for high-frequency data movements. You have to pay $1 per month for each inactive pipeline you have set up but not used, and if you want to do data movements either out to or in from outside data sources, then a low-frequency data movement will cost you $1.50 per month to set up and $2.50 per month if you do it more than once a day.

These Data Pipeline service fees do not include any bandwidth or storage fees associated with core AWS infrastructure services. ®

SANS - Survey on application security programs

More from The Register

next story
This time it's 'Personal': new Office 365 sub covers just two devices
Redmond also brings Office into Google's back yard
Dropbox defends fantastically badly timed Condoleezza Rice appointment
'Nothing is going to change with Dr. Rice's appointment,' file sharer promises
Bored with trading oil and gold? Why not flog some CLOUD servers?
Chicago Mercantile Exchange plans cloud spot exchange
Just what could be inside Dropbox's new 'Home For Life'?
Biz apps, messaging, photos, email, more storage – sorry, did you think there would be cake?
IT bods: How long does it take YOU to train up on new tech?
I'll leave my arrays to do the hard work, if you don't mind
Amazon reveals its Google-killing 'R3' server instances
A mega-memory instance that never forgets
Cisco reps flog Whiptail's Invicta arrays against EMC and Pure
Storage reseller report reveals who's selling what
prev story

Whitepapers

Designing a defence for mobile apps
In this whitepaper learn the various considerations for defending mobile applications; from the mobile application architecture itself to the myriad testing technologies needed to properly assess mobile applications risk.
3 Big data security analytics techniques
Applying these Big Data security analytics techniques can help you make your business safer by detecting attacks early, before significant damage is done.
Five 3D headsets to be won!
We were so impressed by the Durovis Dive headset we’ve asked the company to give some away to Reg readers.
The benefits of software based PBX
Why you should break free from your proprietary PBX and how to leverage your existing server hardware.
Securing web applications made simple and scalable
In this whitepaper learn how automated security testing can provide a simple and scalable way to protect your web applications.