Feeds

Amazon preparing 'disruptive' big data AWS service?

Seattle hunts for folk comfortable with exabyte-scale analytics

Next gen security for virtualised datacentres

Exclusive Amazon Web Services looks set to launch a "disruptive" big data service that is sure to put the frighteners on traditional IT vendors.

The as-yet-unnamed product will be run within AWS Data Services – an internal cloud product team that also handles the AWS Data Pipeline, AWS RDS and AWS RedShift, among others – according to information The Register has gleaned from various job openings published on the Amazon Careers site.

"If you're excited about building a distributed system capable of handling exabytes of data, this is your dream job," reads an ad that was posted on Monday.

"The successful Support Engineer will be instrumental in building, operating, and scaling a massive near-realtime distributed system," specifies another.

To fit with Amazon's love of acronyms, The Reg has nicknamed this product the Mystery-Amazon-Data-Service, or MADS.

Its capabilities will include "highly available, highly reliable processing of data in near-realtime", according to the body copy of one job advert.

MADS has a big mouth – the ads state it will have to ingest between two and five million records per second "at launch," and eventually scale to handle over a hundred times that.

The engineering adverts want people with expertise in distributed systems, consistent hashing, distributed locking, replication, and load balancing.

This suggests MADS will be some kind of near-real-time analytics database. The combination of distributed locking and replication requirements implies that MADS will be able to replicate data widely without suffering a latency hit by employing a distributed locking system.

Because it explicitly says "records" in the advert, there's an indication it will ingest data from relational databases, as another term for row is record (or tuple).

This hints at a system with capabilities equivalent to Google's secretive distributed datastore Spanner, or recent-AWS-partner TransLattice's geo-replicable database tech.

MADS could be a way for Amazon to get around some of the shortcomings of ParAccel, the PostgreSQL database on which Redshift is based, as ParAccel finds it difficult to handle small writes economically. MADS could make it possible to create a buffer database to let customers easily replicate OLTP databases straight into Redshift without having to batch load.

This also lines up with the exabyte data-ingestion specification.

Alternatively, Amazon could be trying to spin up a global database-as-a-service tech using MADS for massively distributed low-latency data storage and processing, which would have a range of applications ranging from online payments to gaming, and could also put existing partners out in the cold.

Either way, it seems like MADS will take a while to go public, as the adverts suggest it is in its early stages. A "System Engineer" can expect to do the following during their first twelve months: define the structure of the system, write infrastructure management tools, "participate in all phases of the development of a large distributed system," manage data-center gear, and perform various bread-and-butter admin tasks, according to an advert posted last Monday.

In other words, somewhere within Amazon there are some very clever people with some architectural ideas and a wishlist of specifications, and now they need to bring in more architects, engineers, and developers to make the MADS system hum.

AWS already has the AWS Data Pipeline, which helps administrators schedule and shuttle data among various services, AWS Redshift for data warehousing which lets people store large quantities of data in the cloud and run queries on it, its NoSQL SSD-backed DynamoDB, and its Relational Database Service (RDS). So where does MADS fit?

The Reg's take is that MADS will allow Amazon to build services that can net together the above components and help automate the passing of data among them. It may also become a standalone product in its own right, based on its similarities to the TransLattice and Google Spanner tech.

This fits with Amazon's general business strategy: tighten links among its own cloud products, and try to implement technology systems pioneered by either its competitors or partners, then bring the product in either for free – see Tuesday's OpsWorks announcement – or for a price so low that a price war becomes inevitable. And Amazon is very, very good at grinding down competitors leading and winning price wars.

We're sure that traditional IT vendors will welcome Amazon's ambition in this area...

Amazon had not responded to multiple requests for comment at the time of writing.®

Secure remote control for conventional and virtual desktops

More from The Register

next story
HP busts out new ProLiant Gen9 servers
Think those are cool? Wait till you get a load of our racks
Shoot-em-up: Sony Online Entertainment hit by 'large scale DDoS attack'
Games disrupted as firm struggles to control network
Community chest: Storage firms need to pay open-source debts
Samba implementation? Time to get some devs on the job
Like condoms, data now comes in big and HUGE sizes
Linux Foundation lights a fire under storage devs with new conference
Silicon Valley jolted by magnitude 6.1 quake – its biggest in 25 years
Did the earth move for you at VMworld – oh, OK. It just did. A lot
Forrester says it's time to give up on physical storage arrays
The physical/virtual storage tipping point may just have arrived
prev story

Whitepapers

5 things you didn’t know about cloud backup
IT departments are embracing cloud backup, but there’s a lot you need to know before choosing a service provider. Learn all the critical things you need to know.
Implementing global e-invoicing with guaranteed legal certainty
Explaining the role local tax compliance plays in successful supply chain management and e-business and how leading global brands are addressing this.
Backing up Big Data
Solving backup challenges and “protect everything from everywhere,” as we move into the era of big data management and the adoption of BYOD.
Consolidation: The Foundation for IT Business Transformation
In this whitepaper learn how effective consolidation of IT and business resources can enable multiple, meaningful business benefits.
High Performance for All
While HPC is not new, it has traditionally been seen as a specialist area – is it now geared up to meet more mainstream requirements?