Amazon preparing 'disruptive' big data AWS service?

Seattle hunts for folk comfortable with exabyte-scale analytics

Application security programs and practises

Exclusive Amazon Web Services looks set to launch a "disruptive" big data service that is sure to put the frighteners on traditional IT vendors.

The as-yet-unnamed product will be run within AWS Data Services – an internal cloud product team that also handles the AWS Data Pipeline, AWS RDS and AWS RedShift, among others – according to information The Register has gleaned from various job openings published on the Amazon Careers site.

"If you're excited about building a distributed system capable of handling exabytes of data, this is your dream job," reads an ad that was posted on Monday.

"The successful Support Engineer will be instrumental in building, operating, and scaling a massive near-realtime distributed system," specifies another.

To fit with Amazon's love of acronyms, The Reg has nicknamed this product the Mystery-Amazon-Data-Service, or MADS.

Its capabilities will include "highly available, highly reliable processing of data in near-realtime", according to the body copy of one job advert.

MADS has a big mouth – the ads state it will have to ingest between two and five million records per second "at launch," and eventually scale to handle over a hundred times that.

The engineering adverts want people with expertise in distributed systems, consistent hashing, distributed locking, replication, and load balancing.

This suggests MADS will be some kind of near-real-time analytics database. The combination of distributed locking and replication requirements implies that MADS will be able to replicate data widely without suffering a latency hit by employing a distributed locking system.

Because it explicitly says "records" in the advert, there's an indication it will ingest data from relational databases, as another term for row is record (or tuple).

This hints at a system with capabilities equivalent to Google's secretive distributed datastore Spanner, or recent-AWS-partner TransLattice's geo-replicable database tech.

MADS could be a way for Amazon to get around some of the shortcomings of ParAccel, the PostgreSQL database on which Redshift is based, as ParAccel finds it difficult to handle small writes economically. MADS could make it possible to create a buffer database to let customers easily replicate OLTP databases straight into Redshift without having to batch load.

This also lines up with the exabyte data-ingestion specification.

Alternatively, Amazon could be trying to spin up a global database-as-a-service tech using MADS for massively distributed low-latency data storage and processing, which would have a range of applications ranging from online payments to gaming, and could also put existing partners out in the cold.

Either way, it seems like MADS will take a while to go public, as the adverts suggest it is in its early stages. A "System Engineer" can expect to do the following during their first twelve months: define the structure of the system, write infrastructure management tools, "participate in all phases of the development of a large distributed system," manage data-center gear, and perform various bread-and-butter admin tasks, according to an advert posted last Monday.

In other words, somewhere within Amazon there are some very clever people with some architectural ideas and a wishlist of specifications, and now they need to bring in more architects, engineers, and developers to make the MADS system hum.

AWS already has the AWS Data Pipeline, which helps administrators schedule and shuttle data among various services, AWS Redshift for data warehousing which lets people store large quantities of data in the cloud and run queries on it, its NoSQL SSD-backed DynamoDB, and its Relational Database Service (RDS). So where does MADS fit?

The Reg's take is that MADS will allow Amazon to build services that can net together the above components and help automate the passing of data among them. It may also become a standalone product in its own right, based on its similarities to the TransLattice and Google Spanner tech.

This fits with Amazon's general business strategy: tighten links among its own cloud products, and try to implement technology systems pioneered by either its competitors or partners, then bring the product in either for free – see Tuesday's OpsWorks announcement – or for a price so low that a price war becomes inevitable. And Amazon is very, very good at grinding down competitors leading and winning price wars.

We're sure that traditional IT vendors will welcome Amazon's ambition in this area...

Amazon had not responded to multiple requests for comment at the time of writing.®

Eight steps to building an HP BladeSystem

More from The Register

next story
Sysadmin Day 2014: Quick, there's still time to get the beers in
He walked over the broken glass, killed the thugs... and er... reconnected the cables*
Apple fanbois SCREAM as update BRICKS their Macbook Airs
Ragegasm spills over as firmware upgrade kills machines
SHOCK and AWS: The fall of Amazon's deflationary cloud
Just as Jeff Bezos did to books and CDs, Amazon's rivals are now doing to it
Amazon Reveals One Weird Trick: A Loss On Almost $20bn In Sales
Investors really hate it: Share price plunge as growth SLOWS in key AWS division
EU's top data cops to meet Google, Microsoft et al over 'right to be forgotten'
Plan to hammer out 'coherent' guidelines. Good luck chaps!
US judge: YES, cops or feds so can slurp an ENTIRE Gmail account
Crooks don't have folders labelled 'drug records', opines NY beak
Auntie remains MYSTIFIED by that weekend BBC iPlayer and website outage
Still doing 'forensics' on the caching layer – Beeb digi wonk
Manic malware Mayhem spreads through Linux, FreeBSD web servers
And how Google could cripple infection rate in a second
prev story


Top three mobile application threats
Prevent sensitive data leakage over insecure channels or stolen mobile devices.
Implementing global e-invoicing with guaranteed legal certainty
Explaining the role local tax compliance plays in successful supply chain management and e-business and how leading global brands are addressing this.
Boost IT visibility and business value
How building a great service catalog relieves pressure points and demonstrates the value of IT service management.
Designing a Defense for Mobile Applications
Learn about the various considerations for defending mobile applications - from the application architecture itself to the myriad testing technologies.
Build a business case: developing custom apps
Learn how to maximize the value of custom applications by accelerating and simplifying their development.