Amazon preparing 'disruptive' big data AWS service?

Seattle hunts for folk comfortable with exabyte-scale analytics

Beginner's guide to SSL certificates

Exclusive Amazon Web Services looks set to launch a "disruptive" big data service that is sure to put the frighteners on traditional IT vendors.

The as-yet-unnamed product will be run within AWS Data Services – an internal cloud product team that also handles the AWS Data Pipeline, AWS RDS and AWS RedShift, among others – according to information The Register has gleaned from various job openings published on the Amazon Careers site.

"If you're excited about building a distributed system capable of handling exabytes of data, this is your dream job," reads an ad that was posted on Monday.

"The successful Support Engineer will be instrumental in building, operating, and scaling a massive near-realtime distributed system," specifies another.

To fit with Amazon's love of acronyms, The Reg has nicknamed this product the Mystery-Amazon-Data-Service, or MADS.

Its capabilities will include "highly available, highly reliable processing of data in near-realtime", according to the body copy of one job advert.

MADS has a big mouth – the ads state it will have to ingest between two and five million records per second "at launch," and eventually scale to handle over a hundred times that.

The engineering adverts want people with expertise in distributed systems, consistent hashing, distributed locking, replication, and load balancing.

This suggests MADS will be some kind of near-real-time analytics database. The combination of distributed locking and replication requirements implies that MADS will be able to replicate data widely without suffering a latency hit by employing a distributed locking system.

Because it explicitly says "records" in the advert, there's an indication it will ingest data from relational databases, as another term for row is record (or tuple).

This hints at a system with capabilities equivalent to Google's secretive distributed datastore Spanner, or recent-AWS-partner TransLattice's geo-replicable database tech.

MADS could be a way for Amazon to get around some of the shortcomings of ParAccel, the PostgreSQL database on which Redshift is based, as ParAccel finds it difficult to handle small writes economically. MADS could make it possible to create a buffer database to let customers easily replicate OLTP databases straight into Redshift without having to batch load.

This also lines up with the exabyte data-ingestion specification.

Alternatively, Amazon could be trying to spin up a global database-as-a-service tech using MADS for massively distributed low-latency data storage and processing, which would have a range of applications ranging from online payments to gaming, and could also put existing partners out in the cold.

Either way, it seems like MADS will take a while to go public, as the adverts suggest it is in its early stages. A "System Engineer" can expect to do the following during their first twelve months: define the structure of the system, write infrastructure management tools, "participate in all phases of the development of a large distributed system," manage data-center gear, and perform various bread-and-butter admin tasks, according to an advert posted last Monday.

In other words, somewhere within Amazon there are some very clever people with some architectural ideas and a wishlist of specifications, and now they need to bring in more architects, engineers, and developers to make the MADS system hum.

AWS already has the AWS Data Pipeline, which helps administrators schedule and shuttle data among various services, AWS Redshift for data warehousing which lets people store large quantities of data in the cloud and run queries on it, its NoSQL SSD-backed DynamoDB, and its Relational Database Service (RDS). So where does MADS fit?

The Reg's take is that MADS will allow Amazon to build services that can net together the above components and help automate the passing of data among them. It may also become a standalone product in its own right, based on its similarities to the TransLattice and Google Spanner tech.

This fits with Amazon's general business strategy: tighten links among its own cloud products, and try to implement technology systems pioneered by either its competitors or partners, then bring the product in either for free – see Tuesday's OpsWorks announcement – or for a price so low that a price war becomes inevitable. And Amazon is very, very good at grinding down competitors leading and winning price wars.

We're sure that traditional IT vendors will welcome Amazon's ambition in this area...

Amazon had not responded to multiple requests for comment at the time of writing.®

Security for virtualized datacentres

More from The Register

next story
It's Big, it's Blue... it's simply FABLESS! IBM's chip-free future
Or why the reversal of globalisation ain't gonna 'appen
'Hmm, why CAN'T I run a water pipe through that rack of media servers?'
Leaving Las Vegas for Armenia kludging and Dubai dune bashing
Bitcasa bins $10-a-month Infinite storage offer
Firm cites 'low demand' plus 'abusers'
Facebook slurps 'paste sites' for STOLEN passwords, sprinkles on hash and salt
Zuck's ad empire DOESN'T see details in plain text. Phew!
CAGE MATCH: Microsoft, Dell open co-located bit barns in Oz
Whole new species of XaaS spawning in the antipodes
Microsoft and Dell’s cloud in a box: Instant Azure for the data centre
A less painful way to run Microsoft’s private cloud
prev story


Choosing cloud Backup services
Demystify how you can address your data protection needs in your small- to medium-sized business and select the best online backup service to meet your needs.
Forging a new future with identity relationship management
Learn about ForgeRock's next generation IRM platform and how it is designed to empower CEOS's and enterprises to engage with consumers.
Security for virtualized datacentres
Legacy security solutions are inefficient due to the architectural differences between physical and virtual environments.
Reg Reader Research: SaaS based Email and Office Productivity Tools
Read this Reg reader report which provides advice and guidance for SMBs towards the use of SaaS based email and Office productivity tools.
Storage capacity and performance optimization at Mizuno USA
Mizuno USA turn to Tegile storage technology to solve both their SAN and backup issues.