Amazon preparing 'disruptive' big data AWS service?

Seattle hunts for folk comfortable with exabyte-scale analytics

Reducing the cost and complexity of web vulnerability management

Exclusive Amazon Web Services looks set to launch a "disruptive" big data service that is sure to put the frighteners on traditional IT vendors.

The as-yet-unnamed product will be run within AWS Data Services – an internal cloud product team that also handles the AWS Data Pipeline, AWS RDS and AWS RedShift, among others – according to information The Register has gleaned from various job openings published on the Amazon Careers site.

"If you're excited about building a distributed system capable of handling exabytes of data, this is your dream job," reads an ad that was posted on Monday.

"The successful Support Engineer will be instrumental in building, operating, and scaling a massive near-realtime distributed system," specifies another.

To fit with Amazon's love of acronyms, The Reg has nicknamed this product the Mystery-Amazon-Data-Service, or MADS.

Its capabilities will include "highly available, highly reliable processing of data in near-realtime", according to the body copy of one job advert.

MADS has a big mouth – the ads state it will have to ingest between two and five million records per second "at launch," and eventually scale to handle over a hundred times that.

The engineering adverts want people with expertise in distributed systems, consistent hashing, distributed locking, replication, and load balancing.

This suggests MADS will be some kind of near-real-time analytics database. The combination of distributed locking and replication requirements implies that MADS will be able to replicate data widely without suffering a latency hit by employing a distributed locking system.

Because it explicitly says "records" in the advert, there's an indication it will ingest data from relational databases, as another term for row is record (or tuple).

This hints at a system with capabilities equivalent to Google's secretive distributed datastore Spanner, or recent-AWS-partner TransLattice's geo-replicable database tech.

MADS could be a way for Amazon to get around some of the shortcomings of ParAccel, the PostgreSQL database on which Redshift is based, as ParAccel finds it difficult to handle small writes economically. MADS could make it possible to create a buffer database to let customers easily replicate OLTP databases straight into Redshift without having to batch load.

This also lines up with the exabyte data-ingestion specification.

Alternatively, Amazon could be trying to spin up a global database-as-a-service tech using MADS for massively distributed low-latency data storage and processing, which would have a range of applications ranging from online payments to gaming, and could also put existing partners out in the cold.

Either way, it seems like MADS will take a while to go public, as the adverts suggest it is in its early stages. A "System Engineer" can expect to do the following during their first twelve months: define the structure of the system, write infrastructure management tools, "participate in all phases of the development of a large distributed system," manage data-center gear, and perform various bread-and-butter admin tasks, according to an advert posted last Monday.

In other words, somewhere within Amazon there are some very clever people with some architectural ideas and a wishlist of specifications, and now they need to bring in more architects, engineers, and developers to make the MADS system hum.

AWS already has the AWS Data Pipeline, which helps administrators schedule and shuttle data among various services, AWS Redshift for data warehousing which lets people store large quantities of data in the cloud and run queries on it, its NoSQL SSD-backed DynamoDB, and its Relational Database Service (RDS). So where does MADS fit?

The Reg's take is that MADS will allow Amazon to build services that can net together the above components and help automate the passing of data among them. It may also become a standalone product in its own right, based on its similarities to the TransLattice and Google Spanner tech.

This fits with Amazon's general business strategy: tighten links among its own cloud products, and try to implement technology systems pioneered by either its competitors or partners, then bring the product in either for free – see Tuesday's OpsWorks announcement – or for a price so low that a price war becomes inevitable. And Amazon is very, very good at grinding down competitors leading and winning price wars.

We're sure that traditional IT vendors will welcome Amazon's ambition in this area...

Amazon had not responded to multiple requests for comment at the time of writing.®

Reducing the cost and complexity of web vulnerability management

More from The Register

next story
Wanna keep your data for 1,000 YEARS? No? Hard luck, HDS wants you to anyway
Combine Blu-ray and M-DISC and you get this monster
US boffins demo 'twisted radio' mux
OAM takes wireless signals to 32 Gbps
Apple flops out 2FA for iCloud in bid to stop future nude selfie leaks
Millions of 4chan users howl with laughter as Cupertino slams stable door
No biggie: EMC's XtremIO firmware upgrade 'will wipe data'
But it'll have no impact and will be seamless, we're told
Students playing with impressive racks? Yes, it's cluster comp time
The most comprehensive coverage the world has ever seen. Ever
Run little spreadsheet, run! IBM's Watson is coming to gobble you up
Big Blue's big super's big appetite for big data in big clouds for big analytics
Seagate's triple-headed Cerberus could SAVE the DISK WORLD
... and possibly bring us even more HAMR time. Yay!
prev story


Secure remote control for conventional and virtual desktops
Balancing user privacy and privileged access, in accordance with compliance frameworks and legislation. Evaluating any potential remote control choice.
WIN a very cool portable ZX Spectrum
Win a one-off portable Spectrum built by legendary hardware hacker Ben Heck
Intelligent flash storage arrays
Tegile Intelligent Storage Arrays with IntelliFlash helps IT boost storage utilization and effciency while delivering unmatched storage savings and performance.
High Performance for All
While HPC is not new, it has traditionally been seen as a specialist area – is it now geared up to meet more mainstream requirements?
Beginner's guide to SSL certificates
De-mystify the technology involved and give you the information you need to make the best decision when considering your online security options.