Feeds

Amazon preparing 'disruptive' big data AWS service?

Seattle hunts for folk comfortable with exabyte-scale analytics

Top three mobile application threats

Exclusive Amazon Web Services looks set to launch a "disruptive" big data service that is sure to put the frighteners on traditional IT vendors.

The as-yet-unnamed product will be run within AWS Data Services – an internal cloud product team that also handles the AWS Data Pipeline, AWS RDS and AWS RedShift, among others – according to information The Register has gleaned from various job openings published on the Amazon Careers site.

"If you're excited about building a distributed system capable of handling exabytes of data, this is your dream job," reads an ad that was posted on Monday.

"The successful Support Engineer will be instrumental in building, operating, and scaling a massive near-realtime distributed system," specifies another.

To fit with Amazon's love of acronyms, The Reg has nicknamed this product the Mystery-Amazon-Data-Service, or MADS.

Its capabilities will include "highly available, highly reliable processing of data in near-realtime", according to the body copy of one job advert.

MADS has a big mouth – the ads state it will have to ingest between two and five million records per second "at launch," and eventually scale to handle over a hundred times that.

The engineering adverts want people with expertise in distributed systems, consistent hashing, distributed locking, replication, and load balancing.

This suggests MADS will be some kind of near-real-time analytics database. The combination of distributed locking and replication requirements implies that MADS will be able to replicate data widely without suffering a latency hit by employing a distributed locking system.

Because it explicitly says "records" in the advert, there's an indication it will ingest data from relational databases, as another term for row is record (or tuple).

This hints at a system with capabilities equivalent to Google's secretive distributed datastore Spanner, or recent-AWS-partner TransLattice's geo-replicable database tech.

MADS could be a way for Amazon to get around some of the shortcomings of ParAccel, the PostgreSQL database on which Redshift is based, as ParAccel finds it difficult to handle small writes economically. MADS could make it possible to create a buffer database to let customers easily replicate OLTP databases straight into Redshift without having to batch load.

This also lines up with the exabyte data-ingestion specification.

Alternatively, Amazon could be trying to spin up a global database-as-a-service tech using MADS for massively distributed low-latency data storage and processing, which would have a range of applications ranging from online payments to gaming, and could also put existing partners out in the cold.

Either way, it seems like MADS will take a while to go public, as the adverts suggest it is in its early stages. A "System Engineer" can expect to do the following during their first twelve months: define the structure of the system, write infrastructure management tools, "participate in all phases of the development of a large distributed system," manage data-center gear, and perform various bread-and-butter admin tasks, according to an advert posted last Monday.

In other words, somewhere within Amazon there are some very clever people with some architectural ideas and a wishlist of specifications, and now they need to bring in more architects, engineers, and developers to make the MADS system hum.

AWS already has the AWS Data Pipeline, which helps administrators schedule and shuttle data among various services, AWS Redshift for data warehousing which lets people store large quantities of data in the cloud and run queries on it, its NoSQL SSD-backed DynamoDB, and its Relational Database Service (RDS). So where does MADS fit?

The Reg's take is that MADS will allow Amazon to build services that can net together the above components and help automate the passing of data among them. It may also become a standalone product in its own right, based on its similarities to the TransLattice and Google Spanner tech.

This fits with Amazon's general business strategy: tighten links among its own cloud products, and try to implement technology systems pioneered by either its competitors or partners, then bring the product in either for free – see Tuesday's OpsWorks announcement – or for a price so low that a price war becomes inevitable. And Amazon is very, very good at grinding down competitors leading and winning price wars.

We're sure that traditional IT vendors will welcome Amazon's ambition in this area...

Amazon had not responded to multiple requests for comment at the time of writing.®

High performance access to file storage

More from The Register

next story
This time it's 'Personal': new Office 365 sub covers just two devices
Redmond also brings Office into Google's back yard
Kingston DataTraveler MicroDuo: Turn your phone into a 72GB beast
USB-usiness in the front, micro-USB party in the back
Dropbox defends fantastically badly timed Condoleezza Rice appointment
'Nothing is going to change with Dr. Rice's appointment,' file sharer promises
Inside the Hekaton: SQL Server 2014's database engine deconstructed
Nadella's database sqares the circle of cheap memory vs speed
BOFH: Oh DO tell us what you think. *CLICK*
$%%&amp Oh dear, we've been cut *CLICK* Well hello *CLICK* You're breaking up...
Just what could be inside Dropbox's new 'Home For Life'?
Biz apps, messaging, photos, email, more storage – sorry, did you think there would be cake?
AMD's 'Seattle' 64-bit ARM server chips now sampling, set to launch in late 2014
But they won't appear in SeaMicro Fabric Compute Systems anytime soon
Amazon reveals its Google-killing 'R3' server instances
A mega-memory instance that never forgets
prev story

Whitepapers

Top three mobile application threats
Learn about three of the top mobile application security threats facing businesses today and recommendations on how to mitigate the risk.
Combat fraud and increase customer satisfaction
Based on their experience using HP ArcSight Enterprise Security Manager for IT security operations, Finansbank moved to HP ArcSight ESM for fraud management.
The benefits of software based PBX
Why you should break free from your proprietary PBX and how to leverage your existing server hardware.
Five 3D headsets to be won!
We were so impressed by the Durovis Dive headset we’ve asked the company to give some away to Reg readers.
SANS - Survey on application security programs
In this whitepaper learn about the state of application security programs and practices of 488 surveyed respondents, and discover how mature and effective these programs are.