Feeds

Amazon opens doors to Kinesis: The Hotel California of the cloud

You can check in your data anytime you like, but as for leaving...

Top 5 reasons to deploy VMware with Tegile

re:Invent 2013 The more data you put in a cloud, the harder it is to migrate away. And so Amazon's new "Kinesis" data ingester is a neat piece of technology, and at the same time a canny way to turn Amazon Web Services into the Hotel California of the cloud.

Kinesis was announced by the web bazaar's chief technology officer Werner Vogels in a speech at the company's re:Invent conference today. It's essentially Amazon's attempt to fire up a commercial variant of open-source data processing and messaging engines Storm, Spark Streaming, and Kafka.

The difference between Kinesis and these systems is that Amazon handles all the pesky infrastructure management and provisioning, and simply exposes the system to a developer as a service that lets the programmer pick what data to ingest, how much, and where to feed it to.

Kinesis will also compete with commercial systems, like Google's BigQuery – though the ad-slinger's streaming capabilities are rudimentary in comparison to Kinesis's data-huffing tech.

Amazon's service can "collect and process hundreds of terabytes of data per hour from hundreds of thousands of sources," the biz wrote in a document discussing the tech. It can then stream this out rapidly to other complementary AWS technologies such as DynamoDB or RedShift for analysis or presentation.

Some of apps this makes possible could include social media data-mining software, live data analytics around financial markets, or a way of feeding changing inventory information from massive stores into software to re-order stock.

"In a few clicks and a couple of lines of code, you can start building applications which respond to changes in your data stream in seconds, at any scale, while only paying for the resources you use," the company wrote.

Admins can set up a Kinesis service by saying how much input and output they need in blocks of 1MBps named 'Shards', via the AWS console, API, or SDKs. The size of the stream can be adjusted without needing a restart, and data is loaded in with HTTP PUTs

Data placed in Kinesis is available for analysis "within seconds" the company said, and is stored across several bit barns for 24 hours, during which time it can be "read, re-read, backfilled, and analyzed," or moved into storage services like S3 and RedShift, the company said.

Amazon has also produced a client library for the service that automates how Kinesis adapts to "changes in stream volume, load-balancing streaming data, coordinating distributed services, and processing data with fault-tolerance," the company said.

However, Kinesis is an ambitious project and perhaps more prone to performance wobbles than other AWS services. "Real-time is definitely one of the areas where we have to crack a lot of nuts still," Vogels told El Reg. We will be watching its performance closely for any wobbles.

Kinesis costs $0.015 per 'Shard' per hour, along with $0.028 per 1,000,000 PUT transactions. Initially, the tech is available from the US East Region as a "Limited Preview" that developers will need to apply for. Inbound data transfer is free and there's no cost to transmit data from Kinesis into other AWS apps.

An AWS pricing example says Kinesis could be used to create an app ingesting 10MBps of data while feeding info out to two real-time processing applications for $4.22 a day.

Kinesis looks like the outcome of a big-data service which El Reg spotted in February, and dubbed the Mystery-Amazon-Data-Service (MADS).

MADS was going to be capable of "highly available, highly reliable processing of data in near-realtime". Recruitment adverts at the time said the service would have to slurp between two and five million database records per second at launch, and eventually scale to deal with hundreds of millions – which is exactly the sort of capability Kinesis requires.

With Kinesis, Amazon is giving companies a way to analyze streams of changing web data, and to do so without operating any complex hardware or infrastructure. It also gives it an app that will pour more data than ever before into its cloud, fattening its margins and letting it buy more storage gear at lower prices than ever before, allowing Bezos & Co to make money twice – first from the customer paying him, and then from the discounts he can get from his equipment suppliers. ®

Beginner's guide to SSL certificates

More from The Register

next story
It's Big, it's Blue... it's simply FABLESS! IBM's chip-free future
Or why the reversal of globalisation ain't gonna 'appen
IBM storage revenues sink: 'We are disappointed,' says CEO
Time to put the storage biz up for sale?
'Hmm, why CAN'T I run a water pipe through that rack of media servers?'
Leaving Las Vegas for Armenia kludging and Dubai dune bashing
Microsoft and Dell’s cloud in a box: Instant Azure for the data centre
A less painful way to run Microsoft’s private cloud
Facebook slurps 'paste sites' for STOLEN passwords, sprinkles on hash and salt
Zuck's ad empire DOESN'T see details in plain text. Phew!
Windows 10: Forget Cloudobile, put Security and Privacy First
But - dammit - It would be insane to say 'don't collect, because NSA'
prev story

Whitepapers

Cloud and hybrid-cloud data protection for VMware
Learn how quick and easy it is to configure backups and perform restores for VMware environments.
A strategic approach to identity relationship management
ForgeRock commissioned Forrester to evaluate companies’ IAM practices and requirements when it comes to customer-facing scenarios versus employee-facing ones.
High Performance for All
While HPC is not new, it has traditionally been seen as a specialist area – is it now geared up to meet more mainstream requirements?
Three 1TB solid state scorchers up for grabs
Big SSDs can be expensive but think big and think free because you could be the lucky winner of one of three 1TB Samsung SSD 840 EVO drives that we’re giving away worth over £300 apiece.
Security for virtualized datacentres
Legacy security solutions are inefficient due to the architectural differences between physical and virtual environments.