Feeds

Heroku tech change leaves customer with bill-shock

PaaS clouds not easy to run if you don't RTFM

Combat fraud and increase customer satisfaction

The cloud is not as easy or as simple as its providers' marketing departments may want you to believe – that's the moral of the story of a startup and its platform provider Heroku.

Web-startup Rap Genius, which pays Heroku $20,000 a month in fees, alleged in a forceful blog post published on Tuesday that the Salesforce-owned PaaS had "swindled its customers".

"In mid-2010, Heroku quietly redesigned its routing system, and the change – nowhere documented, nowhere instrumented – radically degraded throughput on the platform," James Somers, an engineer at Rap Genius, wrote. "Dollar for dollar, a dyno became worth a fraction of its former self."

A dyno is the fundamental resource unit that Heroku deals in. Any time spent waiting for a request to reach a dyno directly impacts performance.

The change that caused cost-overruns at Rap Genius was Heroku's switch from an "intelligent routing" network structure to a "random routing" structure in mid-2010. This altered how requests were sent to dynos.

Instead of making sure that requests immediately found their way to an available compute resource – a dyno – Heroku began assigning the requests to random dynos, Somers wrote. This meant that some requests would end up being queued, which could lower throughput – a critical issue for large web applications.

Rap Genius discovered the problem when a JavaScript problem in its site forced it to run a range of ApacheBench tests. It found that the request time Heroku's monitoring partner New Relic reported for a static page was 40ms, while its own tools said 6330ms.

The company asked Heroku what was going on, and an engineer told them that the numbers reflected the time it took to serve a request after it had passed out of the queue. Rap Genius had presumed Heroku's "intelligent routing" meant there would never be any queuing.

As of this Thursday, Heroku's site says this about how it routes requests: "Incoming web traffic is automatically routed to web dynos, with intelligent distribution of load instantly as you scale." If you click through to the routing page, however, it clearly states "the routing mesh uses a random selection algorithm for HTTP request load balancing across web processes."

Where the situation gets complicated is that Heroku did – slowly – reference the change from intelligent routing to random routing in its documentation. But Rap Genius feels they got a raw deal because Heroku did not explicitly reach out to the young startup and tell it about the change.

Along with this, Heroku continues to use the "intelligent routing" term previously associated with distributed clever routing, though it now routes requests randomly.

Other developers have run into this problem, and Heroku has acknowledged that it has done a poor job at telegraphing the change to users. Developer Tim Watson ran into the same issue as Rap Genius in mid-2011, queried Heroku, and the company's CTO Adam Wiggins said:

You're correct, the routing mesh does not behave in quite the way described by the docs. We're working on evolving away from the global backlog concept in order to provide better support for different concurrency models, and the docs are no longer accurate. The current behavior is not ideal, but we're on our way to a new model which we'll document fully once it's done.

Some industry insiders were sympathetic to Rap Genius's performance hit, but bridled at the aggressive tone the startup used in its blog post.

"Pay attention to your app and performance," former Operations director at Heroku Mark Imbriaco told The Register via Twitter. "This shouldn't sneak up on you after that long," he said.

"I understand why [Rap Genius] is upset with the performance they see, it sucks," he wrote.

Rap Genius's cofounder Tom Lehman admitted in conversation with The Register that "there are definitely ways to mitigate this and Rap Genius should do more of them, but the problem is still really bad."

The startup feels misled by Heroku and wrote the blog post after attempts to get a price reduction while they changed the structure of their site were rebuffed.

The Register has seen an email from Heroku's CTO to Rap Genius that indicates Heroku advised the company to re-engineer its application.

"I'm convinced that the best path forward is for one of your developers to work closely with [name redacted by Reg to preserve anonymity] to modernize and optimize your web stack," Wiggins writes. "If you invest this time I think it's very likely you'll end up with an app that performs the way you want it to at a price within your budget."

At present the situation is unresolved: Rap Genius is clamoring for developers to email Heroku's support desk, while Heroku has remained silent apart from a single statement which was sent to The Register.

"Our customers' success is our top priority," the company wrote. "We are working hard to get to the bottom of this situation and give our customers a clear and transparent understanding of our next steps. We'll provide more information as soon as possible on our blog."

From The Reg's point of view, the events illustrate the troubling nature of modern cloud infrastructure: platform providers promise to take on much of the development work a company would have to do themselves, but if the company does not pay close attention to their platform provider, then architectural changes can cause cost overruns.

It's a problem that's only going to get worse, and if – like the startup in this story – a company has discovered a fault once its application has garnered significant traction, then moving away from the platform can be difficult.

"Moving off of Heroku only gets harder," Lehman said.

Bootnote

Rap Genius is a community wiki for the etymology and cultural significance of lyrics in rap music and other works of art. It has around 15 million users a month, according to the company. Rappers such as Kidd Kidd, Pusha T, and Bryant Dope are verified members of the site.

Combat fraud and increase customer satisfaction

More from The Register

next story
This time it's 'Personal': new Office 365 sub covers just two devices
Redmond also brings Office into Google's back yard
Kingston DataTraveler MicroDuo: Turn your phone into a 72GB beast
USB-usiness in the front, micro-USB party in the back
Dropbox defends fantastically badly timed Condoleezza Rice appointment
'Nothing is going to change with Dr. Rice's appointment,' file sharer promises
BOFH: Oh DO tell us what you think. *CLICK*
$%%&amp Oh dear, we've been cut *CLICK* Well hello *CLICK* You're breaking up...
AMD's 'Seattle' 64-bit ARM server chips now sampling, set to launch in late 2014
But they won't appear in SeaMicro Fabric Compute Systems anytime soon
Amazon reveals its Google-killing 'R3' server instances
A mega-memory instance that never forgets
Cisco reps flog Whiptail's Invicta arrays against EMC and Pure
Storage reseller report reveals who's selling what
Microsoft builds teleporter weapon to send VMware into Azure
Updated Virtual Machine Converter now converts Linux VMs too
prev story

Whitepapers

Securing web applications made simple and scalable
In this whitepaper learn how automated security testing can provide a simple and scalable way to protect your web applications.
3 Big data security analytics techniques
Applying these Big Data security analytics techniques can help you make your business safer by detecting attacks early, before significant damage is done.
The benefits of software based PBX
Why you should break free from your proprietary PBX and how to leverage your existing server hardware.
Top three mobile application threats
Learn about three of the top mobile application security threats facing businesses today and recommendations on how to mitigate the risk.
Combat fraud and increase customer satisfaction
Based on their experience using HP ArcSight Enterprise Security Manager for IT security operations, Finansbank moved to HP ArcSight ESM for fraud management.