Original URL: http://www.theregister.co.uk/2009/10/19/sidekick_rac/
Oracle and Sun fingered for Sidekick fiasco
Just think what they'll do to Microsoft when they merge
The combination of Oracle and Sun has apparently scored a massive hit on Microsoft even before the firms consummate their merger.
The Sidekick service crash involved an Oracle RAC database and Sun Solaris and Linux servers, according to reports.
Oracle RAC (Real Application Clusters) involves a single database running across a cluster of servers for fault tolerance, performance and scalability reasons.
Users of T-Mobile's Sidekick phones experienced a severe data outage thought at first to mean unrecoverable data loss. Microsoft, the owner of the infrastructure service behind Sidekick, said that most if not all the data was recoverable.
SideKick mobile phones have their data, contact lists, calendars and so forth stored by Danger, a subsidiary of Microsoft, on its servers and storage arrays. Danger was started up in January 2000, by Apple veterans Joe Britt, Matt Hershenson and Andy Rubin. Apple co-founder Steve Wozniak was on its board for a while from 2001. Its first customer was Voice Stream which was subsequently bought by T-Mobile International.
Britt's role was described as being responsible for the software and the intellectual property parts of Danger's business and he became Danger's chief technology officer. Hershenson was senior VP for hardware.
Danger had revenues of $56.4m, almost 1.2 million subscribing users, about 300 staff and some 60 contractors, and was close to profitability when acquired by Microsoft in April last year for an undisclosed amount.
It had filed for a $100m IPO in December 2007 which implies Microsoft paid that amount or more - possibly half a billion dollars to buy the company. At the time of the acquistion Andy Rubin had left and joined a startup that became part of Google's Android mobile phone platform.
Britt and Hershenson both stayed with Danger as it became part of Microsoft's Premium Mobile Experiences group headed by corporate VP Roz Ho.
From a check of job adverts and a LinkedIn profile it is apparent that Danger's Service Delivery Engine infrastructure is a network-based system that hosts content, provisions applications and negotiates client communications.
It involves 20 or so CentOS Linux servers, and more than eight Sun servers, both SPARC amd X86-based, running Solaris. They also run an Oracle RAC system and NFS file servers. The back-end storage is not known although a Sun SAN has been mentioned.
An August Danger job ad for a senior Unix administrator said: "A key priority is automating reliable reporting log file transfer and database load functionality - existing environment has fragile software and is unreliable, requiring manual DB cleanup and re-run of data loads, retrieving missing files, etc."
We know from a Microsoft email that the data crash occurred because of a failure in a server that caused data loss in the main and the backup database. The LinkedIn profile referred to above states: "Oracle RAC (is used) for the back-end database."
Microsoft has said that is is rebuilding the Danger system piece by piece and recovering more data at each step.
The main system component involved in the outage appear to be Sun servers. A failure of some sort here was followed by the inability to access user data in the Oracle database and its backup. It seems apparent though that the data was not actually deleted; it just couldn't be found until the system was rebuilt and access to it regained. On this reading the Oracle RAC was fed garbage by the Sun servers corrupted during an update process, and this fouled up access to the data.
Oracle has declined to comment on this story. Sun has so far been unable to respond. ®