GitLab.com luckily found lost data on a staging server
And restored itself. But the code locker lost about six hours of data for ~707 users
GitLab.com, the wannabe GitHub alternative that yesterday went down hard and reported data loss, has confirmed that some data is gone but that its services are now operational again.
The incident did not result in Git repos disappearing. Which may be why the company's PR reps characterised the lost data as “peripheral metadata that was written during a 6-hour window”. But in a a prose account of the incident, GitLab says “issues, merge requests, users, comments, snippets, etc.” were lost. The Register imagines many developers may not be entirely happy with those data types being considered peripheral to their efforts.
GitLab's PR flaks added that the incident impacted “less than 1% of our user base.” But the firm's incident log says 707 users have lost data.
That log also reveals that the restoration of data appears to be more the result of good luck than good management, as the source from which it is restoring is a staging server described in the log as “the only available snapshot.”
As we reported yesterday, the log also says “out of 5 backup/replication techniques deployed none are working reliably or set up in the first place.”
The incident log describes the full impact of the incident as follows:
- About 6 hours of data loss
- 4,613 regular projects, 74 forks, and 350 imports are lost, roughly; 5,037 projects in total. Since Git repositories were not lost, "we can recreate all of the projects whose user/group existed before the data loss, but we cannot restore any of these projects’ issues, etc."
- About 4,979 comments lost
- 707 users lost, potentially, it's "hard to tell for certain from the Kibana logs."
- Webhooks created before Jan 31, 5.20pm were restored, those created after this time are lost.
Online opinion about the outage blends admiration for posting the incident report and making it public, thereby wearing the mistake. That GitLab ignored known best practice and seemingly didn't test its backups is being widely condemned.
It's great that @gitlab is being so open and honest about their failure. It's also kinda crazy these kinds of failures are possible in 2017.— Dave Laribee (@laribee) February 2, 2017
GitLab's prose account of the incident says “Losing production data is unacceptable and in a few days we'll post the 5 why's of why this happened and a list of measures we will implement.”
The Register awaits those posts with interest and will also continue our efforts to interview representatives of the company. GitLab has offered The Register an interview but telephone and email tag has, to date, prevented that interview from taking place. ®
Sponsored: Becoming a Pragmatic Security Leader