Gitlab assures its users of data recovery post data loss incident
A short outage this week on the GitLab hosted code service struck a combination of fear and sympathy across the tech community and offered a sharp reminder of the importance of testing your backups again and again (and again).
On Tuesday, a GitLab administrator had accidentally erased a directory of live production data during a routine database replication. In the process of restoring from the last backup, taken six hours prior, the company had discovered that none of its five backup routines worked entirely correctly. The incident report, which GitLab posted online, noted that erasure affected issues and merge requests but not the git repositories themselves.
There were two issues, Anglade explained: One was a straightforward underlying database issue, which GitLab took offline to resolve, and a separate data log issue, which unearthed an issue with GitLab’ s restore process. In this case, GitLab was using the PostgreSQL open source database.
In keeping with company policy, they made the log issue very transparent, communicating to the GitLab community via tweets, blogs and even a life-stream YouTube channel (now offline), sharing the progress of the issue resolution.
In an interview on Wednesday, Anglade conceded that the policy of openness created more concern and fear than expected, but the recovery team stayed committed to letting the community know what was happening every step of the way. Headlines calling it a “ meltdown” probably didn’ t help much either.
Comments ( 0 )
No comments available