Backups and consistency

Bonjour,

The 42l non-profit explained how they backup their Gitea instance, which is (I think) typical of what most organizations do:

  • File system backup
  • Database backup

I wanted to reply with a reminder highlighting the importance of backup consistency in disaster recovery scenarios. In a nutshell:

  • File system backup happens at 9am
  • Database backup happens at 11am
  • The Gitea host crashes and burns at 11:30am

Gitea is restored on a new machine and all pull requests created between 9am and 11am reference files that do not exist in the file system, they are broken and must be manually repaired (how exactly I’m not sure). And there are obviously other inconsistencies because the database is 2 hours ahead of the file system after the backup is restored.

I was about to suggest that the Gitea instance is shutdown but realized the solution was most probably not practical for them: they would first need to synchronize the backups to avoid this problem. They host a few hundred repositories and shutting down Gitea or even switching on a maintenance mode / read only mode during an extended period of time is not practical because it would inconvenience dozens of users, at least.

The only solution I can think of is to investigate the idea of synchronizing PostegreSQL snapshots with file system snapshots (which depends on the filesystem being used) and keep a record of them for disaster recover purposes.

Thoughts?