The instructions to upgrade a Gitea instance only require three to four steps. They work fine most of the time but the documentation is lacking a “Troubleshooting” section to help out when something goes wrong. Maintaining instructions on how to diagnose and fix upgrade problems is an ambitious undertaking and requires updates every time a new case is discovered.
An inventory of the known upgrade issues was started to figure out how to structure such a section in the documentation. The release notes were analyzed all the way back to Gitea 1.9.6 and the work is still in progress. Here is a sample of the tips that will be included:
- Upgrade directly to the latest Gitea version, there is no need to upgrade to intermediate versions.
- If the upgrade from version x.y to version x.y+2 fails and there is a need to narrow down the problem, try upgrading to the latest minor version of each major version and verify it works.
However, even with the best documentation, someone will eventually run into an new problem and fixing it without compromising the integrity of the data will be challenging. This is best demonstrated by a real world example that was concluded a few days ago.
After upgrading a Gitea intsance from 1.9.6 to 1.16.5 the tests conducted manually did not uncover any problem. However, after going to production, some users saw a blank page after login and had to manually type the URL of the project they wanted to see in the browser. The person in charge of the upgrade never had to diagnose Gitea problem and reached out in the Gitea forum.
Tip: explain the problem in a public forum as early as possible to get help from the community
In their post in the forum they explained how they attempted to diagnose the problem and how why they thought that only users created a few years ago were impacted. It was a detailed analysis that was concluded with a partial copy of the logs. It was unfortunately missing key information that was provided only three days later. In the meantime, as they could not figure out the source of the problem, they were on the verge of accepting the loss of all the Gitea database and start over from the repositories. However, once all the details were available, a workaround was suggested in the forum.
Tip: focus more on providing detailed facts than exposing the attempted diagnostic
There was hope to fix Gitea and in the following days they applied the workaround. They also tried to improve it but without success and eventually accepted a partial data loss as inevitable and reported their success back to the forum.
Tip: when getting support from the community, providing feedback is the best token of appreciation
The Hostea Clinic is a collective of individual and companies that provides professional services to Gitea admins. They are active members of the Gitea community who help out as volunteers. They can also be hired to resolve the more complicated cases.
The Gitea instance that was in trouble required more than a few minutes of work and access to the database content for a proper diagnostic. They proposed their assistance but although well received, it was not accepted.
When the Gitea admin explained how they chose to resolve the problem on the forum, it confirmed the workaround was viable and the root problem was identified. That was enough to figure out a fix for the underlying bug with a rather simple patch that was merged and backported in the following days. But it happened too late to avoid the data loss.
To summarize with a timeline, here is what happened:
- J+1: The problem is discovered by users who see a blank page after login and a the Gitea admin tries to diagnose the problem
- J+2: A message is sent to ask for help in the community
- J+2 to J+6: Three people in the community suggest ideas but the Gitea admin cannot figure out the root cause and is on the verge of accepting the loss of all Gitea data and restart from the git repositories
- J+6: A workaround is suggested by the community
- J+7 to J+17: The Gitea admin applies the workaround and only looses part of the Gitea data
And in retrospect, here is what could have happened instead:
- J+1: The problem is discovered by users who see a blank page after login
- J+1: The Gitea admin reaches out to someone at the Hostea Clinic
- J+2: The logs of the Gitea instance are analyzed, the root cause diagnosed and a patch is created to fix it.
- J+3: If necessary a Gitea binary is created with the patch and used as a temporary replacement until the next point release is published with the backport. The Gitea admin runs the patched Gitea binary in the meantime. There is no data loss.
It does not mean all upgrade problems can be resolved so easily. But it shows, with an example, that in some cases it makes sense to get professional help.