#3007 Issue open: Disaster Recovery for our own ReaR upstream project

Labels: enhancement, critical / security / legal, ReaR Project

jsmeix opened issue at 2023-06-07 06:51:

Since a longer time it worries me
what we could do if a disaster happens
to our own ReaR upstream project.

As far as I know we have
neither a risk analysis
nor a disaster recovery plan.

Possible threats that I currently imagine:

  1. "Move":
    GitHub gets suddenly unfriendly to Free Software projects
    so we need to move to another git hosting service.
    How can we keep our history that is to a greater extent
    stored in comments of GitHub issues and pull requests?

  2. "Restart":
    We let various kind of third-party software "do something"
    with our sources via so called "GitHub Actions".
    Because of Murphy's law ("anything that can go wrong will go wrong")
    it will happen sonner or later that our sources get corrupted,
    regardless if maliciously ("supply chain attack") or by accident.
    How can we recover our sources to a clean state?
    If our sources got completely corrupted at GitHub (including history),
    how can we restart from scratch (with reasonable history)
    at GitHub or at another git hosting service?

gdha commented at 2023-06-07 07:34:

See https://docs.github.com/en/codespaces/customizing-your-codespace/setting-your-default-region-for-github-codespaces

jsmeix commented at 2023-06-07 08:12:

I do not see how setting the region where GitHub data is held
could help with recovering from a disaster at GitHub?

gdha commented at 2023-06-07 08:39:

@jsmeix perhaps you like article https://gitprotect.io/blog/github-backup-best-practices/ better?

schlomo commented at 2023-06-07 08:43:

Good point @jsmeix!

The region is IMHO irrelevant.

The source code is actually the least of our worries as we ship a nearly full copy as part of our "binary" packages and all distros also archive the full source as part of their source packages. That will allow us to restart the ReaR project (without the history) without any preparation.

All maintainers probably have the full git checkout on multiple computers, at least I do. That allows us to recover even with history.

Finally, we can easily add different data backup/distribution schemes:

What is really important in my opinion would be to do something and then to make sure that all maintainers are also setup as admins on the other systems.

We also should include our mailinglist and DNS domain when thinking about disaster recovery.

jsmeix commented at 2023-06-07 08:55:

My personal primary concern is not the plain git data
i.e. what one gets via

# git clone https://github.com/rear/rear.git

because that is duplicated several times on our local computers.

My personal primary concern is all the other data on GitHub
in particular our history that is stored in comments
of GitHub issues and pull requests.

The comments of GitHub issues and pull requests
provide very most of the resoning behind
why something was changed (e.g. what had happened
at a specific user that finally led to a change).

For example see
https://github.com/rear/rear/issues/2990#issuecomment-1562481526
where I would have had no idea why wipefs is called with '--force'
if the originating issue https://github.com/rear/rear/issues/1327
was no longer available.

E.g. think about
https://github.com/rear/rear/blob/master/doc/rear-release-notes.txt
when all the issues and pull requests that are mentioned therein
would be no longer accessible or available.

schlomo commented at 2023-06-07 08:56:

How much money are we (all the maintainers together) willing to spend on a commercial backup plan that covers that?

For backup, it is important that the backup data can be used, e.g. restored somewhere or viewed in the backup system. Otherwise it is just a blob of useless data.

jsmeix commented at 2023-06-07 09:05:

Because SUSE does a lot of its upstream work at GitHub
I will ask internally at SUSE about disaster recovery
for GitHub projects where SUSE does upstream work.

@pcahyna
I guess it is the same for Red Hat?

schlomo commented at 2023-06-07 09:08:

If SUSE and Red Hat can cover ReaR's GitHub org with their backup procedures that would be really great, thanks a lot!

gdha commented at 2023-06-07 09:10:

If SUSE and Red Hat can cover ReaR's GitHub org with their backup procedures that would be really great, thanks a lot!

That will not cover our mailing list nor DNS domain, right?

schlomo commented at 2023-06-07 09:12:

If SUSE and Red Hat can cover ReaR's GitHub org with their backup procedures that would be really great, thanks a lot!

That will not cover our mailing list nor DNS domain, right?

Yes, it won't. I think you own the domain and also have access to the mailing list, but probably nobody else can do anything with them. I'd mostly be worried about continuity in case you can't deal with that any more, and about transitioning ownership / access to other maintainers in such a situation.

gdha commented at 2023-06-07 09:13:

Found https://github.com/dwyl/github-backup to backup issues

gdha commented at 2023-06-07 09:16:

If SUSE and Red Hat can cover ReaR's GitHub org with their backup procedures that would be really great, thanks a lot!

That will not cover our mailing list nor DNS domain, right?

Yes, it won't. I think you own the domain and also have access to the mailing list, but probably nobody else can do anything with them. I'd mostly be worried about continuity in case you can't deal with that any more, and about transitioning ownership / access to other maintainers in such a situation.

Sorry, but I do not own the domain - I think it is owned by @dagwieers (to be confirmed by Dag itself)?
About the mailing list I'm an admin of the list, but not of the storage behind it - that is my biggest concern on this moment. Not sure who owns it today.

jsmeix commented at 2023-06-07 09:20:

@gdha
thank you for the link to
https://github.com/dwyl/github-backup

Such functionality is what I am looking for.

But because it reads

Additionally we use Amazon Web Services (AWS)
Simple Storage Service (S3) for storing
the issue comment history with "high availability"
and reliability.
To run github-backup on your localhost you will need
to have an AWS account and an S3 "bucket".

I think this one is not useful for us in general
(at least I have no plans to get an AWS account).

And the need of various secrets like private keys
does not look very promising to me.

gdha commented at 2023-06-07 09:23:

@jsmeix just build a kubernetes cluster with its own block storage ;-) but then you need to cover DPR for kubernetes as well and then we are just shifting problems

schlomo commented at 2023-06-07 09:32:

Maybe
https://github.com/mattduck/gh2md#github-workflow-backup-issues-as-a-markdown-file-in-your-repo
could be a cheap solution for the issue/PR backup?
The idea would be to create a dump as markdown files and host that for free on Cloudflare Pages or Firebase Hosting (10GB free) or another free hosting service.
We can run this every night.

jsmeix commented at 2023-06-07 10:58:

At least on first glance the very traditional and simple

# w3m -no-graph https://github.com/rear/rear/issues/3007 >issue3007.txt

seems to produce not too bad results.
At least plain texts look sufficiently readable
which is the most important thing from my point of view.


[Export of Github issue for rear/rear.]