#3007 Issue closed: Disaster Recovery for our own ReaR upstream project

Labels: enhancement, critical / security / legal, ReaR Project, no-issue-activity

jsmeix opened issue at 2023-06-07 06:51:

Since a longer time it worries me
what we could do if a disaster happens
to our own ReaR upstream project.

As far as I know we have
neither a risk analysis
nor a disaster recovery plan.

Possible threats that I currently imagine:

  1. "Move":
    GitHub gets suddenly unfriendly to Free Software projects
    so we need to move to another git hosting service.
    How can we keep our history that is to a greater extent
    stored in comments of GitHub issues and pull requests?

  2. "Restart":
    We let various kind of third-party software "do something"
    with our sources via so called "GitHub Actions".
    Because of Murphy's law ("anything that can go wrong will go wrong")
    it will happen sonner or later that our sources get corrupted,
    regardless if maliciously ("supply chain attack") or by accident.
    How can we recover our sources to a clean state?
    If our sources got completely corrupted at GitHub (including history),
    how can we restart from scratch (with reasonable history)
    at GitHub or at another git hosting service?

gdha commented at 2023-06-07 07:34:

See https://docs.github.com/en/codespaces/customizing-your-codespace/setting-your-default-region-for-github-codespaces

jsmeix commented at 2023-06-07 08:12:

I do not see how setting the region where GitHub data is held
could help with recovering from a disaster at GitHub?

gdha commented at 2023-06-07 08:39:

@jsmeix perhaps you like article https://gitprotect.io/blog/github-backup-best-practices/ better?

schlomo commented at 2023-06-07 08:43:

Good point @jsmeix!

The region is IMHO irrelevant.

The source code is actually the least of our worries as we ship a nearly full copy as part of our "binary" packages and all distros also archive the full source as part of their source packages. That will allow us to restart the ReaR project (without the history) without any preparation.

All maintainers probably have the full git checkout on multiple computers, at least I do. That allows us to recover even with history.

Finally, we can easily add different data backup/distribution schemes:

What is really important in my opinion would be to do something and then to make sure that all maintainers are also setup as admins on the other systems.

We also should include our mailinglist and DNS domain when thinking about disaster recovery.

jsmeix commented at 2023-06-07 08:55:

My personal primary concern is not the plain git data
i.e. what one gets via

# git clone https://github.com/rear/rear.git

because that is duplicated several times on our local computers.

My personal primary concern is all the other data on GitHub
in particular our history that is stored in comments
of GitHub issues and pull requests.

The comments of GitHub issues and pull requests
provide very most of the resoning behind
why something was changed (e.g. what had happened
at a specific user that finally led to a change).

For example see
https://github.com/rear/rear/issues/2990#issuecomment-1562481526
where I would have had no idea why wipefs is called with '--force'
if the originating issue https://github.com/rear/rear/issues/1327
was no longer available.

E.g. think about
https://github.com/rear/rear/blob/master/doc/rear-release-notes.txt
when all the issues and pull requests that are mentioned therein
would be no longer accessible or available.

schlomo commented at 2023-06-07 08:56:

How much money are we (all the maintainers together) willing to spend on a commercial backup plan that covers that?

For backup, it is important that the backup data can be used, e.g. restored somewhere or viewed in the backup system. Otherwise it is just a blob of useless data.

jsmeix commented at 2023-06-07 09:05:

Because SUSE does a lot of its upstream work at GitHub
I will ask internally at SUSE about disaster recovery
for GitHub projects where SUSE does upstream work.

@pcahyna
I guess it is the same for Red Hat?

schlomo commented at 2023-06-07 09:08:

If SUSE and Red Hat can cover ReaR's GitHub org with their backup procedures that would be really great, thanks a lot!

gdha commented at 2023-06-07 09:10:

If SUSE and Red Hat can cover ReaR's GitHub org with their backup procedures that would be really great, thanks a lot!

That will not cover our mailing list nor DNS domain, right?

schlomo commented at 2023-06-07 09:12:

If SUSE and Red Hat can cover ReaR's GitHub org with their backup procedures that would be really great, thanks a lot!

That will not cover our mailing list nor DNS domain, right?

Yes, it won't. I think you own the domain and also have access to the mailing list, but probably nobody else can do anything with them. I'd mostly be worried about continuity in case you can't deal with that any more, and about transitioning ownership / access to other maintainers in such a situation.

gdha commented at 2023-06-07 09:13:

Found https://github.com/dwyl/github-backup to backup issues

gdha commented at 2023-06-07 09:16:

If SUSE and Red Hat can cover ReaR's GitHub org with their backup procedures that would be really great, thanks a lot!

That will not cover our mailing list nor DNS domain, right?

Yes, it won't. I think you own the domain and also have access to the mailing list, but probably nobody else can do anything with them. I'd mostly be worried about continuity in case you can't deal with that any more, and about transitioning ownership / access to other maintainers in such a situation.

Sorry, but I do not own the domain - I think it is owned by @dagwieers (to be confirmed by Dag itself)?
About the mailing list I'm an admin of the list, but not of the storage behind it - that is my biggest concern on this moment. Not sure who owns it today.

jsmeix commented at 2023-06-07 09:20:

@gdha
thank you for the link to
https://github.com/dwyl/github-backup

Such functionality is what I am looking for.

But because it reads

Additionally we use Amazon Web Services (AWS)
Simple Storage Service (S3) for storing
the issue comment history with "high availability"
and reliability.
To run github-backup on your localhost you will need
to have an AWS account and an S3 "bucket".

I think this one is not useful for us in general
(at least I have no plans to get an AWS account).

And the need of various secrets like private keys
does not look very promising to me.

gdha commented at 2023-06-07 09:23:

@jsmeix just build a kubernetes cluster with its own block storage ;-) but then you need to cover DPR for kubernetes as well and then we are just shifting problems

schlomo commented at 2023-06-07 09:32:

Maybe
https://github.com/mattduck/gh2md#github-workflow-backup-issues-as-a-markdown-file-in-your-repo
could be a cheap solution for the issue/PR backup?
The idea would be to create a dump as markdown files and host that for free on Cloudflare Pages or Firebase Hosting (10GB free) or another free hosting service.
We can run this every night.

jsmeix commented at 2023-06-07 10:58:

At least on first glance the very traditional and simple

# w3m -no-graph https://github.com/rear/rear/issues/3007 >issue3007.txt

seems to produce not too bad results.
At least plain texts look sufficiently readable
which is the most important thing from my point of view.

gdha commented at 2023-06-14 15:35:

Maybe https://github.com/mattduck/gh2md#github-workflow-backup-issues-as-a-markdown-file-in-your-repo could be a cheap solution for the issue/PR backup? The idea would be to create a dump as markdown files and host that for free on Cloudflare Pages or Firebase Hosting (10GB free) or another free hosting service. We can run this every night.

Made a docker image to create an Issues History in our ReaR User Guide - https://relax-and-recover.org/rear-user-guide/issues/index.html

jsmeix commented at 2023-06-15 07:24:

@gdha
awesome!
Thank you!

The issues in
https://relax-and-recover.org/rear-user-guide/issues/index.html
look very good, e.g. blockquotes are clearly visible.

It even provides (high resolution) images e.g.
https://github.com/rear/rear/assets/1429783/54de2505-4b33-41b8-a12b-da61aec9eda4
in
https://relax-and-recover.org/rear-user-guide/issues/2023-05-19.2990.issue.open.html

jsmeix commented at 2023-06-15 07:29:

Ah! This is the commit about "adding issues history":
https://github.com/rear/rear-user-guide/commit/df824721d0d573686a16e59207614caf77e1dcc4

jsmeix commented at 2023-06-15 08:06:

I am wondering how much (disk) space it needs:

# git clone https://github.com/rear/rear-user-guide.git
...
Receiving objects: 100% (6711/6711), 29.72 MiB | 4.68 MiB/s, done.

# du -hc rear-user-guide
...
32M     rear-user-guide/docs/issues
...
64M     total

# du -h rear-user-guide/docs/issues/2023-05-19.2990.issue.open.md
20K     rear-user-guide/docs/issues/2023-05-19.2990.issue.open.md

The rear-user-guide git repository contains
only the texts of the issues as markdown files
which are only 32M.

Images - in particular high resolution images - are
kept outside of the rear-user-guide git repository
which is good!

So we have now via

# git clone https://github.com/rear/rear-user-guide.git

the texts of our issues and pull requests
duplicated several times on our local computers, cf.
https://github.com/rear/rear/issues/3007#issuecomment-1580234940

I think this basically solves this issue - at least
as far as I need it from my personal point of view.

jsmeix commented at 2023-06-15 08:18:

Only a spontaneous offhanded idea:

What do you think about having
only the texts of our issues and pull request
as markdown files in the 'rear' git repository
so that the 'rear' git repository would become
self-contained regarding its history?

The currently needed (disk) space
of the 'rear' git repository is

# git clone https://github.com/rear/rear.git
...
Receiving objects: 100% (58405/58405), 11.22 MiB | 6.33 MiB/s, done.

# du -hs rear
22M     rear

so adding the markdown files in rear-user-guide/docs/issues
would make the 'rear' git repository more than two times
as large as it currently is from 22M to 22M + 32M = 54M
but it would be still rather small.

For comparison:
One single high resolution image like
https://github.com/rear/rear/assets/1429783/54de2505-4b33-41b8-a12b-da61aec9eda4
is already more than 5M.

github-actions commented at 2023-08-15 02:00:

Stale issue message


[Export of Github issue for rear/rear.]