#3242 PR merged: Fix IPv6 address support in OUTPUT_URL/BACKUP_URL

Labels: enhancement, bug, cleanup, fixed / solved / done

rmetrich opened issue at 2024-06-10 09:15:

Relax-and-Recover (ReaR) Pull Request Template

Please fill in the following items before submitting a new pull request:

Pull Request Details:
  • Type: Bug Fix / Enhancement

  • Impact: Normal

  • How was this pull request tested? with both NFS and SSHFS schemes

  • Description of the changes in this pull request:

Currently there is no proper support for IPv6 addresses in OUTPUT_URL/BACKUP_URL properties:

  • OUTPUT_URL=nfs://2001:db8:ca2:6::101/root/REAR is not recognized (and should not be)
  • OUTPUT_URL=nfs://[2001:db8:ca2:6::101]/root/REAR fails because of bash substitutions occurring in various places
  • OUTPUT_URL="nfs://\[2001:db8:ca2:6::101\]/root/REAR" works but is somehow cumbersome (double quotes + square brackets quoting).

This patch makes the following (proper) scheme work with both NFS and SSHFS:

OUTPUT_URL=nfs://[2001:db8:ca2:6::101]/root/REAR
OUTPUT_URL=sshfs://[2001:db8:ca2:6::101]/root/REAR

For this to occur, a lot of bash quoting had to be fixed.

jsmeix commented at 2024-06-10 14:43:

@rmetrich
awesome "convict labour"!
Thank you so much for it!

As far as I see on
https://github.com/rear/rear/pull/3242/files
missing bash quoting (at a zillion of places)
was the only change that you needed to do?
Or did I overlook some other changes that were needed
to get IPv6 address support in OUTPUT_URL/BACKUP_URL?

Regardless if missing bash quoting was the only change
this issue seems to a very good example why proper
bash variable quoting is actually needed nowadays.

Let's wait and see when things like [...]
will become regular characters in system file names
or device node names and so on - then we need to add
missing bash quoting at many more zillions of places.

rmetrich commented at 2024-06-10 14:48:

Yes all this mess was due to quoting. So I did a pass on everything that was closely related.

jsmeix commented at 2024-06-11 07:44:

@rear/contributors
could (at least) one of you - as time permits - please
also have a look here and approve it if it looks OK.

Provided it is OK I would like to merge it rather soon
in a few days to avoid that other merged code changes
at all the various code places could cause merge conflicts
for this pull request which @rmetrich would have to solve.

jsmeix commented at 2024-06-11 12:22:

@rear/contributors
unless objections appear,
I would like to merge it tomorrow afternoon.

pcahyna commented at 2024-06-12 10:14:

I reviewed CI results: the Fedora 40 / Rawhide errors are known and unrelated. There was a fail on CentOS Stream 9 due to this:

WARNING: tar ended with return code 1 and below output (last 5 lines):
                                ---snip---
                                tar: /var/log/audit/audit.log: file changed as we read it
                                ----------
                              This means that files have been modified during the archiving
                              process. As a result the backup may not be completely consistent
                              or may not be a perfect copy of the system. Relax-and-Recover
                              will continue, however it is highly advisable to verify the
                              backup in order to be sure to safely recover this system.

This is a harmless issue unrelated to the PR and rerun fixed it. The check for warnings in the test should probably be less zealous.

Differential ShellCheck says:
ℹ️ No Fixes!

which indicates that we should probably increase the strictness of the syntax checks. This change is definitely a fix of the shell syntax that ShellCheck is able to recognize.

jsmeix commented at 2024-06-12 11:46:

@rmetrich
take your time to fix the one mistake.
I will not merge it today.
I will wait until you finished.

If you find the time I would much appeciate it
when you could add quoting where @pcahyna mentioned it
because now we have the detailed review of @pcahyna
which would be lost when this pull request was merged
without the additional quoting as mentioned by @pcahyna

@pcahyna
as always your precise and accurate review is so helpful
in particular for our users who are hit by less bugs
and also for us who need to fix less bugs.
Thank you so much for it!

jsmeix commented at 2024-06-12 12:00:

@pcahyna
regarding your
https://github.com/rear/rear/pull/3242#issuecomment-2162636120

... we should probably increase the strictness
of the syntax checks. This change is definitely a fix
of the shell syntax that ShellCheck is able to recognize

As far as I know our Differential ShellCheck
only ckecks the syntax of the actual changes.

In this case I agree to try to change the level
of our Differential ShellCheck syntax checks
to let it report also quoting problems.

Since my
https://github.com/rear/rear/pull/3242#issuecomment-2158560575

Let's wait and see when things like '[...]'
will become regular characters in system file names
or device node names and so on - then we need to add
missing bash quoting at many more zillions of places.

I understand that proper quoting basically everywhere
is needed to avoid that bash expansion - in particular
bash pathname expansion - changes string values in
unintended ways.

BUT:
Proper quoting of all string values everywhere
requires to no longer use strings for multiple values
in one (string) variable as in

for word in $string ; do
    CODE THAT USES $word
done

but to always use bash arrays for multiple values
in one (array) variable.
And this would be a huge cleanup effort.

pcahyna commented at 2024-06-12 12:07:

@jsmeix

... And this would be a huge cleanup effort.

Fortunately the check is differential, so it would not complain about existing code. But, when changing existing code which uses this idiom, one would need to do the cleanup anyway to avoid ShellCheck warnings on the changed lines, so some cleanup would be needed from time to time indeed.

jsmeix commented at 2024-06-12 12:16:

@pcahyna
and such small "by the way" cleanup parts step by step
as code is touched is the intended benefit to move our
code step by step towards porper quoting.

Let's just wait and see what happens when e.g. the line

mount_url "$BACKUP_URL" "$BUILD_DIR/outputfs" $BACKUP_OPTIONS

is touched again and Differential ShellCheck complains
about the unquoted $BACKUP_OPTIONS ;-)
I had a look at that thingy right now and that single piece
looks rather - well - I don't find proper words... ;-)

pcahyna commented at 2024-06-12 12:29:

@rmetrich to be clear, I certainly don't want to request fixing quoting in more and more places if the changes you already did are sufficient for your purpose. That would be a huge effort. I merely suggested to take the opportunity to add more quoting to lines that you already touched, for consistency.

jsmeix commented at 2024-06-14 04:55:

@rmetrich
are you finished so it can be merged
or do you need to do more changes?

rmetrich commented at 2024-06-14 06:24:

I'm good, but would prefer that @pcahyna reviews again :-)

jsmeix commented at 2024-06-14 12:00:

@rmetrich
you are right, currently this pull request's state is
@pcahyna "requested changes"
so he needs to approve if now things are OK for him.

pcahyna commented at 2024-06-17 17:48:

@rmetrich I reviewed again and found some unaddressed (low-priority) comments. Is that an oversight or you did not intend to address them? I also found another (low priority) issue that was present in the former code and not addressed by your changes.

pcahyna commented at 2024-06-18 10:02:

@rmetrich thanks you for the new changes, I reviewed all of them and marked all comments that they address as resolved. Do you plan to address the rest (see the unresolved comments)? Some of them are in network protocols (CIFS, DAV) that would IMO be affected by IPv6 address problems just as NFS was. Other issues are not related and changes would be merely for consistency with other changes.

rmetrich commented at 2024-06-18 10:53:

I consider the changes are all very invasive, I wouldn't add risk with fixing more stuff.

pcahyna commented at 2024-06-18 11:56:

@jsmeix , if you made a mistake, please reset your branch to the previous commit 4485326 and force-push.

jsmeix commented at 2024-06-18 11:56:

@rmetrich
I am sorry I messed up your added quotation
with my too fast attempt to solve a merge conflict via
https://github.com/rear/rear/pull/3242/commits/690e03bb8a01ea5bfbdc08b033f0551efa85788c

I don't see how I could fix that myself here.

jsmeix commented at 2024-06-18 11:58:

@pcahyna
I think I don't have a branch.
I used the GitHub web UI to mess up the merge conflict.

pcahyna commented at 2024-06-18 11:58:

@jsmeix I will reset it then, ok?

jsmeix commented at 2024-06-18 11:59:

@pcahyna
yes please - of course it is OK for me.
Thank you!

pcahyna commented at 2024-06-18 12:03:

And now my turn to make mistakes... I reset to a wrong commit, and I can't force-push anymore, damn.

@rmetrich can you please force-push again your commit 4485326 ?

EDIT: When I commented on the code, it became possible to reopen the PR, which allowed me to force-push again.

pcahyna commented at 2024-06-18 12:06:

... into your rmetrich:ipv6 branch.

pcahyna commented at 2024-06-18 12:07:

Done, ufff. @jsmeix so now the branch is in the same state as when @rmetrich last touched it. What do you need to do now?

pcahyna commented at 2024-06-18 12:11:

@jsmeix I will merge the PR and resolve the conflict if that's all that you wanted to do.

rmetrich commented at 2024-06-18 12:12:

pushed again

jsmeix commented at 2024-06-18 12:26:

@pcahyna @rmetrich
thank you for fixing my mess!
I am sorry for that.

jsmeix commented at 2024-06-18 12:30:

@pcahyna
feel free to merge as you like.

pcahyna commented at 2024-06-18 13:28:

I opened two issues for the remaining unaddressed comments. Fixing them would need a good test coverage of the affected use cases though.

jsmeix commented at 2024-06-19 04:50:

@pcahyna
thank you for the two issues for the remaining missing quotings.

Because - as far as I know - no ReaR user was hit by it
(i.e. no ReaR user reported an issue here to us)
I think fixing the two issues is not required for ReaR 3.0
but it should be fixed (as time permits) for ReaR 3.1.

With the additional time we may even find a way how to
setup automated CI tests for the affected use cases.

I am in particular interested to learn how I could setup
automated CI tests for various SUSE specific standard systems,
cf. what I had manually tested for SLES12 and SLES15 in
https://github.com/rear/rear/wiki/Test-Matrix-rear-2.6#sles
But this is something for after ReaR 3.0 was released.

@pcahyna
if you agree I set the milestone to "ReaR v3.1"
for those two issues.

pcahyna commented at 2024-06-19 09:04:

@jsmeix

@pcahyna thank you for the two issues for the remaining missing quotings.

Because - as far as I know - no ReaR user was hit by it (i.e. no ReaR user reported an issue here to us) I think fixing the two issues is not required for ReaR 3.0 but it should be fixed (as time permits) for ReaR 3.1.

With the additional time we may even find a way how to setup automated CI tests for the affected use cases.

Yes, I need to make the current test more generic though first, so that it is easier to add variations of it without having to copy it as a whole and modify it.

OTOH, for testing these bugs it may be helpful to add tests that are not full end-to-end tests (backup & recovery), but just backup or mkrescue tests, as the bugs will show up already at backup time. Such tests are much simpler and will not need to reuse the current Anton's and Lukas's backup&recover script. Of course this is not 100%, as the wrong quoting may be fixed in "mkbackup" scripts but left in "recover" scripts, and such a bug would be especially nasty from the user's POV.

I am in particular interested to learn how I could setup automated CI tests for various SUSE specific standard systems, cf. what I had manually tested for SLES12 and SLES15 in https://github.com/rear/rear/wiki/Test-Matrix-rear-2.6#sles But this is something for after ReaR 3.0 was released.

Do you want full backup & recovery tests, or just mkbackup tests? The latter are much easier. Do you want a CI test here (on GitHub, to be triggered on every PR), or internal to SUSE? Our experience with Anton has been that the difficult part of public testing is finding a public infrastructure that has the features that you need. E.g. GitHub actions do not support rebooting the machine being tested. Testing Farm does, but you have only CentOS (Stream) and Fedora systems available in TF. Perhaps the support for other distros will appear eventually (I know there is already some support for Ubuntu and Debian, but not for VMs yet). Or maybe SUSE sponsors a different public infrastructure for such purposes?

@pcahyna if you agree I set the milestone to "ReaR v3.1" for those two issues.

Sure, go ahead.

jsmeix commented at 2024-06-19 12:30:

Because I would like to automate what I had
manually tested for SLES12 and SLES15 in
https://github.com/rear/rear/wiki/Test-Matrix-rear-2.6#sles
I would need full "rear mkbackup" plus "rear recover" tests.

Regarding missing support for
"rebooting the machine being tested":

My immediate offhanded and totally untested idea is
to use 'kexec' to boot the rear recovery system, cf.
"Launching the ReaR recovery system via kexec" in
https://en.opensuse.org/SDB:Disaster_Recovery
and after "rear recover" use 'kexec' again
to boot the recreated system, cf.
https://github.com/rear/rear/issues/2186

Because 'kexec' should leave the disk as is,
it should be possible to store what we like on disk
in the old system before 'kexec' and have that
available in the new system after 'kexec'.
So when we use 'kexec' to boot the recreated system
we could add some testing scripts that are run
at the end when the recreated system started up.

Regarding "finding a public infrastructure"
that can run SLES12 or SLES15:

My wishful thinking is that I could provide
a manually "ready-to-use" prepaded '*.qcow2' image
that can be booted by some public infrastructure
in some QEMU/KVM virtual machine.

Cf. my
https://github.com/rear/rear/pull/3239#issuecomment-2173626556
(excerpt):

In particular I would prefer - if possible - using
fixed OS versions for our CI tests.
For example - if possible - by using fixed OS images
of stable OS versions that we keep in our 'rear' space
at GitHub under our control: First and foremost
the stable enterprise Linux distributions RHEL and SLES

pcahyna commented at 2024-06-19 13:20:

Because I would like to automate what I had manually tested for SLES12 and SLES15 in https://github.com/rear/rear/wiki/Test-Matrix-rear-2.6#sles I would need full "rear mkbackup" plus "rear recover" tests.

Regarding missing support for "rebooting the machine being tested":

My immediate offhanded and totally untested idea is to use 'kexec' to boot the rear recovery system, cf. "Launching the ReaR recovery system via kexec" in https://en.opensuse.org/SDB:Disaster_Recovery and after "rear recover" use 'kexec' again to boot the recreated system, cf. #2186

Because 'kexec' should leave the disk as is, it should be possible to store what we like on disk in the old system before 'kexec' and have that available in the new system after 'kexec'. So when we use 'kexec' to boot the recreated system we could add some testing scripts that are run at the end when the recreated system started up.

Not sure whether this will work, depends on the reason why reboot is not possible - many test harnesses do not like to be killed and started again - the test script needs to run uninterrupted from the start of the test till the end. (The test harness we use in Testing Farm does not suffer from this limitation, we can kill the script, even the whole machine, and start it again, test will continue from the beginning.)

Regarding "finding a public infrastructure" that can run SLES12 or SLES15:

My wishful thinking is that I could provide a manually "ready-to-use" prepaded '*.qcow2' image that can be booted by some public infrastructure in some QEMU/KVM virtual machine.

This would work, but you need to find an infrastructure that allows running VMs - as the test machines are usually VMs themselves, this requires nested virtualization working and enough resources. It looks like in GitHub Actions this is supported now but only on larger runners (not default): https://github.blog/changelog/2023-02-23-hardware-accelerated-android-virtualization-on-actions-windows-and-linux-larger-hosted-runners/

We can try if Testing Farm VMs support that.

jsmeix commented at 2024-06-19 14:01:

@pcahyna
thank you for the background information.
Now I can a little bit better imagine
what the root of the various problems is.
Unsurprisingly RFC 1925 item 6a comes to mind ;-)

Yes, I think that an infrastructure that allows
to run VMs directly (e.g. by providing a '*.qcow2' image)
could be the most simple and straightforward way.

As time permits I will ask at (open)SUSE
if we perhaps already have such an infrastructure.
E.g. currently I know basically nothing about openQA
http://open.qa/
which is what (open)SUSE uses for testing.

pcahyna commented at 2024-06-19 15:00:

Fedora is also using OpenQA ( https://fedoraproject.org/wiki/OpenQA ), but it does not seem to solve our problems: does not look like a test infrastructure that will provide VMs to you, but rather a test system that you would use on an infrastructure that you already have. But I am not sure here. Also, the description does not look like it is particularly suitable:

openQA is an automated test system designed around operating system-level testing. Its key feature is that it interacts with the system under test much like a human would, by sending input using a virtual keyboard and mouse (or serial console input), and monitoring the screen, serial console and audio outputs for output.

While this kind of simulated interactive testing could be somewhat useful for ReaR, thanks to good unattended / automatic capabilities of ReaR it is not at all the main problem to solve when starting to develop a backup / recovery test (it is rather the interaction of the test with rebooting the test VM itself which is the main problem).

jsmeix commented at 2024-06-19 15:11:

I was more thinking about possibly "misusing" openQA
but I it would not surprise me if openQA does not provide
what I would like to have - nevertheless I will ask,
perhaps someone at SUSE knows an infrastructure
as I would like to have.

But I would like to postpone that advanced CI testing
to after the ReaR 3.0 release (as time permits).

pcahyna commented at 2024-06-25 09:42:

@jsmeix yes, certainly try asking around if there is an established way to test on SUSE distros on public infrastructure (just like Testing Farm is for Fedora-derived distros). Agree to keep it until after 3.0. I will also investigate about support for spawning your VMs inside Testing Farm VMs.


[Export of Github issue for rear/rear.]