#2907 PR merged: Update 310_network_devices.sh

Labels: enhancement, bug, fixed / solved / done

hpannenb opened issue at 2023-01-08 18:50:

Relax-and-Recover (ReaR) Pull Request Template

Please fill in the following items before submitting a new pull request:

Pull Request Details:
  • Type: Bug Fix

  • Impact: Normal

  • Reference to related issue (URL):
    #2902

  • How was this pull request tested?
    Tested on a CentOS7 VM with two NICs; 2nd NIC IPv6 only;
    tested on lab and production RHEL7 VMs with hybrid IPv4/6 and IPv6 only NICs.

  • Brief description of the changes in this pull request:
    With this PR all available intefaces (IPv4 & IPv6) will be chosen for the rescue environment.
    Previously just IPv4 interfaces with IP addresses and routing only were selected.

jsmeix commented at 2023-01-09 12:52:

@hpannenb
thank you for your pull request!

Could you please add a reference to your issue
https://github.com/rear/rear/issues/2902
by enhancing your existing comment like

# Use output of 'ls /sys/class/net/' to select all available interfaces
# in particular to make networking also work with IPv6 only NICs
# see https://github.com/rear/rear/issues/2902

so others can later easily understand the reason behind,
cf. https://github.com/rear/rear/wiki/Coding-Style

jsmeix commented at 2023-01-09 13:51:

@hpannenb
thanks!

jsmeix commented at 2023-01-10 13:33:

@rear/contributors
could someone please also review it if time permits?

pcahyna commented at 2023-01-11 13:12:

I am checking how it behaves on a machine with many NICs.

pcahyna commented at 2023-01-11 14:27:

On a machine with 6 NICs, 2 of which are configured:

ip r | awk '$2 == "dev" && $8 == "src" { print $3 }' | sort -u

enp2s0f0
enp2s0f2

ls /sys/class/net/

enp14s0  enp15s0  enp2s0f0  enp2s0f1  enp2s0f2  enp2s0f3  lo

pcahyna commented at 2023-01-11 19:01:

I have not yet found a case where the new code produces a different output from the old code, but so far I have not invested much in it. I still feel that it would be better to keep the old code. I believe there is probably a good reason to keep only interfaces with routing information and I am afraid of breaking it. Unfortunately, currently I have very little time for investigating it.
I don't entirely agree with the reasoning in https://github.com/rear/rear/issues/2902#issuecomment-1369567575 "there are new user options available for NICs (i.e. EXCLUDE_NETWORK_INTERFACES=()) giving the user enough power to excluded unwanted NICs instead of implicitly filtering them elsewhere." It would be unfortunate if we forced users for whom the previous autodetection has worked well to use an explicit configuration (which then would need to be maintained - interfaces can change their names, be different on different machines even if the machies are otherwise similar, etc.)
@hpannenb could you please change you approach to use the output of ip -6 r, as suggested by @jsmeix https://github.com/rear/rear/issues/2902#issuecomment-1368967611 ? (I know that it is not straightforward, because the IPv6 routes lack the src keyword used in the filter for the IPv4 routes.)

pcahyna commented at 2023-01-12 23:52:

I have finally found one meaningful difference between the result of the old code and of the proposed code.

diff -U10 -r /var/tmp/rear.cjIYxe4t6h4ksM2/rootfs/etc/scripts/system-setup.d/60-network-devices.sh /var/tmp/rear.PgacJne54eJTRRI/rootfs/etc/scripts/system-setup.d/60-network-devices.sh

--- /var/tmp/rear.cjIYxe4t6h4ksM2/rootfs/etc/scripts/system-setup.d/60-network-devices.sh       2023-01-12 18:43:30.128396285 -0500
+++ /var/tmp/rear.PgacJne54eJTRRI/rootfs/etc/scripts/system-setup.d/60-network-devices.sh       2023-01-12 04:00:48.923395070 -0500
@@ -18,10 +18,12 @@
 # The following is autogenerated code to setup network interfaces
 # in the recovery system which have all these on the original system:
 # - they are UP
 # - they have an IP address
 # - they are somehow linked to a physical device
 # For details see the rescue/GNU/Linux/310_network_devices.sh script.
 ip link set dev eno1 up
 ip link set dev eno1 mtu 1500
 ip addr add 10.16.216.49/23 dev eno1
 ip addr add 2620:52:0:10d8:1658:d0ff:fed3:31ab/64 dev eno1
+ip link set dev eno2 up
+ip link set dev eno2 mtu 1500

This is on a machine with two interfaces, one of them (eno2) is connected, but not configured.
I am not sure if the difference matters or not - probably not.
In any case, the comment

# - they have an IP address

is no longer correct with the proposed code.

hpannenb commented at 2023-01-13 20:34:

@pcahyna Well done and valid point.

So the task in the code is to gather all interfaces that are configured with an IP (either IPv4 and/or IPv6).

hpannenb commented at 2023-01-15 08:42:

In the earlier times of the code all available network interfaces were selected. My PR is reverting to this approach. See e.g.

https://github.com/rear/rear/blob/38131ce55bede4a80fe75bbff416df32ca855945/usr/share/rear/rescue/GNU/Linux/310_network_devices.sh#L90

With the commit https://github.com/rear/rear/commit/15567ede425401b008e5b1680db36a2c62752b8f this changed.

Maybe @rmetrich can explain the intention behind introducing his approach.

rmetrich commented at 2023-01-16 09:39:

In the earlier times of the code all available network interfaces were selected. My PR is reverting to this approach. See e.g.

https://github.com/rear/rear/blob/38131ce55bede4a80fe75bbff416df32ca855945/usr/share/rear/rescue/GNU/Linux/310_network_devices.sh#L90

With the commit 15567ed this changed.

Maybe @rmetrich can explain the intention behind introducing his approach.

Hello,
the idea was to select only interfaces backed on physical device and which would be useful when starting the recovery (i.e. to fetch the backup).
Everything not backed by a physical interface doesn't need to be restored while in the rescue env.

jsmeix commented at 2023-01-16 10:43:

@rmetrich
thank you for your explanation!

What @pcahyna reports in his above
https://github.com/rear/rear/pull/2907#issuecomment-1381116451
is

... two interfaces, one of them (eno2) is connected, but not configured.

I think this means the connected but not configured interface
is "somehow linked to a physical device"
but it has no IP when "rear mkrescue" is run.

I think any interface that is "somehow linked to a physical device"
could be later useful to fetch the backup during "rear recover"
(the admin would then have to assign an IP manually)
regardless whether or not an interface was actually usable
when making the backup during "rear mkbackup".

In particular for third party backup tools normally
"rear mkbackup" is useless (because normally ReaR does
not implement making a backup with third party backup tools)
so only "rear mkrescue" is used for third party backup tools
(also for BACKUP=REQUESTRESTORE)
and then it is possible that an interface is not usable
during "rear mkrescue" time but that interface could be used
when the backup is made with a third party backup tool.
Of course this example is rather theoretical (I don't assume
admins disable special interfaces that are only used for
special things like making a backup with some backup tool)
but this is only an offhanded example for my actual point
which is:

I wonder if ReaR should automatically exclude something
when there is no hard reason to exclude it?

I think something should be automatically excluded only when
including it would cause problems in the ReaR recovery system.
E.g. when things fail in the recovery system if it was inculded
or possible security/privacy issues if secrets were included.

I am not a sufficient networking expert to make a decision
if it could cause problems in the ReaR recovery system
when interfaces are included that are linked to a physical device
but have no IP.

jsmeix commented at 2023-01-16 11:24:

Via
https://github.com/rear/rear/commit/2cf67aed8a1ae7e6709d29e13a225d098b25c370
I added a comment that also appears in network_devices_setup_script
that shows the idea behind which interfaces are currently selected
according to
https://github.com/rear/rear/pull/2907#issuecomment-1383758409

This is only to show how the current code (without this change here)
is meant to work.
Of course that comment (and likely some other comments)
change when we decide here to include more interfaces.

hpannenb commented at 2023-01-16 11:37:

Everything not backed by a physical interface doesn't need to be restored while in the rescue env.

@rmetrich Thanks for the insights. Indeed this is how this mechanism works (well documented by You in the code). The interesting question to me is what YOur intentions was to change the network interface selection mechanism from previous approach to "select IPv4 with routing NICs only". BTW, so far this seems to have worked flawlessly.

rmetrich commented at 2023-01-16 11:49:

Everything not backed by a physical interface doesn't need to be restored while in the rescue env.

@rmetrich Thanks for the insights. Indeed this is how this mechanism works (well documented by You in the code). The interesting question to me is what YOur intentions was to change the network interface selection mechanism from previous approach to "select IPv4 with routing NICs only". BTW, so far this seems to have worked flawlessly.

Honestly I cannot remember. Probably I never got a pure IPv6 system and never saw any Red Hat customer have such, so in the end restoring the network for IPv4 only was sufficient.

hpannenb commented at 2023-01-16 12:04:

I am not a sufficient networking expert to make a decision
if it could cause problems in the ReaR recovery system
when interfaces are included that are linked to a physical device
but have no IP.

@jsmeix Likewise. But it seems this has never been (reported as) an issue until now with either the old or the new approach of collecting the network interfaces.

hpannenb commented at 2023-01-16 12:09:

Honestly I cannot remember.

Sure. I just raised the questions because it could have been a special decision by You to do so.

Probably I never got a pure IPv6 system and never saw any Red Hat customer have such, so in the end restoring the network for IPv4 only was sufficient.

The system I am dealing with is a hybrid: E.g. the management interface is IPv4 only whereas the connection to the backup service is via IPv6 only.

hpannenb commented at 2023-01-16 12:40:

I wonder if ReaR should automatically exclude something when there is no hard reason to exclude it?

I think something should be automatically excluded only when including it would cause problems in the ReaR recovery system. E.g. when things fail in the recovery system if it was inculded or possible security/privacy issues if secrets were included.

@jsmeix Under point 4 in my https://github.com/rear/rear/issues/2902#issuecomment-1369567575 I mentioned it is possible to exclude NICs with EXCLUDE_NETWORK_INTERFACES =() already in case they should be excluded from the rescue environment. So user has the power to do something in case it is harming on the NIC side.

BTW, from purely looking at the code this approach should have made the test passed of @pcahyna putting the eno2 onto this list. Obviously the comment section of the algorithm needs a slight update then.

hpannenb commented at 2023-01-28 08:56:

I adjusted the comments in my PR to align it with my code change.

Since this change is working as such (as already shown with earlier ReaR versions) my opinion is my PR is a lightweight change to support IPv4 and IPv6 NICs in the rescue environment.

@jsmeix @rmetrich @rear/contributors : Please leave Your comments.

hpannenb commented at 2023-02-19 09:44:

It is a great piece of code @rmetrich provided since it covers almost any situation in different NIC setups already. I just changed a single line of code to include IPv6 only NICs again.

@schlomo @jsmeix Appreciate both Your approvals on this; waiting for @pcahyna about his view/thoughts on my change.

schlomo commented at 2023-03-01 09:48:

@pcahyna any comments from you or can we merge this?

jsmeix commented at 2023-03-23 10:40:

@rear/contributors
I would like to be bold here
and merge it next Monday (27. March) afternoon
unless there are objections.

@pcahyna
I think because we are in ReaR 2.8 development phase
it is OK to "just merge" something like this one because
it does not look as if instantly hell broke loose
because of this changes (at least it "just works"
for @hpannenb so it cannot be wrong in any case)
and whether or not it is wrong in some cases
we learn best how it actually goes wrong
when various interested users who use our
current GitHub master code try it out
in their individual networking environments.


[Export of Github issue for rear/rear.]