#1597 PR merged: Trigger udev to reload changed rules (and 1s sleep time as final fallback)

Labels: enhancement, fixed / solved / done, minor bug

schabrolles opened issue at 2017-11-25 17:13:

Without the 1s sleep time, I got the following error Cannot find device "eth0" during a network migration (sles12sp2).

Running 55-migrate-network-devices.sh...
The only original network interface eth0 00:1a:4a:16:01:cc is not available
and no mapping is specified in /etc/rear/mappings/mac
Mapping it to the only available eth1 1a:f4:ea:94:64:0c
Reloading udev ... done.
Running 58-start-dhclient.sh...
Running 60-network-devices.sh...
Cannot find device "eth0"
Cannot find device "eth0"
Cannot find device "eth0"
Running 62-routing.sh...
Cannot find device "eth0"
Running 63-teaming.sh...
Running 65-sysctl.sh...
Running 67-check-by-label-cdrom.sh...
Running 99-makedev.sh...

This is solve by adding a 1s delay before triggering udev.

jsmeix commented at 2017-11-25 20:37:

Only FYI:
In general think triggering udev should usually not be needed, cf.
https://github.com/rear/rear/issues/791
but I am not a udev expert to make a decision here.
Perhaps all what is actually needed is the "sleep 1"
to enforce waiting a bit until udev had done its stuff (hopefully).
Perhaps the real solution would be to wait for the particular
network interface to actually appear?

jsmeix commented at 2017-11-25 20:59:

@schabrolles
as always with udev issues I do not understand what actually goes on.
Can you perhaps describe how it happens that you get a

Cannot find device "eth0"

error message when ReaR had reported

The only original network interface eth0 00:1a:4a:16:01:cc is not available
...
Mapping it to the only available eth1 1a:f4:ea:94:64:0c

so that after ReaR had applied the mapping in 55-migrate-network-devices.sh
I would understand that a

Cannot find device "eth1"

error message can happen as long as udev had not finished
but I do not understand an error message about "eth0" which
does not exist according to what ReaR reported.

schabrolles commented at 2017-11-26 09:12:

@jsmeix, Let me give you more details about the issue I got.

  • Scenario: Migration from system-A to system-B
  • system-A has 1 network interface eth0 with mac address MAC-A and IP set to IP-A in ifcfg-eth0 and 60-network-devices.sh (when booting in rescue image).
  • When I boot system-B the rescue image generated from system-A, a new interface eth1 is discovered (in sles12, udev rules file already contains MAC-A which is related to the eth0 device name).
  • during the rescue-image boot phase, the script 55-migrate-network-devices.sh is suppose to modify the udev rule file to replace MAC-A to MAC-B (eth1 MAC address). That's why you got the message:
Mapping it to the only available eth1 1a:f4:ea:94:64:0c

=> MAC-B should be re-named eth0 after udev trigger (which is by the way done by 55-migrate-network-devices.sh)
This was perfectly working before commit dd0c711. But now I got:

Running 60-network-devices.sh...
Cannot find device "eth0"
Cannot find device "eth0"
Cannot find device "eth0"
  • 60-network-devices.sh contains information about eth0, but eth0 is not there. it is still eth1.
    That is why I got cannot find device eth0.
  • Because it is still eth1 when the system is completely booted, this mean adding sleep time after udev triger would not help. (so it is not related to #791).
  • It works before dd0c711, and this commit add the automatic renaming of interface when only one is available (which is a good feature). Udev rule file is well updated with MAC-B instead of MAC-A... so it is definitely related to udev trigger. If I force udev trigger again (when loggin into the rescue image) eth1 is well renamed in eth0.
  • Previously (before dd0c711), the user add to select the interface from the prompt which add a bit of delay somewhere.
    (It is also working if I add the delay at the end of the modification made in dd0c711:
[...]
   if test ${#ORIGINAL_MACS[@]} -eq 1 -a ${#NEW_DEVICES[@]} -eq 1 ; then
        index=0
        old_dev=${ORIGINAL_DEVICES[$index]}
        old_mac=${ORIGINAL_MACS[$index]}
        choice="${NEW_DEVICES[$index]}"
        # Split choice="dev mac driver" into words:
        dev_mac_driver=( $choice )
        new_dev=${dev_mac_driver[0]}
        new_mac=${dev_mac_driver[1]}
        # Output the old_mac->new_mac mapping for later use below:
        echo "$old_mac $new_mac $old_dev" >>$MAC_MAPPING_FILE
        # Get new device name from current MAC address:
        new_dev=$( get_device_by_hwaddr "$new_mac" )
        # Tell the user about the automated mapping (and how he could avoid it):
        echo "The only original network interface $old_dev $old_mac is not available"
        echo "and no mapping is specified in $MAC_MAPPING_FILE"
        echo "Mapping it to the only available $new_dev $new_mac"
        sleep 1
    else
[...]

Sorry, I don't have the real cause, so I can't propose the perfect solution here. That's why I'm sharing my findings here. May be you will have a better solution.

schabrolles commented at 2017-11-26 14:09:

@jsmeix I change the sleep 1 with udevadm control --reload-rules
This command forces udev to reload rules files. As the file was just modify to change MAC address, Forcing reload the files looks a good idea before a udev trigger.
I does the job for SLES12 sp2.

jsmeix commented at 2017-11-27 10:56:

@schabrolles
many thanks for your great explanation.
I fully agree with the current fix (i.e. udevadm control --reload-rules)
because now it really looks like the right solution.
I guess the delay helps by chance because - as far as I vaguely
remember - udev watches its rules files for changes (via inotify
or something like this if I remember correctly) and that may need
a bit of time so that a "slelep 1" delay helps (usually).

Regarding the udevadm control command syntax:
On my SLE11 system "man udevadm" reads (excerpts):

  udevadm control
    ...
    --reload-rules
    Signal udevd to reload the rules files.
    The udev daemon detects changes automatically,
    this option is usually not needed. Reloading rules
    does not apply any changes to already existing devices.

while in contrast
on my SLE12 system "man udevadm" reads (excerpts):

  udevadm control
    ...
    -R, --reload
    Signal systemd-udevd to reload the rules files
    and other databases like the kernel module index.
    Reloading rules and databases does not apply
    any changes to already existing devices; the new
    configuration will only be applied to new events.

but is seems "udevadm control --reload-rules"
works backward compatible on my SLE12 system:

# udevadm control --reload-rules && echo ok
ok

so that we can assume (at least for now) that
"udevadm control --reload-rules"
should be the right way.

@schabrolles
just merge it so that the automated tests of @gdha are run.

schabrolles commented at 2017-11-27 11:44:

@jsmeix what about using :
« udevadm control —reload-rules || udevadm control —reload || sleep 1 ».
Just to be safe... what do you think ?

jsmeix commented at 2017-11-27 11:58:

Absolutely perfect for me!
But please add a comment why that two fallbacks are there,
cf. "Dirty hacks welcome" in
https://github.com/rear/rear/wiki/Coding-Style

FYI
why I still think it is a "dirty hack" that is same as
https://github.com/rear/rear/issues/791

I think the real solution could be to
replace in 55-migrate-network-devices.sh
the whole "Reload udev if we have MAC mappings" part with a
"wait for the expected new network interfaces to appear"
and just let udev do its stuff without interfering with udev.

The latter requires we know what new network interfaces
to wait for, cf. my "TODO" in 55-migrate-network-devices.sh
about "$old_mac $new_mac $old_dev $new_dev".

With that we could wait for each $new_dev to appear.

Then we could even show meaningful information to the user
when one does not appear after a reasonable timeout,
perhaps together with a subsequent user dialog where
he could set up missing network interfaces manually.

But that would be definitely something for a future ReaR
release (if at all).

jsmeix commented at 2017-11-27 12:29:

@schabrolles
as always many thanks for your testing!
It helps so much because it reveals shortcomings in the code.
It seems on my KVM/QEMU virtual machines udev works
too fast to let such issues show up.


[Export of Github issue for rear/rear.]