#770 Issue closed: /etc/udev/rules.d/70-persistent-net.rules

Labels: enhancement, bug

gdha opened issue at 2016-02-12 11:02:

I was doing a test with a VM restore (cloning) of a RHEL6 system and noticed that the eth0 device was renamed to eth2, and eth1 to eth3. After the recover the system came up, but with no IP address configured to eth2.

In the recover log we find the following:

2016-02-11 05:34:29 Including finalize/GNU/Linux/30_create_mac_mapping.sh
cat: /sys/class/net/eth1/address: No such file or directory
2016-02-11 05:34:29 Including finalize/GNU/Linux/41_migrate_udev_rules.sh
2016-02-11 05:34:29 Updating udev configuration (70-persistent-net.rules)

However, this is easy to fix by:

  • delete the /etc/udev/rules.d/70-persistent-net.rules file and reboot, or
  • editing /etc/udev/rules.d/70-persistent-net.rules and delete the old entries and edit NAME entry (change eth2 to eth0, and eth3 to eth1) and reboot

This makes my think, wouldn't it be better to remove this restored /etc/udev/rules.d/70-persistent-net.rules file during the recover process? As this file gets recreated during the reboot.
Any ideas or suggestions?

schlomo commented at 2016-02-12 11:09:

Isn't that a bug earlier up in the rescue part? E.g. when we boot the system and decide how to map the old interfaces of the source system to the new interfaces of the recovery system?

the logic in 41_migrate_udev_rules.sh suggests that all decisions have been made much earlier and not here.

jsmeix commented at 2016-02-12 12:05:

On my SLES12 system /etc/udev/rules.d/70-persistent-net.rules contains:

# This file was automatically generated by the /usr/lib/udev/write_net_rules
# program,run by the persistent-net-generator.rules rules file.
#
# You can modify it,as long as you keep each rule on a single
# line,and change only the value of the NAME= key.
# PCI device 0x8086:0x10de (e1000e)
SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", ATTR{address}=="00:26:b9:82:21:7a", ATTR{dev_id}=="0x0", ATTR{type}=="1", KERNEL=="eth*", NAME="eth0"

I think it is a generic issue what to do during backup restore with files that are in the backup but should not be restored because they are automatically created and maintained by the system.

The generic traditional example of such a file is /etc/mtab. As long as it was a regular file it must not have been restored with outdated content from a backup. Nowadays it is a symbolic link to /proc/self/mounts which should probably be restored to ensure that link is available (I don't know if that link is automatically created by the system).

I think in general the backup should contain all files to be complete and it is the task of the admin to explicitly exclude files from his backup restore that he does not want to restore via EXCLUDE_RESTORE.

I think in general rear should handle the backup and restore task as "completely separated third party stuff" and not try to mess around with it.

Nevertheless rear could have in usr/share/rear/conf/default.conf a well documented predefined EXCLUDE_RESTORE list of well-known files that usually never make sense to be restored.

schlomo commented at 2016-02-12 12:18:

guys, I think we have a bug somewhere earlier in our process. Why did this suddenly pop up after working flawless for years?

jsmeix commented at 2016-02-12 12:22:

@schlomo
As far as I understand what @gdha wrote "After the recover the system came up" it means that the issue was not in the rear recovery system but that "rear recover" had worked but afterwards when booting the recovered system the issue appeared.

I am not at all a networking configuration expert but I would assume that rear does not need to change the networking configuration after the backup restore.

I think that by default the new hardware where "rear recover" is run has to be sufficiently compatible with the old hardware where "rear mkbackup" was run so that the networking configuration files from the backup would still work on the new hardware.

I think only configuration files regarding storage (partitioning, file system, mount points - e.g. /etc/fstab) and configuration files regarding the bootloader need to be adapted by rear after the backup restore.
But I could be wrong (in particular because I am not at all a networking configuration expert).

schlomo commented at 2016-02-12 12:26:

Yes, exactly my point. If the recovered system comes up and does not work properly then this is a bug in ReaR. The buggy code was executed in the environment of the rescue system. Either in the boot phase or in the rear recover phase.

It worked before, even with RHEL6. So there must be a bug somewhere. Either RHEL did an update or we have a change or both.

jsmeix commented at 2016-02-12 12:27:

In gereral whenever "they" change basic system stuff, then "suddenly" new interesting issues pop up in rear, in particular here it seems to be systemd/udev related.

jsmeix commented at 2016-02-12 12:31:

@gdha
can you exclude /etc/udev/rules.d/70-persistent-net.rules from the backup restore via EXCLUDE_RESTORE and try again.

If it works then, it proves (from my point of view) that this issue is just one more instance of the generic issue that files that are automatically created and maintained by the system should be excluded from the backup restore.

gdha commented at 2016-02-12 18:53:

@jsmeix Doing the test again must be requested, but we could do it with another system perhaps
@schlomo a bug? maybe, the MAC address should be remapped, need to dig deeper to be sure.

gdha commented at 2016-02-13 13:09:

@schlomo I think it has to do with device renaming and as

2016-02-11 05:34:29 Including finalize/GNU/Linux/30_create_mac_mapping.sh
cat: /sys/class/net/eth1/address: No such file or directory

suggests if did not find the device. udevd already kicked off and read the original file (with unmodified MAC addresses) and created 2 new entries. I've seen this many times already (and noticed the same in several reported issues). When you use DHCP then it goes by unnoticed.

Personally, I think adding this file to EXCLUDE_RESTORE could prevent this. Not yet tested this theory.

schlomo commented at 2016-02-13 20:07:

I think that we need to find a generic solution that works regardless of the restore options. EXCLUDE_RESTORE would only cover NETFS but not the other backup methods. Imagine ReaR failing because TSM restored that file.

jsmeix commented at 2016-02-15 09:49:

@schlomo I have a generic question:
Is rear meant to automatically exlude files from the backup restore that should not be restored or it this something that the admin must do?

schlomo commented at 2016-02-15 12:48:

IMHO for ReaR there are three independent concerns:

  1. bare metal recovery (block devices, boot loaders ...)
  2. restoring files (either through builtin backup or via external backup)
  3. system reconfiguration (network card mapping, disk mapping, drivers matching ...)

So far the part of restoring files is very independent from the other parts. Hence nothing magic here.

I would also advise against a magic integration but to always assume that the backup solution restored all files. If we need to remove some as part of the system reconfiguration then this is OK.

jsmeix commented at 2016-02-16 10:44:

@schlomo many thanks for your explanation - I always appreciate your valuable background information - it helps a lot!

Now (at least for me) it is perfectly clear.

If needed rear should remove restored files as part of the system reconfiguration.

Therefore I created https://github.com/rear/rear/issues/779
"Move away restored files that should not have been restored."

If you agree I will implement it - but probably after the 1.18 release.

jsmeix commented at 2016-02-16 13:57:

If you like, I implemented right now something for the 1.18 release, see https://github.com/rear/rear/pull/780

gdha commented at 2016-02-18 17:16:

Can be close as #779 pull request has been checked in.

jsmeix commented at 2016-02-19 11:23:

Unfortunately I think https://github.com/rear/rear/issues/779 "Move away restored files that should not have been restored" may not help regarding what @gdha wrote here in his initial comment https://github.com/rear/rear/issues/770#issue-133211130

Reason (as far as I see):
Via https://github.com/rear/rear/issues/779 files are removed immediately after backup restore. Accordingly with

BACKUP_RESTORE_MOVE_AWAY_FILES=( /etc/udev/rules.d/70-persistent-net.rules )

/etc/udev/rules.d/70-persistent-net.rules gets removed immediately after backup restore but later usr/share/rear/finalize/GNU/Linux/41_migrate_udev_rules.sh is run that creates /etc/udev/rules.d/70-persistent-net.rules anew.

When @gdha needs to remove /etc/udev/rules.d/70-persistent-net.rules in the rear recovery system after "rear recover" finished and before rebooting into the recreated system, then https://github.com/rear/rear/issues/779 does not help and the real fix is probably (as @schlomo suggested) in the rear recovery code that deals with udev rules.

FYI why it works in my particular case with SLE12-SP1:

In my case finalize/GNU/Linux/41_migrate_udev_rules.sh does the following (excerpt from my "rear -d -D recover" log):

+ source /usr/share/rear/restore/default/99_move_away_restored_files.sh
++ pushd /mnt/local
...
++ cp -a --parents etc/udev/rules.d/70-persistent-net.rules var/lib/rear/moved_away_after_backup_restore/
...
++ rm -rf etc/udev/rules.d/70-persistent-net.rules
...
++ popd
.
.
.
+ source /usr/share/rear/finalize/GNU/Linux/41_migrate_udev_rules.sh
...
++ echo -e 'Updating udev configuration (70-persistent-net.rules)'
++ cp /mnt/local//etc/udev/rules.d/70-persistent-net.rules /mnt/local/root/rear-70-persistent-net.rules.old
cp: cannot stat '/mnt/local//etc/udev/rules.d/70-persistent-net.rules': No such file or directory
++ cp /etc/udev/rules.d/70-persistent-net.rules /mnt/local//etc/udev/rules.d/70-persistent-net.rules

In my case 70-persistent-net.rules in the rear recovery system and the one in the original system are identical:

# diff -wups /var/lib/rear/moved_away_after_backup_restore/etc/udev/rules.d/70-persistent-net.rules /etc/udev/rules.d/70-persistent-net.rules  
Files /var/lib/rear/moved_away_after_backup_restore/etc/udev/rules.d/70-persistent-net.rules and /etc/udev/rules.d/70-persistent-net.rules are identical

so that all that deletion and re-creation does not actually change anything in my particular case.

jsmeix commented at 2016-02-19 11:27:

I think there is perhaps a bug in 41_migrate_udev_rules.sh but I am not at all a sufficient systemd/udev expert to really help here.

I will now blindly test what happens on my system when I change 41_migrate_udev_rules.sh so that it does not do anything with 70-persistent-net.rules ...

jsmeix commented at 2016-02-19 12:20:

I use two SLES12-SP1 test systems: Two KVM/QEMU virtual machines with one NIC and DHCP on each where one is the "original system" where I run "rear mkbackup" and the other one is the "replacement system" where I run "rear recover".

I changed 41_migrate_udev_rules.sh to do nothing at all by adding at its beginning

# do nothing at all:
return 0

For me the final result is the same:

99_move_away_restored_files.sh removes /etc/udev/rules.d/70-persistent-net.rules

That is all what is done with it during "rear recover" (that ends on 2016-02-19 11:49:38.597966929 - I assume the "rear recover" time is UTC).

After reboot into the recovered system 70-persistent-net.rules gets re-created identical to what it was before

# ls -l /etc/udev/rules.d/70-persistent-net.rules
-rw-r--r-- 1 root root 439 Feb 19 12:56 /etc/udev/rules.d/70-persistent-net.rules
# diff -wups /var/lib/rear/moved_away_after_backup_restore/etc/udev/rules.d/70-persistent-net.rules /etc/udev/rules.d/70-persistent-net.rules
Files /var/lib/rear/moved_away_after_backup_restore/etc/udev/rules.d/70-persistent-net.rules and /etc/udev/rules.d/70-persistent-net.rules are identical

As expected my DHCP networking still "just works".

gdha commented at 2016-03-02 11:52:

@jsmeix @schlomo Yesterday we were able to retest this on the same hardware where we noticed the behavior described in this issue with the latest snapshot of 20160222.
In the /etc/rear/local.conf file we added BACKUP_RESTORE_MOVE_AWAY_FILES=( /etc/udev/rules.d/70-persistent-net.rules ) and ran a recover process where we noticed the following:

2016-03-01 05:50:18 Including finalize/GNU/Linux/41_migrate_udev_rules.sh
diff: /mnt/local//etc/udev/rules.d/70-persistent-net.rules: No such file or directory
2016-03-01 05:50:18 Updating udev configuration (70-persistent-net.rules)
cp: cannot stat `/mnt/local//etc/udev/rules.d/70-persistent-net.rules`: No such file or directory
2016-03-01 05:50:18 Including finalize/GNU/Linux/42_migrate_network_configuration_files.sh
2016-03-01 05:50:18 SED_SCRIPT: ';s/00:50:56:99:1b:49/00:50:56:99:24:97/g;s/00:50:56:99:1B:49/00:50:56:99:24:97/g;s/00:50:56:99:f7:e9/00:50:56:99:b1:25/g;s/00:50:56:99:F7:E9/00:50:56:99:B1:25/g'

Indeed the /mnt/local//etc/udev/rules.d/70-persistent-net.rules was successfully removed, and recovered from the rescue image by script 42_migrate_network_configuration_files.sh (in the meantime I improved the verbosity of this script).
However, the final /mnt/local//etc/udev/rules.d/70-persistent-net.rules file was looking good before we rebooted the system (and was still ok after the reboot as well).
This concludes my quest for this issue. If on-one resist we can close this one?


[Export of Github issue for rear/rear.]