#3042 Issue closed: use_devicesfile in RHEL9

Labels: bug, fixed / solved / done

MiguelSanders opened issue at 2023-08-17 15:29:

ReaR version ("/usr/sbin/rear -V"):
rear-2.6-17.el9.x86_64

If your ReaR version is not the current version, explain why you can't upgrade:

OS version ("cat /etc/os-release" or "lsb_release -a" or "cat /etc/rear/os.conf"):
Red Hat Enterprise Linux release 9.2 (Plow)

ReaR configuration files ("cat /etc/rear/site.conf" and/or "cat /etc/rear/local.conf"):
BACKUP=NETFS
BACKUP_URL=nfs://XXX.uniforce.eu/GPFS/var/rear
OUTPUT=ISO
ONLY_INCLUDE_VG=('rootvg')

Hardware vendor/product:

System architecture:
x86_64

Firmware:
BIOS, GRUB2

Storage :
Local Disk

Storage layout: Not able to get any log for the disk layout

Description of the issue:

As of RHEL9, use_devicesfile in LVM seems to be enabled by default. The devices file contains information on the allowed PVIDs.
As we may be restoring on a different PV, the devices file needs to be recreated prior to rebooting after the restore.

Possible fix

chroot /mnt/local /bin/bash -c "rm /etc/lvm/devices/system.devices && /usr/sbin/vgimportdevices -a"

pcahyna commented at 2023-08-24 13:14:

See also #2917 . I don't think one can do /usr/sbin/vgimportdevices -a. The problem is, imagine you have several volume groups. One for the system (rootvg) and one for data (datavg), on different disks. Now you system is broken and you want to recover it, but the data are unaffected and you want them to survive the recovery procedure. For this, it is advisable to disconnect the data disks during the recovery procedure to avoid being accidentally reformatting them. But, as they are not connected, /usr/sbin/vgimportdevices -a will not find datavg, therefore it will not add them to the devices file and when you reboot after recovery with the disks attached again, they will not be found by LVM and datavg will not be activated.
I was thinking about some automated procedure to update only the obsolete entries in lvmdevices and keep all others, but I have not found anything simple. I therefore believe the only reliable solution is to remove the devices file at the end of recovery, since running without it is preferable to a broken system due to some fragile attempt at an "ideal" solution.

pcahyna commented at 2023-08-24 16:42:

@MiguelSanders is it a problem for you if the devices file is removed at the end of recovery and thus missing from the recovered system?

MiguelSanders commented at 2023-08-24 17:36:

@pcahyna Just a thought but can't we just merge /etc/lvm/devices/system.devices from the recovery environment with the actual /etc/lvm/devices/system.devices from the backup?

pcahyna commented at 2023-08-24 17:42:

@MiguelSanders I thought about that too, but that would leave duplicate entries in the resulting system.devices (old possibly invalid and new valid) and every LVM command will then complain about it. It will not be broken, but it will be quite annoying.

pcahyna commented at 2023-08-24 17:44:

If by "merge" you mean something more sophisticated than merely concatenate them (like using PVID as the merge key and removing duplicates), I would love to do that, but there is no tool to perform this ATM.

pcahyna commented at 2023-08-24 17:48:

See the discussion in https://bugzilla.redhat.com/show_bug.cgi?id=2196688

MiguelSanders commented at 2023-08-24 21:50:

@pcahyna If we decided to disconnect unaffected VGs before we initiate the recovery phase, we could still

  1. Remove /etc/lvm/devices/system.devices inside the chroot after the recovery
  2. Run vgimportdevices -a inside the chroot after the recovery.
  3. Run vgimportdevices -a again once we have reattached the other PVs after we booted the recovered system. Importing existing VGs should be considered a BAU operation. I don't believe ReaR has to take any responsibility for this step.

Alternatively we could also do a simple merge of the (A) /etc/lvm/devices/system.devices (from ramdisk) with the (B) /etc/lvm/devices/system.devices (from chroot) as follows
EXCLUDE=$(echo $(for i in $ ( cat A | grep PVID | awk '{print $4}' | cut -d'=' -f2); do echo $i; done) | tr ' ' '|')
cat B | grep PVID | grep -Ev "$EXCLUDE" >> A
cp A B
Of course, if we are missing PVs, LVM commands will complain about those but I consider this a good thing as this will tell us that the LVM layout of our recovered system differs from the source system and needs to be amended/corrected.

I can give this a try tomorrow if you want.
Let me know your thoughts.

pcahyna commented at 2023-08-25 09:51:

@MiguelSanders

@pcahyna If we decided to disconnect unaffected VGs before we initiate the recovery phase, we could still

1. Remove /etc/lvm/devices/system.devices inside the chroot after the recovery

2. Run vgimportdevices -a inside the chroot after the recovery.

3. Run vgimportdevices -a again once we have reattached the other PVs after we booted the recovered system. Importing existing VGs should be considered a BAU operation. I don't believe ReaR has to take any responsibility for this step.

I see at least two problems with this approach:

  1. will the recovered system even boot if you reattach the data drives but the data VG will not be found? Usually systemd drops to emergency mode when volumes referenced from fstab can't be found. This could cause lots of admin frustration.
  2. is it widely known that importing existing VGs is a BAU operation? I have encountered several bug reports already that show lack of awareness about the lvmdevices "feature" and its implications especially in case of moving content to new disk, including system cloning (and not just with ReaR).

Alternatively we could also do a simple merge of the (A) /etc/lvm/devices/system.devices (from ramdisk) with the (B) /etc/lvm/devices/system.devices (from chroot) as follows EXCLUDE=$(echo $(for i in $ ( cat A | grep PVID | awk '{print $4}' | cut -d'=' -f2); do echo $i; done) | tr ' ' '|') cat B | grep PVID | grep -Ev "$EXCLUDE" >> A cp A B Of course, if we are missing PVs, LVM commands will complain about those but I consider this a good thing as this will tell us that the LVM layout of our recovered system differs from the source system and needs to be amended/corrected.

I can give this a try tomorrow if you want. Let me know your thoughts.

Thank you for your thoughts and the offer to help. Writing a parser in shell for this (undocumented) format was exactly what I wanted to avoid, though, as I would consider it to be very fragile. For example, where does the print $4 in your awk statement come from? How do you know that the thing you are interested in is the 4th column? You don't take comments into account (although this does not seem to be a problem with the usual file content, who knows what might appear in a comment).

As simplicity and reliability are my highest priorities (there is already too much complicated and fragile code in ReaR making lots of assumptions about how things look like), I strongly prefer to remove the file, with simple concatenation and resulting duplicates (and warnings) as the only viable alternative. Concatenation is not without its pitfalls, though. Is it guaranteed that two files can be simply concatenated like that with preserving the meaning of entries in both? A simple problem like a missing terminal newline will ruin it. Or consider this scenario:

  • new version of LVM introduces a new incompatible format for the file
  • as it needs to support the old format in existing installations, it recognizes the new format using a header at the top
  • your system uses the old format, but your LVM tools are upgraded to ones that use the new format
  • you recover the system, the LVM tools create a new file in ramdisk (A) in the new format, but the file in chroot (B) remains in the old format
  • you concatenate them and the result is now broken, because it contains a mixture of entries in different formats

There is even no guarantee that the file format supports concatenation in the future - the format may be changed to something like a binary database, XML, or JSON. Undocumented formats can be changed any time.

Last but not least, removal of the LVM devices file is not a solution that I made up myself. It is one of the operations performed by virt-sysprep as a preparation of a VM for cloning, so there is a prior art. See the description of lvm-system-devices under https://man.archlinux.org/man/virt-sysprep.1.en#OPERATIONS (it is marked with an asterisk, meaning that it is enabled by default).

MiguelSanders commented at 2023-08-25 09:58:

@pcahyna I completely agree on the fact that the format might change at any time. As long as there is no option to lvmdevices that could help us, I also believe that the safest thing to do right now is to simply remove the system.devices file.

Thanks for your feedback.

pcahyna commented at 2023-08-25 17:35:

@MiguelSanders thank you for your review.

Indeed, if one wants to be perfect and keep correct lvmdevices, the correct way to do it is to have some option of lvmdevices(8) or vgimportdevices(8) that would do what we need without having to touch the file ourselves (i.e. treat it as opaque, which is the right way of dealing with undocumented and thus internal file formats).

I now realized that one such option (which does not rely on yet unimplemented features of lvmdevices(8) or vgimportdevices(8)) could be to keep the old lvmdevices in the rescue ramdisk. This way, the pvcreate commands will update it with new IDs, while keeping the old entries. We could then copy it to the recovered chroot at the end.

This approach is not without its own problems. The file could not have come from backup, since we recreate the layout before extracting the backup. It would need to be embedded in the ramdisk (in upstream ReaR it is actually already there). This would be a problem if the file changes between creating the rescue media (rear mkrescue) and taking the backup (rear mkbackuponly) as these operations are in principle independent. This is a solvable problem as we already have a check for inconsistencies of this type. Another pitfall is that pvcreate used to crash if it found an existing PVID already in the file - see https://bugzilla.redhat.com/show_bug.cgi?id=2117937 , and therefore we are excluding the file from the rescue ramdisk in RHEL as a workaround - I realize I should submit this patch to ReaR upstream too, although the bug in LVM is fortunately fixed now, but some distributions/versions might have a buggy version. The patch will be then incompatible with this approach.


[Export of Github issue for rear/rear.]