#3265 Issue open: Issue with mkrescue on EFI servers with multipath devices

Labels: bug

damm620 opened issue at 2024-07-03 08:22:

  • ReaR version ("/usr/sbin/rear -V"):
    rear -V
    Relax-and-Recover 2.7 / Git

  • OS version ("cat /etc/os-release" or "lsb_release -a" or "cat /etc/rear/os.conf"):

cat /etc/os-release
NAME="CentOS Stream"
VERSION="8"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="8"
PLATFORM_ID="platform:el8"
PRETTY_NAME="CentOS Stream 8"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:8"
HOME_URL="https://centos.org/"
BUG_REPORT_URL="https://bugzilla.redhat.com/"
REDHAT_SUPPORT_PRODUCT="Red Hat Enterprise Linux 8"
REDHAT_SUPPORT_PRODUCT_VERSION="CentOS Stream"
  • ReaR configuration files ("cat /etc/rear/site.conf" and/or "cat /etc/rear/local.conf"):
sudo tail -n5000 /etc/rear/*.conf
==> /etc/rear/local.conf <==
# Default is to create Relax-and-Recover rescue media as ISO image
# set OUTPUT to change that
# set BACKUP to activate an automated (backup and) restore of your data
# Possible configuration values can be found in /usr/share/rear/conf/default.conf
#
# This file (local.conf) is intended for manual configuration. For configuration
# through packages and other automated means we recommend creating a new
# file named site.conf next to this file and to leave the local.conf as it is.
# Our packages will never ship with a site.conf.
ONLY_INCLUDE_VG=("evg00") # Nur das OS wird restored. Die Kundenplatten vg01/IMAPL werden nicht restored

==> /etc/rear/os.conf <==
OS_VENDOR=RedHatEnterpriseServer
OS_VERSION="8"

==> /etc/rear/site.conf <==
OUTPUT=ISO               # Creates an ISO file - other e.g. RAMDISK
BACKUP=TSM               # Backup/Restore method - other e.g. rsync
COPY_AS_IS_TSM=( "/etc/adsm/ /opt/tivoli/tsm/client /usr/local/ibm/gsk8* /SZIR/data/inclexcl2 /SZIR/data/dsm.sys /SZIR/data/dsm.opt" )  # Files which should be in the ISO
# Skip language files to reduce the ISO size
COPY_AS_IS_EXCLUDE_TSM=( "/opt/tivoli/tsm/client/ba/bin/ZH_CN/*" "/opt/tivoli/tsm/client/ba/bin/ZH_TW/*" "/opt/tivoli/tsm/client/ba/bin/IT_IT/*" "/opt/tivoli/tsm/client/ba/bin/CS_CZ/*" "/opt/tivoli/tsm/client/ba/bin/ES_ES*" "/opt/tivoli/tsm/client/ba/bin/FR_FR/*" "/opt/tivoli/tsm/client/ba/bin/HU_HU/*" "/opt/tivoli/tsm/client/ba/bin/KO_KR/*" "/opt/tivoli/tsm/client/ba/bin/PL_PL/*" "/opt/tivoli/tsm/client/ba/bin/PT_BR/*" "/opt/tivoli/tsm/client/ba/bin/JA_JP/*" "/opt/tivoli/tsm/client/ba/bin/RU_RU/*" "/opt/tivoli/tsm/client/ba/bin/plugins/*" "/opt/tivoli/tsm/client/api/bin64/CS_CZ*" "/opt/tivoli/tsm/client/api/bin64/DE_DE/*" "/opt/tivoli/tsm/client/api/bin64/dsmenc" "/opt/tivoli/tsm/client/api/bin64/dsm.opt.smp" "/opt/tivoli/tsm/client/api/bin64/dsm.sys.smp" "/opt/tivoli/tsm/client/api/bin64/dsmtca" "/opt/tivoli/tsm/client/api/bin64/EN_US" "/opt/tivoli/tsm/client/api/bin64/ES_ES" "/opt/tivoli/tsm/client/api/bin64/FR_FR" "/opt/tivoli/tsm/client/api/bin64/HU_HU" "/opt/tivoli/tsm/client/api/bin64/IT_IT" "/opt/tivoli/tsm/client/api/bin64/JA_JP" "/opt/tivoli/tsm/client/api/bin64/KO_KR" "/opt/tivoli/tsm/client/api/bin64/PL_PL" "/opt/tivoli/tsm/client/api/bin64/PT_BR" "/opt/tivoli/tsm/client/api/bin64/RU_RU" "/opt/tivoli/tsm/client/api/bin64/sample" "/opt/tivoli/tsm/client/api/bin64/ZH_CN" "/opt/tivoli/tsm/client/api/bin64/ZH_TW" )
ONLY_INCLUDE_VG=("evg00") # Only the OS will be restored. Customer volumegroups/FS vg01/IMAPL/evg01 aren't restored
CHECK_CONFIG_FILES=( '/etc/adsm/' '/etc/drbd/' '/etc/drbd.conf' '/etc/lvm/lvm.conf' '/etc/multipath.conf' '/etc/rear/' '/etc/udev/udev.conf' '/opt/tivoli/tsm/client/ba/bin/dsmc' )
PROGS_TSM=(dsmc)
PROGS=( ${PROGS[@]} "vconfig vi vim" )
NON_FATAL_BINARIES_WITH_MISSING_LIBRARY=("/opt/tivoli/tsm/client/*|/usr/local/ibm/gsk8_64/bin/")
GRUB_RESCUE=y
AUTOEXCLUDE_MULTIPATH=n  # Default Wert=y - multipath devices are not created/saved. Which is usually not necessary
                         # If these must be restored nevertheless e.g. "rm -rf /". The restore must be performed manually afterwards
TSM_RESULT_SAVE=n        # Don't save ISO in TSM
KEEP_BUILD_DIR=n         # Remove build directory (dir in /tmp) in any case to prevent full filesystem
USER_INPUT_TIMEOUT=900   # Increase timeout for user input from 5 to 15 minutes
USE_DHCLIENT="n"
  • Hardware vendor/product (PC or PowerNV BareMetal or ARM) or VM (KVM guest or PowerVM LPAR):
    Cisco UCS Blade

  • System architecture (x86 compatible or PPC64/PPC64LE or what exact ARM device):
    x64

  • Firmware (BIOS or UEFI or Open Firmware) and bootloader (GRUB or ELILO or Petitboot):
    UEFI

  • Storage (local disk or SSD) and/or SAN (FC or iSCSI or FCoE) and/or multipath (DM or NVMe):
    multipath

  • Storage layout ("lsblk -ipo NAME,KNAME,PKNAME,TRAN,TYPE,FSTYPE,LABEL,SIZE,MOUNTPOINT"):

lsblk -ipo NAME,KNAME,PKNAME,TRAN,TYPE,FSTYPE,LABEL,SIZE,MOUNTPOINT
NAME                                                         KNAME      PKNAME    TRAN   TYPE  FSTYPE       LABEL   SIZE MOUNTPOINT
/dev/sda                                                     /dev/sda             fcoe   disk  mpath_member         102G
|-/dev/sda1                                                  /dev/sda1  /dev/sda         part  none                 200M
|-/dev/sda2                                                  /dev/sda2  /dev/sda         part  none                   1G
|-/dev/sda3                                                  /dev/sda3  /dev/sda         part  none               100.8G
`-/dev/mapper/mpd_boot_1                                     /dev/dm-0  /dev/sda         mpath                      102G
  |-/dev/mapper/mpd_boot_1p1                                 /dev/dm-1  /dev/dm-0        part  vfat                 200M /boot/efi
  |-/dev/mapper/mpd_boot_1p2                                 /dev/dm-2  /dev/dm-0        part  xfs                    1G /boot
  `-/dev/mapper/mpd_boot_1p3                                 /dev/dm-3  /dev/dm-0        part  LVM2_member        100.8G
    |-/dev/mapper/evg00-root                                 /dev/dm-4  /dev/dm-3        lvm   xfs                    2G /
    |-/dev/mapper/evg00-swap                                 /dev/dm-5  /dev/dm-3        lvm   swap                   4G [SWAP]
    |-/dev/mapper/evg00-usr                                  /dev/dm-6  /dev/dm-3        lvm   xfs                    4G /usr
....
/dev/sdb                                                     /dev/sdb             fcoe   disk  mpath_member         102G
|-/dev/sdb1                                                  /dev/sdb1  /dev/sdb         part  none                 200M
|-/dev/sdb2                                                  /dev/sdb2  /dev/sdb         part  none                   1G
|-/dev/sdb3                                                  /dev/sdb3  /dev/sdb         part  none               100.8G
`-/dev/mapper/mpd_boot_1                                     /dev/dm-0  /dev/sdb         mpath                      102G
  |-/dev/mapper/mpd_boot_1p1                                 /dev/dm-1  /dev/dm-0        part  vfat                 200M /boot/efi
  |-/dev/mapper/mpd_boot_1p2                                 /dev/dm-2  /dev/dm-0        part  xfs                    1G /boot
  `-/dev/mapper/mpd_boot_1p3                                 /dev/dm-3  /dev/dm-0        part  LVM2_member        100.8G
    |-/dev/mapper/evg00-root                                 /dev/dm-4  /dev/dm-3        lvm   xfs                    2G /
    |-/dev/mapper/evg00-swap                                 /dev/dm-5  /dev/dm-3        lvm   swap                   4G [SWAP]
    |-/dev/mapper/evg00-usr                                  /dev/dm-6  /dev/dm-3        lvm   xfs                    4G /usr
....
  • Description of the issue (ideally so that others can reproduce it):
    Do have an UEFI server with multipath devices and have /boot/efi mounted via /dev/mapper/mpd_boot_1p2.
    Then run a rear mkrescue.
    It produces this output:
TMPDIR=/tmp /usr/sbin/rear -D mkrescue; echo $?
Relax-and-Recover 2.7 / Git
Running rear mkrescue (PID 540612 date 2024-07-03 10:07:29)
Command line options: /usr/sbin/rear -D mkrescue
Using log file: /var/log/rear/rear-gtunxlnt00696.log
Using build area: /tmp/rear.XSejQLnsWJDoXzl
Running 'init' stage ======================
Running workflow mkrescue on the normal/original system
Running 'prep' stage ======================
Found EFI system partition /dev/mapper/mpd_boot_1p1 on /boot/efi type vfat
Using UEFI Boot Loader for Linux (USING_UEFI_BOOTLOADER=1)
Using '/bin/xorrisofs' to create ISO filesystem images
Using autodetected kernel '/boot/vmlinuz-4.18.0-553.5.1.el8.x86_64' as kernel in the recovery system
Running 'layout/save' stage ======================
Creating disk layout
Overwriting existing disk layout file /var/lib/rear/layout/disklayout.conf
Automatically excluding multipath slave /dev/sda
Automatically excluding multipath slave /dev/sdb
Disabling excluded components in /var/lib/rear/layout/disklayout.conf
Using guessed bootloader 'EFI' for 'rear recover' (found in first bytes on /dev/sda)
Verifying that the entries in /var/lib/rear/layout/disklayout.conf are correct
Created disk layout (check the results in /var/lib/rear/layout/disklayout.conf)
Running 'rescue' stage ======================
Creating recovery system root filesystem skeleton layout
Adding biosdevname=0 to KERNEL_CMDLINE
Adding net.ifnames=0 to KERNEL_CMDLINE
Handling network interface 'eth0'
eth0 is a physical device
Handled network interface 'eth0'
Included current keyboard mapping (via 'dumpkeys -f')
Included default US keyboard mapping /lib/kbd/keymaps/legacy/i386/qwerty/defkeymap.map.gz
Included other keyboard mappings in /lib/kbd/keymaps
Trying to find what to use as UEFI bootloader...
Trying to find a 'well known file' to be used as UEFI bootloader...
Using '/boot/efi/EFI/centos/grubx64.efi' as UEFI bootloader file
Copying logfile /var/log/rear/rear-gtunxlnt00696.log into initramfs as '/tmp/rear-gtunxlnt00696-partial-2024-07-03T10:07:37+02:00.log'
Running 'build' stage ======================
Copying files and directories
Copying binaries and libraries
Copying all kernel modules in /lib/modules/4.18.0-553.5.1.el8.x86_64 (MODULES contains 'all_modules')
Failed to copy all contents of /lib/modules/4.18.0-553.5.1.el8.x86_64 (dangling symlinks could be a reason)
Copying all files in /lib*/firmware/
Skip copying broken symlink '/etc/mtab' target '/proc/558267/mounts' on /proc/ /sys/ /dev/ or /run/
Testing that the recovery system in /tmp/rear.XSejQLnsWJDoXzl/rootfs contains a usable system
Testing each binary with 'ldd' and look for 'not found' libraries within the recovery system
Testing that the existing programs in the PROGS array can be found as executable command within the recovery system
Testing that each program in the REQUIRED_PROGS array can be found as executable command within the recovery system
Running 'pack' stage ======================
Creating recovery/rescue system initramfs/initrd initrd.cgz with gzip default compression
Created initrd.cgz with gzip default compression (708437039 bytes) in 57 seconds
Running 'output' stage ======================
Configuring GRUB2 kernel /isolinux/kernel
Configuring GRUB2 initrd /isolinux/initrd.cgz
Configuring GRUB2 root device as 'set root=cd0'
GRUB2 modules to load: fat part_gpt part_msdos xfs
Did not find /boot/grub2/locale files (minor issue for UEFI ISO boot)
Searching whole /usr for SYSLINUX modules directory (you may specify SYSLINUX_MODULES_DIR)
Making ISO image
Wrote ISO image: /var/lib/rear/output/rear-gtunxlnt00696.iso (792M)
Setting up GRUB_RESCUE: Adding Relax-and-Recover rescue system to the local UEFI boot manager
Anyone who can select UEFI boot entries can boot it and replace the current system via 'rear recover'
GRUB2 modules to load: fat part_gpt part_msdos xfs
ERROR: Failed to create 'Relax-and-Recover' UEFI boot entry
Some latest log messages since the last called script 940_grub2_rescue.sh:
  grub2-mkstandalone: info: reading /usr/lib/grub/x86_64-efi/memdisk.mod.
  grub2-mkstandalone: info: reading /usr/lib/grub/x86_64-efi/archelp.mod.
  grub2-mkstandalone: info: reading /usr/lib/grub/x86_64-efi/tar.mod.
  grub2-mkstandalone: info: reading /tmp/grub.diT5Uz.
  grub2-mkstandalone: info: kernel_img=0x7fbdd5b64010, kernel_size=0x27c00.
  grub2-mkstandalone: info: the core size is 0x6f4420.
  grub2-mkstandalone: info: writing 0x6f6200 bytes.
  Could not prepare Boot variable: No such file or directory
Aborting due to an error, check /var/log/rear/rear-gtunxlnt00696.log for details
Exiting rear mkrescue (PID 540612) and its descendant processes ...
Running exit tasks
Terminated
143

I did some changes to the code and it worked for me:

diff /usr/share/rear/output/default/940_grub2_rescue.sh /usr/share/rear/output/default/940_grub2_rescue.sh_edit
194,195c194,196
<         efi_disk_part=$( grep -w /boot/efi /proc/mounts | awk '{print $1}' )
<         efi_disk=$( echo $efi_disk_part | sed -e 's/[0-9]//g' )
---
>         efi_disk_part=$( lsblk -nrpo PKNAME,KNAME,MOUNTPOINT | grep '/boot/efi'|sort -u )
>         efi_disk=$( echo $efi_disk_part | awk '{print $1}' )
>         efi_disk=$( get_device_name $efi_disk )
197c198,200
<         efi_part=$( echo $efi_disk_part | sed -e 's/[^0-9]//g' )
---
>         efi_part=$( echo $efi_disk_part | awk '{print $2}' )
>         efi_part=$( get_device_name $efi_part )
>         efi_part=$( echo $efi_part | grep -Eo '[0-9]+$' )

After this change, it worked for me on this server:

TMPDIR=/tmp /usr/sbin/rear -D mkrescue; echo $?
Relax-and-Recover 2.7 / Git
Running rear mkrescue (PID 571152 date 2024-07-03 10:14:25)
Command line options: /usr/sbin/rear -D mkrescue
Using log file: /var/log/rear/rear-gtunxlnt00696.log
Using build area: /tmp/rear.FNnzwZoOXHpjcDk
Running 'init' stage ======================
Running workflow mkrescue on the normal/original system
Running 'prep' stage ======================
Found EFI system partition /dev/mapper/mpd_boot_1p1 on /boot/efi type vfat
Using UEFI Boot Loader for Linux (USING_UEFI_BOOTLOADER=1)
Using '/bin/xorrisofs' to create ISO filesystem images
Using autodetected kernel '/boot/vmlinuz-4.18.0-553.5.1.el8.x86_64' as kernel in the recovery system
Running 'layout/save' stage ======================
Creating disk layout
Overwriting existing disk layout file /var/lib/rear/layout/disklayout.conf
Automatically excluding multipath slave /dev/sda
Automatically excluding multipath slave /dev/sdb
Disabling excluded components in /var/lib/rear/layout/disklayout.conf
Using guessed bootloader 'EFI' for 'rear recover' (found in first bytes on /dev/sda)
Verifying that the entries in /var/lib/rear/layout/disklayout.conf are correct
Created disk layout (check the results in /var/lib/rear/layout/disklayout.conf)
Running 'rescue' stage ======================
Creating recovery system root filesystem skeleton layout
Adding biosdevname=0 to KERNEL_CMDLINE
Adding net.ifnames=0 to KERNEL_CMDLINE
Handling network interface 'eth0'
eth0 is a physical device
Handled network interface 'eth0'
Included current keyboard mapping (via 'dumpkeys -f')
Included default US keyboard mapping /lib/kbd/keymaps/legacy/i386/qwerty/defkeymap.map.gz
Included other keyboard mappings in /lib/kbd/keymaps
Trying to find what to use as UEFI bootloader...
Trying to find a 'well known file' to be used as UEFI bootloader...
Using '/boot/efi/EFI/centos/grubx64.efi' as UEFI bootloader file
Copying logfile /var/log/rear/rear-gtunxlnt00696.log into initramfs as '/tmp/rear-gtunxlnt00696-partial-2024-07-03T10:14:33+02:00.log'
Running 'build' stage ======================
Copying files and directories
Copying binaries and libraries
Copying all kernel modules in /lib/modules/4.18.0-553.5.1.el8.x86_64 (MODULES contains 'all_modules')
Failed to copy all contents of /lib/modules/4.18.0-553.5.1.el8.x86_64 (dangling symlinks could be a reason)
Copying all files in /lib*/firmware/
Skip copying broken symlink '/etc/mtab' target '/proc/588455/mounts' on /proc/ /sys/ /dev/ or /run/
Testing that the recovery system in /tmp/rear.FNnzwZoOXHpjcDk/rootfs contains a usable system
Testing each binary with 'ldd' and look for 'not found' libraries within the recovery system
Testing that the existing programs in the PROGS array can be found as executable command within the recovery system
Testing that each program in the REQUIRED_PROGS array can be found as executable command within the recovery system
Running 'pack' stage ======================
Creating recovery/rescue system initramfs/initrd initrd.cgz with gzip default compression
Created initrd.cgz with gzip default compression (708424370 bytes) in 57 seconds
Running 'output' stage ======================
Configuring GRUB2 kernel /isolinux/kernel
Configuring GRUB2 initrd /isolinux/initrd.cgz
Configuring GRUB2 root device as 'set root=cd0'
GRUB2 modules to load: fat part_gpt part_msdos xfs
Did not find /boot/grub2/locale files (minor issue for UEFI ISO boot)
Searching whole /usr for SYSLINUX modules directory (you may specify SYSLINUX_MODULES_DIR)
Making ISO image
Wrote ISO image: /var/lib/rear/output/rear-gtunxlnt00696.iso (792M)
Setting up GRUB_RESCUE: Adding Relax-and-Recover rescue system to the local UEFI boot manager
Anyone who can select UEFI boot entries can boot it and replace the current system via 'rear recover'
GRUB2 modules to load: fat part_gpt part_msdos xfs
Finished GRUB_RESCUE setup: Added 'Relax-and-Recover' UEFI boot manager entry
Exiting rear mkrescue (PID 571152) and its descendant processes ...
Running exit tasks
0

It also worked on a server with /dev/sda devices.
The only (storage-) type left to test would be nvme. I only tested the code change (so that the efibootmgr would be executed correctly) but not really executed the efibootmgr.

my question is:
Is this a good type of fix (with function get_device_name) or would you prefer it another way. I am just asking because I do not know the codebase well enough to know if there is anything smarter to do here :)

gdha commented at 2024-07-04 12:10:

@damm620 Thank you for your excellent analysis and proposed bug fix. Could you make a PR for this?

damm620 commented at 2024-07-04 13:13:

PR created


[Export of Github issue for rear/rear.]