#2826 Issue closed: RedHatEnterpriseServer/8 - There is no code to install a boot loader on the recovered system

Labels: support / question, fixed / solved / done

Githopp192 opened issue at 2022-06-21 18:33:

Relax-and-Recover (ReaR) Issue Template
Fill in the following items before submitting a new issue
(quick response is not guaranteed with free support):

  • ReaR version ("/usr/sbin/rear -V"):

Relax-and-Recover 2.6 / 2020-06-17

  • OS version ("cat /etc/os-release" or "lsb_release -a" or "cat /etc/rear/os.conf"):
VERSION="8.5 (Ootpa)"
ID="rhel"
ID_LIKE="fedora"
VERSION_ID="8.5"
PLATFORM_ID="platform:el8"
PRETTY_NAME="Red Hat Enterprise Linux 8.5 (Ootpa)"
  • ReaR configuration files ("cat /etc/rear/site.conf" and/or "cat /etc/rear/local.conf"):
BACKUP=BACULA
export TMPDIR="/home/TMPDIR"
COPY_AS_IS_BACULA=( /usr/lib64/libbac* /opt/bacula/working /etc/bacula /var/spool/bacula )
COPY_AS_IS_EXCLUDE_BACULA=( /var/lib/bacula )
PROGS_BACULA=( bacula bacula-fd bconsole bacula-console bextract bls bscan btape smartctl gawk )
BACULA_CLIENT=baculaclient-fd
CLONE_ALL_USERS_GROUPS=y
OUTPUT_URL=nfs://nas/volumex/Backup/REAR
OUTPUT_PREFIX=server2.opn9.9opn
  • Hardware vendor/product (PC or PowerNV BareMetal or ARM) or VM (KVM guest or PowerVM LPAR):

HP Proliant DL360 G7

  • System architecture (x86 compatible or PPC64/PPC64LE or what exact ARM device):

x86

  • Firmware (BIOS or UEFI or Open Firmware) and bootloader (GRUB or ELILO or Petitboot):

BIOS, Bootloader GRUB

  • Storage (local disk or SSD) and/or SAN (FC or iSCSI or FCoE) and/or multipath (DM or NVMe):

local disks (DM (mdadm))

  • Storage layout ("lsblk -ipo NAME,KNAME,PKNAME,TRAN,TYPE,FSTYPE,LABEL,SIZE,MOUNTPOINT"):
NAME                              KNAME      PKNAME     TRAN TYPE  FSTYPE            LABEL          SIZE MOUNTPOINT
/dev/sda                          /dev/sda              sas  disk                                 279.4G
|-/dev/sda1                       /dev/sda1  /dev/sda        part  ext4                               1G /mnt/local/boot
`-/dev/sda2                       /dev/sda2  /dev/sda        part  linux_raid_member server2:pv00 278.4G
  `-/dev/md127                    /dev/md127 /dev/sda2       raid1 LVM2_member                    278.2G
    |-/dev/mapper/cl_server2-home /dev/dm-0  /dev/md127      lvm   xfs               home            60G /mnt/local/home
    |-/dev/mapper/cl_server2-root /dev/dm-1  /dev/md127      lvm   xfs               root            50G /mnt/local
    `-/dev/mapper/cl_server2-swap /dev/dm-2  /dev/md127      lvm   swap              swap            20G
/dev/sdb                          /dev/sdb              sas  disk                                 279.4G
`-/dev/sdb1                       /dev/sdb1  /dev/sdb        part  linux_raid_member server2:pv00 278.4G
  `-/dev/md127                    /dev/md127 /dev/sdb1       raid1 LVM2_member                    278.2G
    |-/dev/mapper/cl_server2-home /dev/dm-0  /dev/md127      lvm   xfs               home            60G /mnt/local/home
    |-/dev/mapper/cl_server2-root /dev/dm-1  /dev/md127      lvm   xfs               root            50G /mnt/local
    `-/dev/mapper/cl_server2-swap /dev/dm-2  /dev/md127      lvm   swap              swap            20G
/dev/sdc                          /dev/sdc              usb  disk                                   1.9G
`-/dev/sdc1                       /dev/sdc1  /dev/sdc        part  vfat              RELAXRECOVE    1.9G
  • Description of the issue (ideally so that others can reproduce it):
dracut: *** Stripping files done ***
dracut: *** Creating image file '/boot/initramfs-4.18.0-372.9.1.el8.x86_64.img' ***
dracut: *** Creating initramfs image file '/boot/initramfs-4.18.0-372.9.1.el8.x86_64.img' done ***
2022-06-14 12:45:36.378711439 Updated initrd with new drivers for kernel 4.18.0-372.9.1.el8.x86_64.
2022-06-14 12:45:36.386270072 Including finalize/Linux-i386/610_EFISTUB_run_efibootmgr.sh
2022-06-14 12:45:36.393015395 Including finalize/Linux-i386/630_install_grub.sh
/sbin/grub2-probe
2022-06-14 12:45:36.396131289 Skip installing GRUB Legacy boot loader because GRUB 2 is installed (grub-probe or grub2-probe exist).
2022-06-14 12:45:36.403333237 Including finalize/Linux-i386/640_install_lilo.sh
2022-06-14 12:45:36.410207701 Including finalize/Linux-i386/650_install_elilo.sh
2022-06-14 12:45:36.416834713 Including finalize/Linux-i386/660_install_grub2.sh
/sbin/grub2-probe
2022-06-14 12:45:36.419869106 Installing GRUB2 boot loader...
Generating grub configuration file ...
done
2022-06-14 12:45:52.295025695 Determining where to install GRUB2 (no GRUB2_INSTALL_DEVICES specified)
2022-06-14 12:45:52.306608019 Cannot install GRUB2 (unable to find a /boot or / partition)
2022-06-14 12:45:52.313893146 Including finalize/Linux-i386/670_run_efibootmgr.sh
2022-06-14 12:45:52.320646224 Including finalize/default/880_check_for_mount_by_id.sh
2022-06-14 12:45:52.329801215 Including finalize/default/890_finish_checks.sh
2022-06-14 12:45:52.340399684 WARNING:
For this system
RedHatEnterpriseServer/8 on Linux-i386 (based on Fedora/8/i386)
there is no code to install a boot loader on the recovered system
or the code that we have failed to install the boot loader correctly.
Please contribute appropriate code to the Relax-and-Recover project,
see http://relax-and-recover.org/development/
Take a look at the scripts in /usr/share/rear/finalize - for example
for PC architectures like x86 and x86_64 see the script
/usr/share/rear/finalize/Linux-i386/660_install_grub2.sh
and for POWER architectures like ppc64le see the script
/usr/share/rear/finalize/Linux-ppc64le/660_install_grub2.sh
---------------------------------------------------
|  IF YOU DO NOT INSTALL A BOOT LOADER MANUALLY,  |
|  THEN YOUR SYSTEM WILL NOT BE ABLE TO BOOT.     |
---------------------------------------------------
You can use 'chroot /mnt/local bash --login'
to change into the recovered system and
manually install a boot loader therein.

2022-06-14 12:45:52.347734289 Including finalize/default/900_remount_sync.sh
2022-06-14 12:45:52.350216411 Finished running 'finalize' stage in 271 seconds
2022-06-14 12:45:52.352338551 ======================
2022-06-14 12:45:52.354267487 Running 'wrapup' stage
2022-06-14 12:45:52.356258575 ======================
2022-06-14 12:45:52.368024420 Including wrapup/default/500_post_recovery_script.sh
2022-06-14 12:45:52.374734324 Including wrapup/default/980_good_bye.sh
2022-06-14 12:45:52.381432044 Including wrapup/default/990_copy_logfile.sh
removed '/mnt/local//root/rear-2022-06-11T01:35:05+02:00.log'
'/mnt/local//root/rear-2022-06-14T12:45:52+02:00.log' -> '/var/log/rear/recover/rear-server2.log'
2022-06-14 12:45:52.396646758 Finished running 'wrapup' stage in 0 seconds
2022-06-14 12:45:52.398795849 Finished running recover workflow
2022-06-14 12:45:52.406599457 Exiting rear recover (PID 1771) and its descendant processes ...
2022-06-14 12:45:55.451808639 rear,1771 /bin/rear recover
  • Workaround, if any:

what i did before reboot (i cannot determine, if the system did reboot or not without those steps).
The only way for me, was to do the following steps as precaution:

- chroot /mnt/local bash --login
- grub2-install /dev/sda
- grub2-install /dev/sdb
- grub2-mkconfig -o /boot/grub2/grub.cfg

There was no issue finding the the config files !

The system did reboot successfully !

  • Attachments, as applicable ("rear -D mkrescue/mkbackup/recover" debug log files):

jsmeix commented at 2022-06-22 07:16:

@Githopp192

I think

2022-06-14 12:45:52.295025695 Determining where to install GRUB2 (no GRUB2_INSTALL_DEVICES specified)
2022-06-14 12:45:52.306608019 Cannot install GRUB2 (unable to find a /boot or / partition)

tells it.

I.e. you should specify GRUB2_INSTALL_DEVICES,
see its description in usr/share/rear/conf/default.conf

I experienced this kind of issue with RAID1 and for me

GRUB2_INSTALL_DEVICES="/dev/sda /dev/sdb"

helped.

I don't know the reason why the automatism in
usr/share/rear/finalize/Linux-i386/660_install_grub2.sh
is unable to find a /boot or / partition with RAID1.
Perhaps this happens only when LVM is used on top of RAID1?
But I didn't spend time to analyze the root cause
because specifying GRUB2_INSTALL_DEVICES "just worked"
at least for me (and in case of doubt I prefer specifying
the right facts over relying on possible fragile automatisms).

Githopp192 commented at 2022-06-27 20:09:

thx @jsmeix - yes exactly - using md-softraid RAID1, too.

Thanks for the hint - i added "GRUB2_INSTALL_DEVICES="/dev/sda /dev/sdb" now to /etc/rear/local.conf

You mentioned "Perhaps this happens only when LVM is used on top of RAID1?"

i'm using ext4 for the /boot part:

NAME         FSTYPE            LABEL           UUID   MOUNTPOINT
sda
├─sda1       linux_raid_member server2:pv00    xxx
│ └─md125    ext4                              yyy    /boot

pcahyna commented at 2022-06-27 20:36:

That's weird, I believe I tested this (boot from RAID 1) in the past and it worked. Is this the ReaR package shipped with RHEL (not the upstream release or the upstream development version)? Are you able to reproduce it and re-run the recovery with the -D flag and provide the log, please?

Githopp192 commented at 2022-06-27 21:24:

Name : rear
Version : 2.6
Release : 4.el8
Architecture : x86_64
Size : 2.5 M
Source : rear-2.6-4.el8.src.rpm
Repository : @System
From repo : rhel-8-for-x86_64-appstream-rpms
Summary : Relax-and-Recover is a Linux disaster recovery and system migration tool
URL : http://relax-and-recover.org/
License : GPLv3

i was not able to reproduce it - i did run the "workaround" commands (see obove) & the system booted.

**REAR-log: (without -D flag) **

rear.log

jsmeix commented at 2022-06-30 11:51:

In https://github.com/rear/rear/files/8995598/rear.log
something looks completely wrong during
restore via Bacula (excerpts from that log file):

2022-06-14 12:24:09.643981496 Running 'layout/recreate' stage
2022-06-14 12:24:09.645905076 ======================
2022-06-14 12:24:09.657149747 Including layout/recreate/default/100_confirm_layout_code.sh
2022-06-14 12:24:09.664109658 UserInput: called in /usr/share/rear/layout/recreate/default/100_confirm_layout_code.sh line 26
2022-06-14 12:24:09.669788565 UserInput: Default input in choices - using choice number 1 as default input
2022-06-14 12:24:09.672459197 Confirm or edit the disk recreation script
2022-06-14 12:24:09.675164986 1) Confirm disk recreation script and continue 'rear recover'
2022-06-14 12:24:09.677714576 2) Edit disk recreation script (/var/lib/rear/layout/diskrestore.sh)
2022-06-14 12:24:09.680203480 3) View disk recreation script (/var/lib/rear/layout/diskrestore.sh)
2022-06-14 12:24:09.682721249 4) View original disk space usage (/var/lib/rear/layout/config/df.txt)
2022-06-14 12:24:09.685252236 5) Use Relax-and-Recover shell and return back to here
2022-06-14 12:24:09.687811608 6) Abort 'rear recover'
2022-06-14 12:24:09.690405561 (default '1' timeout 300 seconds)
2022-06-14 12:24:15.103391197 UserInput: 'read' got as user input ''
2022-06-14 12:24:15.112834641 User confirmed disk recreation script
2022-06-14 12:24:15.121691888 Including layout/recreate/default/200_run_layout_code.sh
2022-06-14 12:24:15.128314936 Start system layout restoration.
+++ create_component /dev/sda disk
...
+++ set +x
2022-06-14 12:24:26.600104307 Disk layout created.
2022-06-14 12:24:26.606502665 UserInput: called in /usr/share/rear/layout/recreate/default/200_run_layout_code.sh line 98
2022-06-14 12:24:26.611988093 UserInput: Default input in choices - using choice number 1 as default input
2022-06-14 12:24:26.614679470 Confirm the recreated disk layout or go back one step
2022-06-14 12:24:26.617250002 1) Confirm recreated disk layout and continue 'rear recover'
2022-06-14 12:24:26.619781925 2) Go back one step to redo disk layout recreation
2022-06-14 12:24:26.622359749 3) Use Relax-and-Recover shell and return back to here
2022-06-14 12:24:26.624873542 4) Abort 'rear recover'
2022-06-14 12:24:26.627594781 (default '1' timeout 300 seconds)
2022-06-14 12:24:29.903325386 UserInput: 'read' got as user input ''
2022-06-14 12:24:29.912567488 User confirmed recreated disk layout
2022-06-14 12:24:29.919874994 Including layout/recreate/default/250_verify_mount.sh
2022-06-14 12:24:29.925660839 Finished running 'layout/recreate' stage in 20 seconds
2022-06-14 12:24:29.927805368 ======================
2022-06-14 12:24:29.929762906 Running 'restore' stage
2022-06-14 12:24:29.931697422 ======================
2022-06-14 12:24:29.943297934 Including restore/Fedora/050_copy_dev_files.sh
2022-06-14 12:24:29.986066623 Including restore/default/050_remount_async.sh
2022-06-14 12:24:29.992730746 Including restore/BACULA/default/400_restore_backup.sh
2022-06-14 12:24:30.000032073 
The system is now ready for a restore via Bacula. bconsole will be started for
you to restore the required files. It's assumed that you know what is necessary
to restore - typically it will be a full backup.

Do not exit 'bconsole' until all files are restored

WARNING: The new root is mounted under '/mnt/local'.

Press ENTER to start bconsole
2022-06-14 12:27:10.910173212 UserInput: 'read' timed out with non-zero exit code
2022-06-14 12:27:10.918487118 Continuing 'rear recover' by default
2022-06-14 12:27:10.938294118 Including layout/prepare/default/310_remove_exclusions.sh
2022-06-14 12:27:11.023995593 Including layout/prepare/default/320_apply_mappings.sh
2022-06-14 12:27:11.028876350 Completely identical layout mapping in /var/lib/rear/layout/disk_mappings
2022-06-14 12:27:11.033922193 Completely identical layout mapping in /var/lib/rear/layout/disk_mappings
2022-06-14 12:27:11.039040265 Completely identical layout mapping in /var/lib/rear/layout/disk_mappings
2022-06-14 12:27:11.048128038 Including layout/prepare/default/420_autoresize_last_partitions.sh
2022-06-14 12:27:11.066318633 Including layout/prepare/default/430_autoresize_all_partitions.sh
2022-06-14 12:27:11.072619974 Including layout/prepare/default/500_confirm_layout_file.sh
2022-06-14 12:27:11.078923853 UserInput: called in /usr/share/rear/layout/prepare/default/500_confirm_layout_file.sh line 26
2022-06-14 12:27:11.084274851 UserInput: Default input in choices - using choice number 1 as default input
2022-06-14 12:27:11.086888456 Confirm or edit the disk layout file
2022-06-14 12:27:11.096314934 1) Confirm disk layout and continue 'rear recover'
2022-06-14 12:27:11.105779579 2) Edit disk layout (/var/lib/rear/layout/disklayout.conf)
2022-06-14 12:27:11.115397571 3) View disk layout (/var/lib/rear/layout/disklayout.conf)
2022-06-14 12:27:11.124761486 4) View original disk space usage (/var/lib/rear/layout/config/df.txt)
2022-06-14 12:27:11.134471847 5) Use Relax-and-Recover shell and return back to here
2022-06-14 12:27:11.144432964 6) Abort 'rear recover'
2022-06-14 12:27:11.154225635 (default '1' timeout 300 seconds)
2022-06-14 12:32:11.167158596 UserInput: 'read' timed out with non-zero exit code
2022-06-14 12:32:11.175733399 Continuing 'rear recover' by default
2022-06-14 12:32:11.189436650 Including layout/prepare/default/510_list_dependencies.sh
2022-06-14 12:32:11.197735577 Including layout/prepare/default/520_exclude_components.sh
2022-06-14 12:32:11.206363445 Including layout/prepare/default/540_generate_device_code.sh
2022-06-14 12:32:11.222679445 Including layout/prepare/default/550_finalize_script.sh
2022-06-14 12:32:11.230471911 Including layout/prepare/default/600_show_unprocessed.sh
2022-06-14 12:32:11.240738865 Including layout/prepare/default/610_exclude_from_restore.sh
2022-06-14 12:32:11.243015639 Finished running 'layout/prepare' stage in 601 seconds
2022-06-14 12:32:11.244965195 ======================
2022-06-14 12:32:11.246846378 Running 'layout/recreate' stage
2022-06-14 12:32:11.248701520 ======================
2022-06-14 12:32:11.259451517 Including layout/recreate/default/100_confirm_layout_code.sh
2022-06-14 12:32:11.265990050 UserInput: called in /usr/share/rear/layout/recreate/default/100_confirm_layout_code.sh line 26
2022-06-14 12:32:11.271356385 UserInput: Default input in choices - using choice number 1 as default input
2022-06-14 12:32:11.273986991 Confirm or edit the disk recreation script
2022-06-14 12:32:11.283764062 1) Confirm disk recreation script and continue 'rear recover'
2022-06-14 12:32:11.293397365 2) Edit disk recreation script (/var/lib/rear/layout/diskrestore.sh)
2022-06-14 12:32:11.303133280 3) View disk recreation script (/var/lib/rear/layout/diskrestore.sh)
2022-06-14 12:32:11.312837962 4) View original disk space usage (/var/lib/rear/layout/config/df.txt)
2022-06-14 12:32:11.322805021 5) Use Relax-and-Recover shell and return back to here
2022-06-14 12:32:11.332638899 6) Abort 'rear recover'
2022-06-14 12:32:11.342440230 (default '1' timeout 300 seconds)
2022-06-14 12:37:11.355345402 UserInput: 'read' timed out with non-zero exit code
2022-06-14 12:37:11.364124335 Continuing 'rear recover' by default
2022-06-14 12:37:11.379461132 Including layout/recreate/default/200_run_layout_code.sh
2022-06-14 12:37:11.385350246 Start system layout restoration.
2022-06-14 12:37:11.461599976 Disk layout created.
2022-06-14 12:37:11.474899829 UserInput: called in /usr/share/rear/layout/recreate/default/200_run_layout_code.sh line 98
2022-06-14 12:37:11.479722113 UserInput: Default input in choices - using choice number 1 as default input
2022-06-14 12:37:11.482209213 Confirm the recreated disk layout or go back one step
2022-06-14 12:37:11.492305515 1) Confirm recreated disk layout and continue 'rear recover'
2022-06-14 12:37:11.502614861 2) Go back one step to redo disk layout recreation
2022-06-14 12:37:11.512881281 3) Use Relax-and-Recover shell and return back to here
2022-06-14 12:37:11.523002129 4) Abort 'rear recover'
2022-06-14 12:37:11.533043088 (default '1' timeout 300 seconds)
2022-06-14 12:40:50.741821868 bconsole finished with zero exit code
2022-06-14 12:40:50.744491991 
Please verify that the backup has been restored correctly to '/mnt/local'
in the provided shell. When finished, type exit in the shell to continue
recovery.

2022-06-14 12:41:20.801562293 Including restore/default/500_selinux_autorelabel.sh

During 'restore' stage it somehow runs
the 'layout/recreate' stage a second time
which is (at least normally) plain wrong.

Fortunately the second run of the 'layout/recreate' stage
does not actually do anything (i.e. it leaves the target system
unchanged) because the layout code in ReaR "remembers" what it
had already recreated so there is nothing to do for the
second run of the 'layout/recreate' stage.

But it seems with BACKUP=BACULA things do not behave
during "rear recover" as they normally should.

I am not a BACKUP=BACULA user so I cannot actually reproduce
issues with BACKUP=BACULA but I did some kind of "dry run"
with BACKUP=BACULA versus BACKUP=NETFS as follows:

# grep -v '^#' etc/rear/local.conf
...
OUTPUT=USB
BACKUP=BACULA
BACKUP_URL=usb:///dev/disk/by-label/REAR-000
...

# usr/sbin/rear -s recover | egrep 'recreate/|restore/'
Source layout/recreate/default/100_confirm_layout_code.sh
Source layout/recreate/default/120_confirm_wipedisk_disks.sh
Source layout/recreate/default/150_wipe_disks.sh
Source layout/recreate/default/200_run_layout_code.sh
Source layout/recreate/default/220_verify_layout.sh
Source layout/recreate/default/250_verify_mount.sh
Source restore/default/050_remount_async.sh
Source restore/BACULA/default/400_restore_backup.sh
Source restore/default/500_selinux_autorelabel.sh
Source restore/default/900_create_missing_directories.sh
Source restore/SUSE_LINUX/910_create_missing_directories.sh
Source restore/default/990_move_away_restored_files.sh
Source restore/default/995_remount_sync.sh

# grep -v '^#' etc/rear/local.conf
...
OUTPUT=USB
BACKUP=NETFS
BACKUP_URL=usb:///dev/disk/by-label/REAR-000
...

# usr/sbin/rear -s recover | egrep 'recreate/|restore/'
Source layout/recreate/default/100_confirm_layout_code.sh
Source layout/recreate/default/120_confirm_wipedisk_disks.sh
Source layout/recreate/default/150_wipe_disks.sh
Source layout/recreate/default/200_run_layout_code.sh
Source layout/recreate/default/220_verify_layout.sh
Source layout/recreate/default/250_verify_mount.sh
Source restore/default/050_remount_async.sh
Source restore/NETFS/default/100_mount_NETFS_path.sh
Source restore/NETFS/default/200_remove_relative_rsync_option.sh
Source restore/NETFS/default/380_prepare_multiple_isos.sh
Source restore/NETFS/default/400_restore_backup.sh
Source restore/NETFS/default/500_selinux_autorelabel.sh
Source restore/default/500_selinux_autorelabel.sh
Source restore/NETFS/Linux-i386/510_selinux_fixfiles_exclude_dirs.sh
Source restore/NETFS/default/510_set_capabilities.sh
Source restore/default/900_create_missing_directories.sh
Source restore/SUSE_LINUX/910_create_missing_directories.sh
Source restore/NETFS/default/980_umount_NETFS_dir.sh
Source restore/default/990_move_away_restored_files.sh
Source restore/default/995_remount_sync.sh

so for me also with only plain BACKUP=BACULA as a test
the 'layout/recreate' stage is run only once as it should.

jsmeix commented at 2022-06-30 12:15:

It seems there were several "rear recover" processes
running in parallel:

Those ReaR scripts are run two times:

layout/prepare/default/310_remove_exclusions.sh
layout/prepare/default/320_apply_mappings.sh
layout/prepare/default/420_autoresize_last_partitions.sh
layout/prepare/default/430_autoresize_all_partitions.sh
layout/prepare/default/500_confirm_layout_file.sh
layout/prepare/default/510_list_dependencies.sh
layout/prepare/default/520_exclude_components.sh
layout/prepare/default/540_generate_device_code.sh
layout/prepare/default/550_finalize_script.sh
layout/prepare/default/600_show_unprocessed.sh
layout/prepare/default/610_exclude_from_restore.sh
layout/recreate/default/100_confirm_layout_code.sh
layout/recreate/default/200_run_layout_code.sh
layout/recreate/default/250_verify_mount.sh
restore/BACULA/default/400_restore_backup.sh
restore/default/050_remount_async.sh
restore/Fedora/050_copy_dev_files.sh

but other things are run even three times

2022-06-14 12:41:35.888780755 Running mkinitrd...
2022-06-14 12:43:22.759435618 Running mkinitrd...
2022-06-14 12:44:26.001667243 Running mkinitrd...

Githopp192 commented at 2022-06-30 17:40:

Thanks very much, Johannes.

I can not exclude that I made a mistake (do several runs) - if this were the case - my apologies

jsmeix commented at 2022-07-01 08:18:

The issue is that normally it should not happen that
several "rear recover" (or "rear mkrescue/mkbackup") processes
can run in parallel because we have a check for that in usr/sbin/rear
(I already had "some fun" with that check in the past ;-)

But that check does not yet work sufficiently reliable
in particular not when 'rear' is run from a GitHub checkout/clone
so the 'rear' program is /path/to/checkout/usr/sbin/rear

jsmeix commented at 2022-07-01 08:21:

Via
https://github.com/rear/rear/commit/e04125b03d1ac2f10e965852e9f33d40fe913447
I enhanced the check in usr/sbin/rear
for other simultaneously running 'rear'
to let it find running 'rear' instances
independent of how it was called by the user
in particular also when 'rear' is run
from a GitHub checkout/clone.

jsmeix commented at 2022-07-01 08:31:

@Githopp192 @pcahyna
I assume my enhancement
https://github.com/rear/rear/commit/e04125b03d1ac2f10e965852e9f33d40fe913447
does not help in this particular case here because
https://github.com/rear/rear/files/8995598/rear.log
contains

2022-06-14 12:23:55.694183313 Command line options: /bin/rear recover

so it seems in this particular case here the 'rear' program
is not installed under a /sbin/ (sub)-directory
but under /bin/ which is not what is checked.

For comparison how it looks for me
on openSUSE Leap 15.3 with openSUSE's 'rear23a' RPM package

2022-07-01 10:17:37.031049342 Command line options: /usr/sbin/rear -s mkrescue

where I called 'rear' as

# rear -s mkrescue

and on the same system from a GitHub checkout

2022-07-01 10:28:30.283543904 Command line options: usr/sbin/rear -D -s mkrescue

where I called 'rear' as

# cd /path/to/checkout

# usr/sbin/rear -D -s mkrescue

pcahyna commented at 2022-07-01 08:33:

@jsmeix here the issue would be running multiple rear recover in parallel, not multiple rear mkrescue. Where is rear found on the rescue system when run from a Git checkout?

jsmeix commented at 2022-07-01 08:34:

@pcahyna
ARGH! - I fear you are right - let me check...

jsmeix commented at 2022-07-01 08:59:

In the ReaR recovery system it is always

/sbin:     symbolic link to bin
/usr/sbin: symbolic link to ../bin
/usr/bin:  symbolic link to ../bin
/bin:      directory

and the 'rear' bash script is /bin/rear
regardless if "rear mkrescue" was run from a Git checkout

# chroot /var/tmp/rear.dWgO3DKyGchoKRa/rootfs/

bash-4.4# type -a rear
rear is /sbin/rear
rear is /usr/sbin/rear
rear is /usr/bin/rear
rear is /bin/rear

bash-4.4# file /sbin /usr/sbin /usr/bin /bin     
/sbin:     symbolic link to bin
/usr/sbin: symbolic link to ../bin
/usr/bin:  symbolic link to ../bin
/bin:      directory

bash-4.4# echo $PATH
/sbin:/usr/sbin:/usr/local/sbin:/root/bin:/usr/local/bin:/usr/bin:/bin

or run from an installed RPM package

# chroot /tmp/rear.rMOyIy61Yg3EyNC/rootfs/

bash-4.4# type -a rear
rear is /sbin/rear
rear is /usr/sbin/rear
rear is /usr/bin/rear
rear is /bin/rear

bash-4.4# file /sbin /usr/sbin /usr/bin /bin
/sbin:     symbolic link to bin
/usr/sbin: symbolic link to ../bin
/usr/bin:  symbolic link to ../bin
/bin:      directory

bash-4.4# echo $PATH
/sbin:/usr/sbin:/usr/local/sbin:/root/bin:/usr/local/bin:/usr/bin:/bin

Interestingly type -a rear shows something different
from inside a booted ReaR recovery system

RESCUE localhost:~ # type -a rear
rear is /bin/rear

RESCUE localhost:~ # file /sbin /usr/sbin /usr/bin /bin
/sbin:     symbolic link to bin
/usr/sbin: symbolic link to ../bin
/usr/bin:  symbolic link to ../bin
/bin:      directory

RESCUE localhost:~ # echo $PATH
/bin

because there $PATH is only 'bin'
and accordingly inside a booted ReaR recovery system one gets

RESCUE localhost:~ # rear -D -s recover 1>/dev/null

RESCUE localhost:~ # head /var/log/rear/rear-localhost.log
...
2022-07-01 10:48:24.647073881 Command line options: /bin/rear -D -s recover

q.e.d.

I need to further enhance that check in the 'rear' main script...

RESCUE localhost:~ # pwd
/root

# ../bin/rear -D -s recover 1>/dev/null & sleep 1 ; ps auwx | grep rear
[1] 3789
root      3789 24.2  0.3  17236  6448 pts/0    S    11:02   0:00 /bin/bash ../bin/rear -D -s recover

RESCUE localhost:~ # head /var/log/rear/rear-localhost.log
...
2022-07-01 11:00:09.894053570 Command line options: ../bin/rear -D -s recover

so also in the ReaR recovery system one cannot
rely on that it is called as '/bin/rear'

jsmeix commented at 2022-07-01 09:25:

When in the ReaR recovery system 'rear' is called normally like

RESCUE localhost:~ # rear recover

then $SCRIPT_FILE is '/bin/rear'
and the check works (also the non-enhanced check)

RESCUE localhost:~ # rear -D recover & rear -D recover
[1] 18691
ERROR: rear is already running, not starting again
RESCUE localhost:~ # Relax-and-Recover 2.6 / Git
Running rear recover (PID 18691 date 2022-07-01 11:24:03)
Command line options: /bin/rear -D recover
...
[Ctrl]+[C]
[1]+  Terminated              rear -D recover

The check fails when in the ReaR recovery system 'rear'
is not called normally like

RESCUE localhost:~ # ../bin/rear -D recover & ../bin/rear -D recover
[1] 20315
Relax-and-Recover 2.6 / Git
Relax-and-Recover 2.6 / Git
Running rear recover (PID 20315 date 2022-07-01 11:28:26)
Running rear recover (PID 20316 date 2022-07-01 11:28:26)
Command line options: ../bin/rear -D recover
Command line options: ../bin/rear -D recover
...

or

RESCUE localhost:~ # cd /bin

RESCUE localhost:/bin # ./rear -D recover & ./rear -D recover
[1] 21850
Relax-and-Recover 2.6 / Git
Relax-and-Recover 2.6 / Git
Running rear recover (PID 21850 date 2022-07-01 11:30:53)
Running rear recover (PID 21851 date 2022-07-01 11:30:53)
Command line options: ./rear -D recover
Command line options: ./rear -D recover
...

but the last case works with the enhanced check because
it also checks for "./$PROGRAM" which is ./rear

jsmeix commented at 2022-07-01 10:31:

Via
https://github.com/rear/rear/commit/63e0540e3617d1b7a024f7c188d5135cce733095
I added a comment in usr/sbin/rear that describes
how the check for other simultaneously running 'rear'
works from inside a booted ReaR recovery system
cf. https://github.com/rear/rear/issues/2826#issuecomment-1172111440
and I described the limitations of that check,
cf. https://github.com/rear/rear/issues/2826#issuecomment-1172138188

In particular this check is not meant to detect any possible way
how another rear instance might have been called.

Githopp192 commented at 2022-07-04 20:44:

It's really a fantastic Job you do @jsmeix - i'm very impressed how much knowledge you've got about all the different & possible recovery scenarios. Really, really excellent work!

jsmeix commented at 2022-07-05 06:48:

@Githopp192
thank you for your plaudit.

Regardless how much experience I get
there is always at least one unnoticed case left
and then I need other's help like here @pcahyna via his
https://github.com/rear/rear/issues/2826#issuecomment-1172088217
to get some more cases included but even then there
will still be at least another unnoticed case left
and so on ad infinitum ;-)

This is the reason behind why

There is no such thing as a disaster recovery solution that "just works".

https://en.opensuse.org/SDB:Disaster_Recovery#Inappropriate_expectations

jsmeix commented at 2022-07-07 06:30:

I assume this issue is sufficiently solved so I can close it.
Further comments may be added regardless that it is closed
or it could be reopened if needed.

Githopp192 commented at 2022-07-07 07:03:

Yes, Johannes - you can close it. Thx for investigating.
Btw - i did a full recovery yesterday on the same server and the "automatic recover" did run smoothly without any error.

I even had another disc that was larger in the mdadm compound - and REAR automatically enlarged it.

Perfect.

If you may be interested in the REAR-logs - just tell.


[Export of Github issue for rear/rear.]