#2807 Issue closed: Wrong 'df' program (third-party 'Dialafile') in PATH makes ReaR recover fail

Labels: support / question, external tool, not ReaR / invalid

robpomeroy opened issue at 2022-05-13 10:35:

  • ReaR version: Relax-and-Recover 2.6 / 2020-06-17

  • OS version:

NAME="Oracle Linux Server"
VERSION="8.5"
ID="ol"
ID_LIKE="fedora"
VARIANT="Server"
VARIANT_ID="server"
VERSION_ID="8.5"
PLATFORM_ID="platform:el8"
PRETTY_NAME="Oracle Linux Server 8.5"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:oracle:linux:8:5:server"
HOME_URL="https://linux.oracle.com/"
BUG_REPORT_URL="https://bugzilla.oracle.com/"

ORACLE_BUGZILLA_PRODUCT="Oracle Linux 8"
ORACLE_BUGZILLA_PRODUCT_VERSION=8.5
ORACLE_SUPPORT_PRODUCT="Oracle Linux"
ORACLE_SUPPORT_PRODUCT_VERSION=8.5
  • ReaR configuration file (/etc/rear/site.conf):
OUTPUT=ISO
OUTPUT_URL="cifs://pomnas2.lan.pomeroy.me/rear-backup"
OUTPUT_OPTIONS="cred=/etc/rear/cifs_credentials"
BACKUP=NETFS
BACKUP_URL="cifs://pomnas2.lan.pomeroy.me/rear-backup"
BACKUP_OPTIONS="cred=/etc/rear/cifs_credentials"
  • Hardware vendor/product: VMware ESXi 6.7

  • System architecture: x86_64

  • Firmware and bootloader: UEFI/Grub2

  • Storage: 2 x 100GB virtual disks (paravirtualised) - note: I'm using MD RAID

  • Storage layout:

NAME                           KNAME      PKNAME     TRAN   TYPE  FSTYPE            LABEL                             SIZE MOUNTPOINT
/dev/sda                       /dev/sda                     disk                                                      100G 
|-/dev/sda1                    /dev/sda1  /dev/sda          part  linux_raid_member pv01                               40G 
| `-/dev/md126                 /dev/md126 /dev/sda1         raid1 LVM2_member                                          40G 
|   `-/dev/mapper/vg_u-u       /dev/dm-1  /dev/md126        lvm   xfs               u                                  20G /u
|-/dev/sda2                    /dev/sda2  /dev/sda          part  linux_raid_member epcs-ol8-lvm.lan.pomeroy.me:md2    16G 
| `-/dev/md2                   /dev/md2   /dev/sda2         raid1 swap              md2                                16G [SWAP]
|-/dev/sda3                    /dev/sda3  /dev/sda          part  linux_raid_member epcs-ol8-lvm.lan.pomeroy.me:md0     1G 
| `-/dev/md0                   /dev/md0   /dev/sda3         raid1 xfs               md0                                 1G /boot
|-/dev/sda4                    /dev/sda4  /dev/sda          part  linux_raid_member epcs-ol8-lvm.lan.pomeroy.me:md4     1G 
| `-/dev/md4                   /dev/md4   /dev/sda4         raid1 vfat              md4                                 1G /boot/efi
`-/dev/sda5                    /dev/sda5  /dev/sda          part  linux_raid_member pv00                               40G 
  `-/dev/md127                 /dev/md127 /dev/sda5         raid1 LVM2_member                                          40G 
    `-/dev/mapper/vg_root-root /dev/dm-0  /dev/md127        lvm   xfs               root                               20G /
/dev/sdb                       /dev/sdb                     disk                                                      100G 
|-/dev/sdb1                    /dev/sdb1  /dev/sdb          part  linux_raid_member pv01                               40G 
| `-/dev/md126                 /dev/md126 /dev/sdb1         raid1 LVM2_member                                          40G 
|   `-/dev/mapper/vg_u-u       /dev/dm-1  /dev/md126        lvm   xfs               u                                  20G /u
|-/dev/sdb2                    /dev/sdb2  /dev/sdb          part  linux_raid_member epcs-ol8-lvm.lan.pomeroy.me:md2    16G 
| `-/dev/md2                   /dev/md2   /dev/sdb2         raid1 swap              md2                                16G [SWAP]
|-/dev/sdb3                    /dev/sdb3  /dev/sdb          part  linux_raid_member epcs-ol8-lvm.lan.pomeroy.me:md0     1G 
| `-/dev/md0                   /dev/md0   /dev/sdb3         raid1 xfs               md0                                 1G /boot
|-/dev/sdb4                    /dev/sdb4  /dev/sdb          part  linux_raid_member epcs-ol8-lvm.lan.pomeroy.me:md4     1G 
| `-/dev/md4                   /dev/md4   /dev/sdb4         raid1 vfat              md4                                 1G /boot/efi
`-/dev/sdb5                    /dev/sdb5  /dev/sdb          part  linux_raid_member pv00                               40G 
  `-/dev/md127                 /dev/md127 /dev/sdb5         raid1 LVM2_member                                          40G 
    `-/dev/mapper/vg_root-root /dev/dm-0  /dev/md127        lvm   xfs               root                               20G /
/dev/sdc                       /dev/sdc              usb    disk                                                      3.7T 
|-/dev/sdc1                    /dev/sdc1  /dev/sdc          part  ext2                                                  2M 
`-/dev/sdc2                    /dev/sdc2  /dev/sdc          part  ext2                                                3.7T 
/dev/sr0                       /dev/sr0              sata   rom   iso9660           RELAXRECOVER                    515.2M
  • Description of the issue:

Create full backup with rear -v mkbackup.
Provision identical VM and boot from newly-created ReaR ISO.

ReaR says

Ambiguous disk layout needs manual configuration (more than one disk with the same size...)
Select "2" - Confirm identical disk mapping and proceed without manual configuration

Restoration dies with

ERROR: No filesystem mounted on 'mnt/local'. Stopping.

Excerpt from log says

Cannot change working directory to DFFILES: No such file or directory

Presumably ReaR is struggling with the combination of MD RAID and LVM?

  • Workaround, if any:

None.

  • Attachments, as applicable ("rear -D mkrescue/mkbackup/recover" debug log files):

N/A

  • Comments:

I'm more than willing to try dev branches, and provide further, detailed feedback if that helps.

jsmeix commented at 2022-05-13 13:46:

@robpomeroy
MD RAID1 with LVM has worked for me during several tests
in the past so MD RAID1 with LVM should work in ReaR.

But I test on QEMU/KVM virtual machines.
I do not have VMware.

To find out why it does not work in your particular case
we need "rear -D mkrescue/mkbackup/recover" debug log files.

See the section
"Debugging issues with Relax-and-Recover" in
https://en.opensuse.org/SDB:Disaster_Recovery
therein in particular its sub-section
"To analyze and debug a 'rear recover' failure ..."

You provided most info in your initial comment.

What is missing is:

  • Debug log file of "rear -d -D mkbackup" or "rear -d -D mkrescue" (in /var/log/rear/) that matches the "rear recover" failure (i.e. debug log from the original system of the "rear -d -D mkbackup" or "rear -d -D mkrescue" command that created the ReaR recovery system where "rear recover" failed)
  • Debug log file of "rear -d -D recover" (in /var/log/rear/) from the ReaR recovery system where "rear recover" failed
  • Contents of the /var/lib/rear/ directory and in its sub-directories from the ReaR recovery system where "rear recover" failed

When "rear -d -D recover" fails, you need to
save the log file out of the ReaR recovery system
(where "rear -d -D recover" was run and where it had failed)
before you shut down the ReaR recovery system - otherwise
the log file would be lost (because the ReaR recovery system
runs in a ramdisk).

Additionally the files in the /var/lib/rear/ directory
and in its sub-directories in the ReaR recovery system
(in particular /var/lib/rear/layout/disklayout.conf
and /var/lib/rear/layout/diskrestore.sh) are needed
to analyze a "rear -d -D recover" failure.

See the "First steps with Relax-and-Recover" section in
https://en.opensuse.org/SDB:Disaster_Recovery
how to access the ReaR recovery system
from remote via ssh so that you can use 'scp'
to get files out of the ReaR recovery system.

robpomeroy commented at 2022-05-13 15:26:

Hallo @jsmeix.

The various files are here (link expires in 7 days).

I can see one problem immediately - I don't know if it's fatal. In /var/lib/rear/layout/config/df.txt you'll see an error message: "Dialafile has been scheduled with an illegal command -PL ". This is because unfortunately for historical reasons, these systems have a binary file "df" in the search path, which is not the "disk free space" command - it's a proprietary program called Dialafile. I wasn't around when that decision was made! 😂

Other than that, I can't see any obvious issues. But I don't know ReaR well enough, so my eyes are uneducated!

pcahyna commented at 2022-05-13 15:33:

I am afraid this will be very likely fatal. ReaR uses the df command in many places and if your df program deviates in its behavior from the usual df program, strange errors may happen.

robpomeroy commented at 2022-05-13 15:37:

Thanks @pcahyna. I will see if I can remove it from the path for the duration of backup & restoration.

robpomeroy commented at 2022-05-13 16:07:

@jsmeix & @pcahyna - thanks so much for your guidance. Setting $PATH correctly before running rear mkbackup fixes this.

I have the wrong default option in the grub boot menu on my restored system, but for that I shall consult the documentation.

Again, thank you!

pcahyna commented at 2022-05-13 16:15:

@robpomeroy glad it has helped !

jsmeix commented at 2022-05-16 07:45:

@robpomeroy
thank you for your clear description in
https://github.com/rear/rear/issues/2807#issuecomment-1126177354
what the root cause was why ReaR (and all what uses 'df') failed.

FYI:
This was not the first case we had here at ReaR
where a disaster recovery test with ReaR reveals
a broken setup of the original system.
In particular we already had various "fun" with
various different kind of third-party software
that had messed up things in the original system.
So disaster recovery setup plus recovery test with ReaR
help to check if the basic setup of the original system
seems to be reasonably "normal".

robpomeroy commented at 2022-05-16 08:09:

@jsmeix - I probably should have realised anyway. I have to use absolute paths so often, for exactly this reason. I've had a few scripts fail because $PATH isn't as expected.

I suppose because ReaR runs on so many systems, you can't really plan for every possible scenario. 🙂 Thanks!

jsmeix commented at 2022-05-16 08:32:

I think ReaR should not call basic programs by full path
because ReaR should work generically on all systems
so ReaR should respect a system admin's configuration.

For example when an admin has modified '$PATH'
then all programs should respect his will
and behave as the admin specified.

I think via hardcoded full paths the programmer
who hardcoded them imposes his will on the user
who may have specified things differently.

Of course when the user specifies "wrong" things
his system behaves "wrong" (from other's point of view)
but in general I regard the user's will as sacrosanct.
Accordingly my general principle is:

Final power to the user!

Only kids are sometimes an exception where
"parental overwrite" is good - e.g. when a kid
is up to cross the main street all of a sudden.

robpomeroy commented at 2022-05-16 09:05:

@jsmeix - I completely agree Johannes!


[Export of Github issue for rear/rear.]