#2827 PR merged: Use fail-safe 'yes' pipe for "lvm lvcreate"

Labels: enhancement, bug, fixed / solved / done

jsmeix opened issue at 2022-06-22 11:43:

  • Type: Bug Fix

  • Impact: Normal

  • Reference to related issue (URL):

https://github.com/rear/rear/issues/513
https://github.com/rear/rear/issues/2820

  • How was this pull request tested?

Works for me on SLES11 SP4 and SLES15 SP3

  • Brief description of the changes in this pull request:

In layout/prepare/GNU/Linux/110_include_lvm_code.sh
pipe as many 'y' as asked for into "lvm lvcreate"

jsmeix commented at 2022-06-22 13:04:

@pcahyna
thank you for having a look!

I am currently fighting with testing it on SLES11
...
e.g. there is no 'git' in the default SLES11 repository
(how was life possible at that ancient time without 'git' ? :-)

jsmeix commented at 2022-06-22 14:53:

To test it one must be in MIGRATION_MODE during "rear recover"
(e.g. via export MIGRATION_MODE='true' before rear recover)
because if not in MIGRATION_MODE the code in
layout/prepare/GNU/Linux/110_include_lvm_code.sh
restores LVM stuff via 'vgcfgrestore'
and then the traditional 'vgcreate/lvcreate' commands
are not used to restores LVM stuff.

On SLES11 SP4:

Original system (with LVM that is LUKS encrypted):

# lsblk -o NAME,KNAME,FSTYPE,LABEL,SIZE,MOUNTPOINT,UUID
NAME                     KNAME FSTYPE      LABEL  SIZE MOUNTPOINT UUID
sda                      sda                       15G            
├─sda1                   sda1  ext3               156M /boot      d5e7c2fd-14cf-402c-86c7-c66a34530ae3
└─sda2                   sda2  crypto_LUKS       14.9G            70fc2b5a-d3b8-4ba9-8ff0-6f3e5d0fa4a9
  └─cr_sda2 (dm-0)       dm-0  LVM2_member       14.9G            ba31sh-fZwU-2EvK-smlc-3q3z-UxCi-X94KZc
    ├─system-root (dm-1) dm-1  ext3                10G /          9e3dd4f2-cc78-4eb3-a79e-6b3ae61ce85d
    └─system-swap (dm-2) dm-2  swap                 2G [SWAP]     85800711-afb6-406e-b7bc-97c1a4c1f88c

Recovery (excerpts)

RESCUE linux-a5v5:~ # export MIGRATION_MODE='true'

RESCUE linux-a5v5:~ # rear -D recover
...
Start system layout restoration.
Disk '/dev/sda': creating 'msdos' partition table
Disk '/dev/sda': creating partition number 1 with name 'primary'
Disk '/dev/sda': creating partition number 2 with name 'primary'
Creating LUKS volume cr_sda2 on /dev/sda2
Set the password for LUKS volume cr_sda2 (for 'cryptsetup luksFormat' on /dev/sda2):
Enter LUKS passphrase: 
Enter the password for LUKS volume cr_sda2 (for 'cryptsetup luksOpen' on /dev/sda2):
Enter passphrase for /dev/sda2: 
Creating LVM PV /dev/mapper/cr_sda2
Creating LVM VG 'system' (some properties may not be preserved)
Creating LVM volume 'system/swap' (some properties may not be preserved)
Creating LVM volume 'system/root' (some properties may not be preserved)
Creating filesystem of type ext3 with mount point / on /dev/mapper/system-root.
Mounting filesystem /
Creating filesystem of type ext3 with mount point /boot on /dev/sda1.
Mounting filesystem /boot
Creating swap on /dev/mapper/system-swap
Disk layout created.
...
Backup restore program 'tar' started in subshell (PID=4306)
Restored 145 MiB [avg. 49520 KiB/sec] 
...
Restored 2667 MiB in 212 seconds [avg. 12882 KiB/sec]
Restoring finished (verify backup restore log messages in /var/lib/rear/restore/recover.backup.tar.gz.914.restore.log)
...
Running 'finalize' stage ======================
/etc/mtab: FAILED
md5sum: WARNING: 1 computed checksum did NOT match
Error: Restored files do not match the recreated system in /mnt/local
...
Recreated initrd (/sbin/mkinitrd).
Installing GRUB Legacy boot loader:
...
Finished 'recover'. The target system is mounted at '/mnt/local'.

RESCUE linux-a5v5:~ # grep -1 lvcreate /var/lib/rear/layout/diskrestore.sh

    if ! ( yes 2>/dev/null || true ) | lvm lvcreate --stripes 1 -L 2147483648b -n swap system ; then
        LogPrintError "Failed to create LVM volume 'system/swap' with lvcreate --stripes 1 -L 2147483648b -n swap system"
        if ( yes 2>/dev/null || true ) | lvm lvcreate --stripes 1 -l 100%FREE -n swap system ; then
            LogPrintError "Created LVM volume 'system/swap' using fallback options lvcreate --stripes 1 -l 100%FREE -n swap system"
        else
            LogPrintError "Also failed to create LVM volume 'system/swap' with lvcreate --stripes 1 -l 100%FREE -n swap system"
            # Explicit 'false' is needed to let the whole 'if then else fi' command exit with non zero exit state
--

    if ! ( yes 2>/dev/null || true ) | lvm lvcreate --stripes 1 -L 10737418240b -n root system ; then
        LogPrintError "Failed to create LVM volume 'system/root' with lvcreate --stripes 1 -L 10737418240b -n root system"
        if ( yes 2>/dev/null || true ) | lvm lvcreate --stripes 1 -l 100%FREE -n root system ; then
            LogPrintError "Created LVM volume 'system/root' using fallback options lvcreate --stripes 1 -l 100%FREE -n root system"
        else
            LogPrintError "Also failed to create LVM volume 'system/root' with lvcreate --stripes 1 -l 100%FREE -n root system"
            # Explicit 'false' is needed to let the whole 'if then else fi' command exit with non zero exit state

RESCUE linux-a5v5:~ # grep -3 lvcreate /var/log/rear/rear-linux-a5v5.log
2022-06-22 16:22:45.512783556 Creating LVM volume 'system/swap' (some properties may not be preserved)
+++ Print 'Creating LVM volume '\''system/swap'\'' (some properties may not be preserved)'
+++ yes
+++ lvm lvcreate --stripes 1 -L 2147483648b -n swap system
+++ true
  Logical volume "swap" created
+++ component_created /dev/mapper/system-swap lvmvol
--
2022-06-22 16:22:45.680975780 Creating LVM volume 'system/root' (some properties may not be preserved)
+++ Print 'Creating LVM volume '\''system/root'\'' (some properties may not be preserved)'
+++ yes
+++ lvm lvcreate --stripes 1 -L 10737418240b -n root system
+++ true
  Logical volume "root" created
+++ component_created /dev/mapper/system-root lvmvol

RESCUE linux-a5v5:~ # lsblk -o NAME,KNAME,FSTYPE,LABEL,SIZE,MOUNTPOINT,UUID
NAME                     KNAME FSTYPE      LABEL  SIZE MOUNTPOINT      UUID
sda                      sda                       15G                 
|-sda1                   sda1  ext3               156M /mnt/local/boot d5e7c2fd-14cf-402c-86c7-c66a34530ae3
`-sda2                   sda2  crypto_LUKS       14.9G                 d8b60a40-8163-49e0-871f-bdc45024d2d0
  `-cr_sda2 (dm-0)       dm-0  LVM2_member       14.9G                 ba31sh-fZwU-2EvK-smlc-3q3z-UxCi-X94KZc
    |-system-swap (dm-1) dm-1  swap                 2G                 85800711-afb6-406e-b7bc-97c1a4c1f88c
    `-system-root (dm-2) dm-2  ext3                10G /mnt/local      9e3dd4f2-cc78-4eb3-a79e-6b3ae61ce85d

The UUID of the sda2 crypto_LUKS has changed
because cryptsetup 1.1.3 in SLES11 SP4
does not support UUID and some other things
so some manual adaptions in disklayout.conf before "rear recover"
plus LUKS_CRYPTSETUP_OPTIONS="--iter-time 200" in local.conf
(i.e. removal of the --use-random default option that is
not supported on SLES11) were needed for LUKS recovery.
But at least the changed UUID should not matter
because it is not used according to

RESCUE linux-a5v5:~ # cat /mnt/local/etc/crypttab 
cr_sda2         /dev/disk/by-id/ata-QEMU_HARDDISK_QM00001-part2 none       none

RESCUE linux-a5v5:~ # ls -l /dev/disk/by-id/ata-QEMU_HARDDISK_QM00001-part2
lrwxrwxrwx 1 root root 10 Jun 22 16:26 /dev/disk/by-id/ata-QEMU_HARDDISK_QM00001-part2 -> ../../sda2

and at least for me the recreated system boots OK.

The recreated system:

# lsblk -o NAME,KNAME,FSTYPE,LABEL,SIZE,MOUNTPOINT,UUID
NAME                     KNAME FSTYPE      LABEL  SIZE MOUNTPOINT UUID
sda                      sda                       15G            
├─sda1                   sda1  ext3               156M /boot      d5e7c2fd-14cf-402c-86c7-c66a34530ae3
└─sda2                   sda2  crypto_LUKS       14.9G            d8b60a40-8163-49e0-871f-bdc45024d2d0
  └─cr_sda2 (dm-0)       dm-0  LVM2_member       14.9G            ba31sh-fZwU-2EvK-smlc-3q3z-UxCi-X94KZc
    ├─system-swap (dm-1) dm-1  swap                 2G [SWAP]     85800711-afb6-406e-b7bc-97c1a4c1f88c
    └─system-root (dm-2) dm-2  ext3                10G /          9e3dd4f2-cc78-4eb3-a79e-6b3ae61ce85d

# file /etc/mtab
/etc/mtab: ASCII text

Only for the fun of it:
That /etc/mtab is ASCII text in SLES11
causes the false alarm during "rear recover"

/etc/mtab: FAILED
md5sum: WARNING: 1 computed checksum did NOT match
Error: Restored files do not match the recreated system in /mnt/local

but I don't care about such cosmetic issues on old SLES11.
In contrast it proves that our new verification tests do work.

jsmeix commented at 2022-06-22 15:04:

For tomorrow I intend to test it on SLES15.

jsmeix commented at 2022-06-23 13:46:

Tested SLES15 SP3:

Original system VM:

# lsblk -ipo NAME,KNAME,PKNAME,TYPE,FSTYPE,SIZE,MOUNTPOINT,UUID
NAME                                               KNAME     PKNAME    TYPE  FSTYPE       SIZE MOUNTPOINT UUID
/dev/sda                                           /dev/sda            disk                15G            
|-/dev/sda1                                        /dev/sda1 /dev/sda  part                 8M            
`-/dev/sda2                                        /dev/sda2 /dev/sda  part  crypto_LUKS   15G            e528a556-1ae8-4c47-b76e-bff0d6daccf4
  `-/dev/mapper/cr_ata-QEMU_HARDDISK_QM00001-part2 /dev/dm-0 /dev/sda2 crypt LVM2_member   15G            0CbZRG-P7o5-uoh6-22nG-iQ8V-uv0o-41gMpL
    |-/dev/mapper/system-rootLV                    /dev/dm-1 /dev/dm-0 lvm   btrfs         12G /          1b7f1eb0-93ff-484c-abab-38ea4440f3cd
    |-/dev/mapper/system-swapLV                    /dev/dm-2 /dev/dm-0 lvm   swap           1G [SWAP]     9ba95ad9-7aac-457d-888c-8770019f1be5
    `-/dev/mapper/system-homeLV                    /dev/dm-3 /dev/dm-0 lvm   xfs            1G /home      3dfef038-fd46-4db8-88a9-37ac3a6037aa

# grep -v '^#' etc/rear/local.conf 
OUTPUT=ISO
BACKUP=NETFS
BACKUP_OPTIONS="nfsvers=3,nolock"
BACKUP_URL=nfs://192.168.122.1/nfs
REQUIRED_PROGS+=( snapper chattr )
PROGS+=( lsattr )
COPY_AS_IS+=( /usr/lib/snapper/installation-helper /etc/snapper/config-templates/default )
BACKUP_PROG_INCLUDE=( /boot/grub2/x86_64-efi /boot/grub2/i386-pc /opt /root /srv /usr/local /tmp /var )
POST_RECOVERY_SCRIPT=( 'if snapper --no-dbus -r $TARGET_FS_ROOT get-config | grep -q "^QGROUP.*[0-9]/[0-9]" ; then snapper --no-dbus -r $TARGET_FS_ROOT set-config QGROUP= ; snapper --no-dbus -r $TARGET_FS_ROOT setup-quota && echo snapper setup-quota done || echo snapper setup-quota failed ; else echo snapper setup-quota not used ; fi' )
SSH_ROOT_PASSWORD='rear'
USE_DHCLIENT="yes"
FIRMWARE_FILES=( 'no' )
MODULES=( 'loaded_modules' )
PROGRESS_MODE="plain"
PROGRESS_WAIT_SECONDS="3"
LUKS_CRYPTSETUP_OPTIONS+=" --force-password"
DISKS_TO_BE_WIPED=''

On replacement system VM (with same virtual disk size):

RESCUE localhost:~ # export MIGRATION_MODE='true'

RESCUE localhost:~ # rear -D recover
...
Start system layout restoration.
Disk '/dev/sda': creating 'gpt' partition table
Disk '/dev/sda': creating partition number 1 with name ''sda1''
Disk '/dev/sda': creating partition number 2 with name ''sda2''
Creating LUKS volume cr_ata-QEMU_HARDDISK_QM00001-part2 on /dev/sda2
Set the password for LUKS volume cr_ata-QEMU_HARDDISK_QM00001-part2 (for 'cryptsetup luksFormat' on /dev/sda2):
Enter passphrase for /dev/sda2: 
Enter the password for LUKS volume cr_ata-QEMU_HARDDISK_QM00001-part2 (for 'cryptsetup luksOpen' on /dev/sda2):
Enter passphrase for /dev/sda2: 
Creating LVM PV /dev/mapper/cr_ata-QEMU_HARDDISK_QM00001-part2
Creating LVM VG 'system' (some properties may not be preserved)
Creating LVM volume 'system/homeLV' (some properties may not be preserved)
Creating LVM volume 'system/swapLV' (some properties may not be preserved)
Creating LVM volume 'system/rootLV' (some properties may not be preserved)
Creating filesystem of type btrfs with mount point / on /dev/mapper/system-rootLV.
Mounting filesystem /
Running snapper/installation-helper
Creating filesystem of type xfs with mount point /home on /dev/mapper/system-homeLV.
Mounting filesystem /home
Creating swap on /dev/mapper/system-swapLV
Disk layout created.
...
Finished 'recover'. The target system is mounted at '/mnt/local'

RESCUE localhost:~ # grep -1 lvcreate /var/lib/rear/layout/diskrestore.sh

    if ! ( yes 2>/dev/null || true ) | lvm lvcreate -L 1073741824b -n homeLV system ; then
        LogPrintError "Failed to create LVM volume 'system/homeLV' with lvcreate -L 1073741824b -n homeLV system"
        if ( yes 2>/dev/null || true ) | lvm lvcreate -l 100%FREE -n homeLV system ; then
            LogPrintError "Created LVM volume 'system/homeLV' using fallback options lvcreate -l 100%FREE -n homeLV system"
        else
            LogPrintError "Also failed to create LVM volume 'system/homeLV' with lvcreate -l 100%FREE -n homeLV system"
            # Explicit 'false' is needed to let the whole 'if then else fi' command exit with non zero exit state
--

    if ! ( yes 2>/dev/null || true ) | lvm lvcreate -L 1073741824b -n swapLV system ; then
        LogPrintError "Failed to create LVM volume 'system/swapLV' with lvcreate -L 1073741824b -n swapLV system"
        if ( yes 2>/dev/null || true ) | lvm lvcreate -l 100%FREE -n swapLV system ; then
            LogPrintError "Created LVM volume 'system/swapLV' using fallback options lvcreate -l 100%FREE -n swapLV system"
        else
            LogPrintError "Also failed to create LVM volume 'system/swapLV' with lvcreate -l 100%FREE -n swapLV system"
            # Explicit 'false' is needed to let the whole 'if then else fi' command exit with non zero exit state
--

    if ! ( yes 2>/dev/null || true ) | lvm lvcreate -L 12884901888b -n rootLV system ; then
        LogPrintError "Failed to create LVM volume 'system/rootLV' with lvcreate -L 12884901888b -n rootLV system"
        if ( yes 2>/dev/null || true ) | lvm lvcreate -l 100%FREE -n rootLV system ; then
            LogPrintError "Created LVM volume 'system/rootLV' using fallback options lvcreate -l 100%FREE -n rootLV system"
        else
            LogPrintError "Also failed to create LVM volume 'system/rootLV' with lvcreate -l 100%FREE -n rootLV system"
            # Explicit 'false' is needed to let the whole 'if then else fi' command exit with non zero exit state

RESCUE localhost:~ # grep -3 lvcreate /var/log/rear/rear-localhost.log 
+++ echo '2022-06-23 15:36:05.613519646 Creating LVM volume '\''system/homeLV'\'' (some properties may not be preserved)'
2022-06-23 15:36:05.613519646 Creating LVM volume 'system/homeLV' (some properties may not be preserved)
+++ Print 'Creating LVM volume '\''system/homeLV'\'' (some properties may not be preserved)'
+++ lvm lvcreate -L 1073741824b -n homeLV system
+++ yes
+++ true
  Logical volume "homeLV" created.
--
+++ echo '2022-06-23 15:36:05.726581428 Creating LVM volume '\''system/swapLV'\'' (some properties may not be preserved)'
2022-06-23 15:36:05.726581428 Creating LVM volume 'system/swapLV' (some properties may not be preserved)
+++ Print 'Creating LVM volume '\''system/swapLV'\'' (some properties may not be preserved)'
+++ lvm lvcreate -L 1073741824b -n swapLV system
+++ yes
+++ true
  Logical volume "swapLV" created.
--
+++ echo '2022-06-23 15:36:05.826265780 Creating LVM volume '\''system/rootLV'\'' (some properties may not be preserved)'
2022-06-23 15:36:05.826265780 Creating LVM volume 'system/rootLV' (some properties may not be preserved)
+++ Print 'Creating LVM volume '\''system/rootLV'\'' (some properties may not be preserved)'
+++ lvm lvcreate -L 12884901888b -n rootLV system
+++ yes
+++ true
  Logical volume "rootLV" created.

RESCUE localhost:~ # lsblk -ipo NAME,KNAME,PKNAME,TYPE,FSTYPE,SIZE,MOUNTPOINT,UUID
NAME                                               KNAME     PKNAME    TYPE  FSTYPE       SIZE MOUNTPOINT      UUID
/dev/sda                                           /dev/sda            disk                15G                 
|-/dev/sda1                                        /dev/sda1 /dev/sda  part                 8M                 
`-/dev/sda2                                        /dev/sda2 /dev/sda  part  crypto_LUKS   15G                 e528a556-1ae8-4c47-b76e-bff0d6daccf4
  `-/dev/mapper/cr_ata-QEMU_HARDDISK_QM00001-part2 /dev/dm-0 /dev/sda2 crypt LVM2_member   15G                 0CbZRG-P7o5-uoh6-22nG-iQ8V-uv0o-41gMpL
    |-/dev/mapper/system-homeLV                    /dev/dm-1 /dev/dm-0 lvm   xfs            1G /mnt/local/home 3dfef038-fd46-4db8-88a9-37ac3a6037aa
    |-/dev/mapper/system-swapLV                    /dev/dm-2 /dev/dm-0 lvm   swap           1G                 9ba95ad9-7aac-457d-888c-8770019f1be5
    `-/dev/mapper/system-rootLV                    /dev/dm-3 /dev/dm-0 lvm   btrfs         12G /mnt/local      1b7f1eb0-93ff-484c-abab-38ea4440f3cd

The rebooted recreated system VM:

# lsblk -ipo NAME,KNAME,PKNAME,TYPE,FSTYPE,SIZE,MOUNTPOINT,UUID
NAME                                               KNAME     PKNAME    TYPE  FSTYPE       SIZE MOUNTPOINT UUID
/dev/sda                                           /dev/sda            disk                15G            
|-/dev/sda1                                        /dev/sda1 /dev/sda  part                 8M            
`-/dev/sda2                                        /dev/sda2 /dev/sda  part  crypto_LUKS   15G            e528a556-1ae8-4c47-b76e-bff0d6daccf4
  `-/dev/mapper/cr_ata-QEMU_HARDDISK_QM00001-part2 /dev/dm-0 /dev/sda2 crypt LVM2_member   15G            0CbZRG-P7o5-uoh6-22nG-iQ8V-uv0o-41gMpL
    |-/dev/mapper/system-rootLV                    /dev/dm-1 /dev/dm-0 lvm   btrfs         12G /          1b7f1eb0-93ff-484c-abab-38ea4440f3cd
    |-/dev/mapper/system-homeLV                    /dev/dm-2 /dev/dm-0 lvm   xfs            1G /home      3dfef038-fd46-4db8-88a9-37ac3a6037aa
    `-/dev/mapper/system-swapLV                    /dev/dm-3 /dev/dm-0 lvm   swap           1G [SWAP]     9ba95ad9-7aac-457d-888c-8770019f1be5

jsmeix commented at 2022-06-23 13:57:

@rear/contributors
if there are no objections
I would like to merge it tomorrow afternoon.

pcahyna commented at 2022-07-01 16:18:

Looks good. I can try an actual test in several days.

Test done. Fails, because there is no yes in the rescue system, and things are actually worse now, because previously there was at least one y. The debug log does not have any yes: command not found, because it was clobbered by the 2> /dev/null redirection. Oops.

jsmeix commented at 2022-07-04 07:44:

Sigh :-(
Nowadays computing things are too convoluted with subtle interdependencies
that at least for me it is impossible in practice with reasonable effort
to understand what actually goes on.

jsmeix commented at 2022-07-04 07:52:

@pcahyna
once more a big thank you for your careful testing of the actual issue
by setting up a specifically prepared test case in
https://github.com/rear/rear/pull/2834#issue-1291807720
where "lvm lvcreate" actually needs 'yes' input.

pcahyna commented at 2022-07-04 08:10:

@jsmeix glad to have helped! Actually, the special test case was not needed to test this particular problem, because if one restores in migration mode to existing disks, one needs at least one y and the change here removed even that (but I have not tested this simple case, because I have not expected a regression). Of course, in order to have trust in the original problem being fixed, one should do a test crafted specifically to trigger it.

jsmeix commented at 2022-07-04 08:17:

Now I wonder why my tests above had "just worked" for me.
I guess I did not do a second "rear recover" on the recovered system
i.e. I think I did not test with a disk that already has LVM metadata
at the exact same bytes where it will be re-recreated.

pcahyna commented at 2022-07-04 08:42:

The key must be

On replacement system VM (with same virtual disk size):

that seems to imply you restored to clean disks. My tests almost always restore to disks with existing data.

jsmeix commented at 2022-07-04 10:09:

Yes, usually I test with one "original system"
that I do not overwrite because I like to be able to adapt
things in the original system if something does not work.
I run "rear recover" on another VM usually with same disk size.

In this particular case I was unable to reproduce
the initial issue, see
https://github.com/rear/rear/issues/2820#issuecomment-1161902999
so I could only test if the 'yes' pipe doesn't cause regressions
but I didn't imagine that the fail-safe 'yes' pipe could have
silenty failed because my ReaR debug log messages like

+++ lvm lvcreate ...
+++ yes
+++ true
  Logical volume "..." created.

matched so well my wishful expectations.

Right now I checked again that the running 'true'
does not indicate that 'yes' had really failed
so the running 'true' is the normal case when all is well:

# ( set -x ; ( yes 2>/dev/null || true ) | head -n2 )
+ yes
+ head -n2
y
y
+ true

# ( set -x ; ( yes || true ) | head -n2 )
+ yes
+ head -n2
y
y
yes: standard output: Broken pipe
+ true

because all is well when 'yes' fails with "Broken pipe"
(at least on my openSUSE Leap 15.3 system).

jsmeix commented at 2022-07-04 10:17:

Strictly speaking is it not sufficient to only verify
that the original problem is fixed via a crafted test
specifically to trigger the original problem.
It is also needed to verify that the fix does not introduce
regressions for various normal cases without the original problem.
So every code change would have to be tested multiple times
which is more than what we can do currently in practice.

jsmeix commented at 2022-07-04 10:43:

@pcahyna
when your tests almost always restore to disks with existing data
I would very much appreciate it if you could also test with

DISKS_TO_BE_WIPED=''

because I need feedback how wiping disks during "rear recover"
works in practice on other systems "out there in the wild".

pcahyna commented at 2022-07-04 13:01:

Strictly speaking is it not sufficient to only verify
that the original problem is fixed via a crafted test
specifically to trigger the original problem.
It is also needed to verify that the fix does not introduce
regressions for various normal cases without the original problem.

yes. We were lucky in this case that the relevant normal cases were a subset of the original problem (both involve migration mode and old data on target disks).
We have been working on automated tests that will run on each commit (it is a diploma thesis topic). Stay tuned...

DISKS_TO_BE_WIPED=''

yes, I saw your changes related to wiping disks and I am sorry for not having reviewed them in detail. I can test that, but not immediately. Please describe the test case that you need (migration mode on or off? One disk or many? Same disk sizes or different sizes? Data to be preserved on some of the disks or everything to be wiped?) and I will test it when I can.

pcahyna commented at 2022-07-04 13:06:

because all is well when 'yes' fails with "Broken pipe"

indeed, the problem is that for yes an error exit is part of its normal operation (why can't it handle SIGPIPE?) so if you hide that, you also hide real problems. You could avoid the stderr redirection, this way you would have some "broken pipe" messages in the log, but also real diagnostics in case of real errors.

pcahyna commented at 2022-07-04 13:08:

Sigh :-( Nowadays computing things are too convoluted with subtle interdependencies that at least for me it is impossible in practice with reasonable effort to understand what actually goes on.

I wanted to avoid the yes pipeline for this reason: to avoid increasing the amount of complexity and interdependencies. I think we should remove it and replace by -y as soon as you stop caring about old SLES versions.

jsmeix commented at 2022-07-04 13:24:

Currently (in particular for ReaR 2.7) there is

DISKS_TO_BE_WIPED='false'

in default.conf to avoid issues until this feature was more tested
so we have plenty of time to test it.
Basically I would like to test just everything with

DISKS_TO_BE_WIPED=''

because - provided it works sufficiently well
which means first and foremost sufficiently fail-safe
(no wrong disk must ever be wiped by accident) -
my intent is to have that in ReaR 2.8 (or in any
later version if things don't work sufficiently well).

In particular when MIGRATION_MODE is automatically set to true
wiping disks is likely not so much needed because then disks
are different so that remainders on a used disk are unlikely
to reappear after partitioning during "rear recover".

Where automated wiping disks is likely much more needed
is when the disks on the replacement system are same
as the ones on the original system - in particular when
"rear recover" is run on the original system (e.g. to
recover from soft errors like messed up filesystems)
because then it is more likely that remainders on the
original disk reappear during "rear recover".

jsmeix commented at 2022-07-04 13:40:

An ultimate fail-safe false-alarm-handling 'yes' pipe:

# ( set -x ; ( yes || echo "A 'broken pipe' would be false alarm from 'yes'" 1>&2 ) | head -n2 )
+ yes
+ head -n2
y
y
yes: standard output: Broken pipe
+ echo 'A '\''broken pipe'\'' would be false alarm from '\''yes'\'''
A 'broken pipe' would be false alarm from 'yes'

I don't plan to use that - it gets over the top.

After ReaR 2.7 was released I will replace the 'yes' pipe with

lvm lvcreate -y ...

then let's see how that might fail in unexpected ways
e.g. for some cases with some (future) LVM versions
one might need more than one -y or additionally -f
or even -ff -yy or whatever ;-)


[Export of Github issue for rear/rear.]