#2820 Issue closed: wipefs will fail - Follow-up error: "/var/lib/rear/layout/diskrestore.sh: line 280: 3560 Aborted (core dumped)"

Labels: bug, support / question, fixed / solved / done

Githopp192 opened issue at 2022-06-10 13:47:

Relax-and-Recover (ReaR) Issue Template

Fill in the following items before submitting a new issue
(quick response is not guaranteed with free support):

  • ReaR version ("/usr/sbin/rear -V"):

Relax-and-Recover 2.6 / 2020-06-17

  • OS version ("cat /etc/os-release" or "lsb_release -a" or "cat /etc/rear/os.conf"):
VERSION="8.5 (Ootpa)"
ID="rhel"
ID_LIKE="fedora"
VERSION_ID="8.5"
PLATFORM_ID="platform:el8"
PRETTY_NAME="Red Hat Enterprise Linux 8.5 (Ootpa)"
  • ReaR configuration files ("cat /etc/rear/site.conf" and/or "cat /etc/rear/local.conf"):
BACKUP=BACULA
export TMPDIR="/home/TMPDIR"
COPY_AS_IS_BACULA=( /usr/lib64/libbac* /opt/bacula/working /etc/bacula /var/spool/bacula )
COPY_AS_IS_EXCLUDE_BACULA=( /var/lib/bacula )
PROGS_BACULA=( bacula bacula-fd bconsole bacula-console bextract bls bscan btape smartctl gawk )
BACULA_CLIENT=baculaclient-fd
CLONE_ALL_USERS_GROUPS=y
OUTPUT_URL=nfs://nas/volumex/Backup/REAR
OUTPUT_PREFIX=server2.opn9.9opn
  • Hardware vendor/product (PC or PowerNV BareMetal or ARM) or VM (KVM guest or PowerVM LPAR):
    HP Proliant DL360 G7

  • System architecture (x86 compatible or PPC64/PPC64LE or what exact ARM device):
    x86

  • Firmware (BIOS or UEFI or Open Firmware) and bootloader (GRUB or ELILO or Petitboot):

BIOS, Bootloader GRUB

  • Storage (local disk or SSD) and/or SAN (FC or iSCSI or FCoE) and/or multipath (DM or NVMe):

local disks (DM (mdadm))

  • Storage layout ("lsblk -ipo NAME,KNAME,PKNAME,TRAN,TYPE,FSTYPE,LABEL,SIZE,MOUNTPOINT"):
NAME                                 KNAME      PKNAME     TRAN TYPE  FSTYPE            LABEL             SIZE MOUNTPOINT
/dev/sda                             /dev/sda              sas  disk                                    279.4G
|-/dev/sda1                          /dev/sda1  /dev/sda        part  ext4                                  1G /mnt/local/boot
`-/dev/sda2                          /dev/sda2  /dev/sda        part  linux_raid_member server2:pv00 278.4G
  `-/dev/md127                       /dev/md127 /dev/sda2       raid1 LVM2_member                       278.2G
    |-/dev/mapper/cl_server2-home /dev/dm-0  /dev/md127      lvm   xfs               home               60G /mnt/local/home
    |-/dev/mapper/cl_server2-root /dev/dm-1  /dev/md127      lvm   xfs               root               50G /mnt/local
    `-/dev/mapper/cl_server2-swap /dev/dm-2  /dev/md127      lvm   swap              swap               20G
/dev/sdb                             /dev/sdb              sas  disk                                    279.4G
`-/dev/sdb1                          /dev/sdb1  /dev/sdb        part  linux_raid_member server2:pv00 278.4G
  `-/dev/md127                       /dev/md127 /dev/sdb1       raid1 LVM2_member                       278.2G
    |-/dev/mapper/cl_server2-home /dev/dm-0  /dev/md127      lvm   xfs               home               60G /mnt/local/home
    |-/dev/mapper/cl_server2-root /dev/dm-1  /dev/md127      lvm   xfs               root               50G /mnt/local
    `-/dev/mapper/cl_server2-swap /dev/dm-2  /dev/md127      lvm   swap              swap               20G
/dev/sdc                             /dev/sdc              usb  disk                                      1.9G
`-/dev/sdc1                          /dev/sdc1  /dev/sdc        part  vfat              RELAXRECOVE       1.9G
  • Description of the issue (ideally so that others can reproduce it):

wipefs will fail: ( and "/var/lib/rear/layout/diskrestore.sh: line 280: 3560 Aborted (core dumped)")

+++ Log 'Creating filesystem of type xfs with mount point / on /dev/mapper/server2-root.'
+++ echo '2022-06-10 12:37:20.597201555 Creating filesystem of type xfs with mount point / on /dev/mapper/server2-root.'
2022-06-10 12:37:20.597201555 Creating filesystem of type xfs with mount point / on /dev/mapper/server2-root.
+++ Print 'Creating filesystem of type xfs with mount point / on /dev/mapper/server2-root.'
+++ wipefs --all --force /dev/mapper/server2-root
wipefs: error: /dev/mapper/server2-root: probing initialization failed: No such file or directory
+++ wipefs --all /dev/mapper/server2-root
wipefs: error: /dev/mapper/server2-root: probing initialization failed: No such file or directory
+++ dd if=/dev/zero of=/dev/mapper/server2-root bs=512 count=1
1+0 records in
1+0 records out
512 bytes copied, 3.9181e-05 s, 13.1 MB/s
+++ mkfs.xfs -f -m uuid=37c0e068-87f8-4cf4-bf8a-840c443f1720 -i size=512 -d agcount=4 -s size=512 -i attr=2 -i projid32bit=1 -m crc=1 -m finobt=1 -b size=4096 -i maxpct=25 -d sunit=0 -d swidth=0 -l version=2 -l lazy-count=1 -n size=4096 -n version=2 -r extsize=4096 /dev/mapper/server2-root
mkfs.xfs: xfs_mkfs.c:2569: validate_datadev: Assertion `cfg->dblocks' failed.
/var/lib/rear/layout/diskrestore.sh: line 280:  3558 Aborted                 (core dumped) mkfs.xfs -f -m uuid=37c0e068-87f8-4cf4-bf8a-840c443f1720 -i size=512 -d agcount=4 -s size=512 -i attr=2 -i projid32bit=1 -m crc=1 -m finobt=1 -b size=4096 -i maxpct=25 -d sunit=0 -d swidth=0 -l version=2 -l lazy-count=1 -n size=4096 -n version=2 -r extsize=4096 /dev/mapper/server2-root 1>&2
+++ mkfs.xfs -f -i size=512 -d agcount=4 -s size=512 -i attr=2 -i projid32bit=1 -m crc=1 -m finobt=1 -b size=4096 -i maxpct=25 -d sunit=0 -d swidth=0 -l version=2 -l lazy-count=1 -n size=4096 -n version=2 -r extsize=4096 /dev/mapper/server2-root
mkfs.xfs: xfs_mkfs.c:2569: validate_datadev: Assertion `cfg->dblocks' failed.
/var/lib/rear/layout/diskrestore.sh: line 280:  3560 Aborted                 (core dumped) mkfs.xfs -f -i size=512 -d agcount=4 -s size=512 -i attr=2 -i projid32bit=1 -m crc=1 -m finobt=1 -b size=4096 -i maxpct=25 -d sunit=0 -d swidth=0 -l version=2 -l lazy-count=1 -n size=4096 -n version=2 -r extsize=4096 /dev/mapper/server2-root 1>&2
2022-06-10 12:37:20.622458874 UserInput: called in /usr/share/rear/layout/recreate/default/200_run_layout_code.sh line 127
2022-06-10 12:37:20.628167932 UserInput: Default input in choices - using choice number 1 as default input
2022-06-10 12:37:20.630880621 The disk layout recreation script failed

run the command manually wih "<<<y" will fail:

  • Workaround, if any:

run the command manually lvm lvcreate :

lvm lvcreate -L 53687091200b -n root cl_server2 <<<y

WARNING: xfs_external_log signature detected on /dev/cl_server2/root at offset 14336. Wipe it? [y/n]: 
  Wiping xfs_external_log signature on /dev/cl_server2/root.
WARNING: xfs_external_log signature detected on /dev/cl_server2/root at offset 15360. Wipe it? [y/n]: [n]
  Aborted wiping of xfs_external_log.
  1 existing signature left on the device.
  Failed to wipe signatures on logical volume cl_server2/root.
  Aborting. Failed to wipe start of new LV.

run the command: lvm create without "<<<y" will succeed:

RESCUE server2:~ # lvm lvcreate -L 53687091200b -n root cl_server2
WARNING: xfs_external_log signature detected on /dev/cl_server2/root at offset 15360. Wipe it? [y/n]: y
  Wiping xfs_external_log signature on /dev/cl_server2/root.
WARNING: xfs_external_log signature detected on /dev/cl_server2/root at offset 16384. Wipe it? [y/n]: y
  Wiping xfs_external_log signature on /dev/cl_server2/root.
WARNING: xfs_external_log signature detected on /dev/cl_server2/root at offset 17408. Wipe it? [y/n]: y
  Wiping xfs_external_log signature on /dev/cl_server2/root.
WARNING: xfs_external_log signature detected on /dev/cl_server2/root at offset 18432. Wipe it? [y/n]: y
  Wiping xfs_external_log signature on /dev/cl_server2/root.
WARNING: xfs_external_log signature detected on /dev/cl_server2/root at offset 19456. Wipe it? [y/n]: y
  Wiping xfs_external_log signature on /dev/cl_server2/root.
WARNING: xfs_external_log signature detected on /dev/cl_server2/root at offset 20480. Wipe it? [y/n]: y
  Wiping xfs_external_log signature on /dev/cl_server2/root.
WARNING: xfs_external_log signature detected on /dev/cl_server2/root at offset 21504. Wipe it? [y/n]: y
  Wiping xfs_external_log signature on /dev/cl_server2/root.
WARNING: xfs_external_log signature detected on /dev/cl_server2/root at offset 22528. Wipe it? [y/n]: y
  Wiping xfs_external_log signature on /dev/cl_server2/root.
  Logical volume "root" created.

Afterwards REAR did complete successfully.

  • Attachments, as applicable ("rear -D mkrescue/mkbackup/recover" debug log files):

To paste verbatim text like command output or file content,
include it between a leading and a closing line of three backticks like

```
verbatim content
```

jsmeix commented at 2022-06-13 09:20:

The <<<y in lvm lvcreate ... $vg <<<y in
usr/share/rear/layout/prepare/GNU/Linux/110_include_lvm_code.sh
originates from
https://github.com/rear/rear/commit/5bc24808da0a5d2b6d711c428fef0aa415f2fc01
which mentiones
https://github.com/rear/rear/issues/492
and therein there is the comment

lvcreate hangs because it expects yes prompt to be entered
during restore if existing filesystem signature detected
https://github.com/rear/rear/issues/513

and there it was suggested to use

yes | lvm lvcreate

which should provide as many y as needed by lvm lvcreate
and not only a single y as via lvm lvcreate ... <<<y

jsmeix commented at 2022-06-13 09:28:

@Githopp192
does it work without manual intervention when you exchange in
usr/share/rear/layout/prepare/GNU/Linux/110_include_lvm_code.sh
code like

lvm lvcreate ... <<<y

by

yes | lvm lvcreate ...

?

For only a test you could do this change in the booted
ReaR recovery system before you run "rear recover" therein
to test if with that change "rear recover" works
without manual intervention.

Alternatively you can do this change on your original system
but then you need to run "rear mkrescue" or "rear mkbackup"
first to get an updated ReaR recovery system, then you boot
that updated ReaR recovery system on your replacement hardware
and test if "rear recover" works without manual intervention.

pcahyna commented at 2022-06-13 09:44:

Hi, I believe a proper fix (suggested by @rmetrich some time ago) is to use "-y" to "lvcreate" instead of supplying "y" to it on stdin.

jsmeix commented at 2022-06-13 10:05:

I was wondering all the time why 'y' via stdin is used.
Perhaps old versions of lvcreate did not support '-y'?
I will check SLE10...

pcahyna commented at 2022-06-13 10:24:

Yes, please check the oldest version of SLES that you care about. I checked RHEL 6, -y is supported there.

jsmeix commented at 2022-06-13 10:32:

On SLES 10 SP4 with

# lvm version
  LVM version:     2.02.17 (2006-12-14)
  Library version: 1.02.13 (2006-11-28)
  Driver version:  4.7.0

"man lvcreate" does not show '-y' or '--yes'
(only "man pvcreate" shows '-y' and '--yes').

On SLES 11 SP4 with

# lvm version
  LVM version:     2.02.98(2) (2012-10-15)
  Library version: 1.03.01 (2011-10-15)
  Driver version:  4.25.0

"man lvcreate" does not show '-y' or '--yes'
(only "man pvcreate" shows '-y' and '--yes').

On SLES 12 SP5 with

# lvm version
  LVM version:     2.02.180(2) (2018-07-19)
  Library version: 1.03.01 (2018-07-19)
  Driver version:  4.39.0

"man lvcreate" and "man pvcreate" and "man vgcreate"
show '-y' and '--yes'

I would not mind SLE10 because
we (i.e. SUSE) never provided ReaR for SLE10.

But I do not want to actively break SLE11 support because
this was the first one where we (i.e. SUSE) provided ReaR
(i.e. it is me who gets all ReaR support issues at SUSE)
so I prefer to not actively break SLE11 support in ReaR
regardless that SLE11 is no longer officially supported
by upstream ReaR according to our release notes
https://github.com/rear/rear/blob/master/doc/rear-release-notes.txt

When there is a good reason why

yes | lvm lvcreate ...

has drawbacks in practice for our users compared to

lvm lvcreate -y ...

then we should break SLE11 support because it is more
important that current ReaR works well with current
Linux distributions than keeping support for rather
old Linux distributions.

jsmeix commented at 2022-06-13 10:56:

https://github.com/rear/rear/pull/2821
is a currently untested proposal for a fix.
Please have a look if we could and should include that
right now into ReaR 2.7 or better postpone it to ReaR 2.8.

pcahyna commented at 2022-06-13 11:14:

"man lvcreate" does not show '-y' or '--yes'
(only "man pvcreate" shows '-y' and '--yes').

Check the lvm(8) man page - it lists the common options, on RHEL 6 --yes is there and the lvcreate command accepts it, despite missing from the lvcreate(8) man page.

I will try to provide a reliable reproducer (I tried already, but it is a bit difficult and I ran out of time) and then we can test whether a fix adding -y is working properly or whether we need to provide y's on stdin.

I would not block 2.8 on it.

Githopp192 commented at 2022-06-13 11:24:

@Githopp192 does it work without manual intervention when you exchange in usr/share/rear/layout/prepare/GNU/Linux/110_include_lvm_code.sh code like

lvm lvcreate ... <<<y

by

yes | lvm lvcreate ...

?

For only a test you could do this change in the booted ReaR recovery system before you run "rear recover" therein to test if with that change "rear recover" works without manual intervention.

Alternatively you can do this change on your original system but then you need to run "rear mkrescue" or "rear mkbackup" first to get an updated ReaR recovery system, then you boot that updated ReaR recovery system on your replacement hardware and test if "rear recover" works without manual intervention.

yes, i'm going to test this soon - when i'm into the REAR console again.
Thx in the meantime

jsmeix commented at 2022-06-13 11:31:

I had checked "man lvm" on SLES10 and SLES11
and both do not show '-y' or '--yes'.

I see we already have some '--yes' in
layout/prepare/GNU/Linux/110_include_lvm_code.sh
so the current code may already no longer work on SLE11.

Checking if the current code does no longer work on SLE11:

Currently we have:

... lvm pvcreate -ff --yes ...

On SLES11 "man pvcreate" reads

-f, --force
Force  the  creation  without any confirmation.
You can not recreate (reinitialize) a physical volume
belonging to an existing volume group.
In an emergency you can override this behaviour with -ff.
...
-y, --yes
Answer yes to all questions.

We also have:

lvm vgremove --force --force --yes $vg >&2 || true

On SLES11 "man vgremove" only describes '-f, --force'
but without repeating it and there is no '--yes'
but when that command fails it doesn't matter
because of the '|| true'.

So the current code in
layout/prepare/GNU/Linux/110_include_lvm_code.sh
should still work with SLE11.

pcahyna commented at 2022-06-21 12:35:

@jsmeix

so the current code may already no longer work on SLE11

Did you mean "SLE10", because otherwise it contradicts what you wrote below "So the current code in
layout/prepare/GNU/Linux/110_include_lvm_code.sh
should still work with SLE11" ?

Concerning "both do not show '-y' or '--yes'." - could you please try on a SLES11 system whether the -y or --yes option actually work? I saw on a RHEL 6 system that the option is there, even if the man page forgets to document it.
Concerning "but when that command fails it doesn't matter because of the '|| true'." I believe it matters, because the command is there for a reason, and if it fails because of an unknown option, the result is not what we want, || true merely hides the problem. Am I missing something?

jsmeix commented at 2022-06-21 12:56:

@pcahyna

I meant SLES11 and in
https://github.com/rear/rear/issues/2820#issuecomment-1153802966
I added a like

Checking if the current code does no longer work on SLE11:

to avoid that the subsequent text looks like a contradiction.

I think the code

lvm vgremove --force --force --yes $vg >&2 || true

is meant to try to remove the VG
(e.g. when it is already there on a used disk)
but if that is not possible for whatever reason
we proceed "bona fide".

# git log --follow -p usr/share/rear/layout/prepare/GNU/Linux/110_include_lvm_code.sh

shows that the vgremove originated at
https://github.com/rear/rear/commit/70acf6fa39b3d133c0a632e3496e5a273dc9ef27
which points to
https://github.com/rear/rear/pull/2564

pcahyna commented at 2022-06-21 13:11:

We have been working with @vcrhonek on a reliable reproducer (the original issue is usually quite random depending on what signatures are on the disk and where).

Here is already a minimal reproducer to show the principle without any involvement of ReaR. Assuming that /dev/vdb is a free disk:

umount /mnt
wipefs -a /dev/vdb
yum install -y lvm2 xfsprogs
LOOPFILE=loopbackfile.img
pvcreate /dev/vdb
vgcreate sigvg /dev/vdb
lvcreate  sigvg -n xfsloglv -l 100%FREE
dd if=/dev/zero of="$LOOPFILE" bs=100M count=10
LOOPDEV=$(losetup -f)
losetup -f "$LOOPFILE"
MKFSOUT=$(mkfs.xfs -l logdev=/dev/sigvg/xfsloglv,size=2048b "$LOOPDEV")
losetup -d "$LOOPDEV"
BSIZE=$(echo $MKFSOUT | sed "s/.*\/dev\/sigvg\/xfsloglv bsize=\([^ ]*\)[ ]*.*/\1/")
dd if=/dev/sigvg/xfsloglv of=/dev/sigvg/xfsloglv bs="$BSIZE" seek=1
lvremove /dev/sigvg/xfsloglv
vgremove /dev/sigvg
pvremove /dev/vdb
pvcreate /dev/vdb
vgcreate sigvg /dev/vdb
lvcreate  sigvg -n pokuslv -l 100%FREE

The last lvcreate command will ask for confirmation many times, if one adds -y, it succeeds without asking.

@jsmeix could you please try it on the old SLES versions that you care about? (I will try RHEL 6.)

EDIT: reproducer updated to avoid options that old tools don't accept.

pcahyna commented at 2022-06-21 13:39:

I have checked that lvcreate on RHEL 6 accepts -y. (lvremove too).

jsmeix commented at 2022-06-21 15:16:

The reproducer doesn't reproduce
for me on SLES11 SP4
where /dev/sdb is a 1GiB free (virtual) disk:

# lsblk
NAME   MAJ:MIN RM   SIZE RO MOUNTPOINT
sdb      8:16   0     1G  0 
sda      8:0    0    12G  0 
├─sda1   8:1    0     2G  0 [SWAP]
└─sda2   8:2    0    10G  0 /
sr0     11:0    1   3.2G  0

# lvm version
  LVM version:     2.02.98(2) (2012-10-15)
  Library version: 1.03.01 (2011-10-15)
  Driver version:  4.25.0

# DISK=/dev/sdb

# wipefs -a $DISK

# LOOPFILE=loopbackfile.img

# pvcreate $DISK
  Physical volume "/dev/sdb" successfully created

# vgcreate sigvg $DISK
  Volume group "sigvg" successfully created

# lvcreate sigvg -n xfsloglv -l 100%FREE
  Logical volume "xfsloglv" created

# dd if=/dev/zero of="$LOOPFILE" bs=100M count=10
10+0 records in
10+0 records out
1048576000 bytes (1.0 GB) copied, 4.08083 s, 257 MB/s

# LOOPDEV=$(losetup -f)

# echo $LOOPDEV
/dev/loop0

# lsblk -o NAME,KNAME,FSTYPE,LABEL,SIZE,MOUNTPOINT /dev/sdb
NAME                    KNAME FSTYPE      LABEL   SIZE MOUNTPOINT
sdb                     sdb   LVM2_member           1G 
└─sigvg-xfsloglv (dm-0) dm-0                     1020M

# losetup -f "$LOOPFILE"

# MKFSOUT=$(mkfs.xfs -l logdev=/dev/sigvg/xfsloglv,size=2048b "$LOOPDEV")

# losetup -d "$LOOPDEV"

# BSIZE=$(echo $MKFSOUT | sed "s/.*\/dev\/sigvg\/xfsloglv bsize=\([^ ]*\)[ ]*.*/\1/")

# echo $BSIZE
4096

# dd if=/dev/sigvg/xfsloglv of=/dev/sigvg/xfsloglv bs="$BSIZE" seek=1
dd: writing `/dev/sigvg/xfsloglv': No space left on device
261120+0 records in
261119+0 records out
1069543424 bytes (1.1 GB) copied, 8.20103 s, 130 MB/s

# lvremove /dev/sigvg/xfsloglv
Do you really want to remove active logical volume xfsloglv? [y/n]: y
  Logical volume "xfsloglv" successfully removed

# vgremove /dev/sigvg
  Volume group "sigvg" successfully removed

# pvremove $DISK
  Labels on physical volume "/dev/sdb" successfully wiped

# pvcreate $DISK
  Physical volume "/dev/sdb" successfully created

# vgcreate sigvg $DISK
  Volume group "sigvg" successfully created

# lvcreate sigvg -n pokuslv -l 100%FREE
  Logical volume "pokuslv" created

On SLES11 SP4 'lvremove' does not accept '-y'
but yes | lvremove ... works:

# lvremove -y /dev/sigvg/pokuslv
lvremove: invalid option -- 'y'
  Error during parsing of command line.

# yes | lvremove /dev/sigvg/pokuslv
Do you really want to remove active logical volume pokuslv? [y/n]:   Logical volume "pokuslv" successfully removed

Currently I don't know how to let 'lvcreate'
ask for confirmation.

pcahyna commented at 2022-06-21 15:23:

On RHEL 6, lvcreate did not ask for confirmation either. I think that in old LVM versions lvcreate has not looked at signatures. But -y worked in the sense that it was harmless.
Concerning lvcreate on SLES11, you could try whether it accepts -y or fails with invalid option -- 'y' or similar.

jsmeix commented at 2022-06-21 15:43:

Sigh - I am too old and it's too late in the evening
so I needed your hint to just try out
if 'lvcreate' accepts '-y' on SLES11 SP4

# lvcreate sigvg -y -n pokuslv -l 100%FREE
lvcreate: invalid option -- 'y'
  Error during parsing of command line.

# yes | lvcreate sigvg -n pokuslv -l 100%FREE
  Logical volume "pokuslv" created

jsmeix commented at 2022-06-21 15:50:

@pcahyna
what is the reason behind why you don't like
the traditional yes | ... pipe?

It was suggested in
https://github.com/rear/rear/issues/513
where it has worked at least for the reporter of that issue.

But it was implemeted differently via <<<y
that behaves like a limited yes | ... pipe
which only inputs a single 'y'
but I don't see a reason for that limitation.

pcahyna commented at 2022-06-22 07:53:

I prefer to use the switch that exists for this purpose than to use a more convoluted solution. Of course, if the option is not there in the releases that you care about, there is not much choice remaining.
Please be aware that yes | lvcreate returns an error status if one uses set -o pipefail, so you need to check that pipefail is not enabled in any place where the change will be made, or that the exit code of the pipeline does not matter. And one will have to remember to never use pipefail in places that use the yes pipeline.

# set -o pipefail
# yes | lvremove /dev/sigvg/xfsloglv
Do you really want to remove active logical volume xfsloglv? [y/n]:   Logical volume "xfsloglv" successfully removed
# echo $?
141

pcahyna commented at 2022-06-22 08:26:

Or, if one cares about pipefail and exit codes, one may use this "truism":

# ( yes || true ) | lvremove /dev/sigvg/xfsloglv
Do you really want to remove active logical volume xfsloglv? [y/n]:   Logical volume "xfsloglv" successfully removed
# echo $?
0

jsmeix commented at 2022-06-22 08:40:

@pcahyna
thank you so much for your explanation
about complications when a pipe is used!

In my
https://github.com/rear/rear/pull/2821
things already smelled fishy to me because I got a
"yes: standard output: Broken pipe" stderr message
but my "fix" yes 2>/dev/null | ... doesn't fix anything
instead it even makes things worse by hiding information
because I failed to see the consequences
when yes gets SIGPIPE (141 - 128 = 13 = SIGPIPE).

pcahyna commented at 2022-06-22 08:58:

I should have looked at #2821 and the comment about broken pipe. Interestingly, I never saw this Broken pipe error message, although I was expecting to see one (indeed, yes gets terminated rather ungracefully by SIGPIPE).

jsmeix commented at 2022-06-22 09:53:

I think there is no other way than to let 'yes' get a SIGPIPE.
At least I don't know how in producer | consumer the consumer
could gracefully terminate the producer (in a simple way)
because I think normally the consumer does not know
(and should not care about) that stdin comes from a pipe.
I think normally a consumer just reads its stdin (fd0)
until it finally exits - likely without explicit closing fd0.
I don't know if closing fd0 would avoid SIGPIPE for the producer.

I think the problem is when 'yes' does not propely handle SIGPIPE
because SIGPIPE is likely a common way how 'yes' terminates
regularily in practice (like in this example here).
But I am no 'yes' expert ;-)

pcahyna commented at 2022-06-22 09:57:

I think the problem is when 'yes' does not propely handle SIGPIPE
because SIGPIPE is likely a common way how 'yes' terminates
regularily in practice

Exactly my thought. So, what to do? Will you need support for such old SLES versions even in ReaR 2.8? If so, ( yes 2> /dev/null || true ) is probably the best way to handle it.

jsmeix commented at 2022-06-22 10:21:

Yes, ( yes 2> /dev/null || true ) | ...
is what I prefer to do for now
strictly in compliance with our
"Try hard to care about possible errors" and
"Maintain backward compatibility" and
"Dirty hacks welcome" in
https://github.com/rear/rear/wiki/Coding-Style
:-)

FYI
why I am so much interested
in SLE11 backward compatibility in ReaR 2.7:

ReaR 2.7 has very many generic enhancements and fixes
because the last release ReaR 2.6 is two years ago and
SLE11 is the oldest SLE version that officially provides ReaR
which means there might be some exceptional SLE11 customers
that may need some specific ReaR 2.7 fix or enhancement
for their ReaR use cases in their particular environments
and then I would like to be able to "just provide" it to them
without introducing regressions at unexpected other places.

I hope to release ReaR 2.8 in a much shorter time
than two years (preferably in one year) and I think
I can drop SLE11 support in ReaR 2.8.

Perhaps we should drop even more e.g. bash 3.x support
(so we could use associative arrays) and things like that.

Perhaps then we better call it ReaR 3.0 cf.
https://github.com/rear/rear/issues/1390

pcahyna commented at 2022-06-22 10:30:

I think I can drop SLE11 support in ReaR 2.8

Why to worry about it here then, if #2821 has milestone set to ReaR 2.8 ? Or do you plan to merge it before releasing 2.7?

jsmeix commented at 2022-06-22 10:40:

Yesterday I postponed
https://github.com/rear/rear/pull/2821
from ReaR 2.7 to ReaR 2.8
because I got no feedback from @Githopp192 here about
https://github.com/rear/rear/issues/2820#issuecomment-1153688878
whether or not yes | lvm lvcreate ... works for him.

jsmeix commented at 2022-06-22 11:46:

I messed up https://github.com/rear/rear/pull/2821
so I closed it and did the same again from scratch as
https://github.com/rear/rear/pull/2827

pcahyna commented at 2022-06-22 12:27:

I think it will be hard to get feedback from the original reporter. This issue occurs randomly and infrequently, depending on where the signatures end up on the disks. If the original disks are wiped, it is not easily possible to reproduce the problem again. I can try to reproduce the problem using the strategy above, but not immediately.

Githopp192 commented at 2022-06-22 14:18:

that's correct @pcahyna .. i did run REAR several times on the same machine & the issue did not re-occur.

jsmeix commented at 2022-06-22 15:32:

@Githopp192
thank you for your feedback!
It helps us to know how things behave in your environment
(in this case that things actually behave in your environment
as we imagine they would do).

By the way:
You may have a look at the section
"Prepare replacement hardware for disaster recovery" in
https://en.opensuse.org/SDB:Disaster_Recovery

In our current GitHub master code we have additionally
support for the new DISKS_TO_BE_WIPED config variable
that is described in default.conf currently at
https://github.com/rear/rear/blob/master/usr/share/rear/conf/default.conf#L440
Wiping disks during "rear recover" before recreating the disk layout

For some details about that see
https://github.com/rear/rear/blob/master/usr/share/rear/layout/recreate/default/README.wipe_disks
What I wrote therein is certainly not some "final wisdom".
It is mainly a collection of what I had found out
up to that point in time when I wrote it.

To try out our current GitHub master code see the section
"Testing current ReaR upstream GitHub master code" in
https://en.opensuse.org/SDB:Disaster_Recovery

jsmeix commented at 2022-06-24 13:13:

With https://github.com/rear/rear/pull/2827 merged
this issue should be hopefully avoided
(but we could not reproduce this issue).

Githopp192 commented at 2022-06-27 21:48:

@Githopp192 thank you for your feedback! It helps us to know how things behave in your environment (in this case that things actually behave in your environment as we imagine they would do).

By the way: You may have a look at the section "Prepare replacement hardware for disaster recovery" in https://en.opensuse.org/SDB:Disaster_Recovery

In our current GitHub master code we have additionally support for the new DISKS_TO_BE_WIPED config variable that is described in default.conf currently at https://github.com/rear/rear/blob/master/usr/share/rear/conf/default.conf#L440 Wiping disks during "rear recover" before recreating the disk layout

For some details about that see https://github.com/rear/rear/blob/master/usr/share/rear/layout/recreate/default/README.wipe_disks What I wrote therein is certaily not some "final wisdom". It is mainly a collection of what I had found out up to that point in time when I wrote it.

To try out our current GitHub master code see the section "Testing current ReaR upstream GitHub master code" in https://en.opensuse.org/SDB:Disaster_Recovery

Thx a lot @jsmeix .. now i've got a clearer view what is going on.

and more and more I perceive how complex the whole matter is and how much work is behind it.
For this - thank you !

So far I have been able to restore every system with REAR.
But I could only do that because I know a lot about bash-shell programming and i'm spending hours with debugging regulary :-)


[Export of Github issue for rear/rear.]