#2612 Issue `closed`: 400_restore_rsync_backup.sh does not show an error when restore rsync archive failed¶

Labels: enhancement, bug, cleanup, fixed / solved / done

cvijayvinoth opened issue at 2021-05-06 07:14:¶

Relax-and-Recover (ReaR) Issue Template¶

Fill in the following items before submitting a new issue
(quick response is not guaranteed with free support):

ReaR version ("/usr/sbin/rear -V"):
Relax-and-Recover 2.6 / Git
OS version ("cat /etc/os-release" or "lsb_release -a" or "cat /etc/rear/os.conf"):

NAME="CentOS Linux"
VERSION="7 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="7"
PRETTY_NAME="CentOS Linux 7 (Core)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:7"
HOME_URL="https://www.centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"

CENTOS_MANTISBT_PROJECT="CentOS-7"
CENTOS_MANTISBT_PROJECT_VERSION="7"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION="7"

ReaR configuration files ("cat /etc/rear/site.conf" and/or "cat /etc/rear/local.conf"):

OUTPUT=ISO
BACKUP=RSYNC
RSYNC_PREFIX="diskimage_${HOSTNAME}"
BACKUP_PROG="/var/www/html/imageBackup/rsync"
OUTPUT_URL=rsync://diskimage@xxxx::rsync_backup
BACKUP_URL=rsync://diskimage@xxxx::rsync_backup
BACKUP_RSYNC_OPTIONS+=(-z --progress )
ISO_DIR="/var/www/html/imageBackup/iso/$HOSTNAME"
MESSAGE_PREFIX="$$: "
PROGRESS_MODE="plain"
AUTOEXCLUDE_PATH=( /tmp )
PROGRESS_WAIT_SECONDS="1"
export TMPDIR="/var/www/html/imageBackup/iso/"
PXE_RECOVER_MODE=automatic
ISO_FILES=("/var/www/html/imageBackup/rsync")
ISO_PREFIX="${HOSTNAME}"
ISO_DEFAULT="automatic"

Hardware (PC or PowerNV BareMetal or ARM) or virtual machine (KVM guest or PoverVM LPAR):
Virtual machine
System architecture (x86 compatible or PPC64/PPC64LE or what exact ARM device):
x86 compatible
Firmware (BIOS or UEFI or Open Firmware) and bootloader (GRUB or ELILO or Petitboot):
UEFI and GRUB
Storage (local disk or SSD) and/or SAN (FC or iSCSI or FCoE) and/or multipath (DM or NVMe):
local
Storage layout ("lsblk -ipo NAME,KNAME,PKNAME,TRAN,TYPE,FSTYPE,SIZE,MOUNTPOINT" or "lsblk" as makeshift):

NAME              MAJ:MIN RM  SIZE RO TYPE  MOUNTPOINT
sda                 8:0    0   20G  0 disk
├─sda1              8:1    0  977M  0 part
│ └─md127           9:127  0  976M  0 raid1 /boot
├─sda2              8:2    0  601M  0 part
│ └─md125           9:125  0  601M  0 raid1 /boot/efi
└─sda3              8:3    0 17.3G  0 part
  └─md126           9:126  0 17.3G  0 raid1
    ├─centos-root 253:0    0 13.5G  0 lvm   /
    └─centos-swap 253:1    0  3.8G  0 lvm   [SWAP]
sdb                 8:16   0   20G  0 disk
├─sdb1              8:17   0  977M  0 part
│ └─md127           9:127  0  976M  0 raid1 /boot
├─sdb2              8:18   0  601M  0 part
│ └─md125           9:125  0  601M  0 raid1 /boot/efi
└─sdb3              8:19   0 17.3G  0 part
  └─md126           9:126  0 17.3G  0 raid1
    ├─centos-root 253:0    0 13.5G  0 lvm   /
    └─centos-swap 253:1    0  3.8G  0 lvm   [SWAP]
sr0                11:0    1 1024M  0 rom

Description of the issue (ideally so that others can reproduce it):
Unable to perform the recovery option.
Getting error like "Failed to create UFI Boot Manager entries".
Note: Downloaded the package today only.
Workaround, if any:
Attachments, as applicable ("rear -D mkrescue/mkbackup/recover" debug log files):
rear-localhost.log

jsmeix commented at 2021-05-06 08:00:¶

@cvijayvinoth
the message

Failed to create EFI Boot Manager entries
 (UEFI bootloader '/boot/efi/EFI/centos/grubx64.efi' not found under target /mnt/local)

appears now because of the recent changes via
https://github.com/rear/rear/pull/2608
that implement better error checking so now it shows you errors
instead of before where things had been silently skipped in this case.

@pcahyna
could you please also have a look here?

The relevant excerpt from
https://github.com/rear/rear/files/6432513/rear-localhost.log
is

+ source /usr/share/rear/finalize/Linux-i386/670_run_efibootmgr.sh
++ is_true 1
++ case "$1" in
++ return 0
++ is_true no
++ case "$1" in
++ return 1
++ LogPrint 'Creating EFI Boot Manager entries...'
++ Log 'Creating EFI Boot Manager entries...'
++ echo '1035: 2021-05-06 02:57:19.347292204 Creating EFI Boot Manager entries...'
1035: 2021-05-06 02:57:19.347292204 Creating EFI Boot Manager entries...
++ Print 'Creating EFI Boot Manager entries...'
++ local esp_mountpoint esp_mountpoint_inside boot_efi_parts boot_efi_dev
++ test -f /mnt/local//boot/efi/EFI/centos/grubx64.efi
++ LogPrintError 'Failed to create EFI Boot Manager entries (UEFI bootloader '\''/boot/efi/EFI/centos/grubx64.efi'\'' not found under target /mnt/local)'
++ Log 'Failed to create EFI Boot Manager entries (UEFI bootloader '\''/boot/efi/EFI/centos/grubx64.efi'\'' not found under target /mnt/local)'
++ echo '1035: 2021-05-06 02:57:19.350667526 Failed to create EFI Boot Manager entries (UEFI bootloader '\''/boot/efi/EFI/centos/grubx64.efi'\'' not found under target /mnt/local)'
1035: 2021-05-06 02:57:19.350667526 Failed to create EFI Boot Manager entries (UEFI bootloader '/boot/efi/EFI/centos/grubx64.efi' not found under target /mnt/local)
++ PrintError 'Failed to create EFI Boot Manager entries (UEFI bootloader '\''/boot/efi/EFI/centos/grubx64.efi'\'' not found under target /mnt/local)'
++ return 1
+ source_return_code=1

jsmeix commented at 2021-05-06 08:06:¶

@cvijayvinoth
I wonder why you wrote "Unable to perform the recovery option"
because as far as I see in your
https://github.com/rear/rear/files/6432513/rear-localhost.log

++ Log 'Finished rear recover in 98 seconds'
++ echo '1035: 2021-05-06 02:57:22.600478801 Finished rear recover in 98 seconds'
1035: 2021-05-06 02:57:22.600478801 Finished rear recover in 98 seconds
++ is_true 1
++ case "$1" in
++ return 0

so "rear recover" shows that it
"Failed to create EFI Boot Manager entries"
because "UEFI bootloader '/boot/efi/EFI/centos/grubx64.efi' not found under target /mnt/local"
but nevertheless "rear recover" finished successfully.

So my question:
Does the recreated system boot?
If not, did it ever boot for your tests before?

pcahyna commented at 2021-05-06 08:17:¶

@cvijayvinoth can you please provide a similar debug log for ReaR before my latest change (merged yesterday) on the same system? I wonder how it worked (or maybe did not) back then.

jsmeix commented at 2021-05-06 08:24:¶

As far as I can see from looking at the matching code change here

-# (cf. https://github.com/rear/rear/pull/2051/files#r258826856):
-test -f "$TARGET_FS_ROOT/$UEFI_BOOTLOADER" || return 0
+# (cf. https://github.com/rear/rear/pull/2051/files#r258826856)
+# but using BIOS conflicts with USING_UEFI_BOOTLOADER is true
+# i.e. we should create EFI Boot Manager entries but we cannot:
+if ! test -f "$TARGET_FS_ROOT/$UEFI_BOOTLOADER" ; then
+    LogPrintError "Failed to create EFI Boot Manager entries (UEFI bootloader '$UEFI_BOOTLOADER' not found under target $TARGET_FS_ROOT)"
+    return 1
+fi

the actual behaviour here did not change.
Before it skipped things silently,
now it skips verbosely with a well explanatory message.

pcahyna commented at 2021-05-06 08:45:¶

+if ! test -f "$TARGET_FS_ROOT/$UEFI_BOOTLOADER" ; then
+    LogPrintError "Failed to create EFI Boot Manager entries (UEFI bootloader '$UEFI_BOOTLOADER' not found under target $TARGET_FS_ROOT)"
the actual behaviour here did not change.
Before it skipped things silently,
now it skips verbosely with a well explanatory message.

The explanatory message above is only one among error messages. According to the log, we got this printed:

WARNING:
For this system
RedHatEnterpriseServer/7 on Linux-i386 (based on Fedora/7/i386)
there is no code to install a boot loader on the recovered system
or the code that we have failed to install the boot loader correctly.
Please contribute appropriate code to the Relax-and-Recover project,
see http://relax-and-recover.org/development/
Take a look at the scripts in /usr/share/rear/finalize - for example
for PC architectures like x86 and x86_64 see the script
/usr/share/rear/finalize/Linux-i386/660_install_grub2.sh
and for POWER architectures like ppc64le see the script
/usr/share/rear/finalize/Linux-ppc64le/660_install_grub2.sh
---------------------------------------------------
|  IF YOU DO NOT INSTALL A BOOT LOADER MANUALLY,  |
|  THEN YOUR SYSTEM WILL NOT BE ABLE TO BOOT.     |
---------------------------------------------------
You can use 'chroot /mnt/local bash --login'
to change into the recovered system and
manually install a boot loader therein.

and this gets printed in the case of $NOBOOOTLOADER. This was not changed. So it is surprising that the same message at the end was not printed before. Or was it? We need to know the previous behavior.

pcahyna commented at 2021-05-06 08:53:¶

By the way, what bothers me about NOBOOTLOADER is that it is not clear how to tests its value. In usr/share/rear/finalize/Linux-i386/660_install_grub2.sh we have this

is_true $NOBOOTLOADER || return 0

and this got copied into usr/share/rear/finalize/Linux-i386/670_run_efibootmgr.sh now.
OTOH, in usr/share/rear/finalize/default/890_finish_checks.sh, where the big WARNING gets printed, we have:

elif test "$NOBOOTLOADER" ; then
    LogPrint "WARNING:

so it is not clear whether nonempty means true or one needs to set a "true-ish" value (cf. is_true()) to mean true. (According to the comment in https://github.com/rear/rear/blob/33bce12e1ca1c1f49f65565a7326e7177b794ec3/usr/share/rear/finalize/default/050_prepare_checks.sh#L7 the latter is correct.)
But I don't think that's the problem here.

pcahyna commented at 2021-05-06 09:06:¶

@cvijayvinoth we need to know especially whether before yesterday's change you also saw this

WARNING:
For this system
RedHatEnterpriseServer/7 on Linux-i386 (based on Fedora/7/i386)
there is no code to install a boot loader on the recovered system

Since rear recover is always run in verbose mode and this message gets printed using LogPrint, it should have been shown, if this point was reached.

By the way, on my CentOS 7 system /boot/efi/EFI/centos/grubx64.efi does exist. What filesv do you have under /boot/efi/EFI/centos ?

cvijayvinoth commented at 2021-05-06 09:14:¶

@cvijayvinoth
I wonder why you wrote "Unable to perform the recovery option"
because as far as I see in your
https://github.com/rear/rear/files/6432513/rear-localhost.log
++ Log 'Finished rear recover in 98 seconds'
++ echo '1035: 2021-05-06 02:57:22.600478801 Finished rear recover in 98 seconds'
1035: 2021-05-06 02:57:22.600478801 Finished rear recover in 98 seconds
++ is_true 1
++ case "$1" in
++ return 0
so "rear recover" shows that it
"Failed to create EFI Boot Manager entries"
because "UEFI bootloader '/boot/efi/EFI/centos/grubx64.efi' not found under target /mnt/local"
but nevertheless "rear recover" finished successfully.

So my question:
Does the recreated system boot?
If not, did it ever boot for your tests before?

First time I am trying raid scenario with os mirror in rear 2.6. I haven't tested it in the previous versions.

cvijayvinoth commented at 2021-05-06 09:17:¶

/boot/efi/EFI/centos/

Yes /boot/efi/EFI/centos/grubx64.efi file is exist in the source machine.

pcahyna commented at 2021-05-06 09:19:¶

/boot/efi/EFI/centos/

Yes /boot/efi/EFI/centos/grubx64.efi file is exist in the source machine.

And when recover completes, what do you see under /mnt/local? In particular, under /mnt/local/boot/efi/EFI/ ? It looks like something was not restored properly.

pcahyna commented at 2021-05-06 09:27:¶

I must say that the messages

1035: 2021-05-06 02:56:46.839801963 Restoring rsync archive from '192.168.1.123:rsync_backup'
@ERROR: auth failed on module rsync_backup
rsync error: error starting client-server protocol (code 5) at main.c(1683) [receiver=2.6.9]

and the subsequent

chroot: failed to run command '/bin/bash': No such file or directory

look quite suspect...

jsmeix commented at 2021-05-06 09:31:¶

@pcahyna
regarding that NOBOOTLOADER thingy:
I have had my own fights with it a longer time ago.
Personally I do not like it: Neither its semantics nor its code.
But I found no energy to also clean up that stuff so I kept things as is.

jsmeix commented at 2021-05-06 09:51:¶

@cvijayvinoth
the root cause is what @pcahyna shows in
https://github.com/rear/rear/issues/2612#issuecomment-833379533

I.e. restoring your rsync archive failed.

Matching excerpts from
https://github.com/rear/rear/files/6432513/rear-localhost.log

+ source /usr/share/rear/restore/RSYNC/default/400_restore_rsync_backup.sh
...
1035: 2021-05-06 02:56:46.839801963 Restoring rsync archive from '192.168.1.123:rsync_backup'
...
++ rsync --sparse --archive --hard-links --numeric-ids --stats -z --progress --password-file=/var/www/html/imageBackup/user_profile/diskimage/rsync_pass rsync://diskimage@192.168.1.123:1873/rsync_backup/diskimage_localhost/backup/ /mnt/local/
@ERROR: auth failed on module rsync_backup
rsync error: error starting client-server protocol (code 5) at main.c(1683) [receiver=2.6.9]
++ echo 5
...
++ ProgressStop
++ echo '1035: OK'
...
+ source_return_code=0
...
1035: 2021-05-06 02:56:51.854871947 Leaving debugscript mode (back to previous bash flags and options settings).

So restoring your rsync archive lasted only 5 seconds (from 02:56:46 until 02:56:51)
which is clearly not a successful restore of all the files of the system.

@pcahyna
sigh - again a (not so) nice case where ReaR blindly proceeds
and then shows errors somewhere later at an unrelated place
with a weird looking error message which lead to false assumptions
what the root cause is, cf. "Try hard to care about possible errors" in
https://github.com/rear/rear/wiki/Coding-Style

jsmeix commented at 2021-05-06 10:02:¶

@pcahyna
FYI what I often use to extract from a ReaR log file
almost all the messages that ReaR shows on the user's terminal:

# grep ' Print' rear-localhost.log.issue2612 | less

For https://github.com/rear/rear/files/6432513/rear-localhost.log
this results (excerpts)

++ Print 'Restoring rsync archive from '\''192.168.1.123:rsync_backup'\'''
++ Print 'Created SELinux /mnt/local/.autorelabel file : after reboot SELinux will relabel all files'
++ Print 'Recreating directories (with permissions) from /var/lib/rear/recovery/directories_permissions_owner_group'
++ PrintError 'Failed to '\''chown root:root etc'\'' '
...
++ PrintError 'Failed to '\''chown root:root var/tmp'\'' '
...
++ Print 'Creating EFI Boot Manager entries...'
++ PrintError 'Failed to create EFI Boot Manager entries (UEFI bootloader '\''/boot/efi/EFI/centos/grubx64.efi'\'' not found under target /mnt/local)'
++ Print 'WARNING:

Those many Failed to chown... messages also indicate
that nothing was restored i.e. that /mnt/local is basically empty.

jsmeix commented at 2021-05-06 10:08:¶

It seems BACKUP=RSYNC is yet another "needs cleanup place" in ReaR,
cf. https://github.com/rear/rear/pull/2577

pcahyna commented at 2021-05-06 10:24:¶

grep ' Print' rear-localhost.log.issue2612 | less

I used grep -v '^\+' rear-localhost-issue2612.log | less. I suspect your method will not show multiline output messages?

jsmeix commented at 2021-05-06 10:32:¶

@pcahyna
grep -v '^+' shows basically all except set -x output
so it shows much more than what the user gets on his terminal.
In contrast grep ' Print' does not show multi-line output
and some other things like UserOutput.
To only get some initial useful overview what the user gets
on his terminal grep ' Print' should be mostly sufficient.

cvijayvinoth commented at 2021-05-06 10:57:¶

The issue due to network fluctuation in rsync server and password file permission issue. Fixed the issue from my end. Thanks @pcahyna @jsmeix

pcahyna commented at 2021-05-13 16:05:¶

I believe that the problem is that we store the exit code from rsync here:
https://github.com/rear/rear/blob/3a6993968852791a63fac49bce6e81d52e258622/usr/share/rear/restore/RSYNC/default/400_restore_rsync_backup.sh#L40
but then we don't use it, we instead use the exit code of the background process that runs rsync:
https://github.com/rear/rear/blob/3a6993968852791a63fac49bce6e81d52e258622/usr/share/rear/restore/RSYNC/default/400_restore_rsync_backup.sh#L75
https://github.com/rear/rear/blob/3a6993968852791a63fac49bce6e81d52e258622/usr/share/rear/restore/RSYNC/default/400_restore_rsync_backup.sh#L76
unfortunately, the exit code of the background process is not the exit code of rsync, it is the exit code of the last command in the subshell, which is not rsync but echo (see the first reference), which virtually always succeeds. So error returns from rsync are ignored, although they are properly stored in $TMPDIR/tmp/retval (I checked).

pcahyna commented at 2021-05-13 16:08:¶

Also, even if this code worked properly, would it be enough to react with a LogPrint "WARNING !, as the code intends now? https://github.com/rear/rear/blob/3a6993968852791a63fac49bce6e81d52e258622/usr/share/rear/restore/RSYNC/default/400_restore_rsync_backup.sh#L79
Shouldn't we rather abort the recover process? (I have thought that warnings are a waste of everyone's time :-) )

pcahyna commented at 2021-05-15 11:24:¶

@jsmeix please assign the issue to me, I will attempt a cleanup of the rsync code (while #2577 is probably not a problem anymore, the discussion shows several more bugs in this area).

jsmeix commented at 2021-05-17 11:55:¶

@pcahyna
thank you in advance for your work here!

jsmeix commented at 2021-05-17 13:00:¶

@pcahyna
regarding your question "Shouldn't we rather abort the recover process?":

In general I don't like to abort "rear recover" after the backup was restored.
After the backup was restored I would only abort "rear recover"
when it is actually impossible to continue in any reasonable way.

Reason:
Usually it takes very most of the "rear recover" time to restore the backup.
When the backup restore program exits with non-zero exit code it can mean a fatal error
or it can mean a non-fatal error depending on the particular backup restore program
e.g. "man tar" reads (excerpts):

0      Successful
1      Some files differ
2      Fatal error

For a more interesting example see the exit code handling in
restore/TSM/default/400_restore_with_tsm.sh
which also shows an example where a backup restore program
is called several times so a hard Error exit is likely not good
when only one particluar backup restore program call failed.

But even when a backup restore program exits with "fatal error" it depends
on the particular backup restore program and the user's environment
what that actually means for the user who runs "rear recover".
Perhaps the backup restore program could not restore 1234 files
but all those files are in /some/unimportant/directory/
where the user does not care if that stuff was restored during "rear recover"
because he can easily restore it later in the recreated system as needed.
But in this case the user would care that "rear recover" continues and
installs the bootloader of the recreated system and things like that
to get his system back recreated in a basically usable state.

Cf. what we do in restore/NETFS/default/400_restore_backup.sh

LogPrint "Backup restore program $BACKUP_PROG failed with exit code $backup_prog_exit_code, check $RUNTIME_LOGFILE and $backup_restore_log_file and the restored system"
is_true "$BACKUP_INTEGRITY_CHECK" && Error "Integrity check failed, restore aborted because BACKUP_INTEGRITY_CHECK is enabled"

(That LogPrint should be LogPrintError but in practice it does not matter currently.)

pcahyna commented at 2021-05-17 13:20:¶

Should I return a nonzero exit code from the script if rsync returns an error? I know that it does not make a difference now, but it gives a chance to detect such (possibly fatal, possibly non-fatal) errors globally in the future.
Should I also change the LogPrint "WARNING ! to LogPrintError "WARNING ! ?
And what about 500_make_rsync_backup? It has this

# everyone should see this warning, even if not verbose
test "$_rc" -gt 0 && VERBOSE=1 LogPrint "WARNING !
There was an error (${rsync_err_msg[$_rc]}) during archive creation.

Should I use LogPrintError "WARNING ! instead of VERBOSE=1 LogPrint "WARNING ! ? It seems to have basically the same meaning and is more idiomatic.
More importantly, shouldn't we abort the backup run when this happens? The warning message says that

Since errors are often related to files that cannot be saved by
$BACKUP_PROG, we will continue the $WORKFLOW process. However, you MUST
verify the backup yourself before trusting it !

but is it a correct reasoning? Who will read the warning message before it is too late and one needs to restore from a broken backup?

jsmeix commented at 2021-05-17 13:46:¶

Of course scripts that actually fail should return 1
and LogPrintError should be used for error messages
where ReaR should not error out.

In contrast to not error out during "rear recover" after the backup was restored
even when there have been errors during backup restore
we could error out during "rear mkbackup"
when there have been real errors when making the backup.

Cf. what we do in backup/NETFS/default/500_make_backup.sh
in case of tar as backup program

if (( $backup_prog_rc == 1 )); then
    LogUserOutput "WARNING: ..."
else
    rc=$(cat $FAILING_BACKUP_PROG_RC_FILE)
    Error "$prog failed with return code $rc ..."

jsmeix commented at 2021-05-17 14:06:¶

@pcahyna
only FYI if you have a closer look at our current
backup/NETFS/default/500_make_backup.sh and
restore/NETFS/default/400_restore_backup.sh
you may notice things smell somewhat, cf.
https://github.com/rear/rear/issues/2265

pcahyna commented at 2021-05-17 14:25:¶

Of course scripts that actually fail should return 1
and LogPrintError should be used for error messages
where ReaR should not error out.

Now it is not clear to me what exactly qualifies as "actually fail". We seem to have two possible cases:

ReaR should abort
something is broken so we should be very loud about it but we should not abort according to your reasoning "... even when a backup restore program exits with "fatal error" it depends on the particular backup restore program and the user's environment what that actually means for the user who runs "rear recover".
Perhaps the backup restore program could not restore 1234 files but all those files are in /some/unimportant/directory/ where the user does not care ..."

According to your reasoning most of the cases should fall under 2. and 1. should be used sparsely, if at all.
Now, "scripts that actually fail should return 1" refers to 1. or to 2. ? Recently we added return 1 to some error branches if bootloader installation scripts and we agreed that we should not abort in those cases, only show an error message and continue, i.e. case 2. (Makes sense, because a restored system without EFI variables properly set is quite easy to repair for an experienced user, thus falls under "recreated in a basically usable state".)

By the way,

Cf. what we do in restore/NETFS/default/400_restore_backup.sh

LogPrint "Backup restore program $BACKUP_PROG failed with exit code $backup_prog_exit_code, check $RUNTIME_LOGFILE and $backup_restore_log_file and the restored system"
is_true "$BACKUP_INTEGRITY_CHECK" && Error

sounds good, should I use BACKUP_INTEGRITY_CHECK even in the case of rsync? I.e. when it is defined, be more strict about rsync error codes and Error out, instead ow printing a message and continuing? Sounds good for consistency between NETFS and RSYNC.

In contrast to not error out during "rear recover" after the backup was restored
even when there have been errors during backup restore
we could error out during "rear mkbackup"
when there have been real errors when making the backup.

Cf. what we do in backup/NETFS/default/500_make_backup.sh
in case of tar as backup program
if (( $backup_prog_rc == 1 )); then
    LogUserOutput "WARNING: ..."
else
    rc=$(cat $FAILING_BACKUP_PROG_RC_FILE)
    Error "$prog failed with return code $rc ..."

Sure, but that's not what the code is doing currently, with the reasoning that "Since errors are often related to files that cannot be saved by $BACKUP_PROG, we will continue the $WORKFLOW process. ". The change to Error would be a functional change, so I wanted to ask whether it is desirable.

pcahyna commented at 2021-05-17 14:33:¶

@pcahyna
only FYI if you have a closer look at our current
backup/NETFS/default/500_make_backup.sh and
restore/NETFS/default/400_restore_backup.sh
you may notice things smell somewhat, cf.
#2265

I actually did not intend to extend the cleanup to NETFS, so this is not good news :-/

pcahyna commented at 2021-05-17 22:19:¶

@jsmeix first version of the intended cleanup and fixes is here: https://github.com/pcahyna/rear/commits/cleanup-fix-rsync

jsmeix commented at 2021-05-18 11:00:¶

@pcahyna
regarding your
https://github.com/rear/rear/issues/2612#issuecomment-842369024

It is also not clear to me what exactly qualifies as "actually fail".
I decide things case by case via some kind of "reasonable gut feeling"
with the following general rules (except exceptions of course):

During "rear mkrescue" better error out than to blindly proceed,
cf. "Try hard to care about possible errors" in
https://github.com/rear/rear/wiki/Coding-Style
During "rear mkbackuponly" (i.e. the backup part of "rear mkbackup") I don't know.
Things depend on the backup tool and its environment, cf.
https://github.com/rear/rear/issues/2612#issuecomment-842338130
During "rear recover" better error out than to blindly proceed
up to the point when the backup gets restored
but do not error out after the backup was restored, cf.
https://github.com/rear/rear/issues/2612#issuecomment-842304271

Currently scripts that actually fail should return 1
which has no real effect except a debug message in the log which makes
it easier for us to see where things had failed when a user provides his log
and the return 1 in the script code makes it clear that things failed at that place
so currently a return 1 in a script is also explicit syntax to make it easier
for others who read the code to understand what goes on.

All I know about BACKUP_INTEGRITY_CHECK is what default.conf reads

Enable integrity check of the backup archive
(only with BACKUP=NETFS and BACKUP_PROG=tar)

Currently BACKUP_INTEGRITY_CHECK is used only in

conf/default.conf

backup/NETFS/default/500_make_backup.sh
verify/NETFS/default/550_check_backup_archive.sh
restore/NETFS/default/400_restore_backup.sh

verify/YUM/default/550_check_backup_archive.sh
restore/YUM/default/410_restore_backup.sh

but not in backup/YUM/default/500_make_backup.sh (no idea why not).

pcahyna commented at 2021-06-18 12:03:¶

@jsmeix can you please look at the branch again? I think the code is complete (but untested). I will do some tests and open a formal PR.

jsmeix commented at 2021-06-18 12:25:¶

@pcahyna
I will have a look - but likely not this week (I like to leave rather soon today).

jsmeix commented at 2021-06-21 07:16:¶

@pcahyna
could you rebase your https://github.com/pcahyna/rear/commits/cleanup-fix-rsync
on the current upstream master so that I could see only the differences
that actually belong to your cleanup-fix-rsync branch?

What I did:

# git clone https://github.com/pcahyna/rear.git

# mv rear rear.pcahyna.cleanup-fix-rsync

# cd rear.pcahyna.cleanup-fix-rsync

# git checkout cleanup-fix-rsync

But now I fail to find a git command how I could see only the sum of the changes
(i.e. not a sequence of individual commits) in diff -u format
that actually belong to the current cleanup-fix-rsync branch?

E.g. with

# git merge-base master cleanup-fix-rsync
d0d78bf2aa0c8d69242741dfb5cbe77de1d3950d

then git diff d0d78bf2aa0c8d69242741dfb5cbe77de1d3950d
shows 323 changes where most do not actually belong to this branch.

Also things like
https://stackoverflow.com/questions/4649356/how-do-i-run-git-log-to-see-changes-only-for-a-specific-branch
or
https://stackoverflow.com/questions/14848274/git-log-to-get-commits-only-for-a-specific-branch
or
https://stackoverflow.com/questions/20808892/git-diff-between-current-branch-and-master-but-not-including-unmerged-master-com/20809283
did not really help me in this case (tried a lot but none did what I actually liked to get).

Perhaps just my usual problems with git that I do not yet see the actually right command :-(

As a workaround to get the commit where the changes of the cleanup-fix-rsync branch
are actually based on I did

# git log --format="%ae %H" --graph | less

* pcahyna@redhat.com b39e9873ed7df310f4084050e484f360dd64bac9
* pcahyna@redhat.com 28dbe39f19206e8c1c5ebd6fcc99b0c1d1dbeafe
* pcahyna@redhat.com 06f2e4d80a6502a42e165e1eb27fb1e5c1fd2c45
* pcahyna@redhat.com c51f439129524170300044fd6c5c2d494d91b2cb
* pcahyna@redhat.com fb5db6df98356b5b77e00d47860b20416a6ad23e
* pcahyna@redhat.com a1318e6d5a8b680a6253b7f9d72e472a8ec5c6b5
* pcahyna@redhat.com 045b55da8f63dac7e63132127ba069f5dc23ed3a
* pcahyna@redhat.com d59cbd9e8afed7e8e7d5e21d03a2978d5272541e
* pcahyna@redhat.com e8906949ebdb9fcd830855e662cd8cd465b81c51
* pcahyna@redhat.com 89002e356c3ba79bec48e1e7d07a805f78c46625
* pcahyna@redhat.com 40d4cd8be70e78475a6926d5e4ead4db01d05ffc
* pcahyna@redhat.com 7a2feba3697eba851169c11ce47549c6f4cde575
* pcahyna@redhat.com 89f0f12738201865be9d546d1d87bb821214eeeb
* pcahyna@redhat.com 20383420e6ce94f43134617e47441c866a9627d5
* pcahyna@redhat.com d4c56df04f0299e7d89a392d5db6ddab58bc681e
* pcahyna@redhat.com 7617cbd29fe4769fc4292223fd40b946b73850e0
* pcahyna@redhat.com 817b801eaf8211a0548c626eaad0b198fffe8df4
* pcahyna@redhat.com 6749089fd350ab2aa1a05988734741bb938a3b54
* pcahyna@redhat.com fcab5ebf85ab43c1525926c9344554cdbbdcb38c
* pcahyna@redhat.com 0112f7ef4e0aae6de552af43b39b5b47157402cd
* pcahyna@redhat.com d48d451e0ec4247bd8be1fa80182f059c143d8be
* pcahyna@redhat.com d491907ec9787e9f0edcb56ea562646f49f1d3e7
* pcahyna@redhat.com 7afb163418da249d4a6b06d87025ac4b81a10575
* pcahyna@redhat.com 35f6dee662bdd523157c4ed5add06797a35579c8
* pcahyna@redhat.com 35f6dee662bdd523157c4ed5add06797a35579c8
* pcahyna@redhat.com 2f5e9e35780a8e28b62de0d109cccce1a2cde1c7
* pcahyna@redhat.com 58c8bd56c21fa9a59367d53c2cf6df38673f5c81
*   jsmeix@suse.com c6500f9c42f1e68a5aaf36d556b144cbb8e69369
...

So I could assume the changes of the cleanup-fix-rsync branch
are actually based on c6500f9c42f1e68a5aaf36d556b144cbb8e69369
(but that contradicts what git merge-base master cleanup-fix-rsync shows)
and then git diff c6500f9c42f1e68a5aaf36d556b144cbb8e69369
shows only 19 changes which looks reasonable.

pcahyna commented at 2021-06-21 08:21:¶

git diff c6500f9c42f1e68a5aaf36d556b144cbb8e69369 is the right command and it works for me. This is also what git merge-base master cleanup-fix-rsync shows for me. The problem is, in your clone, master is the one from my fork, not from the official repo, and I have not updated my fork in a while.
But yes, I will rebase the branch.

jsmeix commented at 2021-06-21 08:23:¶

Based on what I see by plain looking at what
git diff c6500f9c42f1e68a5aaf36d556b144cbb8e69369
shows in the cleanup-fix-rsync branch:

In your usr/share/rear/restore/RSYNC/default/200_remove_relative_rsync_option.sh
there is

-if grep -q -- "--relative" <<< $(echo ${BACKUP_RSYNC_OPTIONS[@]}); then
+if grep -q -- "--relative" <<< "${BACKUP_RSYNC_OPTIONS[*]}" ; then

-if grep -q -- "-R" <<< $(echo ${BACKUP_RSYNC_OPTIONS[@]}); then
+if grep -q -- "-R" <<< "${BACKUP_RSYNC_OPTIONS[@]}" ; then

I think the second case should be also "${BACKUP_RSYNC_OPTIONS[*]}"

Right now I did not see other real issues.

Only if you like and when you think it could also be included "by the way"
in a pull request for this issue here:

In your usr/share/rear/prep/RSYNC/default/150_check_rsync_protocol_version.sh
there is now

if [ "${RSYNC_USER}" != "root" -a $RSYNC_PROTO = "ssh" ]; then
    if [ $RSYNC_PROTOCOL_VERSION -gt 29 ]; then
        if grep -q "no xattrs" "$TMP_DIR/rsync_protocol"; then
            # no xattrs available in remote rsync, so --fake-super is not possible
            Error "rsync --fake-super not possible on system ($RSYNC_HOST) (no xattrs compiled in rsync)"
        else
            ...
        fi
    else
        Error "rsync --fake-super not possible on system ($RSYNC_HOST) (please upgrade rsync to 3.x)"
    fi
fi

I think this could be simplified to something like

if [ "${RSYNC_USER}" != "root" -a $RSYNC_PROTO = "ssh" ] ; then
    test $RSYNC_PROTOCOL_VERSION -gt 29 || Error "rsync --fake-super not possible on rsync host '$RSYNC_HOST' using rsync protocol version '$RSYNC_PROTOCOL_VERSION' (at least 3.x is needed)"
    # no xattrs available in remote rsync, so --fake-super is not possible
    grep -q "no xattrs" "$TMP_DIR/rsync_protocol" && Error "rsync --fake-super not possible on rsync host '$RSYNC_HOST' (no xattrs compiled in rsync)"
    ...
fi

i.e. less nested conditions (not needed here because Error exits)
cf. "Return early, return often" in
https://github.com/rear/rear/wiki/Coding-Style
which means here "Error out early in the code text".

In general stdout and stderr redirections are no longer needed
since stdout and stderr are redirected to /dev/null or to the log in debug modes
so you may remove stdout and stderr redirections if there is no real need for them.
There is real need to redirect stdout and/or stderr to /dev/null to avoid oververbose
command messages or to avoid messages that are false alarm in any case.
Cf. https://github.com/rear/rear/issues/1395
and "What to do with stdin, stdout, and stderr" in
https://github.com/rear/rear/wiki/Coding-Style

Rather old ReaR scripts may still have 8 blanks or tab character indentation
but 4 blanks indentation is the recommended way, cf. "Text layout" in
https://github.com/rear/rear/wiki/Coding-Style

jsmeix commented at 2021-06-21 08:25:¶

@pcahyna
thank you for the explanation why git merge-base master cleanup-fix-rsync
did not work for me - as so often one needs to know the details behind.

jsmeix commented at 2021-06-21 09:16:¶

Hmmm...
I do not understand why git merge-base master cleanup-fix-rsync
works for you but does not work for me because I did

git clone https://github.com/pcahyna/rear.git

so I expect that I have the same (i.e. a true clone) of what you have.
Strange...
I will never ever understand git.

pcahyna commented at 2021-06-21 09:19:¶

You don't have what I have locally, you have what I have in my fork on GitHub.
Locally, I have several remotes, one is the official repo, the other is my fork. Ans the master branch is from the official repo, not from my fork.

pcahyna commented at 2021-06-21 09:51:¶

Based on what I see by plain looking at what
git diff c6500f9c42f1e68a5aaf36d556b144cbb8e69369
shows in the cleanup-fix-rsync branch:

In your usr/share/rear/restore/RSYNC/default/200_remove_relative_rsync_option.sh
there is
-if grep -q -- "--relative" <<< $(echo ${BACKUP_RSYNC_OPTIONS[@]}); then
+if grep -q -- "--relative" <<< "${BACKUP_RSYNC_OPTIONS[*]}" ; then

-if grep -q -- "-R" <<< $(echo ${BACKUP_RSYNC_OPTIONS[@]}); then
+if grep -q -- "-R" <<< "${BACKUP_RSYNC_OPTIONS[@]}" ; then
I think the second case should be also "${BACKUP_RSYNC_OPTIONS[*]}"

Thanks for catching this, fixed now.

jsmeix commented at 2021-06-21 10:09:¶

Ahh - you had meant your local checkout - now it is clear - even for me :-)
So it is yet "another level of indirection" (RFC 1925 item 6a) issue
or "tertium datur" - in practice there is always "one more case" (RFC 1925 item 8)

pcahyna commented at 2021-06-21 12:11:¶

@jsmeix , I rebased and submitted a PR. I have not implemented your suggested simplification above in order to keep the amount of changes small (it is already a quite big diff already).

jsmeix commented at 2021-06-24 12:48:¶

With https://github.com/rear/rear/pull/2632 merged
this issue should be fixed.

[Export of Github issue for rear/rear.]

#2612 Issue closed: 400_restore_rsync_backup.sh does not show an error when restore rsync archive failed¶