#2258 Issue closed: 500_make_backup.sh fails inexplicably because of a hidden 'no space left on device' error

Labels: enhancement, support / question, fixed / solved / done, minor bug

jacekadamus88 opened issue at 2019-10-18 13:37:

Relax-and-Recover (ReaR) Issue Template

Fill in the following items before submitting a new issue
(quick response is not guaranteed with free support):

  • Relax-and-Recover 2.5 / 2019-05-10

  • Centos 7

OUTPUT=ISO
OUTPUT_URL=null
ISO_DIR=/var/backup/main
ISO_PREFIX=dr-backup-$(hostname)-$(date "+%Y-%m-%dT%H-%M-%S")
BACKUP=NETFS
BACKUP_URL=iso://backup
BACKUP_PROG_EXCLUDE=("${BACKUP_PROG_EXCLUDE[@]}" '/media/*' '/mnt/*' '/var/tmp/*' '/var/crash/*' '/var/backup/*' '/tmp/*')
USING_UEFI_BOOTLOADER=1
SECURE_BOOT_BOOTLOADER="/boot/efi/EFI/centos/shim.efi"
  • BareMetal

  • x86_64

  • UEFI

  • NVMe

  • rear -v mkbackup

  • Haven't found any. Everything was used to work a month ago, last successful backup has been taken 22nd SEP. Nothing really changed on the system, expect updates and new apps.

  • Currently there is a problem with creating a tar archive due to unknown reasons, or at least I couldn't find out the reason by myself. Would really appreciate any piece of advice on how to get Rear working again.

Console output of the rear -dv mkbackup command

```
rear mkbackup -dv
Relax-and-Recover 2.5 / 2019-05-10
Running rear mkbackup (PID 22225)
Using log file: /var/log/rear/rear-main.log
Using backup archive '/tmp/rear.Azj8RJHxymh4LIx/tmp/isofs/backup/backup.tar.gz'
Found EFI system partition /dev/nvme0n1p1 on /boot/efi type vfat
Using UEFI Boot Loader for Linux (USING_UEFI_BOOTLOADER=1)
Using autodetected kernel '/boot/vmlinuz-3.10.0-1062.1.2.el7.x86_64' as kernel in the recovery system
Creating disk layout
Docker is running, skipping filesystems mounted below Docker Root Dir /var/lib/docker
Using guessed bootloader 'EFI' (found in first bytes on /dev/nvme0n1)
Verifying that the entries in /var/lib/rear/layout/disklayout.conf are correct ...
Creating root filesystem layout
Skipping 'docker0': not bound to any physical interface.
Handling network interface 'eno1'
eno1 is a physical device
Handled network interface 'eno1'
To log into the recovery system via ssh set up /root/.ssh/authorized_keys or specify SSH_ROOT_PASSWORD
Using '/boot/efi/EFI/centos/shim.efi' as UEFI bootloader file
Copying logfile /var/log/rear/rear-main.log into initramfs as '/tmp/rear-main-partial-2019-10-18T13:48:08+0200.log'
Copying files and directories
Copying binaries and libraries
Copying all kernel modules in /lib/modules/3.10.0-1062.1.2.el7.x86_64 (MODULES contains 'all_modules')
Copying all files in /lib*/firmware/
Broken symlink '/usr/lib/firmware/3.10.0-957.12.2.el7.x86_64/intel-ucode/06-55-06' in recovery system because 'readlink' cannot determine its link target
Broken symlink '/usr/lib/firmware/3.10.0-957.12.2.el7.x86_64/intel-ucode/06-55-05' in recovery system because 'readlink' cannot determine its link target
Broken symlink '/usr/lib/firmware/3.10.0-957.12.2.el7.x86_64/intel-ucode/06-55-03' in recovery system because 'readlink' cannot determine its link target
Broken symlink '/usr/lib/firmware/3.10.0-957.21.3.el7.x86_64/intel-ucode/06-55-06' in recovery system because 'readlink' cannot determine its link target
Broken symlink '/usr/lib/firmware/3.10.0-957.21.3.el7.x86_64/intel-ucode/06-55-05' in recovery system because 'readlink' cannot determine its link target
Broken symlink '/usr/lib/firmware/3.10.0-957.21.3.el7.x86_64/intel-ucode/06-55-03' in recovery system because 'readlink' cannot determine its link target
Broken symlink '/usr/lib/firmware/3.10.0-957.12.1.el7.x86_64/intel-ucode/06-55-06' in recovery system because 'readlink' cannot determine its link target
Broken symlink '/usr/lib/firmware/3.10.0-957.12.1.el7.x86_64/intel-ucode/06-55-05' in recovery system because 'readlink' cannot determine its link target
Broken symlink '/usr/lib/firmware/3.10.0-957.12.1.el7.x86_64/intel-ucode/06-55-03' in recovery system because 'readlink' cannot determine its link target
Broken symlink '/usr/lib/firmware/3.10.0-957.el7.x86_64/intel-ucode/06-55-06' in recovery system because 'readlink' cannot determine its link target
Broken symlink '/usr/lib/firmware/3.10.0-957.el7.x86_64/intel-ucode/06-55-05' in recovery system because 'readlink' cannot determine its link target
Broken symlink '/usr/lib/firmware/3.10.0-957.el7.x86_64/intel-ucode/06-55-03' in recovery system because 'readlink' cannot determine its link target
Broken symlink '/usr/lib/modules/3.10.0-1062.1.2.el7.x86_64/source' in recovery system because 'readlink' cannot determine its link target
Broken symlink '/usr/lib/modules/3.10.0-1062.1.2.el7.x86_64/build' in recovery system because 'readlink' cannot determine its link target
Broken symlink '/etc/grub2.cfg' in recovery system because 'readlink' cannot determine its link target
Skip copying broken symlink '/etc/mtab' target '/proc/3462/mounts' on /proc/ /sys/ /dev/ or /run/
Testing that the recovery system in /tmp/rear.Azj8RJHxymh4LIx/rootfs contains a usable system
Creating recovery/rescue system initramfs/initrd initrd.cgz with gzip default compression
Created initrd.cgz with gzip default compression (257938795 bytes) in 18 seconds
Creating tar archive '/tmp/rear.Azj8RJHxymh4LIx/tmp/isofs/backup/backup.tar.gz'
```

Bottom of the /var/log/rear/rear-main.log file, containing the error message details

```
2019-10-18 13:49:04.918909713 Creating tar archive '/tmp/rear.Azj8RJHxymh4LIx/tmp/isofs/backup/backup.tar.gz'
2019-10-18 14:04:08.976409464 ERROR:  failed with return code  and below output:
  ---snip---

  ----------
This means that the archiving process ended prematurely, or did
not even start. As a result it is unlikely you can recover this
system properly. Relax-and-Recover is therefore aborting execution.

===== Stack trace =====
Trace 0: /sbin/rear:541 main
Trace 1: /usr/share/rear/lib/mkbackup-workflow.sh:20 WORKFLOW_mkbackup
Trace 2: /usr/share/rear/lib/framework-functions.sh:116 SourceStage
Trace 3: /usr/share/rear/lib/framework-functions.sh:56 Source
Trace 4: /usr/share/rear/backup/NETFS/default/500_make_backup.sh:252 source
=== End stack trace ===
2019-10-18 14:04:08.981680420 Exiting rear mkbackup (PID 22225) and its descendant processes ...
2019-10-18 14:04:12.095056301 rear,22225 /sbin/rear mkbackup -dv
  `-rear,28022 /sbin/rear mkbackup -dv
      `-pstree,28023 -Aplau 22225
/usr/share/rear/lib/_input-output-functions.sh: line 146: kill: (28028) - No such process
2019-10-18 14:04:12.127059373 Running exit tasks
2019-10-18 14:04:12.128386525 Exit task 'rmdir -v /tmp/rear.Azj8RJHxymh4LIx/outputfs >&2'
rmdir: removing directory, '/tmp/rear.Azj8RJHxymh4LIx/outputfs'
2019-10-18 14:04:12.130638577 Exit task 'cleanup_build_area_and_end_program'
2019-10-18 14:04:12.131872508 Finished in 968 seconds
2019-10-18 14:04:12.133013772 You should also rm -Rf /tmp/rear.Azj8RJHxymh4LIx
2019-10-18 14:04:12.134191381 End of program reached
2019-10-18 14:04:12.135424790 Exit task '(( EXIT_FAIL_MESSAGE )) && echo 'rear mkbackup failed, check /var/log/rear/rear-main.log for details' 1>&8'
2019-10-18 14:04:12.136660273 Exit task 'exec 8>&-'
2019-10-18 14:04:12.137862904 Exit task 'exec 7>&-'
2019-10-18 14:04:12.139110342 Exit task 'exec 6<&-'
2019-10-18 14:04:12.140367359 Exit task ''
```

jacekadamus88 commented at 2019-10-21 00:07:

Seems like, after many hours, I have managed to work around the problem.

Despite the fact that I had my S3 buckets excluded from the backup, they have been root cause of this issue, Everything has been resolved after umounting them completely. It was quite a big surprise to me. I have tried to understand the reason. Although it is not my area of expertise, I strongly believe this might be related to the dd command.

I would appreciate it if someone with bigger knowledge could explain the reason and tell me whether this is a bug in the REAR or somewhere else.

jsmeix commented at 2019-10-21 10:59:

@jacekadamus88
I never used anything like Amazon Simple Storage Service (S3)
where its storage objects are organized into so called 'buckets'
(I read https://en.wikipedia.org/wiki/Amazon_S3 for the first time)
so I know nothing at all about its implications in relation to tar.

According to your
https://github.com/rear/rear/issues/2258#issuecomment-544307011

I had my S3 buckets excluded from the backup,
they have been root cause of this issue,
Everything has been resolved after umounting them completely.

I guess the following happened in your case:

By default ReaR has some automatism what disk components
(like partitions, filesystems, mountpounts) get included or excluded
and some (other but related) automatism what directories get
included or excluded in the backup which do most of the time
what is usually expected but no automatism works perfect.

From my experience with other issues the backup include/exclude functionality
versus the disk components include/exclude functionality in ReaR gets "hairy"
when it comes to unusual cases.

By default ReaR includes basically "what is mounted"
while "rear mkrescue/mkbackup" is running
except some automatically excluded things,
cf. https://github.com/rear/rear/issues/2229

Regarding the backup I implemented some time ago
BACKUP_ONLY_INCLUDE and BACKUP_ONLY_EXCLUDE
see its documentation in the default.conf file
https://github.com/rear/rear/blob/master/usr/share/rear/conf/default.conf#L999
that should help to specify explicitly what gets included in the backup.

Note that things like BACKUP_ONLY_INCLUDE and BACKUP_ONLY_EXCLUDE
and many other BACKUP_* variables only

apply to all cases of internal Relax-and-Recover backup

see
https://github.com/rear/rear/blob/master/usr/share/rear/conf/default.conf#L893

Run rear -D mkbackup (-D is debugscript mode)
and inspect the log file what exactly happens while
usr/share/rear/backup/NETFS/default/400_create_include_exclude_files.sh
https://github.com/rear/rear/blob/master/usr/share/rear/backup/NETFS/default/400_create_include_exclude_files.sh
is running.
That should (hopefully) show why a particular directory
is automatically included in the backup,
cf. the section "Debugging issues with Relax-and-Recover" in
https://en.opensuse.org/SDB:Disaster_Recovery

Furthermore use KEEP_BUILD_DIR="yes" cf.
https://github.com/rear/rear/blob/master/usr/share/rear/conf/default.conf#L128
so that you can after rear mkbackup inspect the contents of the files

$TMPDIR/rear.XXXXXXXXXXXXXXX/tmp/backup-include.txt
$TMPDIR/rear.XXXXXXXXXXXXXXX/tmp/backup-exclude.txt

to see what directories are actually included and excluded in the backup.

gdha commented at 2019-10-21 11:38:

@jacekadamus88 were the s3 buckets mounted according article https://tecadmin.net/mount-s3-bucket-centosrhel-ubuntu-using-s3fs/ ?
If that is the case, then we can confirm that this is not (yet) supported by ReaR. However, you are most welcome to prepare a PR for it - we would really appreciate it.

jsmeix commented at 2019-10-21 11:41:

@jacekadamus88
can we please have a complete rear -D mkbackup debug log file
from your case where it errors out with

2019-10-18 13:49:04.918909713 Creating tar archive '/tmp/rear.Azj8RJHxymh4LIx/tmp/isofs/backup/backup.tar.gz'
2019-10-18 14:04:08.976409464 ERROR:  failed with return code  and below output:
  ---snip---

  ----------

because the matching code in
usr/share/rear/backup/NETFS/default/500_make_backup.sh
is

                rc=$(cat $FAILING_BACKUP_PROG_RC_FILE)
                Error "$prog failed with return code $rc and below output:
  ---snip---
$(grep "^$prog: " "${TMP_DIR}/${BACKUP_PROG_ARCHIVE}.log" | sed -e 's/^/  /' | tail -n3)
  ----------

but in your case both cat $FAILING_BACKUP_PROG_RC_FILE
and grep '^tar: ' "${TMP_DIR}/${BACKUP_PROG_ARCHIVE}.log"
result empty output which is not how things are expected to work.
I would like to understand why in your case there is no such output.
I guess your case is a special case where there is no such output
but then we should be prepared for that case with reasonable
fallback values to avoid such emptiness in the output.

jsmeix commented at 2019-10-21 11:59:

@gdha
because
https://tecadmin.net/mount-s3-bucket-centosrhel-ubuntu-using-s3fs/
reads (excerpt)

S3FS is FUSE (File System in User Space) based solution
to mount an Amazon S3 buckets

I wonder about if rear mkbackup could get in trouble
in general when there are things mouned via FUSE?

jacekadamus88 commented at 2019-10-22 00:51:

@jsmeix
I appreciate your detailed reply about ReaR configuration.
Regarding your reply

By default ReaR has some automatism what disk components
(like partitions, filesystems, mountpounts) get included or excluded
and some (other but related) automatism what directories get
included or excluded in the backup which do most of the time
what is usually expected but no automatism works perfect.

From my experience with other issues the backup include/exclude functionality
versus the disk components include/exclude functionality in ReaR gets "hairy"
when it comes to unusual cases.

By default ReaR includes basically "what is mounted"
while "rear mkrescue/mkbackup" is running
except some automatically excluded things,
cf. #2229

Regarding the backup I implemented some time ago
BACKUP_ONLY_INCLUDE and BACKUP_ONLY_EXCLUDE

I haven't tested this yet, but these two configuration parameters could probably be considered the second workaround in this particular case.

I have recovered the system from the backup into the virtual machine and run rear -D mkbackup as advised.
Thankfully issue is reproducible after mounting s3fs.

df -P -T /mnt/*
Filesystem     Type       1024-blocks  Used    Available Capacity Mounted on
s3fs           fuse.s3fs 274877906944     0 274877906944       0% /mnt/s3archive
s3fs           fuse.s3fs 274877906944     0 274877906944       0% /mnt/s3backups

Output from the execution (shrink)
rear_debug_mode.txt

Additionally:
backup-include.txt

/home
/
/boot/efi
/boot

backup-exclude.txt

/tmp/*
/dev/shm/*
/var/lib/rear/output/*
/media/*
/mnt/*
/var/tmp/*
/var/crash/*
/var/backup/*
/tmp/*
/tmp/rear.DoKltaTo385Ap5K

@gdha
I believe, more or less to the mentioned article.
These are exact commands i use to mount s3 buckets

s3fs -o umask=0007,uid=0,gid=1008 -o allow_other aws-backups /mnt/s3backups
s3fs -o umask=0007,uid=0,gid=1008 -o allow_other aws-archive /mnt/s3archive

I wonder about if rear mkbackup could get in trouble
in general when there are things mouned via FUSE?

I am afraid this is the case here, although the issue might be related to one of the internally used commands (tar, dd)?

jsmeix commented at 2019-10-22 09:18:

@jacekadamus88
thank you for your prompt and detailed reply.

Regarding

the issue might be related to one of the
internally used commands (tar, dd)?

ReaR calls tar with the --one-file-system option, see
https://github.com/rear/rear/blob/master/usr/share/rear/backup/NETFS/default/500_make_backup.sh#L136
for the whole tar command.

So you could try out how tar behaves in your particular case
while you have things mounted via FUSE at some sub-directory of /mnt/
when you try out things like

tar -czvf /dev/null --no-wildcards-match-slash --one-file-system --anchored --exclude='/mnt/*' --exclude='/tmp/*' /  2>&1 | tee /tmp/tar.out

For me (on openSUSE Leap 15.0) that seems to work.
For my test I use a sshfs mounted via FUSE (excerpts):

# pwd
/root

# mkdir ~/fusessh

# sshfs root@localhost:/tmp/ ~/fusessh
Password: ...

# mount | grep fuse
...
root@localhost:/tmp/ on /root/fusessh type fuse.sshfs (rw,nosuid,nodev,relatime,user_id=0,group_id=0)
...

# tar -czvf /dev/null --no-wildcards-match-slash --one-file-system --anchored /root/ 2>&1 | grep fuse
/root/fusessh/
tar: /root/fusessh/: file is on a different filesystem; not dumped

so for me tar --one-file-system excludes my sshfs mounted via FUSE.

jacekadamus88 commented at 2019-10-23 00:43:

@jsmeix
Thank you for providing that tar command.
tar -czvf /dev/null --no-wildcards-match-slash --one-file-system --anchored --exclude='/mnt/*' --exclude='/tmp/*' / 2>&1 | tee /tmp/tar.out
Seems like it executed successfully.

I have run rear -v mkbackuponly and still no success.

Since tar behaves properly, I would say the issue might be somewhere within "exit code" logic. In order, to track it down probably some 500_make_backup.sh code changes are necessary.

jsmeix commented at 2019-10-23 10:24:

@jacekadamus88
because your
https://github.com/rear/rear/files/3753431/rear_debug_mode.txt
contains only (excerpts)

++ ProgressInfo 'Archived 12506 MiB [avg 5349 KiB/sec] '
++ echo -en '\e[2K\rArchived 12506 MiB [avg 5349 KiB/sec] \e7'
++ sleep 1
++ kill -0 14877
++ ProgressStop
++ echo -e '\e8\e[KOK'
++ transfertime=2395
++ wait 14877
++ backup_prog_rc=2

I also need your backup.log file which contains
the debug output for the backup program subshell
that is run by backup/NETFS/default/500_make_backup.sh

Usually the backup.log file is copied to your NETFS output location
but you use OUTPUT_URL=null.
The backup.log file ist is also available where rear -D mkbackup is run
there in /tmp/rear.XXXXXXXX/tmp/backup.log
provided you have KEEP_BUILD_DIR="yes" set,
see the KEEP_BUILD_DIR description in default.conf
https://github.com/rear/rear/blob/master/usr/share/rear/conf/default.conf#L128

For example one of my backup.log files contains (excerpts)

++ dd of=/tmp/rear.y426ClQIfRgufom/outputfs/linux-44ml/backup.tar.gz
+++ cat /tmp/rear.y426ClQIfRgufom/tmp/backup-include.txt
++ tar --warning=no-xdev --sparse --block-number --totals --verbose --no-wildcards-match-slash --one-file-system --ignore-failed-read --anchored --xattrs --xattrs-include=security.capability --xattrs-include=security.selinux --acls --gzip -X /tmp/rear.y426ClQIfRgufom/tmp/backup-exclude.txt -C / -c -f - / /root/rear.jsmeix/var/log/rear/rear-linux-44ml.log
tar: Removing leading `/' from member names
block 0: /
block 3: /etc/
block 6: /etc/fstab
...
block 6279792: /root/rear.jsmeix/var/log/rear/rear-linux-44ml.log
Total bytes written: 3215257600 (3.0GiB, 15MiB/s)
3390977+1 records in
3390977+1 records out
1736180480 bytes (1.7 GB, 1.6 GiB) copied, 217.798 s, 8.0 MB/s
++ pipes_rc=(${PIPESTATUS[@]})
++ let index=3-1
++ '[' 2 -ge 0 ']'
++ rc=0
++ '[' 0 -ne 0 ']'
++ let index--
++ '[' 1 -ge 0 ']'
++ rc=0
++ '[' 0 -ne 0 ']'
++ let index--
++ '[' 0 -ge 0 ']'
++ rc=0
++ '[' 0 -ne 0 ']'
++ let index--
++ '[' -1 -ge 0 ']'
++ exit 0

Additionally to better see how the Exit code logic
actually behaves in your case please insert in your
usr/share/rear/backup/NETFS/default/500_make_backup.sh
the line

for e in "${pipes_rc[@]}" ; do echo "pipes_rc array element is '$e' " ; done

at

  let index=${#pipes_rc[@]}-1
  for e in "${pipes_rc[@]}" ; do echo "pipes_rc array element is '$e' " ; done
  while [ $index -ge 0 ] ; do

and then provide your backup.log file
for a `rear -D mkbackup' run that fails for you.

Furthermore you should inspect your backup.log file
for suspicious (error) messages that my indicate
why things fail in your particular case.

jacekadamus88 commented at 2019-10-24 20:33:

@jsmeix
Thank you for your next great reply

I have followed all suggested hints including KEEP_BUILD_DIR="yes" and exit logic debug

for e in "${pipes_rc[@]}" ; do echo "pipes_rc array element is '$e' " ; done

Looks like those things helped me to find out the real error being thrown on that VM.
It was simle

No space left on device

I have recovered some space to accommodate the "build" directory and iso image.
And it went through.

The output from the backup.log

Total bytes written: 28176240640 (27GiB, 11MiB/s)
23338923+1 records in
23338923+1 records out
11949529084 bytes (12 GB) copied, 2542.86 s, 4.7 MB/s
++ pipes_rc=(${PIPESTATUS[@]})
++ let index=2-1
++ for e in '"${pipes_rc[@]}"'
++ echo 'pipes_rc array element is '\''1'\'' '
++ for e in '"${pipes_rc[@]}"'
++ echo 'pipes_rc array element is '\''0'\'' '
++ '[' 1 -ge 0 ']'
++ rc=0
++ '[' 0 -ne 0 ']'
++ let index--
++ '[' 0 -ge 0 ']'
++ rc=1
++ '[' 1 -ne 0 ']'
++ echo tar
++ echo 1
++ '[' 1 -eq 1 ']'
++ '[' tar '!=' tar ']'
++ exit 1

Although this doesn't answer what was the root cause of the issue on the Real server, however, space issues can be excluded from the list of unexpected issues.

Currently, I am not able to kick off the backup on real server, however, I will make use of the suggested hints to track it down there whenever possible.

I claimed that the root cause of the issue is somewhere within Rear, however, this is not true. Seems like ReaR does handle s3fs well (at least it ignores it properly when configured to do so).

I believe we can close that issue.

I would like to thank @jsmeix for the pieces of advice, that helped me to get to the bottom of the problem. Hopefully, anyone else struggling with a similar problem will benefit from this conversation.

jsmeix commented at 2019-10-25 09:34:

@jacekadamus88
thank you for your explanatory feedback.

As time permits I will try to reproduce it on my VMs
because I would like to understand why such a hard error
like "No space left on device" was somehow hidden
so that 500_make_backup.sh failed inexplicably.

If I can reproduce it I wish I could get the root cause error message
so that 500_make_backup.sh can show that error message to the user.

And this issue shows another problem in 500_make_backup.sh:

Parts of the set -e debug output are not in the usual log file
but hidden in the separated backup.log file which is bad
because we needed so much time and effort to find out
where the root cause error actually is.
The separated backup.log file contradicts

All what could be useful for later debugging
in case of issues must appear in ReaR's log file.

in the section "What to do with stdin, stdout, and stderr" in
https://github.com/rear/rear/wiki/Coding-Style

My immediate untested offhanded idea is to somehow get
the set -e debug output of the backup subshell process
also in the normal log file (in addition to the backup.log file).

Furthermore this issue may show a redirection problem
in 500_make_backup.sh:

(
case "$(basename ${BACKUP_PROG})" in
    ...
esac 2> "${TMP_DIR}/${BACKUP_PROG_ARCHIVE}.log"
...
) &
BackupPID=$!

Only stderr is redirected into the backup.log file
but not also stdout as it should be normally done nowadays
according to "What to do with stdin, stdout, and stderr" in
https://github.com/rear/rear/wiki/Coding-Style
so stdout messages from the backup subshell would
appear in the normal ReaR log file needlessly split apart
matching stderr messages from the backup subshell.

Finally this issue shows one more problem in ReaR:

The backup.log file does not appear as the normal ReaR log file
in var/log/rear locally on the system where 'rear mkbackup' was run
at least not for me with

OUTPUT=ISO
BACKUP=NETFS
BACKUP_URL=nfs://your.NFS.server.IP/path/to/your/rear/backup

where the backup.log file only appears on my remote NFS share
but not locally in var/log/rear.

jsmeix commented at 2019-10-25 09:35:

The support question part of this issue is solved
but the found minor bugs and needed enhancements
keep this issue open.

jsmeix commented at 2019-10-25 09:40:

@jacekadamus88
if you find futher issues in ReaR because of what still fails for you
I would prefer if you open new separated GitHub issues
each one for each separated issue in ReaR so that
we can handle separated issues separatedly.

pcahyna commented at 2019-10-25 10:03:

PR #1914 was supposed to fix error reporting from tar (e.g. in case of no space left on device).

jsmeix commented at 2019-10-25 10:27:

Yes, I know,
but it seems in this case the tar exit code is 1
see "The output from the backup.log" in
https://github.com/rear/rear/issues/2258#issuecomment-546091056
which should output (excerpt)

This means that files have been modified during the archiving

but that does not match the initial comment here
https://github.com/rear/rear/issues/2258#issue-509086552
where the output is (excerpt)

This means that the archiving process ended prematurely

which happens when exit code > 1 according to the code
https://github.com/rear/rear/blob/master/usr/share/rear/backup/NETFS/default/500_make_backup.sh#L262

Puzzling...

jsmeix commented at 2019-10-25 12:33:

I can reproduce it:

I use a virtual machine with a separated filesystem for /tmp
that has 1GiB size which is too small to store the backup
there via BACKUP_URL=file:///tmp.

I use in etc/rear/local.conf

OUTPUT=ISO
BACKUP=NETFS
BACKUP_OPTIONS="nfsvers=3,nolock"
BACKUP_URL=file:///tmp
SSH_ROOT_PASSWORD="rear"
USE_DHCLIENT="yes"
PROGRESS_MODE=plain
PROGRESS_WAIT_SECONDS=3
KEEP_BUILD_DIR="yes"
FIRMWARE_FILES=( no )
AUTOEXCLUDE_PATH=( /media /run /mnt /tmp )

I get no var/log/rear/backup.log file.
I have only the one in /tmp/rear.XXXX/tmp
I.e. without KEEP_BUILD_DIR="yes" I would have
no backup.log file at all.

In my /tmp/rear.XXXX/tmp/backup.log there is (excerpts)

...
++++ echo tar
+++ basename tar
++++ awk '{ print $1 }'
++++ echo '/usr/bin/openssl des3 -salt -k '
+++ basename /usr/bin/openssl
++++ awk '{ print $1 }'
++++ echo 'dd of=/tmp/linux-4lv0/backup.tar.gz'
+++ basename dd
...
++ dd of=/tmp/linux-4lv0/backup.tar.gz
+++ cat /tmp/rear.lwYlarbzswNQcsq/tmp/backup-include.txt
++ tar --warning=no-xdev --sparse --block-number --totals --verbose --no-wildcards-match-slash --one-file-system --ignore-failed-read --anchored --xattrs --xattrs-include=security.capability --xattrs-include=security.selinux --acls --gzip -X /tmp/rear.lwYlarbzswNQcsq/tmp/backup-exclude.txt -C / -c -f - / /home /root/rear.github.master/var/log/rear/rear-linux-4lv0.log
tar: Removing leading `/' from member names
block 0: /
block 3: /mnt/
...
block 1930133: /usr/lib/git/git-http-push
dd: writing to '/tmp/linux-4lv0/backup.tar.gz': No space left on device
750497+0 records in
750496+0 records out
384253952 bytes (384 MB, 366 MiB) copied, 51.0999 s, 7.5 MB/s
++ pipes_rc=(${PIPESTATUS[@]})
++ let index=2-1
++ '[' 1 -ge 0 ']'
++ rc=1
++ '[' 1 -ne 0 ']'
++ echo openssl
/root/rear.github.master/usr/share/rear/backup/NETFS/default/500_make_backup.sh: line 162: echo: write error: No space left on device
++ echo 1
/root/rear.github.master/usr/share/rear/backup/NETFS/default/500_make_backup.sh: line 163: echo: write error: No space left on device
++ '[' 1 -eq 1 ']'
++ '[' openssl '!=' tar ']'
++ rc=2
++ exit 2

In my var/log/rear/rear-linux-4lv0.log there is (excerpt):

++ case "$(basename $BACKUP_PROG)" in
+++ basename tar
++ ((  2 != 0  ))
+++ cat /tmp/rear.lwYlarbzswNQcsq/tmp/failing_backup_prog
++ prog=
++ ((  2 == 1  ))
+++ cat /tmp/rear.lwYlarbzswNQcsq/tmp/failing_backup_prog_rc
++ rc=
+++ sed -e 's/^/  /'
+++ tail -n3
+++ grep '^: ' /tmp/rear.lwYlarbzswNQcsq/tmp/backup.log
++ Error ' failed with return code  and below output:
  ---snip---

  ----------
This means that the archiving process ended prematurely, or did
not even start. As a result it is unlikely you can recover this
system properly. Relax-and-Recover is therefore aborting execution.
'

The reason for the emptiness of exit code and
backup program error message in the output
is the emptiness of the files
/tmp/rear.lwYlarbzswNQcsq/tmp/failing_backup_prog
and
/tmp/rear.lwYlarbzswNQcsq/tmp/failing_backup_prog_rc
that should contain that info but cannot contain that info
because those files cannot be written

++ echo openssl
... echo: write error: No space left on device
++ echo 1
... echo: write error: No space left on device

This shows there is one more bug in 500_make_backup.sh
that is related to the code

        # Variable used to record the short name of piped commands in case of
        # error, e.g. ( "tar" "cat" "dd" ) in case of unencrypted and unsplit backup.
        backup_prog_shortnames=(
            "$(basename $(echo "$BACKUP_PROG" | awk '{ print $1 }'))"
            "$(basename $(echo "$BACKUP_PROG_CRYPT_OPTIONS" | awk '{ print $1 }'))"
            "$(basename $(echo "$SPLIT_COMMAND" | awk '{ print $1 }'))"
        )

that leads to

backup_prog_shortnames[0]="tar"
backup_prog_shortnames[1]="openssl"
backup_prog_shortnames[2]="dd"

But openssl is not used in the backup pipe in my case
because BACKUP_PROG_CRYPT_ENABLED is not true
which leads to a wrong echo openssl instead of echo dd
for the failed program name in the backup pipe
which would result a wrong $prog value for the code

            else
                rc=$(cat $FAILING_BACKUP_PROG_RC_FILE)
                Error "$prog failed with return code $rc and below output:
  ---snip---
$(grep "^$prog: " "${TMP_DIR}/${BACKUP_PROG_ARCHIVE}.log" | sed -e 's/^/  /' | tail -n3)
  ----------
This means that the archiving process ended prematurely, or did
not even start. As a result it is unlikely you can recover this
system properly. Relax-and-Recover is therefore aborting execution.
"
            fi

This last found bug in 500_make_backup.sh is a result
of my https://github.com/rear/rear/pull/2156

jsmeix commented at 2019-10-25 12:36:

@jacekadamus88
thank you for your issue report.
It revealed several issues in backup/NETFS/default/500_make_backup.sh
and perhaps also in restore/NETFS/default/400_restore_backup.sh ...

gdha commented at 2019-10-25 12:49:

@jsmeix perhaps a simple trick like I used in https://github.com/gdha/upgrade-ux/blob/master/opt/upgrade-ux/scripts/postexecute/default/96_call_for_action_after_preview.sh would be sufficient? We should not try to do/foresee everything in ReaR.

jsmeix commented at 2019-10-25 12:58:

@gdha
yes, we must not try to do/foresee everything in ReaR
but we should be prepared for unexpected cases
with a reasonable fallback behaviour, cf.
https://github.com/rear/rear/issues/2258#issuecomment-544476253
and "Try hard to care about possible errors" in
https://github.com/rear/rear/wiki/Coding-Style

So I will now try even harder to care about possible errors in
NETFS 500_make_backup.sh and 400_restore_backup.sh
;-)

jsmeix commented at 2019-10-25 13:01:

@pcahyna
I think I will now really add a new section

Beware of the emptiness!

to https://github.com/rear/rear/wiki/Coding-Style
cf. https://github.com/rear/rear/issues/2220#issuecomment-527489396

jsmeix commented at 2019-10-25 13:04:

FYI:
Next week I am not in the office so I wish all of you
a nice weekend, a relaxed working week, and one more nice weekend!

jsmeix commented at 2019-10-25 13:06:

I close this issue because the found minor bugs
and needed enhancements will be handled via
https://github.com/rear/rear/issues/2265


[Export of Github issue for rear/rear.]