#3027 PR merged
: Make sure rescue contains all COPY_AS_IS files¶
Labels: bug
, fixed / solved / done
rmetrich opened issue at 2023-07-19 08:18:¶
Relax-and-Recover (ReaR) Pull Request Template¶
Please fill in the following items before submitting a new pull request:
Pull Request Details:¶
-
Type: Bug Fix
-
Impact: Normal
-
Reference to related issue (URL): N/A
-
How was this pull request tested?
Using an automated reproducer implemented using systemtap
-
Create a file in
/dev
which will be embedded in the rescue environment# dd if=/dev/urandom of=/dev/shrinking bs=10K count=3 3+0 records in 3+0 records out 30720 bytes (31 kB, 30 KiB) copied, 0.000689566 s, 44.5 MB/s
-
Execute the systemtap script in charge of shrinking the file while being copied
# stap -v -g ./shrinking.stp [...] Pass 5: starting run.
-
Execute
rear mkrescue
from another terminal -
Brief description of the changes in this pull request:
The files copied as-is in the rescue are copied using a
tar -c | tar -x
command.
It may happen that inodes are shrinking during the copy (e.g.
/dev/mqueue/nnsc
(HPE device)), which can cause the tar -x
command
to stop extracting the next files, due to the padding zeros inserted in
the shrank inode.
This leads to an error when checking the integrity of the rescue
environment.
The solution is to add -i
option when extracting, to continue the
processing.
rmetrich commented at 2023-07-19 08:21:¶
Without the fix:
ERROR: ReaR recovery system in '/var/tmp/rear.tpfZyNy6ayS53wP/rootfs' not usable (required libraries are missing)
Some latest log messages since the last called script 990_verify_rootfs.sh:
wipefs is /bin/wipefs
mkfs is /bin/mkfs
mkfs.xfs is /bin/mkfs.xfs
xfs_admin is /bin/xfs_admin
mkswap is /bin/mkswap
cryptsetup is /bin/cryptsetup
dmsetup is /bin/dmsetup
ldconfig is /bin/ldconfig
The reason is everything after /dev/shrinking
is not copied, because
tar -x
stopped silently (without error, believing the end of the
archive was reached):
# grep -A 2 shrinking /var/tmp/rear.tpfZyNy6ayS53wP/tmp/copy-as-is-filelist
/dev/shrinking
tar: /dev/shrinking: File shrank by 5120 bytes; padding with zeros
/dev/vcsa6
/dev/vcs6
# ls -l /var/tmp/rear.tpfZyNy6ayS53wP/rootfs/dev/shrinking /var/tmp/rear.tpfZyNy6ayS53wP/rootfs/dev/vcsa6
ls: cannot access '/var/tmp/rear.tpfZyNy6ayS53wP/rootfs/dev/vcsa6': No such file or directory
-rw-r--r--. 1 root root 30720 Jul 19 10:12 /var/tmp/rear.tpfZyNy6ayS53wP/rootfs/dev/shrinking
rmetrich commented at 2023-07-19 08:22:¶
systemtap shrinking.stp
script:
global openat
global to_catch
global shrinking_fd
global nread
probe syscall.openat {
if (execname() != "tar") next
if (filename_unquoted != "shrinking") next
to_catch[tid()] = 1
openat[tid()] = 1
}
probe syscall.openat.return {
if (! openat[tid()]) next
delete openat[tid()]
shrinking_fd = retval
nread = 0
printf("FD is %ld\n", retval)
}
probe syscall.close {
if (! to_catch[tid()]) next
if (fd != shrinking_fd) next
delete to_catch[tid()]
}
probe syscall.read {
if (! to_catch[tid()]) next
if (fd != shrinking_fd) next
nread++
if (nread < 3) next
printf("Truncating the file\n")
system("truncate -s 25K /dev/shrinking")
mdelay(2000)
exit()
}
gdha commented at 2023-07-19 08:33:¶
@rmetrich Hi - is this related to Message Queue brokers? Could you say
something about the background of this issue? I also saw from time to
time that dd
seems to hang, but this could be the same issue as I saw
it on kubernetes nodes with RabbitMQ pods...
rmetrich commented at 2023-07-19 08:47:¶
@rmetrich Hi - is this related to Message Queue brokers? Could you say something about the background of this issue? I also saw from time to time that
dd
seems to hang, but this could be the same issue as I saw it on kubernetes nodes with RabbitMQ pods...
Yes this was seen with /dev/mqueue/nnsc
device node, something shipped
by HPE apparently. One of my customer constantly hits the issue when
building the ISO.
pcahyna commented at 2023-07-19 09:00:¶
Is /dev/mqueue/nnsc
really a device node, or a regular file? I think
the content of device special files should not be copied - only the
inode (metadata) ...
rmetrich commented at 2023-07-19 09:44:¶
Is
/dev/mqueue/nnsc
really a device node, or a regular file? I think the content of device special files should not be copied - only the inode (metadata) ...
It seems to be a regular file:
/dev/mqueue:
total 0
drwxrwxrwt. 2 0 0 60 Jun 17 20:04 .
drwxr-xr-x. 23 0 0 5740 Jun 18 19:30 ..
---x--x--T. 1 0 0 80 Jun 21 13:30 nnsc
But none of this really matters, all we must take care of is archives with zero padding can be extracted properly.
jsmeix commented at 2023-07-19 11:38:¶
@rmetrich
thank you for your prompt explanatory comment!
jsmeix commented at 2023-07-19 11:38:¶
@rear/contributors
I would like to merge it tomorrow afternoon
unless there are objections
jsmeix commented at 2023-07-19 11:50:¶
Only for the log (for completeness):
I will never fully understand how 'tar' works.
Neither the old GNU tar version 1.15.1 man page (see above)
nor the newer GNU tar version 1.34 man page (in openSUSE Leap 15.4)
-i, --ignore-zeros
Ignore zeroed blocks in archive.
Normally two consecutive 512-blocks filled with zeroes mean EOF
and tar stops reading after encountering them.
This option instructs it to read further and
is useful when reading archives created with the -A option.
explain why "blocks filled with zeroes mean EOF" in 'tar'.
At least for me this is unexpected behaviour of a program
when reading a sequence of zero bytes means EOF
so "sequence of zero bytes" == "end of file" ?
rmetrich commented at 2023-07-19 11:51:¶
@rmetrich nice find and thanks for making ReaR more robust in general.
Beyond that, does the rescue system actually need those mqueue files? If not then I'd suggest to also exclude them from the rescue image, as it will debloat or unclutter the rescue image.
If this is a common thing then we can also add it to the default exclude list, IMHO.
It's probably not useful, as most of the devices in /dev
are (knowing
that all is supposed to be generated, on RHEL at least).
jsmeix commented at 2023-07-19 12:00:¶
Related to this pull request
but not actually belonging to this pull request:
We copy by default all in /dev/ into the recovery system
# find /dev | wc -l
732
# usr/sbin/rear -D mkrescue
...
# find /var/tmp/rear.nSzlPYGQ1Nk9v9w/rootfs/dev | wc -l
728
because of
COPY_AS_IS+=( /dev ...
in usr/share/rear/conf/GNU/Linux.conf
https://github.com/rear/rear/blob/master/usr/share/rear/conf/GNU/Linux.conf#L231
which is there since the beginning according to
# git log --follow -p usr/share/rear/conf/GNU/Linux.conf
I made a separated issue
https://github.com/rear/rear/issues/3028
jsmeix commented at 2023-07-20 13:16:¶
@rmetrich
thank you for your problem analysis and your fix!
It is much appreciated to get ReaR working
more fail-safe even in corner cases like this one.
[Export of Github issue for rear/rear.]