#2273 Issue `closed`: Coredump during rear mkrescue¶

Labels: support / question, fixed / solved / done

adatum opened issue at 2019-11-10 18:47:¶

ReaR version ("/usr/sbin/rear -V"):
Relax-and-Recover 2.5-git.3350.aa82834d.unknown / 2019-05-10
OS version ("cat /etc/rear/os.conf" or "lsb_release -a" or "cat /etc/os-release"):
Fedora 31 MATE
ReaR configuration files ("cat /etc/rear/site.conf" and/or "cat /etc/rear/local.conf"):

PROGS=( "${PROGS[@]}" /home/[USER]/src/borg-linux64 iw wpa_supplicant wpa_passphrase mount.cifs )

AUTOEXCLUDE_PATH=( /run/media /mnt )

EXCLUDE_RECREATE=( "${EXCLUDE_RECREATE[@]}" $( EXCLUDE_RECREATE_MOUNTPOINTS="/media/veracrypt"; if [ -n "$EXCLUDE_RECREATE_MOUNTPOINTS" ]; then grep -Eo "(^|\s)($EXCLUDE_RECREATE_MOUNTPOINTS)\S*" /proc/mounts | sed -r 's/\s(.*)/ fs:\1 /' | tr -d '\n'; fi ) )

Hardware (PC or PowerNV BareMetal or ARM) or virtual machine (KVM guest or PoverVM LPAR):
PC
System architecture (x86 compatible or PPC64/PPC64LE or what exact ARM device):
x86_64
Firmware (BIOS or UEFI or Open Firmware) and bootloader (GRUB or ELILO or Petitboot):
UEFI, GRUB
Storage (local disk or SSD) and/or SAN (FC or iSCSI or FCoE) and/or multipath (DM or NVMe):
local SSD
Description of the issue (ideally so that others can reproduce it):
Since upgrading to Fedora 31, the following coredump messages are found in journalctl when rear mkrescue is run:

systemd-coredump[1410909]: Process 1410907 (ld-linux-x86-64) of user 0 dumped core.

                                                Stack trace of thread 1410907:
                                                #0  0x00007efea9828f80 n/a (ld-linux-x86-64.so.2)
                                                #1  0x00007efea982b316 n/a (ld-linux-x86-64.so.2)

systemd-coredump[1410921]: Process 1410919 (ld-linux.so.2) of user 0 dumped core.

                                                Stack trace of thread 1410919:
                                                #0  0x00000000f7f69546 _dl_map_object_from_fd (/usr/lib/ld-2.30.so)
                                                #1  0x00000000f7f6b80c _dl_map_object (/usr/lib/ld-2.30.so)
                                                #2  0x00000000f7f6347a map_doit (/usr/lib/ld-2.30.so)
                                                #3  0x00000000f7f7ceae _dl_catch_exception (/usr/lib/ld-2.30.so)
                                                #4  0x00000000f7f7cf25 _dl_catch_error (/usr/lib/ld-2.30.so)
                                                #5  0x00000000f7f66ad8 dl_main (/usr/lib/ld-2.30.so)
                                                #6  0x00000000f7f7bd49 _dl_sysdep_start (/usr/lib/ld-2.30.so)
                                                #7  0x00000000f7f63fc5 _dl_start (/usr/lib/ld-2.30.so)
                                                #8  0x00000000f7f6313b _start (/usr/lib/ld-2.30.so)

Nevertheless:

rear mkrescue finished with zero exit code

I am not certain but I think the coredump appears at the point when the verbose output shows:

Testing that the recovery system in /tmp/rear.PLNrcqee0qcN2Vb/rootfs contains a usable system

I tested that the rear image boots in a VM, but I did not test a recovery.

Other potentially unrelated warnings or errors during rear -v mkrescue:

Symlink '/usr/lib/modules/5.3.8-300.fc31.x86_64/build' -> '/usr/src/kernels/5.3.8-300.fc31.x86_64' refers to a non-existing directory on the recovery system.
It will not be copied by default. You can include '/usr/src/kernels/5.3.8-300.fc31.x86_64' via the 'COPY_AS_IS' configuration variable.

Broken symlink '/etc/grub2.cfg' in recovery system because 'readlink' cannot determine its link target

Did not find /boot/grub2/locale files (minor issue for UEFI ISO boot)

Again I'm not certain, but I think those messages were coming up before the coredump issue began.

jsmeix commented at 2019-11-11 12:11:¶

@adatum
I am not a Fedora user so that I cannot reprocude your issue.

I think for debugging what the root cause is
it should help to use KEEP_BUILD_DIR="yes"
so that after rear mkrescue you can chroot into
the ReaR recovery system in $TMPDIR/rear.XXXX/rootfs/
and therein try out this or that commands that look suspicious
to cause the coredump in your case.

Furthermore in ReaR's log file there are time stamps
for each of ReaR's Log* message function calls
(run it in full debug mode with rear -D mkrescue)
that could help to identify what commands are run
at the time when the coredump happens.

adatum commented at 2019-11-11 13:08:¶

In the debug log from rear -vD mkrescue the following is what is shortly before the coredump is timestamped in the system logs:

Testing 'ldd /bin/bash' to ensure 'ldd' works for the subsequent 'ldd' tests within the recovery system
++ chroot /tmp/rear.TyVuD78faQnlzbF/rootfs /bin/ldd /bin/bash
    linux-vdso.so.1 (0x00007ffdee2e4000)
    libtinfo.so.6 => /lib64/libtinfo.so.6 (0x00007fb7a3c51000)
    libdl.so.2 => /lib64/libdl.so.2 (0x00007fb7a3c4a000)
    libc.so.6 => /lib64/libc.so.6 (0x00007fb7a3a81000)
    /lib64/ld-linux-x86-64.so.2 (0x00007fb7a3db3000)

The system kernel log also shows this segfault message just before the coredump posted previously.

kernel: show_signal_msg: 52 callbacks suppressed
kernel: ld-linux-x86-64[198094]: segfault at 7f2679d943c8 ip 00007f2679d71f80 sp 00007ffe3b6f8080 error 7 in ld-2.30.so[7f2679d6c000+20000]
kernel: Code: 41 d1 f9 83 fe f8 0f 86 5d 01 00 00 44 89 da 44 29 ca 48 89 04 d1 e9 b1 fc ff ff 48 85 ff 74 79 49 8b 44 24 60 48 85 c0 74 04 <48> 01 78 08 49 8b 44 24 58 48 85 c0 74 04 48 01 78 08 49 8b 44 24

gdha commented at 2019-11-29 09:03:¶

@adatum Perhaps it is worthwhile to open a bugzilla report with Fedora31?

adatum commented at 2019-11-29 20:31:¶

@gdha As far as I know, this coredump happens only with ReaR. What would a bug report with Fedora31 be about?

Do you have suggestions on what to try to debug this? The "good" part is the coredump is consistent: it happens every time mkrescue is run.

gdha commented at 2020-01-23 15:32:¶

@adatum Can you reproduce this behavior outside of rear? If you run rear -vD mkrescue and then in the /tmp/rear.xxxx/rootfs directory doing a chroot .
Therefore, doing a ldd /bin/bash - what is the result?

adatum commented at 2020-01-23 17:05:¶

@gdha Thanks for the follow up. I could not reproduce the coredump with your suggestion.

After rear -vD mkrescue, cd /tmp/rear.xxxx/rootfs, and chroot ., I get this:

bash-5.0# ldd /bin/bash
    linux-vdso.so.1 (0x00007ffd4a1e7000)
    libtinfo.so.6 => /lib64/libtinfo.so.6 (0x00007f5c7f855000)
    libdl.so.2 => /lib64/libdl.so.2 (0x00007f5c7f84e000)
    libc.so.6 => /lib64/libc.so.6 (0x00007f5c7f685000)
    /lib64/ld-linux-x86-64.so.2 (0x00007f5c7f9b7000)

jsmeix commented at 2020-06-24 11:27:¶

Perhaps ldd /bin/bash is not what causes the coredump
but something that happens directly afterwards.
The matching code in ReaR is in
usr/share/rear/build/default/990_verify_rootfs.sh
and therein see in particular
https://github.com/rear/rear/blob/master/usr/share/rear/build/default/990_verify_rootfs.sh#L104

sometimes running ldd on kernel modules causes
needless errors because sometimes that segfaults

so perhaps this issue is a similar one as
https://github.com/rear/rear/issues/2177

Because "no news is good news" I blindly guess that this issue here
was also fixed by https://github.com/rear/rear/pull/2179
so that I can close this issue here.

[Export of Github issue for rear/rear.]

#2273 Issue closed: Coredump during rear mkrescue¶