#3064 Issue closed: Missing libraries in the recovery system for symlinks in COPY_AS_IS

Labels: enhancement, bug, fixed / solved / done

pcahyna opened issue at 2023-11-02 13:58:

  • ReaR version ("/usr/sbin/rear -V"): Relax-and-Recover 2.7 / Git

  • OS version ("cat /etc/os-release" or "lsb_release -a" or "cat /etc/rear/os.conf"):
    NAME="Red Hat Enterprise Linux"
    VERSION="9.4 (Plow)"
    ID="rhel"
    ID_LIKE="fedora"
    VERSION_ID="9.4"
    PLATFORM_ID="platform:el9"
    PRETTY_NAME="Red Hat Enterprise Linux 9.4 Beta (Plow)"
    ANSI_COLOR="0;31"
    LOGO="fedora-logo-icon"
    CPE_NAME="cpe:/o:redhat:enterprise_linux:9::baseos"

  • ReaR configuration files ("cat /etc/rear/site.conf" and/or "cat /etc/rear/local.conf"):
    empty

  • Description of the issue (ideally so that others can reproduce it):
    When chkconfig is installed, rear mkrescue aborts when validating libraries in the recovery system:

Running 'build' stage ======================
Copying files and directories
Copying binaries and libraries
Copying all kernel modules in /lib/modules/5.14.0-380.el9.x86_64 (MODULES contains 'all_modules')
Copying all files in /lib*/firmware/
Skip copying broken symlink '/etc/mtab' target '/proc/68455/mounts' on /proc/ /sys/ /dev/ or /run/
Testing that the recovery system in /var/tmp/rear.OlQzDSRNwPCduIQ/rootfs contains a usable system
Testing each binary with 'ldd' and look for 'not found' libraries within the recovery system
/bin/chkconfig requires additional libraries (fatal error)
        libpopt.so.0 => not found
/usr/lib64/systemd/libsystemd-core-252.so requires additional libraries
        libsystemd-shared-252.so => not found
ReaR recovery system in '/var/tmp/rear.OlQzDSRNwPCduIQ/rootfs' needs additional libraries, check /root/rear/var/log/rear/rear-vm-10-0-187-79.log for details
Build area kept for investigation in /var/tmp/rear.OlQzDSRNwPCduIQ, remove it when not needed
Testing that the existing programs in the PROGS array can be found as executable command within the recovery system
Testing that each program in the REQUIRED_PROGS array can be found as executable command within the recovery system
ERROR: ReaR recovery system in '/var/tmp/rear.OlQzDSRNwPCduIQ/rootfs' not usable (required libraries are missing)

libpopt.so.0 is missing. In order to reproduce the problem, no other binary that requires libpopt should be included in the recovery system, otherwise it would pull libpopt and the problem would be hidden. In particular, rsync and cryptsetup use libpopt. Remove them before reproducing the problem.

The underlying issue is that chkconfig got there as the symlink target of /usr/lib/systemd/systemd-sysv-install (build/default/490_fix_broken_links.sh), which got there in build/GNU/Linux/100_copy_as_is.sh because /usr/lib/systemd/systemd-* is in COPY_AS_IS (see prep/GNU/Linux/280_include_systemd.sh).

# ls -l /usr/lib/systemd/systemd-sysv-install
lrwxrwxrwx. 1 root root 27 May 23 08:59 /usr/lib/systemd/systemd-sysv-install -> ../../../usr/sbin/chkconfig

and unlike other code that adds binaries to the recovery system, the build/GNU/Linux/100_copy_as_is.sh and build/default/490_fix_broken_links.sh tandem does not add required libraries if COPY_AS_IS contains a symlink.

The problematic code is here: https://github.com/rear/rear/pull/1521/files#r1377909379 - it skips library dependency resolution for COPY_AS_IS items that are symlinks.

To trigger the problem, several condition must be met:

  • a binary shoud be added to recovery system via a symlink
  • the symlink is in COPY_AS_IS (OTOH PROGS handles symlinks properly)
  • the binary needs additional libraries
  • no other binary added to the recovery system causes the libraries to be added (cf. the remark about rsync and cryptsetup above)

The problem is thus quite rare (PROGS are the preferred way to add executables to the recovery system, which works fine), but one can trigger it using something else than chkconfig. For example, COPY_AS_IS+=( /usr/bin/cdrecord ). On RHEL, cdrecord is a symlink to xorriso, which needs quite special libraries, so:

# usr/sbin/rear mkrescue
/bin/xorriso requires additional libraries (fatal error)
        libisoburn.so.1 => not found

PROGS+=( cdrecord ) works fine.

  • Workaround, if any:
    Add chkconfig to PROGS - this will trigger library dependency resolution for it

  • Attachments, as applicable ("rear -D mkrescue/mkbackup/recover" debug log files):

Relevant parts of debug log:

2023-11-02 09:15:44.001826257 Including build/GNU/Linux/100_copy_as_is.sh
2023-11-02 09:15:44.004404528 Entering debugscript mode via 'set -x'.
+ source /root/rear/usr/share/rear/build/GNU/Linux/100_copy_as_is.sh
(...)
2023-11-02 09:15:44.032533864 Files being copied: /root/rear/usr/share/rear /root/rear/var/lib/rear /dev /etc/inputrc /etc/protocols /etc/services /etc/rpc /etc/termcap /etc/terminfo /usr/share/terminfo /etc/netconfig /etc/mke2fs.conf /etc/os-release /etc/redhat-release /etc/system-release /etc/localtime /etc/magic /usr/share/misc/magic /etc/dracut.conf /etc/dracut.conf.d /usr/lib/dracut /sbin/modprobe.ksplice-orig /etc/sysctl.conf /etc/sysctl.d /etc/e2fsck.conf /usr/libexec/vi /etc/ssl/certs/* /etc/pki/* /usr/lib/ssl/* /usr/share/ca-certificates/* /etc/ca-certificates/* /usr/lib/dhcpcd/* /usr/share/systemd /etc/dbus-1 /usr/lib/systemd/systemd-ac-power /usr/lib/systemd/systemd-backlight /usr/lib/systemd/systemd-binfmt /usr/lib/systemd/systemd-bless-boot /usr/lib/systemd/systemd-boot-check-no-failures /usr/lib/systemd/systemd-cgroups-agent /usr/lib/systemd/systemd-coredump /usr/lib/systemd/systemd-cryptsetup /usr/lib/systemd/systemd-export /usr/lib/systemd/systemd-fsck /usr/lib/systemd/systemd-growfs /usr/lib/systemd/systemd-hibernate-resume /usr/lib/systemd/systemd-hostnamed /usr/lib/systemd/systemd-initctl /usr/lib/systemd/systemd-integritysetup /usr/lib/systemd/systemd-journald /usr/lib/systemd/systemd-localed /usr/lib/systemd/systemd-logind /usr/lib/systemd/systemd-makefs /usr/lib/systemd/systemd-measure /usr/lib/systemd/systemd-modules-load /usr/lib/systemd/systemd-network-generator /usr/lib/systemd/systemd-pcrphase /usr/lib/systemd/systemd-pstore /usr/lib/systemd/systemd-quotacheck /usr/lib/systemd/systemd-random-seed /usr/lib/systemd/systemd-remount-fs /usr/lib/systemd/systemd-reply-password /usr/lib/systemd/systemd-rfkill /usr/lib/systemd/systemd-shutdown /usr/lib/systemd/systemd-sleep /usr/lib/systemd/systemd-socket-proxyd /usr/lib/systemd/systemd-sulogin-shell /usr/lib/systemd/systemd-sysctl /usr/lib/systemd/systemd-sysroot-fstab-check /usr/lib/systemd/systemd-sysupdate /usr/lib/systemd/systemd-sysv-install (...)
2023-11-02 09:15:46.241542887 copy_as_is_executables = (... ) (does not contain /usr/lib/systemd/systemd-sysv-install )
(...)
2023-11-02 09:16:01.446960166 Including build/default/490_fix_broken_links.sh
2023-11-02 09:16:01.449411488 Entering debugscript mode via 'set -x'.
+ source /root/rear/usr/share/rear/build/default/490_fix_broken_links.sh
(...)
++ for broken_symlink in $broken_symlinks
+++ readlink -v -e /usr/lib/systemd/systemd-sysv-install
++ link_target=/usr/sbin/chkconfig
++ [[ -d /usr/sbin/chkconfig ]]
++ test /usr/sbin/chkconfig
++ egrep -q '^/(proc|sys|dev|run)/'
++ echo /usr/sbin/chkconfig
+++ dirname /usr/sbin/chkconfig
++ mkdir -v -p ./usr/sbin
++ cp -v --preserve=all /usr/sbin/chkconfig ./usr/sbin/chkconfig
'/usr/sbin/chkconfig' -> './usr/sbin/chkconfig'

jsmeix commented at 2023-11-02 14:10:

For now "discuss / RFC" until we came to a conclusion
how COPY_AS_IS versus PROGS are meant to work.

Regardless that I did a lot of code changes related
to COPY_AS_IS, I do not really know how COPY_AS_IS
is or was actually meant to work from the beginning.
Perhaps COPY_AS_IS means "only copy 'as is' but nothing more"?

Excerpt from default.conf:

# Files and directories to copy as-is (with tar) into the ReaR recovery system.
# As its name tells COPY_AS_IS is primarily meant to "only copy as is".
# In particular tar does not follow symlinks when copying
# (for the reason behind see the build/GNU/Linux/100_copy_as_is.sh script).
# To get libraries into the recovery system, use the LIBS array.
# To get non-mandatory programs into the recovery system, use the PROGS array.
# To get mandatory programs into the recovery system, use the REQUIRED_PROGS array.
# For elements in the LIBS, PROGS, and REQUIRED_PROGS arrays
# the RequiredSharedObjects function is called to determine required
# shared objects (libraries) that get also copied into the recovery system.
# For what is specified via the COPY_AS_IS array there is an exception to the
# above "only copy as is" rule that for executables RequiredSharedObjects is called
# to also get their needed libraries copied into the recovery system.
# But when libraries are specified via COPY_AS_IS that are not executable
# no required other libraries get automatically copied into the recovery system.
# The reasoning behind is when programs are specified via COPY_AS_IS they
# should be handled gracefully but COPY_AS_IS is not meant for libraries.
...

I fear that exception to the above "only copy as is" rule bites us
because it makes the original intent vague so things are expected
to "always work" because such things "sometimes work" because
we try to behave "gracefully" :-(

pcahyna commented at 2023-11-02 14:18:

Perhaps COPY_AS_IS means "only copy 'as is' but nothing more"?

The problem is, even if COPY_AS_IS copies the symlink as it is and leaves a dangling symlink, build/default/490_fix_broken_links.sh then copies the target. So, if COPY_AS_IS is really meant to just "copy as is", build/default/490_fix_broken_links.sh should not exist.

pcahyna commented at 2023-11-02 14:19:

@jsmeix without this exception, we would have the same problem even for non-symlinks and ReaR would fail all the time.

jsmeix commented at 2023-11-02 14:21:

@pcahyna
welcome to the shadowy world how ReaR behaves in practice :-)

I do not argue against further enhancing the COPY_AS_IS code
to make it behave more reliable and fail-safe.

I only like to provide some reasoning for the current behaviour.

jsmeix commented at 2023-11-02 14:28:

@pcahyna
could you provide an (artificial) example for your

without this exception, we would have the same problem
even for non-symlinks and ReaR would fail all the time

I think you see and understand this shortcoming
of the COPY_AS_IS code much better than I do.
I like to also see and understand what is
generically wrong with our current COPY_AS_IS code.
I.e. when it even fails (under special conditions)
when COPY_AS_IS is used as it is primarily meant to be used.

pcahyna commented at 2023-11-02 14:43:

@jsmeix, maybe I was wrong. Without this exception, what I get are these additional messages (which are not fatal errors):

/usr/lib/systemd/systemd-cryptsetup requires additional libraries
        libcryptsetup.so.12 => not found
/usr/lib/systemd/systemd-integritysetup requires additional libraries
        libcryptsetup.so.12 => not found
/usr/lib/systemd/systemd-veritysetup requires additional libraries
        libcryptsetup.so.12 => not found
/usr/lib/udev/rename_device requires additional libraries
        libglib-2.0.so.0 => not found

I don't know whether this indicates a real problem in the recovery system or not (that is, whether e.g. /usr/lib/udev/rename_device is actually used during the recovery system operation or not).

pcahyna commented at 2023-11-02 14:45:

I would say though that by default executables under /usr/lib or /usr/libexec copied via COPY_AS_IS should be assumed to be useful and not including their required libraries seems dangerous.

jsmeix commented at 2023-11-09 08:18:

A totally offhanded and untested idea
how to possibly get rid of the complications
with executables in the COPY_AS_IS code:

Copy executables in COPY_AS_IS to PROGS
and handle them via PROGS
instead of handling them in COPY_AS_IS.

My assumption is that handling arbitrary files
that are executable (i.e. where 'test -x' is true)
via PROGS works well (i.e. sufficiently fail-safe).

If my assumption holds, this could much clean up
the COPY_AS_IS code and make things behave more
consistent because then it makes no difference
if a program was specified in COPY_AS_IS or in PROGS
and we would have only one code place that deals
with programs or executables and their libraries:
build/GNU/Linux/390_copy_binaries_libraries.sh

I think duplicate entries should be filtered out
in the 'all_binaries' and 'all_libs arrays' in
build/GNU/Linux/390_copy_binaries_libraries.sh
as currently done with COPY_AS_IS.

pcahyna commented at 2023-11-09 21:30:

A totally offhanded and untested idea how to possibly get rid of the complications with executables in the COPY_AS_IS code:

Copy executables in COPY_AS_IS to PROGS and handle them via PROGS instead of handling them in COPY_AS_IS.

My assumption is that handling arbitrary files that are executable (i.e. where 'test -x' is true) via PROGS works well (i.e. sufficiently fail-safe).

The problem is, PROGS is intended for executables in $PATH, while we are using COPY_AS_IS for helper executables outside $PATH (under /usr/libexec or /usr/lib or similar).

A program in PROGS will be unconditionally copied to /bin (see the use of copy_binaries in build/GNU/Linux/390_copy_binaries_libraries.sh), but the COPY_AS_IS items need to be copied to the same path as in the original system.

jsmeix commented at 2023-11-10 07:34:

Yes, sigh!
While sleeping on my "totally offhanded and untested idea"
I also saw that this cannot work because of how PROGS behaves
(i.e. handling arbitrary files that are executable
via PROGS will not work well).

So currently our only way out is to more and more enhance
the COPY_AS_IS code so that it behaves sufficiently OK
for arbitrary files that are executable.

@pcahyna
I would appreciate it if you would implement
what you think works OK for your current case
with reasonable effort (i.e. make a pull request).

Then let's just wait and see how things behave in general
and when new issues appear let's just fix them one by one
as usual as far as possible with reasonable effort.

A general addedum:

From my curent point of view our whole "copy things" code
looks like a (long grown over time) pile of mess.

I think the root cause is that currently the ReaR recovery system
is in general not a truthful and self-consistent copy
of the needed parts from the original system.
It works sufficiently OK for our known cases but the current
implementation is insufficient in this or that special cases.

This leads to the (apparently at least in practice unsolvable?)
problem how to truthfully and self-consistently copy "something".

Cf. the part about symbolic links
https://github.com/rear/rear/issues/2972#issuecomment-1669295963

... investigate how to really properly copy "something"
also when there are symbolic links "in between"
so that the whole structure is copied to a new place
i.e. that after copying
the link target with its owner,group,permissions,...
and its name plus the symlink exist at the new place.

A self-consistent copy also means that linked libraries
get automatically copied.
I think our RequiredSharedObjects function is sufficient
for that.

I think other required things to get a self-consistent copy
like other files that are neded which are not linked
is not possible in an automated way.

pcahyna commented at 2023-11-10 09:57:

@jsmeix (or anyone else), have you ever considered using dracut to generate the ReaR initrd? It is a tool that generates the initrd on Fedora (and probably some other distros).

I would prefer to focus on areas where ReaR is really unique (mainly, layout saving and restoration) and leave the "boring work" of constructing the initrd to something else, if possible.

jsmeix commented at 2023-11-10 12:18:

@pcahyna
thank you for your encouraging proposal!

I fully agree with you to use as much as possible
already existing tools for certain functionality in ReaR
instead of reimplementing this or that functionality in ReaR.

I never considered using dracut to generate the ReaR initrd.
Also SUSE/openSUSE uses dracut to generate its initrd for
normal system boot and system startup.

Caution! ;-)
here my next "totally offhanded and untested idea":

Based on our findings and assumptions in
https://github.com/rear/rear/pull/3031#issuecomment-1668940477
and
https://github.com/rear/rear/pull/3031#issuecomment-1669490742
we might do something similar for ReaR's initrd:

Make ReaR's initrd as an enhancement of the already existing
initrd for boot and startup of the original system,
for example somehow like
https://github.com/rear/rear/issues/3017#issuecomment-1645348480
i.e. basically something like

  1. unpack the already existing initrd into some directory
  2. add ReaR's specific additional stuff to that directory
  3. pack that directory as ReaR's initrd

My reasoning is same as in
https://github.com/rear/rear/pull/3031#issuecomment-1668940477

When the initrd of the Linux distribution
makes the original system start up,
then the contents of that initrd
should also make the ReaR recovery system start up
on replacement hardware
because replacement hardware must be sufficiently
compatible with the original system hardware.

Another (hopefully reasonable) assumption is that
the initrd of the Linux distribution
should be sufficiently feature complete
to start up the original system on various hardware.

So only adding ReaR's specific additional stuff
to the initrd of the Linux distribution
should be (hopefully) sufficient
to make ReaR's things work in the ReaR recovery system.

I.e. my "totally offhanded and untested idea" is
to not use dracut and generate a ReaR initrd from scratch
but perhaps simpler and perhaps even more fail-safe and
hopefully more generically working on various Linux distributions
because it does not depend on dracut - provided a sufficiently
"normal" initrd already exists for system boot and startup.

pcahyna commented at 2023-11-13 14:48:

@jsmeix using an already existing initrd would have the advantage of not relying on a specific tool (dracut), but I am afraid that the original initrd is not just feature complete (although that might be a problem as well, for example I suspect that it does not have the all the kernel modules and firmware), but that it does in fact too much. It needs to activate all the storage on the original disks, mount the rootfs and pivot to it. We need to avoid all this (activating the original storage components should be preferably avoided if you are going to wipe them in the next step), which means that we would not need to just "add ReaR's specific additional stuff" but to remove lots of stuff as well. This would require us to have a good knowledge of the specifics of the particular initrd.


[Export of Github issue for rear/rear.]