#1560 PR merged: Let /bin/ldd detect *.so with relative paths

Labels: enhancement, bug, fixed / solved / done, external tool

rowens275 opened issue at 2017-11-01 15:05:

This allows the "broken_binaries" code to properly detect *.so files that have relative paths. Prior to this change, executables which had relative paths to their *.so files would result in "not found", thus adding those executables to the $broken_binaries array. This caused problems for one executable in FDR/Upstream version 4.0.

I've tested that this works properly, but I'd appreciate it if somebody could take a look at the code change and let me know if you see any problems. This is code that gets used by all ReaR users, not just FDR/Upstream customers.

jsmeix commented at 2017-11-06 11:28:

@rowens275
please add an explanatory comment into the code
cf. https://github.com/rear/rear/wiki/Coding-Style
so that at any time later others understand why
that "overcomplicated" looking way via 'bash' is
needed here instead of "just calling ldd directly",
cf. https://github.com/rear/rear/issues/862
therein in particular
https://github.com/rear/rear/issues/862#issuecomment-274068914
and subsequent comments.

rowens275 commented at 2017-11-09 17:39:

I looked into this more and see I made a mistake. You are correct, it is over complicated. I was seeing errors that I mistakenly attributed to the code in 980_verify_rootfs.sh.

I have a question. When ldd is run in 980_verify_rootfs.sh, any 'not found' messages cause ReaR to fail with "ERROR: ReaR recovery system in '/tmp/rear.xxxx/rootfs' not usable". But when ldd is run in linux-functions.sh, 'not found' messages do not cause failure. Why is that?

I can modify linux-functions.sh to handle relative paths for ldd, but since it is only causing error messages and not real failures, I'm not sure if I should bother.

jsmeix commented at 2017-11-10 10:47:

In general when build/default/980_verify_rootfs.sh
fails for particular third-party backup software
(like FDR/Upstream version 4.0 in this case)
have a look at
https://github.com/rear/rear/issues/1533
which is the same kind of issue for TSM.

Usually third-party backup software needs special handling
in ReaR which is in general expected for third-party software.
I assume also FDR/Upstream needs additional special handling
since the test in build/default/980_verify_rootfs.sh is there.

@rowens275
I do not yet fully understand what the actual issue is
because I do not have this issue on my systems
(probably because I do not use third-party backup software)
and currently I fail to imagine what goes on on your system.
Can you provide an example of *.so files that have relative paths
on your system together with a 'ldd' output on your system
with that relative paths.

The old SharedObjectFiles() function had some
inexplicable additional weird code that called "readlink"
that I removed in the new RequiredSharedOjects() function, cf.
https://github.com/rear/rear/commit/3973399cbf79c51fb9901ade8039a34740418ef8
because after too mucht time I gave up to try to understand
what that code does and even more important what the
reason behind that code could be.
Perhaps your issue can now explain what the reason
behind that code was?

Regarding your question:

I used the ldd call in the RequiredSharedOjects() function
in lib/linux-functions.sh basically as it had been "since ever"
in ReaR which was before in the SharedObjectFiles() function
and there the function was copied from mkinitrd off SUSE 9.3
(I guess it was "just copied as is").

But I do not know the history why those function did not
error out in case of "not found" shared objects mainly
because - as usually - there was not any explanatory
comment WHY that function behaves this way.

I assume it was not intended to error out the whole
running "rear mkrescue/mkbackup" program
only because the SharedObjectFiles() function
detected one single "not found" shared object.

And I agree with that assumtion.
I.e. I think that also the current RequiredSharedOjects() function
should not abort the whole "rear mkrescue/mkbackup"
only because of a single "not found" shared object.

But both the old SharedObjectFiles() function
and the new RequiredSharedOjects() function
report "not found" shared objects in the ReaR log
so that one can (hopefully) easier debus issues
with "not found" shared objects.

Reasoning:

In general a function call should not abort its caller program
unless it is really hopeless to proceed in any way.
For the fun of it (actually no fun) you may have a look at
https://github.com/apple/cups/issues/5143

In particular the RequiredSharedOjects() function is
only intended to determine all required shared objects
but it is not intended to also verify if what it found by
'ldd' is really all required shared objects so that
a "not found" shared object by 'ldd' is not a reason
to assume it is really hopeless to proceed in any way
because later in the ReaR recovery system that
by-ldd-not-found-shared-object will be actually missing.

For example the user may know about that special
by-ldd-not-found-shared-object and get it explicitly
copied into the ReaR recovery system via an
appropriate COPY_AS_IS config variable, cf.
https://github.com/rear/rear/issues/1533

Therefore to actually verify whether or not there are really
all required shared objects in the ReaR recovery system
a well separated test is implemented in
build/default/980_verify_rootfs.sh
that can be tuned as needed independent of the
RequiredSharedOjects() function implementation, cf.
https://github.com/rear/rear/pull/1562

FWIW regarding "well separated":
I do prefer to Keep Separated Issues Separated ( KSIS ;-)
cf. RFC 1925 item (5)

rowens275 commented at 2017-11-10 15:32:

Thank you for the detailed response. The by-ldd-not-found-shared-object in my case is already copied in by COPY_AS_IS_FDRUPSTREAM. So I think I will not make any changes to linux-functions.sh, and I will simplify my changes to 980_verify_rootfs.sh as you suggested. This will make ReaR work with FDR/Upstream, but will show some error messages in the log. That is ok with me.

I can't supply you with a binary without going through an approval process over here. But I will give you an example of what my original problem was:

# cd /opt/fdrupstream
# ldd uscmd1
        linux-vdso.so.1 =>  (0x00007ffe391b9000)
        libcrypt.so.1 => /lib64/libcrypt.so.1 (0x00007f6658402000)
        libdl.so.2 => /lib64/libdl.so.2 (0x00007f66581fd000)
        libm.so.6 => /lib64/libm.so.6 (0x00007f6657efb000)
        libnsl.so.1 => /lib64/libnsl.so.1 (0x00007f6657ce2000)
        libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f6657ac5000)
        libresolv.so.2 => /lib64/libresolv.so.2 (0x00007f66578ab000)
        librt.so.1 => /lib64/librt.so.1 (0x00007f66576a3000)
        ./bin/ioOptimizer.so (0x00007f665747d000)
        ./bin/log.so (0x00007f6657267000)
        ./bin/aps.so (0x00007f6657059000)
        libz.so.1 => /lib64/libz.so.1 (0x00007f6656e42000)
        libstdc++.so.6 => /lib64/libstdc++.so.6 (0x00007f6656b3a000)
        libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007f6656924000)
        libc.so.6 => /lib64/libc.so.6 (0x00007f6656560000)
        libfreebl3.so => /lib64/libfreebl3.so (0x00007f665635d000)
        /lib64/ld-linux-x86-64.so.2 (0x0000558063527000)

Note the three lines which start with ./
Those are the relative paths I was talking about. ldd works ok if you are in the /opt/fdrupstream directory. But if you are in another directory, ldd generates errors:

# cd /
# ldd /opt/fdrupstream/uscmd1
        linux-vdso.so.1 =>  (0x00007ffc9d584000)
        libcrypt.so.1 => /lib64/libcrypt.so.1 (0x00007f8a7cd86000)
        libdl.so.2 => /lib64/libdl.so.2 (0x00007f8a7cb81000)
        libm.so.6 => /lib64/libm.so.6 (0x00007f8a7c87f000)
        libnsl.so.1 => /lib64/libnsl.so.1 (0x00007f8a7c666000)
        libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f8a7c449000)
        libresolv.so.2 => /lib64/libresolv.so.2 (0x00007f8a7c22f000)
        librt.so.1 => /lib64/librt.so.1 (0x00007f8a7c027000)
        ./bin/ioOptimizer.so => not found
        ./bin/log.so => not found
        ./bin/aps.so => not found
        libz.so.1 => /lib64/libz.so.1 (0x00007f8a7be0f000)
        libstdc++.so.6 => /lib64/libstdc++.so.6 (0x00007f8a7bb07000)
        libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007f8a7b8f1000)
        libc.so.6 => /lib64/libc.so.6 (0x00007f8a7b52d000)
        libfreebl3.so => /lib64/libfreebl3.so (0x00007f8a7b32a000)
        /lib64/ld-linux-x86-64.so.2 (0x000055611b697000)

linux-functions.sh logs these 'not found' errors, but does not cause a failure. 980_verify_rootfs.sh, however, logs the errors and does cause a failure.

I know what I have to do to simplify my fix. What is the best way forward, git-wise? Do I cancel this pull request and generate a new one, after I have reverted these changes and made new changes? Or does updating my InnovationDataProcessing:ldd_relative_paths branch automatically update this pull request?

jsmeix commented at 2017-11-10 16:10:

@rowens275
many thanks for your explanatory description how
things look on your system - it helps me so much
to understand the root cause of this issue!

Accordingly the right way to solve this issue is
to enhance the 'ldd' test in 980_verify_rootfs.sh
and if you provide explanatory comments in
whatever code you need in 980_verify_rootfs.sh
to make it work escpecially with FDR/Upstream
I look forward to "just accept" your changes
(provided you did not really break something in ReaR)
cf. "Dirty hacks welcome" at
https://github.com/rear/rear/wiki/Coding-Style

rowens275 commented at 2017-11-10 16:49:

OK, changes have been uploaded and I see that the pull request automatically detects the new changes. (I am relatively new to git, so I was unsure what to expect).

rowens275 commented at 2017-11-14 17:04:

I re-read your comments. My testing of this code on CentOS 7 shows no difference whether using --login or not, but I will include it because I see that it is probably safer that way. Perhaps it will make a difference on some distributions.

After the 'bash --login -c', is it preferable to specify the full path to executables or not? Specifically: '...ldd $binary' or '.../bin/ldd $binary'?

jsmeix commented at 2017-11-17 08:52:

@rowens275
according to my
https://github.com/rear/rear/issues/862#issuecomment-274068914
I think in case of 'bash --login -c' one should call executables
as one would call them in a normal working shell.

One should call programs by their basename without path
so that calling programs works independent of where each
particular Linux distribution installs the programs as long as
the program is in one of the directories in $PATH.

Only special programs where their directory is not in $PATH
must be called with their full path but that is then also
the same as one would call such special programs
in a normal working shell.

rowens275 commented at 2017-11-17 19:13:

Thanks for the input, jsmeix. I've updated the code per your suggestions.

jsmeix commented at 2017-11-20 15:35:

With https://github.com/rear/rear/pull/1585 merged
there is now the new config variable
NON_FATAL_BINARIES_WITH_MISSING_LIBRARY
that provides a generic method how one could avoid
issues like his one, see default.conf how to use it.


[Export of Github issue for rear/rear.]