#1494 Issue closed: Recovery system libraries not consistent (e.g. when /lib64/noelision/libpthread is used kernel panics because init fails)

Labels: bug, support / question, fixed / solved / done

linuxstar2017 opened issue at 2017-09-15 10:41:

Relax-and-Recover (ReaR) Issue Template

Fill in the following items before submitting a new issue
(quick response is not guaranteed with free support):

  • rear version (/usr/sbin/rear -V):
l9995988:~ # rpm -qa | grep -i rear
rear-1.18-3.x86_64
  • OS version (cat /etc/rear/os.conf or lsb_release -a):
    SLES12
9995988:~ # uname -a
Linux l9995988 4.4.49-92.11.1.12564.2.PTF.1027273-default #1 SMP Thu Mar 9 10:34:40 UTC 2017 (6122209) x86_64 x86_64 x86_64 GNU/Linux
  • rear configuration files (cat /etc/rear/site.conf or cat /etc/rear/local.conf):
l9995988:~ # cat /etc/rear/local.conf
MODULES=af_packet iscsi_ibft iscsi_boot_sysfs vmw_vsock_vmci_transport vsock coretemp crct10dif_pclmul crc32_pclmul drbg ansi_cprng aesni_intel aes_x86_64 ppdev lrw gf128mul glue_helper ablk_helper cryptd i2c_piix4 joydev pcspkr shpchp mptctl vmw_balloon e1000 vmw_vmci battery parport_pc parport acpi_cpufreq fjes processor ac xfs libcrc32c sr_mod sd_mod cdrom ata_generic crc32c_intel serio_raw drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm mptspi drm scsi_transport_spi mptscsih mptbase floppy ata_piix ahci libahci libata button dm_mirror dm_region_hash dm_log sg dm_multipath dm_mod scsi_dh_rdac scsi_dh_emc scsi_dh_alua scsi_mod autofs4 
BACKUP=NSR
OUTPUT=ISO
ISO_DIR=/var/lib/rear/output
ISO_PREFIX=t32-rescue-l9995988
ISO_VOLID=t32-rescue-l9995988
AUTOEXCLUDE_PATH=( /media /tmp )
AUTOEXCLUDE_AUTOFS=y
AUTOEXCLUDE_MULTIPATH=y
AUTOEXCLUDE_DISKS=y
NSR_RETENTION_TIME="1 month"
NSR_POOL_NAME=REAR
COPY_AS_IS=(${COPY_AS_IS[@]} /lib64/noelision/)
  • Are you using legacy BIOS or UEFI boot?
    legacy BIOS
  • Brief description of the issue:
    ReaR ISO recovery image leads to a kernel kernel panic.
    /init: error while loading shared library: libpthread.s0.0: cannot open shared object file
    Problem can be reproduced on physical and virtual servers and also
    with ReaR version 2.x, found on OBS
    Even we adapted the rear conf to include /lib64/noelision (it is actually copied over),
    libpthread is not found
  • Work-around, if any:
    Remove /lib64/noelision from /etc/ld.so.conf
    Execute ldconfig
    Create rescue ISO

jsmeix commented at 2017-09-15 11:19:

@linuxstar2017
I don't know about noelision stuff (and quick Googling did not
show something explanatory - only various bug reports).

Nevertheless I try some guesswork:
On my SLES12 system I have

e205:~/rear.master # find /lib64 | wc -l
234

but in the matching ReaR recovery system I have only

RESCUE e205:~ # find /lib64 | wc -l
71

In general the ReaR recovery system is by default very minimal
so that this or that additional stuff might be missing which
leads to this error.

@linuxstar2017
could you try if

COPY_AS_IS=( "${COPY_AS_IS[@]}" /lib64/ )

helps?
With that I get in my recovery system:

RESCUE e205:~ # find /lib64 | wc -l
230

That looks better but it is still 4 files less
that in the original system which are those links

/lib64/libfuse.so.2 -> /usr/lib64/libfuse.so.2
/lib64/libfuse.so.2.9.3 -> /usr/lib64/libfuse.so.2.9.3
/lib64/libss.so.2 -> /usr/lib64/libss.so.2
/lib64/libss.so.2.0 -> /usr/lib64/libss.so.2.0

It seems there is a general problem
with links and COPY_AS_IS ...
but I guess that is not related to this issue here.

@linuxstar2017 onyl FYI
you might have a possibly severe syntax issue in your

COPY_AS_IS=(${COPY_AS_IS[@]} /lib64/noelision/)

In almost all cases one must use quoted "${arr[@]}", cf.
https://github.com/rear/rear/issues/1068
and see in default.conf

# Some variables are actually bash arrays and should be treated with care.
# Use VAR=() to set an empty array.
# Use VAR=( "${VAR[@]}" 'value' ) to add a fixed value to an array.
# Use VAR=( "${VAR[@]}" "$var" ) to add a variable value to an array.
# Whether or not the latter case works as intended depends on when and
# how "$var" is set and evaluated by the Relax-and-Recover scripts.
# In general using ${VAR[*]} is problematic and using ${VAR[@]} without
# double-quotes is also problematic, see 'Arrays' in "man bash" and
# see https://github.com/rear/rear/issues/1068 for some examples.

jsmeix commented at 2017-09-15 12:00:

@linuxstar2017
can you explain to me how I could reproduce it
on a SLES12 virtual QEMU/KVM machine?
On my current SLES12 virtual QEMU/KVM machine
I do not have any noelision in my /etc/ld.so.conf.

linuxstar2017 commented at 2017-09-15 12:24:

@jsmeix

Sure ...
(on my lab server)

denbvslnxs1:/lib64/noelision # ls
libpthread-2.22.so libpthread.so.0
denbvslnxs1:/lib64/noelision # rpm -q --whatprovides /lib64/noelision/libpthread-2.22.so
glibc-2.22-61.3.x86_64

Normally, you don't need it !

But in our case ..

/etc/ld.so.conf need to have the "noelision" library path at first to prevent
application crashes, which are caused by bad coded applications (like Legato Networker's
nsrexecd) and lead to a double-unlock.

I sent you in parallel a mail (in German) with all the background ...

jsmeix commented at 2017-09-15 13:22:

@linuxstar2017
I also have the noelision library in /lib64/noelision/
but I do not have any noelision in my /etc/ld.so.conf
so that I like to know what exact "noelision" library path
you have in your /etc/ld.so.conf that might help me
to reproduce it - preferably just post your whole
/etc/ld.so.conf here so that I do not need to guess.

Additionally to really reproduce it it seems one needs
"a badly coded application" which requires the noelision library.
Do you know a free software that requires the noelision library?

linuxstar2017 commented at 2017-09-15 13:34:

@jsmeix

l9995988:~ # cat /etc/ld.so.conf
/lib64/noelision
/usr/local/lib64
/usr/local/lib
include /etc/ld.so.conf.d/*.conf
# /lib64, /lib, /usr/lib64 and /usr/lib gets added
# automatically by ldconfig after parsing this file.
# So, they do not need to be listed.

I have no clue of any free sw, which requires
this .. but for me, you don't need that step
of verification ... the original issue need just
to be fixed ... to do no kernel panic anymore
and boot from recovery ISO.

Tests for the EMC Networker Recovery need
to be done anyway later by the customer,
as he owns the license and EMC is the original
cause of the issues.

Just IMHO

jsmeix commented at 2017-09-15 13:56:

@linuxstar2017
thanks - and have a nice weekend!

jsmeix commented at 2017-09-18 11:27:

I can reproduce it and
I know a working workaround with the code as is and
I know the solution (which needs a fix in a script):

I use always

KEEP_BUILD_DIR="yes"

so that I can 'chroot' into the ReaR recovery system
directly on the system where I made it with "rear mkrescue"
without the need to boot the ReaR recovery system
on another (virtual) machine only to do some tests
in the ReaR recovery system which is impossible
when the ReaR recovery system fails to come up
like here when the kernel panics because init fails.

I can reproduce it with

COPY_AS_IS=( "${COPY_AS_IS[@]}" /lib64/noelision/ )

I make a ReaR recovery system and
'chroot' into it and do some tests therein:

e205:~/rear.master # usr/sbin/rear -d -D mkrescue
...
Creating recovery/rescue system initramfs/initrd
...
You should also rm -Rf /tmp/rear.mqtc7xtv52uuRgu

e205:~/rear.master # chroot /tmp/rear.mqtc7xtv52uuRgu/rootfs/

bash-4.3# export LC_ALL=C LANG=C

bash-4.3# ldd /bin/sort | grep pthread
grep: error while loading shared libraries: libpthread.so.0: cannot open shared object file: No such file or directory

bash-4.3# echo -e 'foo\nbar\nbaz' | sort
sort: error while loading shared libraries: libpthread.so.0: cannot open shared object file: No such file or directory

The working workaround without code changes is

COPY_AS_IS=( "${COPY_AS_IS[@]}" /lib64/ )

cf. "could you try if ... helps" in my above
https://github.com/rear/rear/issues/1494#issuecomment-329754377

With that I get:

e205:~/rear.master # usr/sbin/rear -d -D mkrescue
...
Creating recovery/rescue system initramfs/initrd
...
You should also rm -Rf /tmp/rear.ICBQgUUoYcvDCap

e205:~/rear.master # chroot /tmp/rear.ICBQgUUoYcvDCap/rootfs/

bash-4.3# export LC_ALL=C LANG=C

bash-4.3# ldd /bin/sort | grep pthread
        libpthread.so.0 => /lib64/libpthread.so.0 (0x00007fc6a7267000)

bash-4.3# echo -e 'foo\nbar\nbaz' | sort
bar
baz
foo

The solution (which needs a fix in a ReaR script) is
to run 'ldconfig' inside the ReaR recovery system before
"Creating recovery/rescue system initramfs/initrd ..."

e205:~/rear.master # chroot /tmp/rear.mqtc7xtv52uuRgu/rootfs/

bash-4.3# export LC_ALL=C LANG=C

bash-4.3# ldconfig -v | egrep 'noelision|pthread'
...
/lib64/noelision:
        libpthread.so.0 -> libpthread-2.22.so
bash-4.3# ldd /bin/sort | grep pthread
        libpthread.so.0 => /lib64/noelision/libpthread.so.0 (0x00007f295567a000)

bash-4.3# echo -e 'foo\nbar\nbaz' | sort
bar
baz
foo

@linuxstar2017
verify with your customer that

COPY_AS_IS=( "${COPY_AS_IS[@]}" /lib64/ )

also works for him with his particular backup program
(I cannot test if his particular backup program also works).

jsmeix commented at 2017-09-18 12:43:

At the end of
usr/share/rear/build/GNU/Linux/390_copy_binaries_libraries.sh
there is

#ldconfig $v -r "$ROOTFS_DIR" >&2 || Error "Could not configure libraries with ldconfig"

which would have helped in this case if it was enabled.

With this line enabled and

COPY_AS_IS=( "${COPY_AS_IS[@]}" /lib64/noelision/ )

everything works for me.

@linuxstar2017
verify with your customer that enabling the "#ldconfig ..."
line at the end of build/GNU/Linux/390_copy_binaries_libraries.sh
is sufficient so that it also works with only

COPY_AS_IS=( "${COPY_AS_IS[@]}" /lib64/noelision/ )

Because - as ususal :-( - there is not any comment that explains
why the 'ldconfig' is disabled in 390_copy_binaries_libraries.sh
I need to do forensics - but fortunately we have Git where

git log -p --follow usr/share/rear/build/GNU/Linux/390_copy_binaries_libraries.sh

shows that it was disabled via
https://github.com/rear/rear/commit/d62a555fd3460a618e47e8a6c6288e13cbd940fb
which points to
https://github.com/rear/rear/issues/772
and therein the reason behind is described in
https://github.com/rear/rear/issues/772#issuecomment-183457540
and the commit
https://github.com/rear/rear/commit/d62a555fd3460a618e47e8a6c6288e13cbd940fb
shows that the intent was to run 'ldconfig' in the booted
recovery system but now as 'ldconfig -X' (WHY now the '-X'?)
which is wrong because now it tries to boot the
recovery system with inconsistent libraries configuration
which can badly fail as in this case.

As far as I understand it the actually right solution seems
to create a consistent libraries configuration at build-time
of the recovery system with right confing files in the
$ROOTFS_DIR/etc/ld.so.conf.d/ directory and
Error out if that fails because that could mean
the recovery system may fail to boot.

schlomo commented at 2017-09-18 13:19:

We actually have usr/share/rear/build/default/980_verify_rootfs.sh which does a chroot test.

It seems to me that we should improve this test to be more thorough. At the moment it only checks bash which apparently does not have this problem. Can you maybe expand the test there to be more thorough and to also check for such problems?

linuxstar2017 commented at 2017-09-18 13:40:

Feedback:

Perfect, that change solved the issue ! Server is now recovering ....

Many thanks for your help, will now update the SR.....

jsmeix commented at 2017-09-19 07:53:

@linuxstar2017
please be explicit so that I do not need to guess.
If you do not invest your time to write explicit information
others may not spend their time to investigate your issues.
What change solved the issue?

COPY_AS_IS=( "${COPY_AS_IS[@]}" /lib64/ )

or

COPY_AS_IS=( "${COPY_AS_IS[@]}" /lib64/noelision/ )

plus in build/GNU/Linux/390_copy_binaries_libraries.sh

ldconfig $v -r "$ROOTFS_DIR" >&2 || Error " ..."

or did even both changes solve the issue?
In the latter case what was the preferred way and why?
I need such feedback to better understand user expectations.

jsmeix commented at 2017-09-19 07:56:

I reopen this issue because we still have a bug in ReaR
that needs to be fixed.

@schlomo
I will enhance build/default/980_verify_rootfs.sh
but that is of secondary importance.

First and foremost I need some help how to implement correctly
that the libraries in the recovery system are consistent because
plain 'ldconfig $v -r "$ROOTFS_DIR"' seems to have
unwanted side-effects https://github.com/rear/rear/issues/772

linuxstar2017 commented at 2017-09-19 07:58:

@jsmeix

Sorry, I was explicit ... but agreed, only in the SR solution comment:

ReaR adapted as per below:

File:
/usr/share/rear/build/GNU/Linux/39_copy_binaries_libraries.sh: ldconfig $v -r "$ROOTFS_DIR" >&2 || Error "Could not configure libraries with ldconfig"
==> (line uncommented)

File:
/etc/rear/local.conf: COPY_AS_IS=( "${COPY_AS_IS[@]}" /lib64/noelision/ )
==> (/lib64/noelision added)

schlomo commented at 2017-09-19 08:04:

@jsmeix thanks. ATM I only can offer a wild guess: The way how ReaR copies libraries to the standard paths can mess up some setups to put "override" libraries in special paths. To be 100% correct we might have to:

  • Take ld.so.conf* as it is
  • replicate the library directories when we copy libs
  • Have special code to handle known special cases

gdha commented at 2017-09-19 09:39:

@linuxstar2017 are you really sure it did not work with /usr/share/rear/build/GNU/Linux/39_copy_binaries_libraries.sh and leaving #ldconfig $v -r "$ROOTFS_DIR" >&2 || Error "Could not configure libraries with ldconfig" ==> (line commented)
It worked like that for 1,5 year already with commented line. I suspect the missing /lib64/noelision/ was the real reason.
However, we need to be sure.

linuxstar2017 commented at 2017-09-19 10:04:

@gdha
At least this is , what we tried/tested .... after un-commenting the ldconfig line, it worked,
before not.

The addition of the library path was days before ... it was copied over (verified),
but failed nevertheless....

jsmeix commented at 2017-09-19 10:08:

@linuxstar2017
I cannot read a "SR solution comment" and even more important:
Nobody else from ReaR upstream can.

jsmeix commented at 2017-09-19 10:18:

@gdha
you can reproduce it when you have
/lib64/noelision/libpthread* libraries
on your system.

Add a topmost line
/lib64/noelision
to your /etc/ld.so.conf cf.
https://github.com/rear/rear/issues/1494#issuecomment-329784111
and then do a 'ldconfig' so that you get on your system

# ldconfig -v | egrep 'libpthread|noelision'
...
/lib64/noelision:
        libpthread.so.0 -> libpthread-2.22.so
        libpthread.so.0 -> libpthread-2.22.so

# ldd /bin/sort | grep pthread
        libpthread.so.0 => /lib64/noelision/libpthread.so.0

I use /bin/sort as an example for a binary that is
linked with libpthread which is now linked with
the noelision version of libpthread.

Then reproduce it as I described in
https://github.com/rear/rear/issues/1494#issuecomment-330192635

It worked like that for 1,5 year already with commented line
because for 1,5 year already no ReaR user uses the
noelision version of libpthread.

Simply put:
As soon as a special libraries configuration is used,
the libraries in the recovery system are likely inconsistent
and when a library is involved where init is linked with,
the recovery system fails to boot with kernel panic
because init fails.

gdha commented at 2017-09-19 11:17:

@jsmeix @linuxstar2017 Thank you both for the clear explanation of the issue - now I'm following

jsmeix commented at 2017-09-19 14:28:

With https://github.com/rear/rear/pull/1502 merged
issues like this one should now be avoided.

jsmeix commented at 2017-09-26 13:47:

The recovery system libraries are still not fully consistent,
see https://github.com/rear/rear/issues/1518 but
since https://github.com/rear/rear/pull/1514 is merged
for programs (i.e. files in a .../bin/... or .../sbin/... directory)
a missing library in the recovery system is now an Error.

linuxstar2017 commented at 2017-09-27 06:33:

O.K. let me ask more concrete ... what my customer need to know ....

When is the SUSE SLES-supported ReaR package able to work
non-modfied ? As a consequence, customer expects a PTF or
fix going upstream.

If no ETA or PTF (customer would challenge, why ?), what according
to your experience, need the customer to implement to have current, non-working
situation corrected ?

For the given scenario ... including /lib64/noelision, a bootable recovery image and
working Networker recovery at the end .. pls. keep in mind, this is the customers
recovery environment, they rely on !

Many thanks ....

jsmeix commented at 2017-09-27 08:47:

@linuxstar2017

The fix is upstream:
https://github.com/rear/rear/pull/1502
https://github.com/rear/rear/issues/1494#issuecomment-330557699
https://github.com/rear/rear/commit/6afc9a0a8b5b80429fb0aba545302c1af8ccbe70

I do not understand why
"customer ... have current, non-working situation"
regardless of your
https://github.com/rear/rear/issues/1494#issuecomment-330224625
"that change solved the issue ! Server is now recovering"
Why cannot the customer adapt his ReaR scripts?
That's why ReaR is implemented as bash scripts, read
"Disaster recovery with Relax-and-Recover (ReaR)" in
https://en.opensuse.org/SDB:Disaster_Recovery
Regarding the upstream fix
https://github.com/rear/rear/commit/6afc9a0a8b5b80429fb0aba545302c1af8ccbe70
When the customer uses an older ReaR version
there is no LogPrintError function.
In this case use the LogPrint function instead.

In general:

The public ReaR upstream GitHub is
neither the right place for questions about SUSE's
(or any company) official customer support handling
nor for lecturing what people should "keep in mind"
when working as ReaR upstream maintainers or contributors
(where most of them do that even on a voluntary base).

The public ReaR upstream GitHub is the right place
to report the plain facts about a ReaR issue
to get the issue analyzed and ideally even fixed
via direct communication with people who are here
as ReaR upstream maintainers and contributors.
This way an issue could get solved in a shorter time
than via an indirectly working official company support.

The public ReaR upstream GitHub does not replace
an official company support but it can complement it
by providing an efficient direct communication with the
ReaR upstream maintainers and contributors plus
either fixes so that all official company support teams
could get the fixes from the public ReaR upstream GitHub
or alternatively by providing a public visible explanation
why a particular issue cannot be fixed in ReaR
(e.g. when the cause of an issue is not "inside" ReaR).

gdha commented at 2017-09-27 09:47:

@linuxstar2017 As @jsmeix said for ReaR packages from SLES itself you should open a software case at SuSE support center, and do not bother us with these kind of questions as we have nothing to say what SuSE does or has in the pipe-line with ReaR in their products.
We are volunteers working on public ReaR upstream project and cannot speak for RHEL, SuSE, Ubuntu or what-ever Linux distro.

linuxstar2017 commented at 2017-09-27 09:54:

@gdha @jsmeix
Sorry (as explained already to jsmeix) my fault ....
Just forget it ... it's already channeled again the right way ...

jsmeix commented at 2017-10-04 14:18:

With https://github.com/rear/rear/pull/1521 merged
the whole binaries and libraries copying code is now
cleaned up and simplified.


[Export of Github issue for rear/rear.]