#2240 Issue closed: sshd segfault in ReaR recovery system (also segfaults in SLES15-SP1 system)

Labels: support / question, external tool, not ReaR / invalid

shaunsJM opened issue at 2019-09-25 19:24:

Relax-and-Recover (ReaR) Issue Template

Fill in the following items before submitting a new issue
(quick response is not guaranteed with free support):

  • ReaR version ("/usr/sbin/rear -V"):
    Relax-and-Recover 2.5 / 2019-05-10

  • OS version ("cat /etc/rear/os.conf" or "lsb_release -a" or "cat /etc/os-release"):

NAME="SLES"
VERSION="15-SP1"
VERSION_ID="15.1"
PRETTY_NAME="SUSE Linux Enterprise Server 15 SP1"
ID="sles"
ID_LIKE="suse"
ANSI_COLOR="0;32"
CPE_NAME="cpe:/o:suse:sles:15:sp1"
  • ReaR configuration files ("cat /etc/rear/site.conf" and/or "cat /etc/rear/local.conf"):
    LOCAL.CONF
# Begin example setup for SLE12-SP2 with default btrfs subvolumes.
# Since SLE12-SP1 what is mounted at '/' is a btrfs snapshot subvolume
# see https://github.com/rear/rear/issues/556
# and since SLE12-SP2 btrfs quota via "snapper setup-quota" is needed
# see https://github.com/rear/rear/issues/999
# You must adapt "your.NFS.server.IP/path/to/your/rear/backup" at BACKUP_URL.
# You must decide whether or not you want to have /home/* in the backup.
# It depends on the size of your harddisk whether or not /home is by default
# a btrfs subvolume or a separated xfs filesystem on a separated partition.
# You may activate SSH_ROOT_PASSWORD and adapt the "password_on_the_rear_recovery_system".
# For basic information see the SLE12-SP2 manuals.
# Also see the support database article "SDB:Disaster Recovery"
# at http://en.opensuse.org/SDB:Disaster_Recovery
# In particular note:
# There is no such thing as a disaster recovery solution that "just works".
# Regarding btrfs snapshots:
# Recovery of btrfs snapshot subvolumes is not possible.
# Only recovery of "normal" btrfs subvolumes is possible.
# On SLE12-SP1 and SP2 the only exception is the btrfs snapshot subvolume
# that is mounted at '/' but that one is not recreated but instead
# it is created anew from scratch during the recovery installation with the
# default first btrfs snapper snapshot subvolume path "@/.snapshots/1/snapshot"
# by the SUSE tool "installation-helper --step 1" (cf. below).
# Other snapshots like "@/.snapshots/234/snapshot" are not recreated.
# Create rear recovery system as ISO image:
OUTPUT=ISO
# Store the backup file via NFS on a NFS server:
BACKUP=NETFS
# BACKUP_OPTIONS variable contains the NFS mount options and
# with 'mount -o nolock' no rpc.statd (plus rpcbind) are needed:
BACKUP_OPTIONS="nolock,credentials=/etc/rear/cifs-rear"
# If the NFS server is not an IP address but a hostname,
# DNS must work in the rear recovery system when the backup is restored.
BACKUP_URL=cifs://den-vmutility/Linux-Rear-Migrations
# Keep an older copy of the backup in a HOSTNAME.old directory
# provided there is no '.lockfile' in the HOSTNAME directory:
NETFS_KEEP_OLD_BACKUP_COPY=yes
# Have all modules of the original system in the recovery system with the
# same module loading ordering as in the original system by using the output of
#   lsmod | tail -n +2 | cut -d ' ' -f 1 | tac | tr -s '[:space:]' ' '
# as value for MODULES_LOAD (cf. https://github.com/rear/rear/issues/626):
#MODULES_LOAD=( )
# On SLE12-SP1 and SP2 with default btrfs subvolumes what is mounted at '/' is a btrfs snapshot subvolume
# that is controlled by snapper so that snapper is needed in the recovery system.
# In SLE12-SP1 and SP2 some btrfs subvolume directories (/var/lib/pgsql /var/lib/libvirt/images /var/lib/mariadb)
# have the "no copy on write (C)" file attribute set so that chattr is required in the recovery system
# and accordingly also lsattr is useful to have in the recovery system (but not strictly required):
REQUIRED_PROGS=( "${REQUIRED_PROGS[@]}" snapper chattr lsattr )
# Snapper setup by the recovery system uses /usr/lib/snapper/installation-helper
# that is linked to all libraries where snapper is linked to
# (except libdbus that is only needed by snapper).
# "installation-helper --step 1" creates a snapper config based on /etc/snapper/config-templates/default
COPY_AS_IS=( "${COPY_AS_IS[@]}" /usr/lib/snapper/installation-helper /etc/snapper/config-templates/default /etc/rear/cifs-rear )
# Files in btrfs subvolumes are excluded by 'tar --one-file-system'
# so that such files must be explicitly included to be in the backup.
# Files in the following SLE12-SP2 default btrfs subvolumes are
# in the below example not included to be in the backup
#   /.snapshots  /var/crash
# but files in /home are included to be in the backup.
# You may use a command like
#   findmnt -n -r -o TARGET -t btrfs | grep -v '^/$' | egrep -v 'snapshots|crash'
# to generate the values:
BACKUP_PROG_INCLUDE=( /var/cache /var/lib/mailman /var/tmp /var/lib/pgsql /usr/local /opt /var/lib/libvirt/images /boot/grub2/i386-pc /var/opt /srv /boot/grub2/x86_64-efi /var/lib/mariadb /var/spool /var/lib/mysql /tmp /home /var/log /var/lib/named /var/lib/machines )
# The following POST_RECOVERY_SCRIPT implements during "rear recover"
# btrfs quota setup for snapper if that is used in the original system:
POST_RECOVERY_SCRIPT=( 'if snapper --no-dbus -r $TARGET_FS_ROOT get-config | grep -q "^QGROUP.*[0-9]/[0-9]" ; then snapper --no-dbus -r $TARGET_FS_ROOT set-config QGROUP= ; snapper --no-dbus -r $TARGET_FS_ROOT setup-quota && echo snapper setup-quota done || echo snapper setup-quota failed ; else echo snapper setup-quota not used ; fi' )
# This option defines a root password to allow SSH connection
# whithout a public/private key pair
SSH_ROOT_PASSWORD='$1$dQS13sX5$L3Y4je4sdHLqveto/Ic4t.'
ONLY_INCLUDE_VG=( "system" )
AUTOEXCLUDE_MULTIPATH=n
ISO_MKISOFS_BIN="$( type -p xorrisofs || type -p mkisofs || type -p genisoimage )"
EXCLUDE_VG=( "vgapp" "vgarchlog" "vgu00-04" "vgu05-09" )
EXCLUDE_MOUNTPOINTS=('/app' '/archlog' '/u00' '/u01' '/u02' '/u03' '/u04' '/u05' '/u06' '/u07' '/u08' '/u09')
CLONE_ALL_USERS_GROUPS="true"
# Let the rear recovery system run dhclient to get an IP address
# instead of using the same IP address as the original system:
#USE_DHCLIENT="yes"
# End example setup for SLE12-SP2 with default btrfs subvolumes.
  • Hardware (PC or PowerNV BareMetal or ARM) or virtual machine (KVM guest or PoverVM LPAR): VMWare guest

  • System architecture (x86 compatible or PPC64/PPC64LE or what exact ARM device):
    x86_64

  • Firmware (BIOS or UEFI or Open Firmware) and bootloader (GRUB or ELILO or Petitboot):
    BIOS and GRUB2

  • Storage (local disk or SSD) and/or SAN (FC or iSCSI or FCoE) and/or multipath (DM or NVMe):
    CIFS mount to a windows server

  • Description of the issue (ideally so that others can reproduce it):
    While booted to rear recovery RESCUE ISO sshd will segfault when you try to ssh to the server to run the rear recover.
    This is my first use of REAR with SUSE SLES 15. Other SLES versions do not have this issue.

  • Workaround, if any:
    None

  • Attachments, as applicable ("rear -D mkrescue/mkbackup/recover" debug log files):
    I have console screen printouts since I could not capture the journalctl log by using ssh.
    rear-rescue-sshd-segfault.docx

jsmeix commented at 2019-09-26 12:55:

@shaunsJM
in your attached
https://github.com/rear/rear/files/3654405/rear-rescue-sshd-segfault.docx
there are sshd related error messages
Could not load host key: /etc/ssh/ssh_host_*_key
so it seems those key files are missing in the ReaR recovery system,
cf. the SSH_* config variables and their explanation in default.conf
https://github.com/rear/rear/blob/master/usr/share/rear/conf/default.conf#L1321

shaunsJM commented at 2019-09-26 22:01:

I believe the "Could not load host key" message is not the real issue and is a distraction.

If local.conf is configured with a SSH_ROOT_PASSWORD then a public/private key pair is not needed.

# SSH_ROOT_PASSWORD defines a password for remote access to the recovery system as 'root' via SSH
# without requiring a public/private key pair. This password is valid only while the recovery system
# is running and will not allow access afterwards to the restored target system.

This issue only happens on SLES 15, and not on SLES 12 using a very similar local.conf configuration with a SSH_ROOT_PASSWORD defined in local.conf

jsmeix commented at 2019-09-27 12:01:

@shaunsJM

On my ReaR test systems I always use SSH_ROOT_PASSWORD="rear"
(never ever user your original root password as plain text there).
I had never noticed an ssh issue with that, in particular not
at the time when I did my tests on SLES12 and SLES15.
I am not at all an ssh expert so I cannot actually debug ssh issues.
Perhaps something is "unusual" on your particular SLES15 system?
Perhaps there was some recent ssh related update on SLES15
that causes this issue in ReaR as a side effect?
In any case a segfault of sshd is usually first and foremost
an issue in sshd itself or in one of its libraries.

After "rear mkrescue/mkbackup" you can check things inside the
ReaR recovery system by using KEEP_BUILD_DIR="yes",
see the KEEP_BILD_DIR description in default.conf
https://github.com/rear/rear/blob/master/usr/share/rear/conf/default.conf#L128
via chroot $TMPDIR/rear.XXXXXXXXXXXXXXX/rootfs/
and then try to start sshd from inside the ReaR recovery system.
I guess there could be certain issues when starting a service
that usually needs the outer network to work properly
from within the limited chroot environment
but for some basic tests it could help,
at least sshd should not segfault.

I am not in the office since some weeks and for some more weeks
so that I cannot do much for ReaR.
I cannot try out or test anything for ReaR - in particular not on SLES.
I expect to be back in the office at about beginning of October.
But I also expect that I have to do first and foremost other stuff with higher priority,
cf. https://github.com/rear/rear/issues/799#issuecomment-531247109

gozora commented at 2019-09-27 13:13:

@shaunsJM can you verify your openssh installation on your ORIGINAL system, by running rpm --verify openssh | grep ^S ?

V.

shaunsJM commented at 2019-09-27 14:48:

@gozora Thanks for the help. Here it the rpm verify output:
S.5....T. c /etc/ssh/sshd_config

shaunsJM commented at 2019-09-27 16:04:

@jsmeix I'm thinking something about the sshd changed between the version SLES12 and 15. During my testing yesterday, I managed to stumble upon a scenario where after running a ReaR backup, any new ssh connections into the server would produce a sshd segfault in the messages log. I will spend some time trying to reproduce it. If I can reproduce it, I will log a call with SUSE to have them look at why.
When you get back and have some time, try loading OpenSUSE Leap 15.1 and see if you get the same results. Like I said to begin with, ReaR is working great and I can recover without any issues if I start the the recovery from the console. It is just a nice thing to be able to sshd into the recovery environment.
Thanks! Shaun

gozora commented at 2019-09-27 19:09:

@shaunsJM hmm, looks good at first sight.
Can you tell me, until it is not a secret of course, what are the changes you've made in your /etc/ssh/sshd_config ?

I just tested ssh connection to ReaR recovery system of my SUSE Linux Enterprise Server 15 with openssh-7.6p1-7.8.x86_64, and all works fine...

Btw, my sshd in ReaR recovery system also suffers from messages like mentioned earlier:

RESCUE sle15:~ # systemctl status sshd
● sshd.service - Relax-and-Recover sshd service
   Loaded: loaded (/usr/lib/systemd/system/sshd.service; static; vendor preset: enabled)
   Active: active (running) since Fri 2019-09-27 20:59:41 CEST; 1min 13s ago
 Main PID: 106 (sshd)
    Tasks: 5 (limit: 4915)
   CGroup: /system.slice/sshd.service
           ├─106 /bin/sshd -D
           ├─744 sshd: root@pts/0
           ├─746 -bash
           ├─751 systemctl status sshd
           └─752 less

Sep 27 20:59:41 sle15 sshd[106]: Server listening on 0.0.0.0 port 22.
Sep 27 20:59:41 sle15 sshd[106]: Server listening on :: port 22.
Sep 27 21:00:18 sle15 sshd[742]: error: Could not load host key: /etc/ssh/ssh_host_dsa_key
Sep 27 21:00:18 sle15 sshd[742]: error: Could not load host key: /etc/ssh/ssh_host_ecdsa_key
Sep 27 21:00:18 sle15 sshd[742]: error: Could not load host key: /etc/ssh/ssh_host_ed25519_key
Sep 27 21:00:18 sle15 sshd[742]: Connection closed by 192.168.0.1 port 60840 [preauth]
Sep 27 21:00:23 sle15 sshd[744]: error: Could not load host key: /etc/ssh/ssh_host_dsa_key
Sep 27 21:00:23 sle15 sshd[744]: error: Could not load host key: /etc/ssh/ssh_host_ecdsa_key
Sep 27 21:00:23 sle15 sshd[744]: error: Could not load host key: /etc/ssh/ssh_host_ed25519_key

But this is most probably caused by ReaR security feature, described in default.conf

For your reference, here is how my sshd is linked:

RESCUE sle15:~ # ldd /bin/sshd
    linux-vdso.so.1 (0x00007ffc0a946000)
    libaudit.so.1 => /usr/lib64/libaudit.so.1 (0x00007fd66a99a000)
    libpam.so.0 => /lib64/libpam.so.0 (0x00007fd66a78b000)
    libselinux.so.1 => /lib64/libselinux.so.1 (0x00007fd66a562000)
    libsystemd.so.0 => /usr/lib64/libsystemd.so.0 (0x00007fd66a2cd000)
    libcrypto.so.1.1 => /usr/lib64/libcrypto.so.1.1 (0x00007fd669e42000)
    libutil.so.1 => /lib64/libutil.so.1 (0x00007fd669c3f000)
    libz.so.1 => /lib64/libz.so.1 (0x00007fd669a28000)
    libcrypt.so.1 => /lib64/libcrypt.so.1 (0x00007fd6697ed000)
    libgssapi_krb5.so.2 => /usr/lib64/libgssapi_krb5.so.2 (0x00007fd6695a2000)
    libkrb5.so.3 => /usr/lib64/libkrb5.so.3 (0x00007fd6692c8000)
    libcom_err.so.2 => /lib64/libcom_err.so.2 (0x00007fd6690c4000)
    libc.so.6 => /lib64/libc.so.6 (0x00007fd668d0a000)
    libdl.so.2 => /lib64/libdl.so.2 (0x00007fd668b06000)
    libpcre.so.1 => /usr/lib64/libpcre.so.1 (0x00007fd668879000)
    /lib64/ld-linux-x86-64.so.2 (0x00007fd66ae8f000)
    libresolv.so.2 => /lib64/libresolv.so.2 (0x00007fd668662000)
    libcap.so.2 => /usr/lib64/libcap.so.2 (0x00007fd66845d000)
    librt.so.1 => /lib64/librt.so.1 (0x00007fd668255000)
    liblzma.so.5 => /usr/lib64/liblzma.so.5 (0x00007fd66801a000)
    liblz4.so.1 => /usr/lib64/liblz4.so.1 (0x00007fd667e05000)
    libgcrypt.so.20 => /usr/lib64/libgcrypt.so.20 (0x00007fd667ae9000)
    libgpg-error.so.0 => /usr/lib64/libgpg-error.so.0 (0x00007fd6678c9000)
    libpthread.so.0 => /lib64/libpthread.so.0 (0x00007fd6676ab000)
    libk5crypto.so.3 => /usr/lib64/libk5crypto.so.3 (0x00007fd667479000)
    libkrb5support.so.0 => /usr/lib64/libkrb5support.so.0 (0x00007fd66726c000)
    libkeyutils.so.1 => /usr/lib64/libkeyutils.so.1 (0x00007fd667068000)

V.

shaunsJM commented at 2019-09-27 22:31:

@gozora Any chance you can upgrade to SLES 15 SP1? My openssh version is openssh-7.9p1-4.7.x86_64 on SLES 15 SP1.
My only change to sshd_config is setting "PermitRootLogin without-password" But I also tried setting it to yes.

@jsmeix and @gozora
Possible progress, I can easily reproduce the sshd segfault on a SLES 15 SP1 server without ReaR installed. So it doesn't appear to be ReaR having the issue. Something in the way sshd is reading those system wide ssh keys in /etc/ssh. I did create a workaround that allows me to get on the server using ssh.
I created a setup-sshd.sh in /etc/rear and have it included in the making of the ISO. Once in the ReaR recovery shell I run the /etc/rear/setup-sshd.sh and I can now login using ssh.
here is the script so you can see what it does.
#!/bin/bash rm /etc/ssh/ssh_host_ecdsa_key /etc/ssh/ssh_host_ed25519_key ssh-keygen -t ecdsa -f /etc/ssh/ssh_host_ecdsa_key -q -P "" ssh-keygen -t ed25519 -f /etc/ssh/ssh_host_ed25519_key -q -P "" systemctl restart sshd

This stops the segfault and allows me to login using ssh.

Big thanks to @jsmeix and @gozora for looking at this with me. I will create a case with SUSE to see why the segfault is happening. Thanks!

gozora commented at 2019-09-28 07:20:

@shaunsJM following configuration allowed ReaR to copy host keys into ReaR recovery image:

RESCUE sle15:~ # cat /etc/rear/local.conf 
...

SSH_FILES=/etc/ssh
SSH_UNPROTECTED_PRIVATE_KEYS="yes"

With this setup you should be able to connect to ReaR recovery system without your workaround (which is fully valid by the way ;-)).

V.

shaunsJM commented at 2019-09-30 15:26:

@gozora I debated using SSH_UNPROTECTED_PRIVATE_KEYS="yes" but had some concerns about security. So I was leaving that as my last option. But thanks for pointing it out! :)

jsmeix commented at 2019-10-01 15:42:

According to
https://github.com/rear/rear/issues/2240#issuecomment-536116399

I can ... reproduce the sshd segfault on a SLES 15 SP1 server
without ReaR installed.

the root cause is not in ReaR so that I close this issue accordingly.
Nevertheless further comments can of course be added here
to describe how things go on with sshd on SLES15-SP1.
@shaunsJM
thank you for your explicit feedback where the root cause of this issue is.

shaunsJM commented at 2019-10-10 19:43:

A quick update, I logged the issue with SUSE and they are releasing a patch to fix the ssh segfault issue.

jsmeix commented at 2019-10-11 09:37:

@shaunsJM
out of curiosity: could you send me your SUSE bugzilla bug number
(or post it here if it is even a public accessible SUSE bugzilla bug)
or your SUSE issue ticket number (or whatever that thingy is called)
to my SUSE mail address: jsmeix@suse.com
so that I could see what goes on there?
Thank you in advance!

shaunsJM commented at 2019-10-22 15:26:

This should be my last update for this... SUSE has created a PTF (Program Temporary Fix) for sshd. I've been able to test it out and confirm that sshd will no longer segfault if the host keys are missing. SUSE didn't give me any ETA on when this fix will be included in the next update of sshd.
I further tested out a ReaR recovery and I'm able to ssh into the ReaR recovery console without any problems. Again thanks for the help @jsmeix and @gozora

jsmeix commented at 2019-10-23 09:25:

@shaunsJM
thank you for your positive feedback.

FYI
regarding PTF versus maintenance update:

A PTF is fully supported so you do not need to have a maintenance update
to have full support from SUSE for what you currently have installed.

Usually some time after you got a PTF you will get a maintenance update
that includes what is fixed by the PTF.
But there is no time limit when such a maintenance update will come.
In certain cases it may even come never (e.g. when a PTF contains
a special fix for a corner case) so a temporary fix could become
temporary as long as a SLES version is supported.

For more details see for example
https://www.suse.com/support/handbook/


[Export of Github issue for rear/rear.]