#3415 PR merged: Create an ed25519 ssh host key, and /usr/share/empty.sshd in the rescue image #3413

Labels: enhancement, external tool

gdha opened issue at 2025-02-28 16:09:

Relax-and-Recover (ReaR) Pull Request Template

Please fill in the following items before submitting a new pull request:

Pull Request Details:
  • Type: Bug Fix and Enhancement

  • Impact: High

  • Reference to related issue (URL): #3413

  • How was this pull request tested? Manually in recovery mode

  • Description of the changes in this pull request: EL9 requires the ed25519 SSH key and the directory /usr/share/empty.sshd to be present.

schlomo commented at 2025-03-02 10:29:

BTW, it seems like we use the /etc/inittab file as a trigger for actually starting SSH, maybe time to let that go?

gdha commented at 2025-03-03 09:53:

BTW, it seems like we use the /etc/inittab file as a trigger for actually starting SSH, maybe time to let that go?

Yes indeed. However, this PR is meant for ReaR v2.6 (RHEL) and would like to keep the /etc/inittab dependency removal for another issue

gdha commented at 2025-03-13 09:48:

@rear/contributors If no-one has any objections I would like to merge this PR tomorrow afternoon (14/Mar/2025)?

pcahyna commented at 2025-03-17 10:49:

Hi @gdha , thank you for the PR, but I am not able to reproduce the original issue. On EL9, sshd works for me in the rescue system without any changes (but, I am not using Putty, is it doing anything special?). Do you please have some logs from the failed sshd start-up that might help with reproducing the problem? Do you have a custom sshd_config?

gdha commented at 2025-03-17 10:58:

Hi @gdha , thank you for the PR, but I am not able to reproduce the original issue. On EL9, sshd works for me in the rescue system without any changes (but, I am not using Putty, is it doing anything special?). Do you please have some logs from the failed sshd start-up that might help with reproducing the problem? Do you have a custom sshd_config?

Unfortunately, no logs as it was during the boot up and ReaR did enable the RSA sshd key, but our putty enforce the ed25519 key and that was not present, therefore, no login via putty was possible.

schlomo commented at 2025-03-17 11:15:

So the root cause was an SSH key policy on the SSH client side that triggered the problem? And our fix is to also generate modern SSH keys and not only old SSH keys?

I think that this is good for us to support modern and more secure standards...

I was wondering if maybe instead of looking for missing keys under a default name we could either check for missing files from the configuration or maybe even check the configuration on the source system and remember the SSH key files and types to re-created?

E.g. something like this could be used to remember the key files and types:

# sshd -T | grep "hostkey /" | while read _ key ; do method=( $(ssh-keygen -y -f $key) ) ; echo $key $method ; done
/etc/ssh/ssh_host_rsa_key ssh-rsa
/etc/ssh/ssh_host_ecdsa_key ecdsa-sha2-nistp256
/etc/ssh/ssh_host_ed25519_key ssh-ed25519

and then we generate them all and the code would be a bit more future proof with regard to key file names and types. As we copy over the original sshd_config this might make ReaR more robust as we don't depend on hard-coded SSH host key file names.

pcahyna commented at 2025-03-17 11:43:

our putty enforce the ed25519 key and that was not present, therefore, no login via putty was possible

Thanks for the explanation. Makes sense, with this information I think I will be able to reproduce it easily (I would not describe the problem as "On EL9 sshd will not start anymore", though .... )

pcahyna commented at 2025-03-17 13:21:

I was wondering if maybe instead of looking for missing keys under a default name we could either check for missing files from the configuration or maybe even check the configuration on the source system and remember the SSH key files and types to re-created?

There is a script in the build stage that generates a key if present on the original system and not present in the rescue ramdisk, but the algorithm is a bit different (it does not check the configuration, it checks the existence of the actual files, which could be even better, as we want to have the rescue ramdisk close to the original system if feasible).

https://github.com/rear/rear/blob/bce2750933fae8c8bd328e5378c8ad435d0166d1/usr/share/rear/build/default/500_ssh_setup.sh#L147-L162

schlomo commented at 2025-03-17 13:35:

@pcahyna I suppose that our current approach is based on hard coded assumptions around the actual file names of the host keys:
https://github.com/rear/rear/blob/bce2750933fae8c8bd328e5378c8ad435d0166d1/usr/share/rear/build/default/500_ssh_setup.sh#L140

I was wondering if it is safe to assume that these file names are indeed the same across all distros and SSHD versions?

pcahyna commented at 2025-03-17 13:52:

I can reproduce the problem with the missing host key:
ssh -o 'HostKeyAlgorithms=ssh-ed25519' root@...
prints:
Unable to negotiate with ...: no matching host key type found. Their offer: rsa-sha2-512,rsa-sha2-256,ssh-rsa

pcahyna commented at 2025-03-17 13:59:

I am not able to reproduce the problem with the missing /usr/share/empty.sshd though. In my test, the directory exists in the rescue image. It has been created here:
https://github.com/rear/rear/blob/bce2750933fae8c8bd328e5378c8ad435d0166d1/usr/share/rear/rescue/default/500_ssh.sh#L73-L79
@gdha is there something unusual on the system if the directory does not get created? On EL9, getent passwd sshd should print this:

sshd:x:74:74:Privilege-separated SSH:/usr/share/empty.sshd:/usr/sbin/nologin

You can check what is in the rescue ramdisk by using rear -d mkrescue without rebooting - the build area will be left under /var/tmp for examination.

pcahyna commented at 2025-03-17 14:35:

rear -D mkrescue and the resulting log is even more informative. I see this:

++ local 'sshd_usernames=sshd ssh'
+++ getent passwd sshd ssh
++ local 'getent_passswd_ssh=sshd:x:74:74:Privilege-separated SSH:/usr/share/empty.sshd:/usr/sbin/nologin'
++ test 'sshd:x:74:74:Privilege-separated SSH:/usr/share/empty.sshd:/usr/sbin/nologin'
++ local sshd_user sshd_password sshd_uid sshd_gid sshd_gecos sshd_homedir junk
++ IFS=:
++ read sshd_user sshd_password sshd_uid sshd_gid sshd_gecos sshd_homedir junk
++ CLONE_USERS+=($sshd_user)
++ mkdir -v -p /var/tmp/rear.olGxluYuADkVTij/rootfs//usr/share/empty.sshd
mkdir: created directory '/var/tmp/rear.olGxluYuADkVTij/rootfs//usr/share'
mkdir: created directory '/var/tmp/rear.olGxluYuADkVTij/rootfs//usr/share/empty.sshd'
++ chmod -v 0700 /var/tmp/rear.olGxluYuADkVTij/rootfs//usr/share/empty.sshd
mode of '/var/tmp/rear.olGxluYuADkVTij/rootfs//usr/share/empty.sshd' changed from 0755 (rwxr-xr-x) to 0700 (rwx------)

gdha commented at 2025-03-17 14:44:

@pcahyna I see the following:

++ CLONE_USERS+=($sshd_user)
++ mkdir -v -p /oem/rear.BLHLE9eXuu48WDU/rootfs//var/empty/sshd
mkdir: created directory '/oem/rear.BLHLE9eXuu48WDU/rootfs//var/empty/sshd'
++ chmod -v 0700 /oem/rear.BLHLE9eXuu48WDU/rootfs//var/empty/sshd
mode of '/oem/rear.BLHLE9eXuu48WDU/rootfs//var/empty/sshd' retained as 0700 (rwx------)

and the output of:

#-> getent passwd sshd
sshd:x:74:74:Privilege-separated SSH:/var/empty/sshd:/sbin/nologin

However, I'm 100% sure that after the failed ssh login session I had to create on the console the directory /usr/share/empty.ssh. I wonder if this is something embedded in sshd?

pcahyna commented at 2025-03-17 15:00:

Indeed it seems hardcoded:

# strings /usr/sbin/sshd | grep empty
(...)
/usr/share/empty.sshd

sshd(8) manual page reads:

     /usr/share/empty.sshd
             chroot(2) directory used by sshd during privilege separation in
             the pre-authentication phase.  The directory should not contain
             any files and must be owned by root and not group or world-
             writable.

and it does not mention that the directory is configurable.
On Debian it reads:

     /run/sshd
             chroot(2) directory used by sshd during privilege separation in
             the pre-authentication phase.  The directory should not contain
             any files and must be owned by root and not group or world-
             writable.

So apparently the directory is set during build time (and distro-specific), and the build embeds it both in the sshd binary and the manual page.

I am afraid you have an inconsistent system (wrong user entry for sshd). Do you know where it has come from?

gdha commented at 2025-03-17 15:16:

I am afraid you have an inconsistent system (wrong user entry for sshd). Do you know where it has come from?

The system was upgraded last year from RHEL 8 to RHEL 9 and for the rest it is a pretty standard RHEL system. No special software installed that could impact sshd, except perhaps we enforcing the use of ed25519 key.

pcahyna commented at 2025-03-17 15:23:

The system was upgraded last year from RHEL 8 to RHEL 9

Indeed on RHEL 8:

# getent passwd sshd
sshd:x:74:74:Privilege-separated SSH:/var/empty/sshd:/sbin/nologin

so what we see here is actually a result of a bad upgrade procedure - I need to check whether it is a bug in the upgrade tooling or the upgrade was somehow botched.

gdha commented at 2025-03-17 15:26:

Oh boy - serious impact that would be for the already >1000 systems upgraded...

jsmeix commented at 2025-03-17 15:28:

For such cases we have
https://github.com/rear/rear/wiki/Coding-Style#dirty-hacks-welcome

pcahyna commented at 2025-03-17 15:38:

whether it is a bug in the upgrade tooling

This - there is apparently nothing in the sshd package to change the user when the directory changes, same problem on Fedora - older version used /var/empty/sshd, new versions use /usr/share/empty.sshd, but on an upgraded Fedora, the user still has /var/empty/sshd as $HOME.

gdha commented at 2025-03-17 15:41:

okay a glitch in the leapp software?

pcahyna commented at 2025-03-17 15:42:

https://src.fedoraproject.org/rpms/openssh/pull-request/14 is the source of the change

pcahyna commented at 2025-03-17 15:43:

should rather be in the package scriptlet than in Leapp as Fedora needs it as well. Anyway, a conversion step needs to be somewhere. I will report it to the openssh package maintainers.


[Export of Github issue for rear/rear.]