#1709 Issue closed: Default setting of USE_SERIAL_CONSOLE considered dangerous in migration mode

Labels: documentation, discuss / RFC

OliverO2 opened issue at 2018-01-25 13:31:

In ReaR 2.3, the default of USE_SERIAL_CONSOLE is determined by checking for at least one of /dev/ttyS[0-9]* being operational (200_include_serial_console.sh). This leads to an incomplete boot on a rescue system if none of /dev/ttyS[0-9]* is connected to an actual console.

I have seen the following behavior with rescue images generated on a bare metal server with ttyS[012]:

Boot appears to hang (/dev/ttyS0 operational on the recovery system but not connected to an actual console)

Only kernel messages were appearing on the virtual console device:

[   19.918450] EDD information not available.
[   20.099473] Freeing unused kernel memory: 1548K (ffffffff83b6a000 - ffffffff83ced000)
[   20.508457] Write protecting the kernel read-only data: 14336k
[   20.738752] Freeing unused kernel memory: 1396K (ffff8d09640a3000 - ffff8d0964200000)
[   21.078851] Freeing unused kernel memory: 260K (ffff8d09645bf000 - ffff8d0964600000)
[   21.433765] x86/mm: Checked W+X mappings: passed, no W+X pages found.
[   21.681982] random: systemd: uninitialized urandom read (16 bytes read)
[   22.018999] random: systemd: uninitialized urandom read (16 bytes read)
[   22.368529] random: systemd: uninitialized urandom read (16 bytes read)
[   22.640103] random: systemd: uninitialized urandom read (16 bytes read)
[   22.709686] random: systemd: uninitialized urandom read (16 bytes read)
[   22.714752] random: systemd: uninitialized urandom read (16 bytes read)
[   23.110639] random: systemd: uninitialized urandom read (16 bytes read)
[   23.408483] random: systemd: uninitialized urandom read (16 bytes read)
[   23.708502] random: systemd: uninitialized urandom read (16 bytes read)
[   24.029080] random: systemd: uninitialized urandom read (16 bytes read)

...and the system appeared to hang...

...while the important stuff was hidden from view (only showing up on the serial port /dev/ttyS0 which was not connected to an actual console):

Running 42-engage-scsi.sh...
Running 45-serial-console.sh...
Serial console support enabled for ttyS0 at speed 9600
Serial console support enabled for ttyS1 at speed 9600
Serial console support enabled for ttyS2 at speed 9600
Running 55-migrate-network-devices.sh...
No network interface mapping is specified in /etc/rear/mappings/mac
The original network interface em1 00:25:90:d5:50:be is not available
1) eth0 08:00:27:92:4e:ea e1000
2) Skip replacing em1 00:25:90:d5:50:be
Choose replacement for em1 00:25:90:d5:50:be

Boot is incomplete because /etc/scripts/system-setup fails (/dev/ttyS0 is operational on the recovery system but not connected to an actual console)

sysinit.service: Failed at step STDIN spawning /etc/scripts/system-setup: Inappropriate ioctl for device

In my view it would be safer to set USE_SERIAL_CONSOLE='n' in default.conf. Then, if serial console auto-detection was desired, it would have to be stated explicitly (maybe even use an auto term for that instead of the current empty setting). What do you think?

jsmeix commented at 2018-01-25 14:30:

@didacog
this issue is perhaps of interest for you
because you implemented recently something
related to serial console setup in the recovery system
https://github.com/rear/rear/pull/1649

@gdha
perhaps this issue is also of interest for you because of your
https://github.com/rear/rear/issues/1644

jsmeix commented at 2018-01-25 14:35:

@OliverO2
in general regarding debugging issues with the start up scripts
that are run initially in the ReaR recovery system have a look at
https://github.com/rear/rear/issues/1703#issuecomment-360398508

didacog commented at 2018-01-26 09:00:

@OliverO2 what OS versions are you using?
We tested this with Centos/RHEL 7 and Debian/Ubuntu (systemd) booting rescue with and without serial devices attached and/or in use, also with Physical and Virtual systems, and booted as expected after the last changes were applied.

Regards,
Didac

jsmeix commented at 2018-01-26 13:32:

@OliverO2
in general we cannot "just change" a ReaR default behaviour
because that would be perceived as a regression by this or that users
who use ReaR under the assumption that the default behaviour
is a reliable behaviour (i.e. the old problem when some default
of whatever software is actually required by some users).

In particular if we set USE_SERIAL_CONSOLE='n' in default.conf
it would be perceived as a regression by those users who currently use ReaR
under the assumption that ReaR "just reliably supports" their serial console.

Ideally we could avoid this issue here by making the scripts
and the systemd unit files that set up serial console in the
recovery system behave more robust against possible failure.

Simply put:
When only serial console setup fails in the recovery system
this is no reason to let the whole recovery system setup fail.

OliverO2 commented at 2018-01-29 21:09:

@didacog I tested on Ubuntu 16.04.3 LTS server. The issues came up when restoring a rescue system built on a production bare metal server (Supermicro X10SLH-F mainboard with IPMI support). I've tried but was unable to reproduce those on a minimal Ubuntu server running as a VirtualBox guest.

@jsmeix I am aware that addressing this would be a backward-incompatible change. Failing gracefully on startup would be an alternative but might not be that easy to implement. Anyway, if this feature is kept as is, hopefully users could at least consult this issue when they experience seemingly hanging or incomplete boots of their rescue system.

gdha commented at 2018-11-04 09:54:

@OliverO2 Do you experience this also with a recent Ubuntu version?

OliverO2 commented at 2018-11-04 15:35:

@gdha I haven't had the chance to test this as the (only) affected system over here has not been migrated to a newer Ubuntu version yet.

gdha commented at 2019-03-22 08:56:

@OliverO2 any idea if this issue is still relevant to keep it open?

OliverO2 commented at 2019-03-23 16:25:

@gdha I'll probably try this one again in April on Ubuntu 18.04. If I get results, I'll post them here. If you wish, you can as well close it for now, since it seems to depend on rare circumstances, so the issue is mostly for documentation.

OliverO2 commented at 2019-04-30 14:51:

Last day of April, so here we go:

I have tested this again with a current rear snapshot (f3157906) on Ubuntu 18.04.2 LTS:

  • I have used the same bare metal server to create the rescue system on which the problem initially appeared.
  • I have started the rescue system on
    • a bare metal restoration system as well as
    • VirtualBox VMs with and without hardware serial ports configured.
  • Migration mode was always active with network interface questions being asked during startup.
  • USE_SERIAL_CONSOLE was always left at its default setting.

The problem did not reappear in any scenario.

So maybe it would be worth re-trying the other case (#2102) with ReaR 2.5 once that version is released. If that one would be also resoved, the issue has probably been fixed in the mean time.

geksi commented at 2019-05-02 12:21:

This issue appear only with virsh console on kvm. Everytime in migration mode.

OliverO2 commented at 2019-05-02 12:24:

In #2102 you have stated the ReaR version to be 2.4.3. Have you ever tried a current Git snapshot of ReaR?


[Export of Github issue for rear/rear.]