#2186 Issue closed: Alternatively do kexec instead of regular boot to speed up booting

Labels: enhancement, discuss / RFC, no-issue-activity

jsmeix opened issue at 2019-07-16 14:18:

In https://github.com/rear/rear/pull/2142 therein see in particular
https://github.com/rear/rear/pull/2142#issuecomment-495554406
I learned how to boot a new system from within a running system
via kexec basically as follows:

# kexec -l new_kernel --initrd=new_initrd --command-line='root=...'

# kexec -e

This will instantly reboot into the loaded kernel (plus initrd)
without a clean shutdown of the currently running system.

The interesting part here is instantly reboot.

So the advantage of booting via kexec is that
it leaves out the BIOS/UEFI firmware part
which makes booting via kexec much faster.

On the other hand the disadvantage of booting via kexec
is that it leaves out the BIOS/UEFI firmware part
which makes booting via kexec somewhat unreliable
because the hardware is not initialized from the very beginning.
I.e. booting via kexec basically runs a new kernel on an
arbitrary current hardware state as left from the old kernel.

As far as I know it should help when the kernel that does
the kexec is up-to-date because a more up-to-date kernel
should be better prepared against possible kexec failures
when there are still ongoing hardware operations after kexec
(e.g. old DMA operations that still write somewhere in memory)
from the old kernel while the new kernel is already running.
As far as I know current kernels have proper kexec support
for most hardware which means they cancel still ongoing
hardware operations just before the actual kexec is done,
but I am not at all an expert in this area.

So we cannot do kexec instead of regular boot in any case
to always speed up booting but as an alternative for the user
(who must anyway verify that his particular disaster procedure works)
it could be helpful when he can do kexec instead of regular boot.

Cf. the section
"Launching the ReaR recovery system via kexec" at
https://en.opensuse.org/SDB:Disaster_Recovery

Here I also mean to kexec the recreated system after "rear recover".

As far as I know booting on this or that server hardware could take
longer time (up to minutes) while it is in its BIOS/UEFI firmware part.

Assume on such hardware kexec does work reliably
(perhaps kexec works in particular unreliably on those hardware
where booting takes "ages" in its BIOS/UEFI firmware part ;-)
then kexec instead of regular boot would save that time two times:

  1. When booting the ReaR recovery system (provided the replacement hardware is already running a system that supports kexec well).

  2. When booting the recreated system after "rear recover".

This way disaster recovery could be done noticeably faster.

I was told that kexec should work in particular reliably on IBM Z
because there kexec is even part of the regular boot, cf.
https://github.com/rear/rear/issues/2137#issuecomment-490420041

So on SLES12 and SLES15 booting IBM Z basically works this way:

Initially zipl loads a kernel and
that kernel runs GRUB2 and
GRUB2 loads the actual kernel and does a kexec.

but I am not at all an expert in this area.

gozora commented at 2019-07-16 17:30:

I've used kexec boot, while debugging un-bootable OS on HPE Superdome with 16TB of RAM, where one normal reboot took ~1 hour. Since the problem was in initrd, I had to do several initrd rebuilds followed by reboot, to exactly locate the culprit (guess it was systemd-udev).
In this scenario, re-boot with kexec saved me literally hours.
I was later told by vendors HW engineer that it is OK to use kexec for such "debugging" purposes he'd however not recommend to run production on such system just because the state of HW, as already mentioned by @jsmeix.

V.

jsmeix commented at 2019-07-17 08:18:

Wow!

... 16TB of RAM, where one normal reboot took ~1 hour ...

@gozora
thank you so much for that example!

For me this one example is already sufficient reason to implement
a way that the user can deliberately and alternatively do kexec
instead of reboot to boot the recreated system after "rear recover".

My offhanded idea is to enhance our generated reboot script
to support a parameter kexec so that it could be called reboot kexec
or add a separated script called kexecreboot via
build/GNU/Linux/630_simplify_systemd_reboot_halt_poweroff_shutdown.sh
https://github.com/rear/rear/blob/master/usr/share/rear/build/GNU/Linux/630_simplify_systemd_reboot_halt_poweroff_shutdown.sh
where reboot kexec or kexecreboot basically should do
something like

kexec -l /mnt/local/path/to/kernel --initrd=/mnt/local/path/to/initrd --command-line='...'
umount -vfar
echo syncing disks... waiting 3 seconds before kexec
sync
sleep 3
kexec -e

to be reasonably well on the safe side that all disk I/O on the
recreated system disk is completely done before kexec -e.

I asked a colleague: It is perfectly o.k. when arbitrary things happen
between kexec -l and kexec -e. For example the crash kernel
is loaded by kexec -l (somewhere into kernel reserved memory)
during boot and then the system can run for an arbitrary long time
until a crash may happen which triggers kexec -e into the crash kernel.

What I need for such a reboot kexec or kexecreboot is
/mnt/local/path/to/kernel and /mnt/local/path/to/initrd
and the right kernel command line to boot the recreated system.

I think /mnt/local/path/to/kernel should be normally
the same as /mnt/local/$KERNEL_FILE and
the right kernel command line should be normally
the same as /proc/cmdline on the original system during "rear mkrescue"
(KERNEL_CMDLINE is not right because that is for the recovery system)
but /mnt/local/path/to/initrd could be anything
what chroot /mnt/local/ mkinitrd results.

In any case I would first and foremost implement it with
config variables that the user can specify and must specify
if there is no reliably working way to autodetect that values.

jsmeix commented at 2019-07-17 08:26:

FWIW:
I think at least at SUSE when we install a system with YaST
we boot the installed system via kexec and I think we do that
basically the same way as I intend for reboot kexec or kexecreboot:
We shut down the system normally except at the very end where
kexec -e gets called instead of a regular reboot via the
hardware BIOS/UEFI firmware.

jsmeix commented at 2019-07-17 08:48:

My reasoning behind why I think this could be in particular
of interest for disaster recovery of production systems:

Assume your "high noon meals" web sales portal production server
burns out of a sudden at high noon where your hungry customers
use that server most of all.

So - no matter what - you want that server back as fast as possible.

You are not (yet) such a big business that you could afford a
high availability solution via redundantly replicated systems.

But you are sufficiently well prepared and "rear recover"
on your replacement machine is done in less than 5 minutes.

But then you do a regular reboot of your recreated system
and your sales of this day are mostly lost in your vendor's
HW engineer recommendation... ;-)

In contrast if you could do deliberately and alternatively kexec
to fast-boot your recreated system after "rear recover",
you could be back in business in a few minutes.

I think if kexec fails, things won't be really worse because then
you can do the regular boot of your recreated system as fallback.

gozora commented at 2019-07-18 06:59:

Either way, I'd add some kind of warning message that additional full reboot should be done after successful kexec boot.

V.

jsmeix commented at 2019-07-18 09:59:

A first manual test to fast-boot the recreated system after "rear recover"
via kexec seems to have worked well for me:

RESCUE zinfandel-3:~ # rear -D recover
....
Running mkinitrd...
Recreated initrd (/sbin/mkinitrd).
Installing GRUB2 boot loader on PPC64/PPC64LE...
Determining where to install GRUB2 (no GRUB2_INSTALL_DEVICES specified)
Found PPC PReP boot partition /dev/vda1 - installing GRUB2 there
Finished recovering your system. You can explore it under '/mnt/local'.
Exiting rear recover (PID 3419) and its descendant processes ...
Running exit tasks
You should also rm -Rf /tmp/rear.21WgVHysMddS6yX

RESCUE zinfandel-3:~ # grep '/boot/' /boot/grub2/grub.cfg
        linux   /boot/vmlinux-4.12.14-23-default root=UUID=ca5d7bc6-1280-4cf9-9327-a6778f409f61  ${extra_cmdline} sysrq_always_enabled panic=100 ignore_loglevel unknown_nmi_panic console=hvc0 console=ttyS0,57600 splash=silent quiet showopts crashkernel=512M-2G:128M,2G-64G:256M,64G-:512M
        initrd  /boot/initrd-4.12.14-23-default
                linux   /boot/vmlinux-4.12.14-23-default root=UUID=ca5d7bc6-1280-4cf9-9327-a6778f409f61  ${extra_cmdline} sysrq_always_enabled panic=100 ignore_loglevel unknown_nmi_panic console=hvc0 console=ttyS0,57600 splash=silent quiet showopts crashkernel=512M-2G:128M,2G-64G:256M,64G-:512M
                initrd  /boot/initrd-4.12.14-23-default
                linux   /boot/vmlinux-4.12.14-23-default root=UUID=ca5d7bc6-1280-4cf9-9327-a6778f409f61  ${extra_cmdline} 
                initrd  /boot/initrd-4.12.14-23-default

RESCUE zinfandel-3:~ # ls -l /dev/disk/by-uuid/
total 0
lrwxrwxrwx 1 root root 10 Jul 18 11:49 ca5d7bc6-1280-4cf9-9327-a6778f409f61 -> ../../vda5
lrwxrwxrwx 1 root root 10 Jul 18 11:48 d63b327a-4277-42f7-80fd-de3d14c669a9 -> ../../vda3

RESCUE zinfandel-3:~ # kexec -l /mnt/local/boot/vmlinux-4.12.14-23-default --initrd=/mnt/local/boot/initrd-4.12.14-23-default --command-line='root=UUID=ca5d7bc6-1280-4cf9-9327-
a6778f409f61 sysrq_always_enabled panic=100 ignore_loglevel unknown_nmi_panic console=hvc0 console=ttyS0,57600 splash=silent quiet showopts crashkernel=512M-2G:128M,2G-64G:256M
,64G-:512M'
Modified cmdline:root=UUID=ca5d7bc6-1280-4cf9-9327-a6778f409f61 sysrq_always_enabled panic=100 ignore_loglevel unknown_nmi_panic console=hvc0 console=ttyS0,57600 splash=silent quiet showopts crashkernel=512M-2G:128M,2G-64G:256M,64G-:512M

RESCUE zinfandel-3:~ # umount -vfar
/mnt/local/dev           : successfully unmounted
/mnt/local/sys           : ignored
/mnt/local/proc          : ignored
/mnt/local/run           : successfully unmounted
/mnt/local/opt           : successfully unmounted
/mnt/local/.snapshots    : successfully unmounted
/mnt/local/boot/grub2/powerpc-ieee1275: successfully unmounted
/mnt/local/var           : successfully unmounted
/mnt/local/tmp           : successfully unmounted
/mnt/local/root          : successfully unmounted
/mnt/local/srv           : successfully unmounted
/mnt/local/usr/local     : successfully unmounted
/mnt/local/home          : successfully unmounted
/mnt/local               : successfully unmounted
/dev/pts                 : ignored
/sys/fs/cgroup/rdma      : successfully unmounted
/sys/fs/cgroup/cpuset    : successfully unmounted
/sys/fs/cgroup/freezer   : successfully unmounted
/sys/fs/cgroup/hugetlb   : successfully unmounted
/sys/fs/cgroup/blkio     : successfully unmounted
/sys/fs/cgroup/memory    : successfully unmounted
/sys/fs/cgroup/pids      : successfully unmounted
/sys/fs/cgroup/devices   : successfully unmounted
/sys/fs/cgroup/net_cls,net_prio: successfully unmounted
/sys/fs/cgroup/perf_event: successfully unmounted
/sys/fs/cgroup/cpu,cpuacct: successfully unmounted
/sys/fs/pstore           : successfully unmounted
/sys/fs/cgroup/systemd   : successfully unmounted
umount: /sys/fs/cgroup/unified: not mounted.
/sys/fs/cgroup           : successfully unmounted
umount: /run: target is busy.
/dev/pts                 : ignored
/dev/shm                 : successfully unmounted
/sys/kernel/security     : successfully unmounted
/dev                     : successfully unmounted
/proc                    : ignored
/sys                     : ignored
umount: /: not mounted.

RESCUE zinfandel-3:~ # kexec -e

and the recreated system "just boots" and works well
as far as I see for now...

github-actions commented at 2020-06-27 01:33:

Stale issue message


[Export of Github issue for rear/rear.]