#3233 PR merged: On POWER tell the user when the initrd is big

Labels: enhancement, fixed / solved / done

jsmeix opened issue at 2024-05-23 13:19:

  • Type: Enhancement

  • Impact: High
    High impact when needed to avoid endless searching
    for the root cause when booting fails in inexplicable ways
    on POWER when the initrd is too big.

  • Reference to related issue (URL):
    https://github.com/rear/rear/issues/3189

  • How was this pull request tested?

To test on my x86_64 system I used additionally

if test "$ARCH" = "Linux-i386" || ...

plus BACKUP=TSM and got:

# usr/sbin/rear -D mkrescue
...
Running 'pack' stage ======================
Creating recovery/rescue system initramfs/initrd initrd.cgz with gzip default compression
Created initrd.cgz with gzip default compression (137288661 bytes) in 9 seconds
On POWER architecture booting may fail when the initrd is too big (initrd size is 130 MiB)
Consider better (but slower) initrd compression with REAR_INITRD_COMPRESSION='lzma'
Consider using COPY_AS_IS_EXCLUDE_TSM to get a slim TSM client in the initrd
Running 'output' stage ======================
...
Saving result files with TSM
ERROR: Could not save result files with dsmc

The error is because I do not have TSM installed.

  • Description of the changes in this pull request:

In pack/GNU/Linux/900_create_initramfs.sh
tell the user on POWER architecture
when the initrd is bigger than 100 MiB
that this may cause a boot failure

pcahyna commented at 2024-05-23 13:32:

@jsmeix do you know what is the panic message (or in general relevant kernel messages) when this happens? In the whole #3189 I can't find it.

jsmeix commented at 2024-05-24 05:25:

@pcahyna
a whole booting log on POWER including kernel messages
up to the point where the kernel panics with

Kernel panic - not syncing: VFS: Unable to mount root fs ...

is available in issue 3189 as its 'boot.log' attachment
https://github.com/rear/rear/files/14837101/boot.log
that is listed in the initial description of issue 3189
https://github.com/rear/rear/issues/3189#issue-2220504820
(cf. my comment in the code of this pull request)

jsmeix commented at 2024-05-24 06:38:

With recent changes my test shows now

# usr/sbin/rear -D mkrescue
...
Running 'pack' stage ======================
Creating recovery/rescue system initramfs/initrd initrd.cgz with gzip default compression
Created initrd.cgz with gzip default compression (130 MiB) in 9 seconds
On POWER architecture booting may fail when the initrd is too big (about 120 MiB or even less)
Consider better (but slower) initrd compression with REAR_INITRD_COMPRESSION='lzma'
Consider MODULES=( 'loaded_modules' ) to have less kernel modules in the initrd
Consider using COPY_AS_IS_EXCLUDE_TSM to get a slim TSM client in the initrd
Running 'output' stage ======================
...
ERROR: Could not save result files with dsmc

jsmeix commented at 2024-05-24 07:27:

With latest changes my test shows now

# usr/sbin/rear -D mkrescue
...
Running 'pack' stage ======================
Creating recovery/rescue system initramfs/initrd initrd.cgz with gzip default compression
Created initrd.cgz with gzip default compression (482 MiB) in 21 seconds
On POWER architecture booting may fail when the initrd is big (about 120 MiB or even less)
Verify that your ReaR recovery system boots on your replacement hardware.
If it fails to boot consider the following:
REAR_INITRD_COMPRESSION='lzma' for better (but slower) initrd compression
MODULES=('loaded_modules') to include less kernel modules in the initrd
FIRMWARE_FILES=('no') to excludes all firmware files from the initrd
COPY_AS_IS_EXCLUDE_TSM to get a slim TSM client in the initrd
Running 'output' stage ======================
...
ERROR: Could not save result files with dsmc

jsmeix commented at 2024-05-27 08:42:

I could only test on x86_64 architecture
i.e. on a VM on my homeoffice workstation.
There I added

if test "$ARCH" = "Linux-i386" || ...

see my above test reports.
I cannot test on POWER machine - at least not
with efforts that are reasonable in practice for me.
But perhaps @pcahyna could test on a POWER machine?
Perhaps he has direct access to a POWER machine?

schlomo commented at 2024-05-27 16:19:

@jsmeix regardless of the testing I'd appreciate it if we could move the arch-dependent code into separate scripts, if anyhow possible.

OTOH, maybe there is a general limit on the initramfs size, as this blog article suggests. In that case I'd be happy to see a generic implementation of initramfs size checking with a default setting to 50% of RAM, that would be overridden for PPC or ARM to something suitable smaller.

jsmeix commented at 2024-05-29 12:30:

@schlomo @pcahyna
I'm afraid, currently I don't have time for it
because of maintenance updates for SLES,
a.k.a. "customers first" ;-)

jsmeix commented at 2024-06-03 13:27:

I have no good idea how to properly separate
arch-dependent code away into separate scripts
and still keep the code easy to understand, cf.
https://github.com/rear/rear/wiki/Coding-Style

My reasoning:

As far as I understand how separeated
arch-dependent script directories work
the execution logic is basically like

if ARCH = specific_arch
   then source specific_arch-dependent.script
endif

so the code in specific_arch-dependent.script
is run only for one specific architecture.

But what is needed in particular in this case here is

Nested Conditions:

if OTHER_CONDITION
   then if ARCH = specific_arch
           then specific_arch-dependent code
           else generic code
        endif
        other condition code
endif

as in (shortened code excerpts)

case "$REAR_INITRD_COMPRESSION" in
    (lz4)
        if [[ "$ARCH" == "Linux-s390" ]] ; then
            REAR_INITRD_FILENAME=$VM_UID".initrd"
        else
            REAR_INITRD_FILENAME="initrd.lz4"
        fi
        lz4 related code
        ;;
    (lzma)
        if [[ "$ARCH" == "Linux-s390" ]] ; then
            REAR_INITRD_FILENAME=$VM_UID".initrd"
        else
            REAR_INITRD_FILENAME="initrd.xz"
        fi
        xz related code
        ;;

AND

Multiple Archs:

if ARCH = this_arch OR ARCH = that_arch
   then this_or_that_arch-dependent code
endif

as in

if test "$ARCH" = "Linux-ppc64" || test "$ARCH" = "Linux-ppc64le" ; then

To implement "Nested Conditions"
via arch-dependent separated scripts
I would have to duplicate the OTHER_CONDITION test
in each arch-dependent separated script.

To implement "Multiple Archs"
via arch-dependent separated scripts
I would have to implement as many separated scripts
as there are archs or at least use multiple symlinks.

Separating arch-dependent code is against
to "keep together what belongs together" and
it is against "implement one thing only once"
and both would make the code as a whole
harder to read and even harder to understand
because the meaning of the code as a whole
was artificially split up into parts
which makes it needlessly hard to reassemble
its overall meaning from its separated parts.

Additionally there is a generic issue with
"Multiple Conditions"
for example like

if ( CONDITION1 AND NOT CONDITION2 ) OR CONDITION3

when some of those conditions should be separated
out into condition-dependant scripts.

In general I am wondering about:
Why separating certain kind of conditioned code away
into certain-condition-dependant scripts at all?
What is the real advantage of it (for us and our users)?
We have the same with Linux-distribution-dependent code.
Why separating Linux-distribution-dependent code away?
What is the crucial difference that makes an arch condition
or a Linux-distribution condition different compared
to any other condition?
How to properly deal with "Nested Conditions"
and "Multiple Conditions"
via certain-condition-dependent scripts?

gdha commented at 2024-06-11 08:36:

@jsmeix @pcahyna Can we change the milestone to "3.1"? Or, is this a blocker for "3.0"?

jsmeix commented at 2024-06-11 11:25:

The curent implementation works
as far as it is implemented
so it could be merged as is.

What is questioned is if the curent implementation
is the best way how to implement it and
if a more generic test could be implemented
but that is not something I could do
(no time for more advanced things).

gdha commented at 2024-06-11 11:33:

The curent implementation works as far as it is implemented so it could be merged as is.

@jsmeix Then I would merge it as is.

What is questioned is if the current implementation is the best way how to implement it and if a more generic test could be implemented but that is not something I could do (no time for more advanced things).

We could make an additional issue for improvements if required.

jsmeix commented at 2024-06-11 11:44:

@gdha @pcahyna
yes - if I had an approval ... ;-)

jsmeix commented at 2024-06-11 11:56:

@schlomo
thank you for your approval!

schlomo commented at 2024-06-11 12:20:

@jsmeix typically we would have to further modularize the code so that we break it up into smaller pieces, and then some of these smaller pieces go into the arch-dependent subfolders


[Export of Github issue for rear/rear.]