#3212 PR merged: default.conf example for COPY_AS_IS_EXCLUDE_TSM

Labels: enhancement, documentation, fixed / solved / done

jsmeix opened issue at 2024-04-30 14:02:

In default.conf explain how one could reduce
the size of the ReaR recovery system initrd
in particular on POWER architecture
via COPY_AS_IS_TSM and alternatively
via COPY_AS_IS_EXCLUDE_TSM and additionally
via MODULES=( 'loaded_modules' ),
see https://github.com/rear/rear/issues/3189

schlomo commented at 2024-05-02 12:06:

Thanks for the info about IBM Spectrum Protect, I didn't know that.

If there is a general size limit for the initrd on POWER systems then I would expect a check for that, aborting if the initrd is too large (since we know that it won't boot).

We might also add code to automatically enable some space reducing features, like loading less modules and no extra firmware files.

Finally, if we could update the TSM configuration in ReaR to reflect modern TSM agents, that would be of course really great. Maybe @schabrolles can add some knowledge to this?

Otherwise of course the change in comments is fine, although personally I find it a little bit misplaced under the TSM section. Maybe better add a file to the docs directory explaining about ReaR on POWER?

jsmeix commented at 2024-05-03 07:45:

I cannot properly implement a check that errors out
because I am not at all a sufficient POWER expert.

There are several different kind of POWER systems
and several different ways how to boot them.
I assume that vague 128 - 4 = 124 MiB limit
does not apply to all POWER boot methods
so we cannot eror out in any case on POWER
when ReaR's initrd is e.g. bigger than 123 MiB.

Regarding why the initrd size limit is vague:
A colleague told me:

The absolute limit you can push (in our case)
is 128 MB = 131072 KB (in binary).
But in fact IBM always adds some prep boot
(service data, their own "magic") data to that block.
And we were able to fill only 123 MB = 125952 KB.
And that is bad from IBM side as they are one
of the developers of the IEC standard
(because of the punch cards), but you will
never find it black on white on their site.

I.e. it is unknown how big that additional data is
so it is unknown how much less than 128 MiB is left
for the initrd.
Therein note the "in our case":
This means the case where things were tested at SUSE.
By trial and error (via artificially increasing the
initrd) colleagues as SUSE found that 123 MiB limit
on the POWER system where they tested things.
That POWER system reported errors during boot
when the initrd was too big in contrast to
https://github.com/rear/rear/issues/3189
where
https://github.com/rear/rear/files/14837101/boot.log
shows nothing - just a kernel panic at the end that
tells noting about its root cause.

What I may implement is a LogPrint() or LogPrintError()
user information when the inird is bigger than 120 MiB
that tells about the vague 128 - 4 = 124 MiB limit
so the user is at least informed that booting may fail
(depending on his specific case).

Same basic reasoning (i.e. "it depends")
for including a reduced/minimal TSM client:
When the initrd size does not matter in some cases
then the full TSM client should normally be included
to have all TSM client functionality available
during ReaR recovery system runtime.

On POWER by default no firmware files are included, cf.
https://github.com/rear/rear/issues/3189#issuecomment-2076960341
which I now also mention here via my recent
https://github.com/rear/rear/pull/3212/commits/10436d8f0102709291c2c94b1f73bd90d9b6b2cc

Whether or not MODULES=( 'loaded_modules' ) works
depends on whether or not the replacement hardware or VM
is fully compatible with the original system.
I don't know if on POWER the replacement VM
is always fully compatible with the original VM?
As far as I know on POWER there is always a VM
but there are different kind of VMs on POWER.

I also find it somewhat misplaced under the TSM section
but that is all what I can do for now with my rather limited
available time - in particular I will first and foremost
continue with https://github.com/rear/rear/issues/3189
to help our customer - "customers first" ;-)
Now there are multipath issues - again an area where
I have zero personal experience with - so I need to
somehow find my way through that stuff - hopefully as in
https://github.com/rear/rear/issues/3189#issuecomment-2051683257
where my vague intuitiveness revealed the root cause...

jsmeix commented at 2024-05-03 11:49:

I will merge it now.
I ignore the two current CI errors

testing-farm:fedora-40-x86_64 — Error
testing-farm:fedora-rawhide-x86_64 — Error

that both show at
https://artifacts.dev.testing-farm.io/59b815ee-d4a4-46b4-a0db-fe3070972188/
and
https://artifacts.dev.testing-farm.io/1c4a3dd1-8744-4ffb-b8f4-3ee0db996581/
(excerpts)

fail: Failed to pull workdir from the guest.
 This usually means that login as 'root' to the guest does not work.
fail: Command 'yum install -y rsync' returned 255.
...
ssh: connect to host [IP address] port 22: Connection timed out

because the changes here are only comments in default.conf
so the changes here cannot be a cause of those CI failures.

jsmeix commented at 2024-05-03 12:00:

@gdha @schlomo
thank you for review and comments!


[Export of Github issue for rear/rear.]