#2608 PR merged: Fix setting boot path in case of UEFI partition (ESP) on MD RAID

Labels: enhancement, bug, cleanup, fixed / solved / done

pcahyna opened issue at 2021-04-28 18:09:

Pull Request Details:
  • Type: Bug Fix

  • Impact: High

  • Reference to related issue (URL): #2595

  • How was this pull request tested?
    Backup & restore on a RHEL 8 host with UEFI but without RAID and on a CentOS 7 host with UEFI and RAID (ESP on RAID). Examination of logs to check that correct efibootmgr commands were executed.

  • Brief description of the changes in this pull request:

Use dependencies when determining partitions for UEFI

The ESP may be located on a RAID device. In this case, we need to determine the physical RAID components and call efibootmgr on them (after restoring - during the finalize step), because efibootmgr does not know how to tell the firmware to boot from RAID. Moreover, the current code tries to parse a device like /dev/md125 into a disk called /dev/md and partition 125.

Fix by using storage dependencies: RAID components (the actual ESP copies) can be found as dependencies of the ESP block device. For consistency, use dependencies also to locate the ESP block device itself (the old method is preserved as a fallback). Similar strategy is used by the GRUB2 installation code (finalize/Linux-i386/660_install_grub2.sh), which is able to install GRUB2 on the RAID components correctly.

In addition, clean up usr/share/rear/finalize/Linux-i386/670_run_efibootmgr.sh a bit and add more logging to ease analysis when something goes wrong in this area.

TODO: finalize/Linux-i386/610_EFISTUB_run_efibootmgr.sh needs a similar fix.

Fixes #2595

pcahyna commented at 2021-04-29 18:58:

@gdha hello, should it have the enhancement label, when it Fixes #2595, which has the bug label?

pcahyna commented at 2021-04-29 18:59:

@rmetrich hello, please review, as you reported the problem.

rmetrich commented at 2021-04-30 10:51:

Hi @pcahyna I'm all good with your code, thanks a lot for the effort.

jsmeix commented at 2021-04-30 11:25:

I agree that ReaR is expected to restore a software RAID also on UEFI systems
so the issue is a missing expected functionality a.k.a. "bug".

On the other hand there are zillions of other "reasonably expected" functionalitries
that are currently missing in ReaR so such issues are also enhancements
in particular when "reasonably expected" code for that is just not there.

For me a plain "bug" is only when code that is there does not work as intended.

I.e. it is also not a plain "bug" when code that is there does work as intended
but the current intention is not what others "reasonably expect" so such code
is "buggy" and needs some enhancement to behave as "reasonably expected".

jsmeix commented at 2021-04-30 11:29:

@pcahyna
thank you so much for your additional cleanup work like your recent
https://github.com/rear/rear/pull/2608/commits/8228246d16b1374be5cc25fb7a1244c7102b804a

I do very much appreciate it!
It helps so much to get ReaR's code step by step into a better state
where others can easily understand WHY and WHAT and HOW things are done in ReaR
so that others can easily adapt and enhance ReaR in the future.

pcahyna commented at 2021-04-30 12:37:

@jsmeix, glad you like it! You know,
https://despair.com/products/quality

I can also replace the non-standard case identifiers like Dev and ParNr by something that conforms to the style guide, if you think that it makes sense to do it in the same PR.

jsmeix commented at 2021-04-30 13:10:

@pcahyna
yes, feel free to mercilessly replace old non-standard identifiers with better names
in particular if needed with more explanatory names
cf. "Code must be easy to read" in
https://github.com/rear/rear/wiki/Coding-Style

I appreciate it!

jsmeix commented at 2021-04-30 13:31:

I wish you all a relaxed and recovering weekend!

jsmeix commented at 2021-04-30 13:40:

Some "famous words" before weekend regarding
https://despair.com/products/quality

The race for quality has no finish line- so technically,
it's more like a death march.

Yes, we all will be dead at some time.

But if we made code as intended by
https://github.com/rear/rear/wiki/Coding-Style
our code can evolve (it may not evolve but it can evolve)
so from our code future code can be derived (this may not happen but it can happen)
in contrast to code that (amost) nobody understands which will be completely removed
and replaced by other completely new code from scratch
so code that is hard to understand dies out
while code that is easy to understand can have descendants.

jsmeix commented at 2021-05-03 12:48:

It seems there is a dilemma where in the code
the Creating EFI Boot Manager entries... info
should be done.

Current finalize/Linux-i386/670_run_efibootmgr.sh code in this pull request is (excerpts)

LogPrint "Creating EFI Boot Manager entries..."
...
if [ "$esp_mountpoint" = "/" ] ; then
    esp_mountpoint="$TARGET_FS_ROOT/boot/efi"
    Log "Mountpoint of $TARGET_FS_ROOT/$UEFI_BOOTLOADER not found, trying $esp_mountpoint"
fi
...
test -d "$esp_mountpoint" || return 0

so it could tell the user Creating EFI Boot Manager entries...
but then it exits silently and does nothing.

I think when test -d "$esp_mountpoint" fails is must no longer exit silently
so a matching message should be added like

if ! test -d "$esp_mountpoint" ; then
    LogPrint "Skip creating EFI Boot Manager entries (no ESP mountpoint found)"
fi

Or my
https://github.com/rear/rear/pull/2608/files#r622957244
was wrong and nothing should be shown to the user about
Mountpoint of /mnt/local/UEFI_BOOTLOADER not found, trying /mnt/local/boot/efi
and the LogPrint "Creating EFI Boot Manager entries..."
should be moved after the test -d "$esp_mountpoint" || return 0.

pcahyna commented at 2021-05-03 13:35:

I think when test -d "$esp_mountpoint" fails is must no longer exit silently

This part caught my attention as well. Is it even wise to return 0 in this case? At this point we know that a UEFI bootloader should be used, so skipping the rest of the file is almost certainly a serious error.
(Not so much related to the subject of the PR anymore though. For now, I can add the other LogPrint as you suggest.)

pcahyna commented at 2021-05-03 14:07:

For now, I can add the other LogPrint as you suggest

Done (I changed the message a bit.)

jsmeix commented at 2021-05-03 15:07:

@pcahyna
regarding your https://github.com/rear/rear/pull/2608#issuecomment-831263524

At this point we know that a UEFI bootloader should be used,
so skipping the rest of the file is almost certainly a serious error.

Then show it as an error to the user so he is informed something went wrong
(e.g. use wording like failed to ... or cannot ... that tells something went wrong)
and the user knows he may have to fix something manually after "rear recover"
and leave the script with return 1 (the Source function doesn't abort)
but do not error out of "rear recover", cf. finalize/Linux-i386/660_install_grub2.sh
https://github.com/rear/rear/blob/master/usr/share/rear/finalize/Linux-i386/660_install_grub2.sh

pcahyna commented at 2021-05-03 15:26:

Then show it as an error to the user so he is informed something went wrong
(e.g. use wording like failed to ... or cannot ... that tells something went wrong)
and the user knows he may have to fix something manually after "rear recover"
and leave the script with return 1

I can do that, but I feel this deviates a lot from the subject of this PR (the questionable return 0 is inherited from the original code).

pcahyna commented at 2021-05-03 20:57:

@jsmeix I believe that I addressed all your comments except for the return 0 in some places where it had existed before my changes. Please review again. If everything is ok, I will force push to squash some commits together.

pcahyna commented at 2021-05-03 20:59:

@pcahyna
yes, feel free to mercilessly replace old non-standard identifiers with better names
in particular if needed with more explanatory names
cf. "Code must be easy to read" in
https://github.com/rear/rear/wiki/Coding-Style

I appreciate it!

One of the last commits does that now.

jsmeix commented at 2021-05-04 12:01:

@pcahyna
thank you for all your work here - in particular for your additional cleanup work!

jsmeix commented at 2021-05-04 12:02:

@rear/contributors
if there are no objections I would like to merge it tomorrow afternoon.

jsmeix commented at 2021-05-04 12:15:

@pcahyna
the return 0 when there is no ESP mountpoint directory $esp_mountpoint is from me via
https://github.com/rear/rear/commit/cb6fe3be719c1361b22d0484cd940c681752a973

I am sure I could not think about if return 1 could be better for this particular case
because at that time I did not know your reasoning why return 1 is right here in your
https://github.com/rear/rear/pull/2608#issuecomment-831263524

pcahyna commented at 2021-05-04 12:51:

@jsmeix are there any guidelines for return codes from scripts? ReaR does not seem to care that much, actually. There is just Debug "Source function: 'source $source_file' returns $source_return_code" in the Source() function. So it looks like returning 1 when problems are discovered is not enough to get them noticed by the user and one should complain very loudly (LogPrintError, probably).

jsmeix commented at 2021-05-04 13:09:

@pcahyna
yes, what you describe in your
https://github.com/rear/rear/pull/2608#issuecomment-831916709
is exactly the current state in ReaR's code.

See also
https://github.com/rear/rear/pull/1970

So currently there is no difference in user experience between
leaving a script with return 0 versus return 1.

But for those who read the code of the script it makes clear
if things were expected and all is well with leaving the script at that place
versus an error-like issue happened why the script cannot be continued
(which is not a fatal error to Error out of the whole rear <workflow>).

It would take some longer time until all scripts are fixed that currently
exit with non-zero exit code by accident when the last command
in the script exits with non-zero exit code which is the reason why
there is no user alert when a script exits with non-zero exit code.

Currently it is intentionally only a Debug message because
some users inspect the ReaR log and raise issues here
for all what looks like an error to them ;-)

See https://github.com/rear/rear/issues/2416
and as an example the therein mentioned
https://github.com/rear/rear/issues/2473
and
https://github.com/rear/rear/issues/2479

pcahyna commented at 2021-05-04 16:31:

@jsmeix please check if you approve the latest change, esp. the wording of the error messages. If yes, I consider this PR functionally complete and I will rebase and force-push the branch to squash related commits together.

jsmeix commented at 2021-05-05 07:18:

@pcahyna
thank you for your final adaptions and enhancements.
Even if those kind of smaller things might look unimportant or nitpicking
I appreciate them so much because at least for me such smaller things are important
because they can make a bigger difference how others can understand what goes on.

I dared to do some further tiny adaptions and enhancements regarding such smaller things
via my recent commits here - I hope that is OK for you @pcahyna

Now all looks very well to me
and I would like to merge it today afternoon
unless there are objections from other @rear/contributors

jsmeix commented at 2021-05-05 07:35:

@pcahyna
no need to rebase and force-push the branch to squash related commits together.
Just keep the individual development commits as is.

The merge commit will automatically be the one that has all of them in one commit.

I even prefer when all the individual development commits are kept as is.
It preserves the development history in the master branch
i.e. I perfer to not remove information unless there is a serious reason to do so.

In my local git clone https://github.com/rear/rear.git I often use

git log --format="%ae %H %ad%n%s :%n%b%n" --graph | fmt -w 120 -u -t | less

to show the whole git history in the master branch and there
all the individual development commits often help me
to better understand in retrospect why things were changed
without the need to dig out the GitHub issue and pull request
and read through all the details there.

Having all the individual development commits in the master branch history
is often even crucial to understand in retrospect what was actually changed
because the merge commit message is often only rather meaningless, e.g.:

* | | | gratien.dhaese@gmail.com 285c0c7660caca184ea53dbec5525a69af1fdcc9 Tue Jan 26 14:44:39 2021 +0100
|\ \ \ \ Merge pull request #2544 from pcahyna/netbackup-vxpbx_exchanged :
| | | | | Changes for NetBackup (NBU) support
| | | | |
| * | | | pcahyna@redhat.com 9a0fcbfd0dcfecdfce5921e24a89e3918d962467 Wed Dec 16 11:08:14 2020 +0100
| | | | | Changes for NetBackup (NBU) support :
| | | | | Add /usr/openv/tmp directory to the NBU skeleton
| | | | |
| | | | |
| * | | | pcahyna@redhat.com 0c6909ed2c37dfa92f87b0714b62af7cae4720c9 Wed Dec 16 10:40:55 2020 +0100
| |/ / / Changes for NetBackup (NBU) support :
| | | | Copy NetBackup PBX related files to the rescue system and start
| | | | vxpbx_exchanged on boot

The merge commit message Changes for NetBackup (NBU) support alone
tells nothing what was changed.

pcahyna commented at 2021-05-05 08:35:

@jsmeix

your newest commits are ok. But I don't entirely agree with

no need to rebase and force-push the branch to squash related commits together.
Just keep the individual development commits as is.

There are commits marked as squash! and fixup! and those were intended to be squashed with others. My intent was to preserve the history, but not all of the commits, there were commits that merely amend earlier commits and the history would IMO look better with them squashed into earlier commits. The squash! and fixup! log messages will look particularly ugly in the history because they were not intended to be part of the final history.

jsmeix commented at 2021-05-05 08:44:

@pcahyna
feel free to squash your commits as you like
provided the commit messages of the commits you finally keep
still show sufficient information so that others can understand in retrospect
what was changed without the need to dig out the GitHub issue
and pull request and read through all the details there.

In particular we need that understanding in retrospect
when we make the doc/rear-release-notes.txt
https://github.com/rear/rear/blob/master/doc/rear-release-notes.txt
cf. "compile the information for the release notes" in
https://github.com/rear/rear/wiki/Release-process

pcahyna commented at 2021-05-05 11:28:

@jsmeix rebase done.

jsmeix commented at 2021-05-05 12:04:

@pcahyna
thank you.

I will merge it soon today.

pcahyna commented at 2021-05-05 19:08:

Oops. I accidentally made the file executable in 27de4cd95dabe86f531f132108744459b2dbcfd0 @jsmeix can you please revert the mode change, or should I submit a new PR reverting it?

jsmeix commented at 2021-05-07 09:44:

@pcahyna regarding executable finalize/Linux-i386/670_run_efibootmgr.sh
you could simply submit it with mode -rw-r--r-- as "by the way" addon
via your next PR without mentioning it in a commit message
(but you can mention it in a separated PR comment).


[Export of Github issue for rear/rear.]