#2695 Issue closed: RFC: No longer hardcoded enforced Error() exit

Labels: enhancement, discuss / RFC, severe improvement

jsmeix opened issue at 2021-10-12 11:11:

Currently Error() exits the whole running rear program with all its child processes.

This is right in particular during "rear mkrescue/mkbackup",
cf. "better abort than to blindly proceed" in the section
"Try hard to care about possible errors" in
https://github.com/rear/rear/wiki/Coding-Style

But during "rear recover" things could be different, cf.
https://github.com/rear/rear/pull/2691#discussion_r726997737

Currently we do not Error() exit the whole ReaR in case of errors
after the backup was restored (e.g. errors during bootloader installation), cf.
https://github.com/rear/rear/blob/master/usr/share/rear/finalize/Linux-i386/660_install_grub2.sh

# This script does not error out because at this late state of "rear recover"
# (i.e. after the backup was restored) I <jsmeix@suse.de> consider it too hard
# to abort "rear recover" when it failed to install GRUB2 because in this case
# the user gets an explicit WARNING via finalize/default/890_finish_checks.sh
# so that after "rear recover" finished he can manually install the bootloader
# as appropriate for his particular system.

Because restoring the backup could be rather time consuming task,
Error() exit the whole ReaR in case of errors after the backup was restored
makes all what was recreated and restored up to that point useless
and the user is forced to redo all that from the very beginning
provided he had a good idea how to avoid that Error() exit,
otherwise he would be lost in a "retry and Error() exit" loop.

I think it could be basically always wrong to
exit the whole ReaR during "rear recover" because
that results basically always bad UX for the user.
I think exit the whole ReaR during "rear recover" is basically
never helpful for the user - what should he do in that case?
In contrast clear ERROR messages during "rear recover"
with the option to proceed (cf. "Final power to the user!")
may help the user to shomehow finish "rear recover"
(even with some things that are not recreated right).
Then he may fix the issues that lead to the error messages
or deliberately ignore them, reboot, and hope for the best.
When errors are actually fatal deliberately ignoring them
would let the recovery fatally fail but for the user
that is not as bad as being kicked out from recovery
in any case by a hardcoded enforced Error() exit.

So I suggest to enhance the Error() function
to no longer exit the whole ReaR during "rear recover"
but only show the ERROR message and a user dialog
whether to abort or proceed "bona fide" regardless of the error.

pcahyna commented at 2021-10-12 17:11:

I have two remarks:

Because restoring the backup could be rather time consuming task,
Error() exit the whole ReaR in case of errors after the backup was restored
makes all what was recreated and restored up to that point useless

note that while the issue seems to be inspired by the behavior discussed in #2691 (aborting the whole program from inside the generated layout script), the reasoning above does not apply to it, because the layout script runs before restoring the backup, so no time-consuming task has been performed yet.

So I suggest to enhance the Error() function
to no longer exit the whole ReaR during "rear recover"
but only show the ERROR message and a user dialog
whether to abort or proceed "bona fide" regardless of the error.

I think the reasoning does not necessarily apply to the BugError() calls, because they are essentially asserts for conditions that should never happen and if they are encountered, it is serious. This would require separating BugError() from Error(), though.

jsmeix commented at 2021-10-15 11:28:

@pcahyna
I do not see why BugError is actually different from Error.
Both are just errors.
How severe an error is does not depend on
if it is a BugError or an Error.

BugError only additionally indicates the root cause is a bug in ReaR.
Error does not do that.
Its root cause could still be a bug in ReaR (e.g. in some script before)
or a bug in whatever program that is called by ReaR
or a normal error where the user is responsible.

Think about BugError in some non-essential functionality
(non-essential for the user's specific use case)
versus Error during essential functionality.

E.g. think about a BugError when recreating a swap partition
versus an Error when recreating a '/home' partition
versus an Error when recreating the root '/' partition.

A recreated system without swap or /home partition
could be acceptable for the user (better a somewhat running system
without swap or with /home in the root partition than no system at all
and later when there is time for it the user can fix things).

What I have in mind is the following use case or "user story":

When the user runs "rear recover" for an actual real disaster recovery
then all is already lost for his original system he tries to recreate.

His original system is lost.
What could go more wrong for him when he can ignore errors
during "rear recover" and proceed "bona fide"?
His original system won't get any bit more lost
when "rear recover" terribly fails after he ignored errors.
His replacement hardware will be (re-)installed from scratch.
Whatever there was on his replacement hardware is basically lost
as soon as "rear recover" did run diskrestore.sh at least partially.
His replacement hardware shoudn't get damaged
when "rear recover" terribly fails fails after he ignored errors.

What does an enforced error exit during "rear recover"
(enforced by us on the user because we hardcoded the exit "for him")
actually help the user to somehow get his lost system back
(perhaps somewhat incomplete but at least partially usable)
compared to only show the ERROR message and a user dialog
whether to abort or proceed "bona fide" regardless of the error
so we give the user final power to decide what he wants to do
in his particular case in his specific error situation?

jsmeix commented at 2021-10-15 11:35:

Ha!
Simple solution:

Have a new config variable that specifies
if Error (and BugError) should exit
or Error (and BugError) should
only show the ERROR message and a user dialog
whether to abort or proceed "bona fide" regardless of the error.

If this config variable is unset we could have any default we like
e.g. error exit during "rear mkrescue/mkbackup/..."
but error exit dialog during "rear recover" or anything else as we like.

For example when this config variable is empty or unset
we could during "rear recover" error exit until the backup gets restored
and switch to "user dialog abort/proceed" before the backup gets restored
or after the backup was restored as we think what is best by default.
Or this config variable could be an array that could contain those
workflow names and/or stage names where "user dialog abort/proceed"
should happen.

The crucial point is that such a config variable or array
provides final power to the user.

jsmeix commented at 2021-10-22 07:33:

As always there are exceptions:
There is at least one exception where the user must never ever
ignore errors during "rear recover" and proceed "bona fide":
https://github.com/rear/rear/pull/2626
"Stop ReaR from overwriting its own disk and backup drives"

The solution to that is to use the above mentioned
config variable with ternary semantics like:

# Always let Error() exit the whole running rear program:
ERROR_WITHOUT_EXIT=( no )
# Never let Error() exit the whole running rear program:
ERROR_WITHOUT_EXIT=( yes )
# Do not let Error() exit the whole running rear program during
# the whole recover workflow,
# the restore and wrapup stages (used in the restoreonly workflow),
# the backup stage (used in the mkbackup and mkbackuponly workflows):
ERROR_WITHOUT_EXIT=( recover restore wrapup backup )

and set ERROR_WITHOUT_EXIT=( no )
before Error() is called to
"Stop ReaR from overwriting its own disk and backup drives"

github-actions commented at 2021-12-22 02:22:

Stale issue message

github-actions commented at 2022-02-21 02:09:

Stale issue message

jsmeix commented at 2022-02-28 08:38:

Not for the next ReaR version
but perhaps after the next ReaR version was released.

github-actions commented at 2022-04-30 03:06:

Stale issue message

github-actions commented at 2022-07-02 03:30:

Stale issue message


[Export of Github issue for rear/rear.]