#2218 PR merged: Do not keep the build dir when used noninteractively

Labels: cleanup, fixed / solved / done

pcahyna opened issue at 2019-08-22 09:59:

Print helpful messages about keeping/not keeping the build directory in case of errors.

Relax-and-Recover (ReaR) Pull Request Template

Please fill in the following items before submitting a new pull request:

Pull Request Details:
yum -y install rear
echo 'COPY_AS_IS+=( /usr/bin/xterm )' >> /etc/rear/local.conf
yum -y install xterm
rpm  -e --nodeps libXaw.x86_64 # don't do it on a workstation

This makes ReaR complain about broken xterm and abort. To test it interactively:

rear mkrescue

Result:

Broken symlink '/usr/lib/modules/3.10.0-1077.el7.x86_64/build' in recovery system because 'readlink' cannot determine its link target
Broken symlink '/usr/lib/modules/3.10.0-1077.el7.x86_64/source' in recovery system because 'readlink' cannot determine its link target
There are binaries or libraries in the ReaR recovery system that need additional libraries
/bin/xterm requires additional libraries (fatal error)
        libXaw.so.7 => not found
Build area kept for investigation in /tmp/rear.YiON7zQALRvXj6Q, remove it when not needed
ReaR recovery system in '/tmp/rear.YiON7zQALRvXj6Q/rootfs' needs additional libraries, check /root/rear/var/log/rear/rear-ci-vm-10-0-137-193.log for details
ERROR: ReaR recovery system in '/tmp/rear.YiON7zQALRvXj6Q/rootfs' not usable (required libraries are missing)
Some latest log messages since the last called script 990_verify_rootfs.sh:
  stty is /bin/stty
  parted is /bin/parted
  partprobe is /bin/partprobe
  wipefs is /bin/wipefs
  mkfs is /bin/mkfs
  mkfs.xfs is /bin/mkfs.xfs
  xfs_admin is /bin/xfs_admin
  ldconfig is /bin/ldconfig
Aborting due to an error, check /root/rear/var/log/rear/rear-ci-vm-10-0-137-193.log for details
Terminated

To test noninteractively:

crontab -e
#add
#*/5 * * * * /usr/sbin/rear mkrescue

and wait several minutes. Mail from cron will read:

There are binaries or libraries in the ReaR recovery system that need additional
 libraries
Build area /tmp/rear.pHJTicEpfOCMZVf will be removed
To preserve it for investigation set KEEP_BUILD_DIR=1 or run ReaR with -d
/bin/xterm requires additional libraries (fatal error)
        libXaw.so.7 => not found
ERROR: ReaR recovery system in '/tmp/rear.pHJTicEpfOCMZVf/rootfs' not usable
Aborting due to an error, check /var/log/rear/rear-ci-vm-10-0-137-193.log for details
  • Brief description of the changes in this pull request:
    The rear package in Fedora and derived distributions drops a file to /etc/cron.d which runs rear mkrescue.
    When the recovery system is broken (an example is #2144), rear mkrescue will fail and leave the build area behind, without giving a hint that the build area should be removed (because this hit is printed only when running rear with the -v flag).
    This change:
    • does not disable the removal of the build area when used noninteractively - instead it gives a hint how to keep the build area should one really want it
    • when the build area is kept, print a more prominent message, even when run without -v.

pcahyna commented at 2019-08-22 10:11:

The description of KEEP_BUILD_DIR from #2121 should be updated together with this change.

pcahyna commented at 2019-08-28 10:53:

The description of KEEP_BUILD_DIR from #2121 should be updated together with this change.

Done.

pcahyna commented at 2019-08-29 09:45:

according to a review by @xjezda00, I introduced a new variable KEEP_BUILD_DIR_ON_ERRORS which can be used to restore the previous behaviour of keeping the build dir on errors always by setting it to yes.

pcahyna commented at 2019-08-29 10:45:

see https://bugzilla.redhat.com/show_bug.cgi?id=1693608#c4

jsmeix commented at 2019-08-29 11:16:

I am still not in the office and will not be for some more weeks
so I cannot actually work on ReaR.

jsmeix commented at 2019-08-29 11:40:

Offhandedly I think I did it wrong to set KEEP_BUILD_DIR=1 in a script
because this is against what the user specified and I am in general against it
when ReaR works in an automated way against the user because I want
the user to always have the final power.

So if the user wants to keep the build directory for whatever reason
he can run ReaR with the -D option or he can specify KEEP_BUILD_DIR=1
in his etc/rear/local.conf file.

So from my current point of view I think the right, clean, and simple solution is
to remove the KEEP_BUILD_DIR=1 settings from the 990_verify_rootfs.sh script.

Furthermore that I had not added a comment that explains why I thought
it would be right to set KEEP_BUILD_DIR=1 in 990_verify_rootfs.sh
(which is against https://github.com/rear/rear/wiki/Coding-Style)
basically proves that I had done that without a real good reason.

Finally that the current attempts to solve it result more and more
complicated code indicate that the actual root cause is "something else"
(i.e. the actual root cause are my wrong hardcoded KEEP_BUILD_DIR=1
settings in the 990_verify_rootfs.sh script).

pcahyna commented at 2019-08-29 11:53:

Offhandedly I think I did it wrong to set KEEP_BUILD_DIR=1 in a script

I found it strange as well. Just removing it would be definitely a simplification, but it would be less helpful for debugging of errors, as @xjezda00 pointed out in https://bugzilla.redhat.com/show_bug.cgi?id=1693608#c4 . I would recommend keeping the error message part of the PR if this solution is accepted:
https://github.com/rear/rear/pull/2218/files#diff-14e72ffdba2836896118ed562369b421R27

jsmeix commented at 2019-08-29 11:53:

@schlomo
I added you as reviewer here because I would like to get your opinion
about the generic issue that lays behind here:

I had added a hardcoded KEEP_BUILD_DIR=1 in 990_verify_rootfs.sh
when that script detects an error in the ReaR recovery system so that
the user could directy inspect the contents of the recovery system
without the need to re-run "rear mkrescue" with the -D option
or with KEEP_BUILD_DIR=1 explicitly set by the user.

But this way it works in case of an error against what the user had
specified when he runs it without -D or without KEEP_BUILD_DIR=1.

jsmeix commented at 2019-08-29 12:19:

@pcahyna
more clear messages that help the user to better understand what goes on
are much appreciated.

By the way:

Another generic improvement could be to make KEEP_BUILD_DIR a ternary variable
(but that requires changes at all places where KEEP_BUILD_IR is used)
so that the user gets real final power to enforce "yes" or "no" as he wants.

Currently KEEP_BUILD_DIR="no" behaves same as KEEP_BUILD_DIR=1
because of the "non-empty means yes" boolean behaviour in ReaR, cf.
https://github.com/rear/rear/blob/master/usr/share/rear/lib/framework-functions.sh#L124
which is confusing for the user and currently we cannot decide in our code
if the user actually wants "no" when KEEP_BUILD_DIR is empty.
I.e. currently we cannot distinguish between the default value
and an enforced user setting.

pcahyna commented at 2019-08-29 12:23:

@pcahyna
more clear messages that help the user to better understand what goes on
are much appreciated.

By the way:

Another generic improvement could be to make KEEP_BUILD_DIR a ternary variable
(but that requires changes at all places where KEEP_BUILD_IR is used)
so that the user gets real final power to enforce "yes" or "no" as he wants.

Currently KEEP_BUILD_DIR="no" behaves same as KEEP_BUILD_DIR=1
because of the "non-empty means yes" boolean behaviour in ReaR, cf.
https://github.com/rear/rear/blob/master/usr/share/rear/lib/framework-functions.sh#L124
which is confusing for the user and currently we cannot decide in our code
if the user actually wants "no" when KEEP_BUILD_DIR is empty.
I.e. currently we cannot distinguish between the default value
and an enforced user setting.

That was my thinking as well. I wanted to use the empty value as meaning "not set by user", but then I realized that it already means "false". Unfortunately, switching it to ternary means a compatibility break. By the way, your reasoning that "non-empty means yes" is confusing for the user is valid, but applies to most of the variables, because this boolean (not ternary) semantics is very common in ReaR.

jsmeix commented at 2019-08-29 12:37:

@pcahyna
I don't think that switching it to ternary means a compatibility break
as long as "empty" means "no" by default/fallback
(and "yes" if the -D option is used).

But an explicit KEEP_BUILD_DIR="no" would enforce
that BUILD_DIR is never kept.

This way we would be able to set KEEP_BUILD_DIR="yes" in a script
but only if it is not already set to "no".

And then we might even still set it in 990_verify_rootfs.sh as follows
(provided we come to the conclusion it is really a good idea to do that)

if ERROR_CONDITION ; then
    # Keep BUILD_DIR for easier debugging of the error
    # unless the user had explicitly specified he never wants that:
    is_false "$KEEP_BUILD_DIR" || KEEP_BUILD_DIR="yes"
    Error "..."

pcahyna commented at 2019-08-29 14:14:

@jsmeix Indeed, this we do not break compatibility in the usual case, where the user sets KEEP_BUILD_DIR to a string accepted by is_true to indicate yes. (Potential problem is, anything that is not false and is not empty is now treated as true, which would not be compatible with the change. (A particularly silly case is if the user sets KEEP_BUILD_DIR to 0 or "no" to mean "yes".))

Still, your snippet does not address the concern that by default, automated runs of ReaR clutter /tmp on errors. We could do

if ! is_false "$KEEP_BUILD_DIR" && tty -s; then
   KEEP_BUILD_DIR="yes"

but it would be not possible to get the previous behaviour (keep the build dir on error only) - @xjezda00 's concern.

if ! is_false "$KEEP_BUILD_DIR" &&  { tty -s || test "$KEEP_BUILD_DIR" == errors; }; then
   KEEP_BUILD_DIR="yes"

would do it (there would be a special value "errors" which would indicate, keep it only on errors).

pcahyna commented at 2019-08-29 15:30:

What to do with scripts that check the value of KEEP_BUILD_DIR before 990_verify_rootfs.sh modifies it? Such as build/OPALPBA/Linux-i386/391_list_executable_dependencies.sh ( https://github.com/rear/rear/pull/1659/files#diff-94b2089c3ecae157af07001369db4783 ), which is now maybe the only one, but in the future there may be more of them.

jsmeix commented at 2019-08-30 17:15:

@pcahyna
I would assume that users who set KEEP_BUILD_DIR to a strange value
have to expect strange things ;-)
When someone sets it to "no" and now it behaves this way,
I think this change is rather a fix than a regression.
When someone sets it to 'X' that neither is_true nor is_false
he wold get the default behaviour which is a minor regression for him.
Of course we would document such a change in the release notes
so the users are informed.
I think we cannot keep 100.0% backward compatibility in any case
when there is a good reason for a change with minimal regressions.
We did several such changes in the past, cf. the various
"backward incompatible changes" parts in our release notes
https://github.com/rear/rear/blob/master/doc/rear-release-notes.txt

I like variables that have in the end a boolean 'yes/no' meaning
but also support enhanced ways to specify more exactly how things are meant
so I like it when KEEP_BUILD_DIR="errors" means "yes but only in case of errors".

I do not see a problem with scripts that check KEEP_BUILD_DIR.
For example if is_true $KEEP_BUILD_DIR; then in
https://github.com/rear/rear/pull/1659/files#diff-94b2089c3ecae157af07001369db4783
looks fine to me in particular when KEEP_BUILD_DIR gets ternary semantics.

pcahyna commented at 2019-08-30 17:22:

I like variables that have in the end a boolean 'yes/no' meaning
but also support enhanced ways to specify more exactly how things are meant
so I like it when KEEP_BUILD_DIR="errors" means "yes but only in case of errors".

Ok, I will update the PR to introduce this semantics instead of KEEP_BUILD_DIR_ON_ERRORS.

I do not see a problem with scripts that check KEEP_BUILD_DIR.
For example if is_true $KEEP_BUILD_DIR; then in
https://github.com/rear/rear/pull/1659/files#diff-94b2089c3ecae157af07001369db4783
looks fine to me in particular when KEEP_BUILD_DIR gets ternary semantics.

The problem is that this script checks the value, then 990_verify_rootfs.sh changes it to something else (it is sourced after).

jsmeix commented at 2019-08-30 17:52:

@pcahyna
you are right.

I had thought 990_verify_rootfs.sh sets KEEP_BUILD_DIR=1
only when it errors out.

But now I see there are two cases where 990_verify_rootfs.sh
sets KEEP_BUILD_DIR=1 but does not error out later:

if contains_visible_char "$broken_binaries" ; then
    LogPrintError "There are binaries or libraries in the ReaR recovery system that need additional libraries"
    KEEP_BUILD_DIR=1
    ...
                    LogPrintError "$binary requires additional libraries (specified as non-fatal)"
    ...
            LogPrintError "$binary requires additional libraries"

In those two cases fatal_missing_library="yes" is not set.
This needs to be corrected because there is no reason
to automatically keep BUILD_DIR when there is no error.

jsmeix commented at 2019-08-30 18:01:

I mean that in those two cases where fatal_missing_library="yes"
is (correctly) not set, it needs to be corrected that KEEP_BUILD_DIR=1
is set.
E.g. the KEEP_BUILD_DIR=1 at the beginning of the if clause
must be removed and at each place where fatal_missing_library="yes" is set
also KEEP_BUILD_DIR=1 must be set.
Alternatively KEEP_BUILD_DIR=1 can be set directly before the BugError and Error calls
at the end of the script.

pcahyna commented at 2019-09-02 12:27:

@jsmeix I had something else in mind than the problem you have found. The issue is that build/OPALPBA/Linux-i386/391_list_executable_dependencies.sh gets sourced, KEEP_BUILD_DIR is not true, then 990_verify_rootfs.sh gets sourced and can set KEEP_BUILD_DIR to true. But build/OPALPBA/Linux-i386/391_list_executable_dependencies.sh behaved as if it were not true, i.e. the behaviour was inconsistent with the final value.

pcahyna commented at 2019-09-03 10:36:

@jsmeix I pushed a commit to address your last comment.

pcahyna commented at 2019-09-03 12:21:

@jsmeix I pushed a commit to implement what I proposed (changes to default.conf to be done).

jsmeix commented at 2019-09-03 12:31:

As far as I understand the code in
build/OPALPBA/Linux-i386/391_list_executable_dependencies.sh
the intent is to do that when the user had set KEEP_BUILD_DIR.
@OliverO2 may explain his intent behind here in more detail.

In contrast we set KEEP_BUILD_DIR automatically in
990_verify_rootfs.sh only when it errors out because
here the intent is to make debugging easier for the user.

So I think that both intents are independent of each other
so that the default behaviour does not need to be the same
for different intents.

For me a crucial point is "final power to the user" so that an
explicit user setting KEEP_BUILD_DIR="no" must be obeyed
which means we must not automatically set KEEP_BUILD_DIR=1
i.e. ReaR must not work against what its user has specified.

pcahyna commented at 2019-09-03 12:53:

@jsmeix

For me a crucial point is "final power to the user" so that an
explicit user setting KEEP_BUILD_DIR="no" must be obeyed
which means we must not automatically set KEEP_BUILD_DIR=1
i.e. ReaR must not work against what its user has specified.

this should be the effect of the last changes. I rebased the last commit to include updates to the description in default.conf, so it is ready to be reviewed, and, after testing, merged.

jsmeix commented at 2019-09-03 15:10:

@rear/contributors
I am still not in the office and will not be for some more weeks
so I cannot actually test anything in ReaR.
Therefore I would appreciate it when one of you
could also review it and merge it if it is o.k.

gdha commented at 2019-09-13 06:31:

@rear/contributors for me it is ok and for you?

rmetrich commented at 2019-09-13 07:35:

@gdha It's ok for me

jsmeix commented at 2019-09-17 11:15:

@gdha @rmetrich
thank you for your review.


[Export of Github issue for rear/rear.]