#2494 Issue closed: Discussion: What Is The Right Way Of Error Checking in ReaR?

Labels: discuss / RFC, no-issue-activity

OliverO2 opened issue at 2020-09-17 12:29:

It all started with some seemingly innocent argument checking in an issue comment's code:

function opal_device_partprobe() {
    local device="${1:?}"

    [...]
}

Which prompted the observation

this leads to an unfriendly exit without any error message when VAR is null or unset.

Example 1: Parameter Checks

Store this code as usr/share/rear/prep/default/000_test.sh:

function opal_device_partprobe() {
    local device="${1:?}"

    partprobe "$device"
}

opal_device_partprobe  # missing parameter

Running sudo usr/sbin/rear mkrescue prints:

rear mkrescue failed, check [...].log for details

The relevant log excerpt is (timestamps omitted):

Including prep/default/000_test.sh
[...]/usr/share/rear/prep/default/000_test.sh: line 2: 1: parameter null or not set
Exiting rear mkrescue (PID 241592) and its descendant processes ...

Example 2: Extensive Checks

function opal_device_partprobe() {
    local device="$1"
    test -b "$device" -o -c "$device" || BugError "opal_device_partprobe called for '$device' that is neither a block nor a character device"

    partprobe "$device"
}

opal_device_partprobe  # missing parameter

Running sudo usr/sbin/rear mkrescue prints:

ERROR:
====================
BUG in [...]/usr/share/rear/prep/default/000_test.sh line 3:
'opal_device_partprobe called for '' that is neither a block nor a character device'
--------------------
Please report this issue at https://github.com/rear/rear/issues
and include the relevant parts from [...].log
preferably with full debug information via 'rear -D mkrescue'
====================
Some latest log messages since the last called script 000_test.sh:
  2020-09-17 13:21:23.386054312 Including prep/default/000_test.sh
Aborting due to an error, check [...].log for details
Terminated

The relevant log excerpt is (timestamps omitted):

Including prep/default/000_test.sh
ERROR:
====================
BUG in [...]/usr/share/rear/prep/default/000_test.sh line 3:
'opal_device_partprobe called for '' that is neither a block nor a character device'
--------------------
Please report this issue at https://github.com/rear/rear/issues
and include the relevant parts from [...].log
preferably with full debug information via 'rear -D mkrescue'
====================
===== Stack trace =====
Trace 0: usr/sbin/rear:555 main
Trace 1: [...]/usr/share/rear/lib/mkrescue-workflow.sh:12 WORKFLOW_mkrescue
Trace 2: [...]/usr/share/rear/lib/framework-functions.sh:124 SourceStage
Trace 3: [...]/usr/share/rear/lib/framework-functions.sh:56 Source
Trace 4: [...]/usr/share/rear/prep/default/000_test.sh:8 source
Trace 5: [...]/usr/share/rear/prep/default/000_test.sh:3 opal_device_partprobe
Trace 6: [...]/usr/share/rear/lib/_input-output-functions.sh:701 BugError
=== End stack trace ===
Exiting rear mkrescue (PID 242203) and its descendant processes ...

Arguments

(I hope this is representative of the original discussion.)

In Favor Of Some Checks

Not checking at all (like most code in ReaR) would let ReaR either continue with undesirable results or trigger a follow-up error later on which is probably harder to diagnose.

Against Extensive Checks

Putting in explicit error-checking code everywhere (like [[ -z "$foo" ]] || BugError "foo: parameter null or not set" ]]) distracts from the primary functionality and makes the code harder to read.

In Favor Of Extensive Checks

The problem for the user is that he does not get the error message on his terminal.
Let him search the actual error message from the log is not user friendly.
He needs at least some message about what went wrong on his terminal.
Then he can search the log for more details related to that message.

[Parameter checks only] would not catch if opal_device_partprobe /etc/fstab was called.

To Be Discussed

In principle, we can always insert more checks in ReaR code and rely less on checks elsewhere. It is apparent that such checks can make the user's life easier:

  • In the above case, partprobe does almost no error checking on its own: It does not complain when called without arguments or with a regular file argument.

On the other hand, more checking means more code. Checks themselves can contain bugs. More code must be read when contributing to ReaR. Comparing the code within the function body (ignoring leading spaces):

  • Example 1: 43 characters
  • Example 2: 177 characters (4.1 times as much)

So what might be the right amount of error checking in ReaR? Can we even figure out useful guidelines?

jsmeix commented at 2020-09-17 13:14:

@OliverO2
thank you for your explanatory description of the issue
which helps so much to discuss such things in a constructive way!

I dared to change the subject from

What Is The Right Amount Of Error Checking in ReaR?

to

What Is The Right Way Of Error Checking in ReaR?

because I think it is not only about the Amount
but more generally about what the best Way is.

jsmeix commented at 2020-09-17 13:33:

See my reasoning in
https://github.com/rear/rear/issues/2475#issuecomment-671871423
why I am in favor of explicit checks.

I am not in favor of extensive (i.e. a high amount of) checks.

Therefore I think the example like

[[ -z "$device" ]] || BugError "device: parameter null or not set"

is - although better than local device="${1:?}" - not what I mean with an explicit check.

In contrast I think

test -b "$device" -o -c "$device" || BugError "opal_device_partprobe called for '$device' that is neither a block nor a character device"

is an explicit check.
In particular it makes it obvious for a code reader what opal_device_partprobe is about
because the error message provides explicit information in particular for a code reader.

In the end I am in favor of explicit checks because of what I described in
https://github.com/rear/rear/wiki/Coding-Style

jsmeix commented at 2020-09-17 13:35:

So now I stay silent at least until next Monday
to give the other @rear/contributors some time
to share their thougths here.

gdha commented at 2020-09-18 06:46:

I'm in favor for explicit check for disk devices and creation of partition and file systems. Other kind of errors can also always be found back in the log file, which should be a wise thing to do check each time - will try to explicit mention this in the ReaR User Guide (in progress)

github-actions commented at 2020-11-18 01:37:

Stale issue message


[Export of Github issue for rear/rear.]