#2988 PR merged
: Implement non-interactive mode¶
Labels: fixed / solved / done
, sponsored
schlomo opened issue at 2023-05-14 15:24:¶
supersedes #2979 and should solve everything discussed there. Code is based on the work by @codefritzel and adapted by me.
My considerations and decisions:
- the non-interactive mode is marked experimental because it only
changes the way how
UserInput
works and there might be some scenarios where it won't work as advertised. I don't want to put a lengthy explanation about the technical details in the help text, hence experimental to indicate the current status. - Without
--non-interactive
nothing should have changed (but I fixed a few places that seemed like they could cause problems) - With
--non-interactive
theUserInput
timeouts are hard coded to 3 seconds (short, but not too short) and everyUserInput -I FOO
call can happen only once. If it happens again then it willError
out. - I treat an automated response by
USER_INPUT_FOO
the same as using the default response given via-D
- I'd like to keep it this simple till further testing reveals
scenarios where multiple calls to the same
UserInput -I FOO
would be successful, and then I'll adjust the code accordingly (no premature optimisation).
Finally, I'd like to merge this rather soon so please look for reasons why this is a really bad idea and leave further improvements to future iterations.
@rear/contributors please have a look.
PS: The UserInput
function is really hard to understand and to test
(via rear shell
) because it does so much, if somebody wants to
contribute a unit test then this would be my starting point.
jsmeix commented at 2023-05-15 06:27:¶
I am wondering if there a reason why it is called 'non-interactive'
and not 'unattended' which is a term that we already use in ReaR?
Perhaps it is good to use 'non-interactive'
for the usr/sbin/rear command line option
to distinguish it from the already existing
recovery system kernel command line option
'auto_recover/automatic' and 'unattended'
because each one implements a differnt thing so the three
'non-interactive' 'auto_recover/automatic' and 'unattended'
look same but actually they are only similar
and belong to different parts of the same goal?
@schlomo
PS: The UserInput function was really hard to implement
to get it into a state that provides what was needed
at that time when the UserInput was implemented.
Feel free to overhaul it and replace it with something better.
I already overhauled much old code without useful comments
that I could not understand with reasonable effort so
meanwhile I got rather tired to endlessly clean up old code.
pcahyna commented at 2023-05-16 11:19:¶
* I'd like to keep it this simple till further testing reveals scenarios where multiple calls to the same `UserInput -I FOO` would be successful, and then I'll adjust the code accordingly (no premature optimisation).
This could in principle happen with loops that are not infinite loops but iterate over some restricted set of items. I suspected that the layout code might have instances of this, as it may be necessary to enter input multiple times at the same code spot for multiple disks or other storage objects. But I have found this code that maps multiple disks interactively:
which already takes care of it. So, I believe this is either taken care of and if not, code that does not do it could be adapted in a similar way (generating unique identifiers for every iteration).
jsmeix commented at 2023-05-16 12:01:¶
The following is only for the non-interactive mode:
The crucial thing is to abort only if there was no active user input
which means to abort only if the 'read' timeout happened.
The UserInput function returns 1 only if 'read' timed out
(provided there is no syntax error or some lower level error).
Because the current user_input_ID gets only added
to USER_INPUT_SEEN_WITH_TIMEOUT when return_code=1 is set
the current code seems to implement that
(as far as I see from plain looking at the code).
jsmeix commented at 2023-05-16 12:27:¶
Regardless of the needed condition in
https://github.com/rear/rear/pull/2988#issuecomment-1549529195
I think this condition is not sufficient to make things work
in general in non-interactive mode.
I think (as far as I see from plain looking at the code)
the current implementation aborts at a second UserInput
with a user_input_ID that had timed out before.
But in non-interactive mode it is the usual case
that UserInput times out and the default input is used
when the user has not specified a predefined input.
As far as I remember in most cases the default input
is to let ReaR continue so that the normal behaviour is
that ReaR continues when the user does nothing
regardless of normal/interactive or non-interactive mode.
I implemented the various UserInput calls this way
as far as possible to get intentionally a default behaviour
of "automated proceeding" - unless the user must make a decision.
So in normal mode the following code would
automatically proceed after each timeout
...
UserInput -I WAIT_UNTIL_TIMEOUT -p 'Press [Enter] to continue'
...
UserInput -I WAIT_UNTIL_TIMEOUT -p 'Press [Enter] to continue'
...
In contrast in non-interactive mode it would abort at
the second 'UserInput -I WAIT_UNTIL_TIMEOUT' call.
I think (as far as I understand it) the current implementation
enforces that a user_input_ID must never be used a second time.
I think this is what @pcahyna meant with
"iterate over some restricted set of items"
in his
https://github.com/rear/rear/pull/2988#issuecomment-1549471637
pcahyna commented at 2023-05-17 08:56:¶
@jsmeix what I meant by "iterate over some restricted set of items" was code like this:
for d in "${disks[@]}"; do
UserInput -I CONFIRM_DISK -p "confirm the use of disk ${d}"
...
done
This can be fixed by generating a unique ID for each disk that is to be
confirmed. My point was that the code that iterates this way when
mapping disks is already using this idiom and is thus safe.
What is still problematic is code like you have shown:
UserInput -I WAIT_UNTIL_TIMEOUT -p 'Press [Enter] to continue'
DoStuff
UserInput -I WAIT_UNTIL_TIMEOUT -p 'Press [Enter] to continue'
DoOtherStuff
This, too, can be fixed by changing the IDs to be unique, like
UserInput -I CONFIRM_STUFF -p 'Press [Enter] to continue'
DoStuff
UserInput -I CONFIRM_OTHER_STUFF -p 'Press [Enter] to continue'
DoOtherStuff
I don't know whether there actually is code like this (that reuses the same input ID in several places).
schlomo commented at 2023-05-17 09:04:¶
@pcahyna @jsmeix your concerns are valid and I planned to go through our codebase to look at such cases with the goal of defusing potential problems, as you suggested.
schlomo commented at 2023-05-29 12:11:¶
I'm going over all uses of UserInput
to fix them up. Except for the
ones mentioned below, all the User Input calls have a default or won't
run into a perpetual loop.
The following cases are not clear to me and I left them be as they are now:
-
https://github.com/rear/rear/blob/2aaa1967710314c7bc12ccacac67dcf76b10b95a/usr/share/rear/lib/global-functions.sh#L553-L555
This could theoretically be called multiple times, but I'm not sure if this can be called multiple times for ISO -
https://github.com/rear/rear/blob/2aaa1967710314c7bc12ccacac67dcf76b10b95a/usr/share/rear/lib/opal-functions.sh#L333-L337
https://github.com/rear/rear/blob/2aaa1967710314c7bc12ccacac67dcf76b10b95a/usr/share/rear/lib/opal-functions.sh#L372-L375
Those are used for OPAL and set the timeout to0
, so I assume that nobody needs this non-interactive right now. -
https://github.com/rear/rear/blob/2aaa1967710314c7bc12ccacac67dcf76b10b95a/usr/share/rear/restore/NBKDC/default/400_restore_backup.sh#L101-L107
Manual wait for restore to happen, timeout set to0
so that we can't support non-interactive here either -
https://github.com/rear/rear/blob/2aaa1967710314c7bc12ccacac67dcf76b10b95a/usr/share/rear/verify/CDM/default/410_use_replica_cdm_cluster_cert.sh#L15
https://github.com/rear/rear/blob/2aaa1967710314c7bc12ccacac67dcf76b10b95a/usr/share/rear/verify/CDM/default/410_use_replica_cdm_cluster_cert.sh#L56
Manual wait with timeout0
, I guess that we don't support non-interactive for this
schlomo commented at 2023-05-29 16:19:¶
@rear/contributors please have a look, I'd like to merge this also during this week
jsmeix commented at 2023-05-30 10:53:¶
Regarding
https://github.com/rear/rear/pull/2988#issuecomment-1567061655
therein the UserInput for the '(iso)' case
in the mount_url() function in lib/global-functions.sh
When a UserInput is needed here things are already in a
rather exceptional (a.k.a. erroneous) state:
A device node where the ISO is attached to is needed
but ReaR could not determine such a device node
so ReaR must ask the user.
In non-interactive mode ReaR may directly abort here
but I think it is better to let ReaR continue
with the default value (i.e. try with '/dev/cdrom')
and when this is not right things will fail later with
ERROR: Mount command '...' failed.
I.e. things should behave sufficiently OK as is.
schlomo commented at 2023-05-30 10:58:¶
Regarding #2988 (comment) therein the UserInput for the '(iso)' case in the mount_url() function in lib/global-functions.sh
When a UserInput is needed here things are already in a rather exceptional (a.k.a. erroneous) state: A device node where the ISO is is needed but ReaR could not determine such a device node so ReaR must ask the user.
In non-interactive mode ReaR may directly abort here but I think it is better to let ReaR continue with the default value (i.e. try with '/dev/cdrom') and when this is not right things will fail later.
I.e. things should behave sufficiently OK as is.
When I was reading that code I asked myself: How could we ever truly test this???
jsmeix commented at 2023-05-30 11:02:¶
Usually while I am implementing such kind of code
I set up some artificial stuff to simulate
how things behave in the erroneous cases
but of course I cannot test an actual
erroneous case on real hardware
with reasonable effort.
jsmeix commented at 2023-05-30 11:08:¶
In the end for code like
if ! NORMAL_CASE ; then
Log "problematic case"
try something
fi
do something || Error "..."
it does not matter if "try something" works perfectly well.
jsmeix commented at 2023-05-30 13:26:¶
@schlomo
I would like to better understand the reasoning behind why you
"treat an automated response by USER_INPUT_FOO the same
as using the default response given via -D"
I assume the reason is to avoid an endless loop
when the user has specified a USER_INPUT_FOO value
in his etc/rear/local.conf file that leads to a loop?
For example something like
USER_INPUT_LAYOUT_CODE_RUN=0
which matches in layout/recreate/default/200_run_layout_code.sh
choices[0]="Rerun disk recreation script ($LAYOUT_CODE)"
If my assumption is right I wonder if it is OK
to let ReaR behave automatically different than
what the user had explicitly specified because
I think it contradicts how I understand what
"final power to the user" should mean.
I.e. in my example above when the user had specified
something that makes ReaR loop then ReaR must obey.
ReaR must not attempt to do automated "corrections"
of what the user had specified regardless if
what the user had specified makes sense
from our point of view.
But I am not against to
"treat an automated response by USER_INPUT_FOO the same
as using the default response given via -D"
in non-interactive mode because it means ReaR would error out
when the same user_input_ID gets an automated response
(i.e. the same automated response) a second time
which indicates that things do not move forward and
when things do not move forward in non-interactive mode
it is an erroneous state so it is OK to error out then.
I only like to get a sound understanding whether or not
this behaviour is actually right or perhaps wrong.
schlomo commented at 2023-05-31 21:42:¶
@jsmeix I added your two suggestions. About your question in
https://github.com/rear/rear/pull/2988#issuecomment-1568434201
I think that my logic here is very simple: If ReaR starts to run in
circles during non-interactive mode, meaning that the same question is
asked again (same UserInput -I XXX
), then we assume that additional
iterations of the same question will be answered with the same answer -
either the default value or something provided by the user - and
therefore will not lead to a different result. Therefore it is better to
abort with a clear error message so that other code, e.g. that was
running rear recover
via SSH, can know about it and handle it.
For that logic it indeed doesn't matter why the answer to a question
is the same, it only matters that it is the same and we assume that it
is the same answer even though it would be possible to inject code into
ReaR that would go and modify the USER_INPUT_XXX
variable between
calls to UserInput
.
I think we should merge this and see how it behaves in production and improve it where needed. I'm very confident that regular use of ReaR will work "as expected".
jsmeix commented at 2023-06-01 06:26:¶
@schlomo
thank you for your explanatory description in your
https://github.com/rear/rear/pull/2988#issuecomment-1571002063
I fully agree with you.
[Export of Github issue for rear/rear.]