#2979 PR closed: superseded: non-interactive mode for ReaR

Labels: enhancement, won't fix / can't fix / obsolete

codefritzel opened issue at 2023-05-04 11:56:

Relax-and-Recover (ReaR) Pull Request Template

Please fill in the following items before submitting a new pull request:

Pull Request Details:
  • Type: Enhancement

  • Impact: Normal

  • How was this pull request tested?
    manually on Ubu2204 and OL8

  • Brief description of the changes in this pull request:

This PR adds a non-interactive mode in ReaR.
This mode is intended when ReaR is started with an external automation tool like Ansible and there is no possibility to do an user input.
If a user input is needed, ReaR should abort with an error and not get stuck in user input loop.
The UserInput function has been modified so that it returns the dafault value in non-interactive mode otherwise an error is thrown.
In the scripts the mode can be checked and handled with the variable NON_INTERACTIVE.
For example, see usr/share/rear/verify/GALAXY11/default/420_login_to_galaxy_and_setup_environment.sh:

is_true "$NON_INTERACTIVE" && \
            Error "Login is not possible in non-interactive mode. Set variables GALAXY11_USER and GALAXY11_PASSWORD."

The non-interactive mode mode can be activated with the command line option -n or --non-interactive.

jsmeix commented at 2023-05-04 13:14:

Unfortunately in current ReaR code the UserInput function
is not the only way how ReaR may ask for user input.

There are some code places left where 'read' or 'select'
is called directly to get user input, for example
in the ReaR recovery system startup scripts like 'select' in
https://github.com/rear/rear/blob/master/usr/share/rear/skel/default/etc/scripts/system-setup.d/55-migrate-network-devices.sh
but I think also for some third-party backup methods
and at some other code places.

jsmeix commented at 2023-05-04 13:19:

The UserInput function is intentionally made
to be ready for automated usage via the
USER_INPUT_... config variables
so no additional special code should be needed
for UserInput() for automated usage.

I.e. from my current point of view I don't see
why this pull request is needed - or in other words:

@codefritzel
what does not work for you with
the USER_INPUT_... config variables
for your particular automated use case?

jsmeix commented at 2023-05-04 13:30:

The code places where UserInput is called without '-D'
i.e. without a default input value are currently:

# find usr/sbin/rear usr/share/rear -type f | xargs grep 'UserInput ' | grep -v ': *#' | grep -v ' -D

usr/share/rear/format/USB/default/300_format_usb_disk.sh:
        USB_UEFI_PART_SIZE="$( UserInput -I USB_DEVICE_EFI_PARTITION_MIBS -p "Enter size for EFI system partition on $RAW_USB_DEVICE in MiB (default 512 MiB)" )"

usr/share/rear/layout/prepare/GNU/Linux/150_include_drbd_code.sh:
    user_input="$( UserInput -I DRBD_RESOURCE_BECOMES_PRIMARY -p "Type 'yes' if you want DRBD resource $resource to become primary" )"

usr/share/rear/restore/BORG/default/300_load_archives.sh:
    choice="$( UserInput -I BORGBACKUP_ARCHIVE_TO_RECOVER -p "Choose archive to recover from" )"

usr/share/rear/restore/NBKDC/default/400_restore_backup.sh:
    if is_true "$( UserInput -I NBKDC_WAIT_UNTIL_RESTORE_SUCCEEDED -t 0 -p "$user_input_prompt" )" ; then

usr/share/rear/verify/GALAXY11/default/500_select_backupset.sh:
 GALAXY11_BACKUPSET=$( UserInput -I GALAXY11_BACKUPSET -p "Select CommVault backupset to use:" "${backupsets[@]}" )

usr/share/rear/verify/GALAXY11/default/550_request_point_in_time_restore_parameters.sh:
 GALAXY11_PIT_RECOVERY=$( UserInput -I GALAXY11_PIT_RECOVERY -p "Select point-in-time restore to use (ENTER for latest):" "${job_end_times[@]}" )

usr/share/rear/verify/CDM/default/410_use_replica_cdm_cluster_cert.sh:
    CDM_CLUSTER_IP="$( UserInput -I CDM_CLUSTER_IP -r -t 0 -p "$prompt" )"

usr/share/rear/verify/CDM/default/410_use_replica_cdm_cluster_cert.sh:
            CDM_AGENT_URL="$( UserInput -I CDM_AGENT_URL -r -t 0 -p "$prompt" )"

usr/share/rear/verify/NBU/default/390_request_point_in_time_restore_parameters.sh:
    answer=$( UserInput -I NBU_RESTORE_PIT -r -p "Enter date (mm/dd/yyyy) or date and time (mm/dd/yyyy HH:MM:SS) or press ENTER" )

usr/share/rear/verify/TSM/default/390_request_point_in_time_restore_parameters.sh:
    answer=$( UserInput -I TSM_RESTORE_PIT -r -p "Enter date/time (YYYY-MM-DD HH:mm:ss) or press ENTER" )

usr/share/rear/verify/GALAXY10/default/550_request_point_in_time_restore_parameters.sh:
    answer=$( UserInput -I GALAXY10_RESTORE_PIT -r -p "Enter date/time (MM/DD/YYYY HH:mm:ss) or press ENTER" )

usr/share/rear/verify/NSR/default/390_request_point_in_time_restore_parameters.sh:
            answer=$( UserInput -I NSR_RESTORE_PIT -r -p "Enter date (mm/dd/yyyy) or date and time (mm/dd/yyyy HH:MM:SS) or press ENTER" )

usr/share/rear/lib/opal-functions.sh:
        password="$(UserInput -I "$id" -C -r -s -t 0 -p "$prompt")"
...
        result="$(UserInput -I "$id" -t 0 -p "$prompt")"

As far as I see nowhere is UserInput called without '-I'

# find usr/sbin/rear usr/share/rear -type f | xargs grep -h 'UserInput ' | grep -v '^ *#' | grep -v ' -I '

[no output ]

so for all cases where UserInput is called
it is possible to specify a predefined user input value.

jsmeix commented at 2023-05-04 13:38:

I see there are perhaps some cases where
any predefined user input value cannot make sense
e.g. because any predefined value would be wrong
or any predefined value would be a security issue.

So I could enhance the UserInput function to error out
when a special predefined user input value is specified,
for example when that value is 'ERROR_ABORT' e.g. like

USER_INPUT_GALAXY10_RESTORE_PIT="ERROR_ABORT"

then the call

UserInput -I GALAXY10_RESTORE_PIT ...

would just error out.

jsmeix commented at 2023-05-04 13:55:

In general see
https://github.com/rear/rear/wiki/Coding-Style#it-should-be-possible-to-run-rear-unattended

So in theory all code in ReaR should be so that it can run unattended.

In theory, theory and practice are the same. In practice, they are not.

I think it is more helpful in practice to fix those code places
where ReaR cannot run unattended one by one as they become known
than a new generic mode which does not work in general, see
https://github.com/rear/rear/pull/2979#issuecomment-1534762436
because currently a non-interactive mode would be
just one more false promise to the user.

schlomo commented at 2023-05-04 17:05:

@codefritzel there is a lot of wisdom in what @jsmeix says here.

Maybe another aproach would actually provide an easier solution? what happens if you call ReaR like this

yes "" | rear recover -v

In thise way every prompt will be answered with ENTER and I imagine that your login prompt would also quickly lead to a failure, which you could then detect from Ansible.

Or, rather:

bash -c "exec rear recover -v < <(yes '')"

(bash creates the stdin input redirection and then replaces itself with rear)

to easier get the return code of rear.

codefritzel commented at 2023-05-04 20:10:

The UserInput function is intentionally made to be ready for automated usage via the USER_INPUT_... config variables so no additional special code should be needed for UserInput() for automated usage.

I.e. from my current point of view I don't see why this pull request is needed - or in other words:

@codefritzel what does not work for you with the USER_INPUT_... config variables for your particular automated use case?

@jsmeix thanks for the tipp i did not know that at all.

I see there are perhaps some cases where any predefined user input value cannot make sense e.g. because any predefined value would be wrong or any predefined value would be a security issue.

So I could enhance the UserInput function to error out when a special predefined user input value is specified, for example when that value is 'ERROR_ABORT' e.g. like

USER_INPUT_GALAXY10_RESTORE_PIT="ERROR_ABORT"

then the call

UserInput -I GALAXY10_RESTORE_PIT ...

would just error out.

This could be an alternative, but the user must know all possible inputs before.

@codefritzel there is a lot of wisdom in what @jsmeix says here.

Maybe another aproach would actually provide an easier solution? what happens if you call ReaR like this

yes "" | rear recover -v

In thise way every prompt will be answered with ENTER and I imagine that your login prompt would also quickly lead to a failure, which you could then detect from Ansible.

Or, rather:

bash -c "exec rear recover -v < <(yes '')"

(bash creates the stdin input redirection and then replaces itself with rear)

to easier get the return code of rear.

@schlomo in my opinion these are "dirty" workarounds which only work if the inputs are not validated.
Here is an example with GALAXY11_PIT_RECOVERY:
ReaR was called by ansible and there was no feedback for a long time. The reason was that a UserInput was requested for GALAXY11_PIT_RECOVERY and this was validated in a loop. This had created an infinite loop because of this code section:

until IsInArray "$GALAXY11_PIT_RECOVERY" "${job_end_times[@]}"; do
    GALAXY11_PIT_RECOVERY=$( UserInput -I GALAXY11_PIT_RECOVERY -p "Select point-in-time restore to use (ENTER for latest):" "${job_end_times[@]}" )
done

This has already been fixed in 0ebb2c but there is still a concern that this problem will occur somewhere else. In this case the desire for ReaR is to abort in such a case.

I think with the non-interactive mode this would be the cleanest way to do it. If this it implemented in the relevant sections.

jsmeix commented at 2023-05-05 05:32:

@codefritzel
as far as I understand it currently
I think the generic functionality you need
is to monitor a running process ('rear' in this case)
whether or not it is actually doing something
in contrast to only endlessly useless waiting.

I think the problem with that is how to distinguish
between deliberate waiting (for something that will happen)
and endlessly useless waiting.

For a nice example of deliberate waiting see your
recent "nfs_server as new restore method"
https://github.com/rear/rear/pull/2973
via restore/NFS4SERVER/default/400_restore_with_nfs_server.sh
https://github.com/rear/rear/blob/master/usr/share/rear/restore/NFS4SERVER/default/400_restore_with_nfs_server.sh

The good thing of this deliberate waiting is
that is is gentle busy waiting via

while WAIT_CONDITION ; do
    ...
    sleep 5
    ...
done

so the waiting process is not totally idle
i.e. the waiting process does not appear to be dead(locked).

pcahyna commented at 2023-05-05 09:55:

@codefritzel

Here is an example with GALAXY11_PIT_RECOVERY: ReaR was called by ansible and there was no feedback for a long time.

Note that there is the USER_INPUT_TIMEOUT variable, which can be used to shorten waiting for input that never arrives

The reason was that a UserInput was requested for GALAXY11_PIT_RECOVERY and this was validated in a loop. This had created an infinite loop because of this code section:

I think that this infinite loop case warrants some extension of UserInput: either one would be able to specify in the code that user input must be provided, or the function would abort if called some number of times for the same input and timed out (or some combination of both).

until IsInArray "$GALAXY11_PIT_RECOVERY" "${job_end_times[@]}"; do
    GALAXY11_PIT_RECOVERY=$( UserInput -I GALAXY11_PIT_RECOVERY -p "Select point-in-time restore to use (ENTER for latest):" "${job_end_times[@]}" )
done

This has already been fixed in 0ebb2c but there is still a concern that this problem will occur somewhere else. In this case the desire for ReaR is to abort in such a case.

I think with the non-interactive mode this would be the cleanest way to do it. If this it implemented in the relevant sections.

Note that there is already the unattended command line option, which implements a sort of non-interactive mode. Could you check whether it does what you need and if not whether it would make sense to extend it?

schlomo commented at 2023-05-10 08:17:

Here is another idea:

Extend the UserInput function so that it will automatically abort on repeated calls on the same variable. That way interactive prompts can be answered with the default response but loops that keep asking for the same UserInput will then abort automatically on the second try.

What do you think about that?

jsmeix commented at 2023-05-10 08:53:

for something in foo bar baz ; do
    ...
    UserInput -I WAIT_UNTIL_ENTER -t 0 -p 'Press [Enter] to continue'
    ...
done

schlomo commented at 2023-05-10 10:08:

@jsmeix what do you mean? That this example would not work with my suggestion? Yes, of course not.

I'm still trying to find a solution for making ReaR work better in unattended mode, and especially preventing endless UserInput loops. The one during recovery #2984 is maybe one of the worst examples, but there may be others.

I don't mind modifying my suggestion so that calling UserInput 10 times with the same variable will lead to an abort. And we can also make it a configuration option that must be activated via a parameter, e.g. the --non-interactive parameter.

pcahyna commented at 2023-05-10 11:24:

Here is another idea:

Extend the UserInput function so that it will automatically abort on repeated calls on the same variable. That way interactive prompts can be answered with the default response but loops that keep asking for the same UserInput will then abort automatically on the second try.

What do you think about that?

yes, it is similar to what I meant by "or the function would abort if called some number of times for the same input and timed out" - by "same input" I meant "on the same variable" - and I meant to abort on a specific number of calls and not necessarily already at the second one. Also, to abort only if the fucntion timed out multiple times, not when the input has been entered properly. Aborting always would basically prevent using UserInput in a loop, which could also have its would also have its merits and possibly disadvantages.

schlomo commented at 2023-05-10 11:59:

Good idea, we can count the "nobody took care of me for the N-th time" when triggering the timeout in UserInput. That way it really happens only without user interaction

jsmeix commented at 2023-05-10 13:15:

@pcahyna
thank you for your proposal.

@schlomo
thank you for your summary reason why this proposal works.

Now I agree with the proposed method.

Regarding using the same UserInput in a loop:

This is what I meant with
https://github.com/rear/rear/pull/2979#issuecomment-1541609286
where same input '[Enter]' is needed several times in a loop
with same UserInput ID 'WAIT_UNTIL_ENTER'.

A real-world example is similar as
https://github.com/rear/rear/issues/2984
but in interactive mode when diskrestore.sh
needs to be run several times in a loop
with same UserInput 'LAYOUT_CODE_RUN' and
same input "Rerun disk recreation script"
because the user could also fix things
by only calling command line tools via another shell
which he could do on a second terminal (e.g. via ssh)
so that via the terminal where 'rear' is run only the
UserInput 'LAYOUT_CODE_RUN' is run several times in a loop
with several times same input "Rerun disk recreation script".

schlomo commented at 2023-05-10 13:22:

We'll rework this PR to

  1. abort in the UserInput function if the timeout is hit too many times (configurable)
  2. activate non-interactive together with the unattended recovery

Any other wishes to include?

I'm actually wondering if we can have (1) always active but with a higher count, e.g. 10, and then reduce that to 1 together with setting the USER_INPUT_TIMEOUT to 3 seconds as the main action of the --non-interactive flag? I can't imagine where somebody would want to cycle more than 10 times through the same question and this approach makes it simple to manually test ReaR with convenient timeouts and then use ReaR "in production" and non-interactive with short timeouts and no timeout-driven retries, without changing the configuration files.

jsmeix commented at 2023-05-10 14:42:

I think when there is a --non-interactive flag,
then the maximum number of timeouts for the same UserInput
could be even unlimited when --non-interactive is not set.
In contrast when --non-interactive is set,
then the maximum number of timeouts is what the user specified
or as fallback what is specified for it in default.conf.

Or in other words:
Keep the current behaviour when --non-interactive is not set.
Do whatever is needed when --non-interactive is set.

My reason for unlimited number of timeouts
when --non-interactive is not set is that
I think it could be rather unfriendly behaviour
for the user if it all of a sudden aborts ReaR
after a relatively small number of timeouts
(and a big number of timeouts behaves in practice
same as unlimited number of timeouts).

In particular during "rear recover" any automated abort
because of something that is not an actual hard error
would be rather unfriendly behaviour for a user
who is interactively struggling with "rear recover".

Simply put:
In interactive mode only the user should abort ReaR.
(It is a different thing when ReaR must error out.)

schlomo commented at 2023-05-10 16:56:

BTW, I'll keep the Error if non-interactive is used and a UserInput has no default choice, as this affects only very few cases:

git grep '\$(.*UserInput' | grep -v -- -D | grep -v functions.sh
usr/share/rear/format/USB/default/300_format_usb_disk.sh:        USB_UEFI_PART_SIZE="$( UserInput -I USB_DEVICE_EFI_PARTITION_MIBS -p "Enter size for EFI system partition on $RAW_USB_DEVICE in MiB (default 512 MiB)" )"
usr/share/rear/layout/prepare/GNU/Linux/150_include_drbd_code.sh:    user_input="$( UserInput -I DRBD_RESOURCE_BECOMES_PRIMARY -p "Type 'yes' if you want DRBD resource $resource to become primary" )"
usr/share/rear/restore/BORG/default/300_load_archives.sh:    choice="$( UserInput -I BORGBACKUP_ARCHIVE_TO_RECOVER -p "Choose archive to recover from" )"
usr/share/rear/restore/NBKDC/default/400_restore_backup.sh:    if is_true "$( UserInput -I NBKDC_WAIT_UNTIL_RESTORE_SUCCEEDED -t 0 -p "$user_input_prompt" )" ; then
usr/share/rear/verify/CDM/default/410_use_replica_cdm_cluster_cert.sh:    CDM_CLUSTER_IP="$( UserInput -I CDM_CLUSTER_IP -r -t 0 -p "$prompt" )"
usr/share/rear/verify/CDM/default/410_use_replica_cdm_cluster_cert.sh:            CDM_AGENT_URL="$( UserInput -I CDM_AGENT_URL -r -t 0 -p "$prompt" )"
usr/share/rear/verify/GALAXY10/default/550_request_point_in_time_restore_parameters.sh:    answer=$( UserInput -I GALAXY10_RESTORE_PIT -r -p "Enter date/time (MM/DD/YYYY HH:mm:ss) or press ENTER" )
usr/share/rear/verify/GALAXY11/default/500_select_backupset.sh: GALAXY11_BACKUPSET=$( UserInput -I GALAXY11_BACKUPSET -p "Select CommVault backupset to use:" "${backupsets[@]}" )
usr/share/rear/verify/NBU/default/390_request_point_in_time_restore_parameters.sh:    answer=$( UserInput -I NBU_RESTORE_PIT -r -p "Enter date (mm/dd/yyyy) or date and time (mm/dd/yyyy HH:MM:SS) or press ENTER" )
usr/share/rear/verify/NSR/default/390_request_point_in_time_restore_parameters.sh:            answer=$( UserInput -I NSR_RESTORE_PIT -r -p "Enter date (mm/dd/yyyy) or date and time (mm/dd/yyyy HH:MM:SS) or press ENTER" )

I'll have a look at those and add defaults where possible.

schlomo commented at 2023-05-14 15:15:

There are some code places left where 'read' or 'select'
is called directly to get user input, for example
in the ReaR recovery system startup scripts like 'select' in
https://github.com/rear/rear/blob/master/usr/share/rear/skel/default/etc/scripts/system-setup.d/55-migrate-network-devices.sh

@jsmeix I just realized: Not an issue as the stuff in skel runs before rear recover starts, while --non-interactice only works within rear itself.

schlomo commented at 2023-05-14 15:32:

This PR is superseded by #2988

jsmeix commented at 2023-05-15 06:14:

@schlomo
regarding your
https://github.com/rear/rear/pull/2979#issuecomment-1546922558

I think you may have misunderstood.

I think in general a non-interactive / unattended mode for ReaR
is meant to let ReaR either continue or abort automatically
whenever user input is needed.

But in general when 'read' or 'select' is called directly
to get user input the current implementation of those calls
may not yet automatically continue or abort.

Off the top of my head I know at least about
https://github.com/rear/rear/commit/dd0c7111c76f48c21e53f5184ede6aa5f07d6865
so that at this code place it currently continues
automatically only when there is only one choice
https://github.com/rear/rear/blob/master/usr/share/rear/skel/default/etc/scripts/system-setup.d/55-migrate-network-devices.sh#L145
but when there is more than a single choice it calls 'select'
https://github.com/rear/rear/blob/master/usr/share/rear/skel/default/etc/scripts/system-setup.d/55-migrate-network-devices.sh#L177
which waits endlessly for an input, cf.
https://github.com/rear/rear/issues/1366


[Export of Github issue for rear/rear.]