#2566 Issue closed: TSM restore fails because of impossible user input when stdout and stderr are redirected to the log

Labels: enhancement, support / question, needs sponsorship, external tool, no-issue-activity

dcz01 opened issue at 2021-02-09 15:44:

chroot: failed to run command '/bin/bash': No such file or directory

Fill in the following items before submitting a new issue
(quick response is not guaranteed with free support):

  • ReaR version ("/usr/sbin/rear -V"):
    Relax-and-Recover 2.6 / 2020-06-17

  • OS version ("cat /etc/os-release" or "lsb_release -a" or "cat /etc/rear/os.conf"):

NAME="CentOS Linux"
VERSION="8 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="8"
PLATFORM_ID="platform:el8"
PRETTY_NAME="CentOS Linux 8 (Core)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:8"
HOME_URL="https://www.centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"

CENTOS_MANTISBT_PROJECT="CentOS-8"
CENTOS_MANTISBT_PROJECT_VERSION="8"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION="8"

CentOS Linux release 8.3.2011

  • ReaR configuration files ("cat /etc/rear/site.conf" and/or "cat /etc/rear/local.conf"):
OUTPUT=ISO
OUTPUT_URL=file:///tmp/rear
#BACKUP=NETFS
BACKUP=TSM
#BACKUP_PROG=tar
#BACKUP_PROG_CRYPT_ENABLED=1
#BACKUP_PROG_CRYPT_KEY=<Verschluesselungskennwort>
#BACKUP_PROG_CRYPT_OPTIONS="/usr/bin/openssl aes-256-cbc -salt -k"
#BACKUP_PROG_DECRYPT_OPTIONS="/usr/bin/openssl aes-256-cbc -d -k"
#BACKUP_URL=nfs://<IP-Adresse oder DNS-Name>/<Freigabepfad>
#BACKUP_URL=cifs://<Server>/<Freigabe>
#BACKUP_OPTIONS="cred=/etc/rear/cifs,sec=ntlmsspi"
#BACKUP_OPTIONS="nfsvers=4,nolock"
#BACKUP_TYPE=incremental
#FULLBACKUPDAY="Sat"
#BACKUP_PROG_EXCLUDE=( '/tmp/*' '/dev/shm/*' $VAR_DIR/output/\* '/opt/tivoli/tsm/rear/*' '/mnt/*' '/media/*' '/var/lib/pgsql/*/data/base/*' '/var/lib/pgsql/*/data/global/*' '/var/lib/pgsql/*/data/pg*/*' )
SSH_ROOT_PASSWORD='$6$CFQoHxuu57fA8oWc$RCLWE/ZiSlKFAjNADp6ob.feRYxy/zk1Hch/QK9awCMhtTwPmEdddza/w5WlpnK85pcCVhh/MJ4evHTg73sl//'
#BOOTLOADER="GRUB2-EFI"
  • Hardware (PC or PowerNV BareMetal or ARM) or virtual machine (KVM guest or PoverVM LPAR):
    Virtual Machine on VMware ESXi

  • System architecture (x86 compatible or PPC64/PPC64LE or what exact ARM device):
    x86

  • Firmware (BIOS or UEFI or Open Firmware) and bootloader (GRUB or ELILO or Petitboot):
    UEFI with GRUB2

  • Storage (local disk or SSD) and/or SAN (FC or iSCSI or FCoE) and/or multipath (DM or NVMe):
    Local Disk

  • Storage layout ("lsblk -ipo NAME,KNAME,PKNAME,TRAN,TYPE,FSTYPE,SIZE,MOUNTPOINT" or "lsblk" as makeshift):

NAME   MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
sda      8:0    0  200G  0 disk
├─sda1   8:1    0    1M  0 part
├─sda2   8:2    0  500M  0 part /boot
├─sda3   8:3    0 97,7G  0 part /
├─sda4   8:4    0  7,9G  0 part [SWAP]
├─sda5   8:5    0  200M  0 part /boot/efi
└─sda6   8:6    0 93,8G  0 part /var/lib/pgsql
sr0     11:0    1  683M  0 rom
  • Description of the issue (ideally so that others can reproduce it):
    I don't know why the encryption password isn't saved normally in the TSM keystore like always, so i have to enter it manually during the restore. In the restore process the TSM restore hang because of that input of the user and i had to cancel it with STRG-C.

ReaR-TSM1
ReaR-TSM2
ReaR-TSM3

jsmeix commented at 2021-02-23 13:14:

@dcz01
I do not really understand your issue description
and I am not a TSM user (I use no third party backup tools)
but from what I read in your desctiption (excerpts):

I don't know why the encryption password isn't saved normally
in the TSM keystore like always, so i have to enter it manually
during the restore. In the restore process the TSM restore hang
because of that input of the user
...
Workaround, if any:
Change redirect stdout from logfile to user display.

I assume the actual issue is that
"TSM restore fails because of impossible user input
when stdout and stderr are redirected to the log"

If my assumtion is right what would need to be done is
to not redirect stdout and stderr to the log only for the particular
TSM restore command(s) where user input can happen,
cf. "What to do with stdin, stdout, and stderr" in
https://github.com/rear/rear/wiki/Coding-Style

jsmeix commented at 2021-02-23 13:28:

As far as I see (without actually using TSM) during 'rear recover'
the following TSM specific scripts are run:

verify/TSM/default/389_check_TSM_connection.sh
verify/TSM/default/390_request_point_in_time_restore_parameters.sh
verify/TSM/default/400_verify_tsm.sh
restore/TSM/default/400_restore_with_tsm.sh

which should be checked how TSM commands are called therein.

In particular restore/TSM/default/400_restore_with_tsm.sh
contains (long line shown wrapped here):

    LC_ALL=$LANG_RECOVER \
 dsmc restore "$filespace" "$TARGET_FS_ROOT/$filespace" \
 -subdir=yes -replace=all -tapeprompt=no \
 -errorlogname=\"$backup_restore_log_file\" "${TSM_DSMC_RESTORE_OPTIONS[@]}" \
 0<&6 1>>"$backup_restore_log_file" 2>&1

where stdout and stderr are redirected to $backup_restore_log_file

@dcz01
perhaps it helps
when you replace in that particular command the current redirections

... 0<&6 1>>"$backup_restore_log_file" 2>&1

with the one from https://github.com/rear/rear/wiki/Coding-Style

... 0<&6 1>&7 2>&8

The drawback is that then all dsmc restore output appears on the terminal
but at least as a test whether or not then dsmc restore user input is possible
you should try out how that behaves in your particular case.

jsmeix commented at 2021-02-23 13:40:

@dcz01
is my guess right that what I see in the TSM...restore.log files

ANS8013I Invalid encryption key password

--- User Action is Required ---
File: /mnt/local/... requires an encryption key.

Select an appropriate action
  1. Prompt for encrypt key password
  2. Skip this object from decryption
  3. Skip all objects that are encrypted
  A. Abort this operation

is the TSM user input prompt that you don't see on the terminal?

FYI:
Because of the 0<&6 redirection stdin should be still your keyboard
so you could still blindly type in the needed input when TSM needs it
(provided you could imagine the TSM user input prompts).

dcz01 commented at 2021-02-24 09:03:

@jsmeix Sure, your assuming is right. The only problem here is that the user input from TSM client isn't shown on terminal.
Perhaps it is possible to use the tee tool to redirect the output to log and on the terminal?
And if the whole output from the restore with TSM shouldn't be shown on the terminal and log, the option -quiet can be used with the dsmc program.

https://www.ibm.com/support/knowledgecenter/SSEQVQ_8.1.11/perf/r_opt_client_quiet.html

jsmeix commented at 2021-02-24 10:05:

@dcz01
modify your ReaR scripts as you need it for your particular case
and then show your exact changes to us here at ReaR upstream
preferably as a GitHub pull request but plain diff -u output would be also ok
so that we could see what actually made things work for your specific case.

I cannot do anything for the TSM code in ReaR on my own
because - as I wrote (see above) - I do not use TSM.

Of course tee can be used (because ReaR is normal bash scripting).
For an example see usr/share/rear/backup/BORG/default/500_make_backup.sh
https://github.com/rear/rear/blob/master/usr/share/rear/backup/BORG/default/500_make_backup.sh
and usr/share/rear/restore/BORG/default/400_restore_backup.sh
https://github.com/rear/rear/blob/master/usr/share/rear/restore/BORG/default/400_restore_backup.sh
which contain explanatory comments that contain a link to
https://github.com/rear/rear/pull/2382
and therein in the comments there is the actual explanation of code like

# Get stdin from where it was when 'rear' was launched = fd6 (usually the user's keyboard) and
# append stdout where it was when 'rear' was launched = fd7 (usually the user's terminal) and
# append stdout also to the log file that is used while 'rear' runs = $RUNTIME_LOGFILE and
# append stderr where it was when 'rear' was launched = fd8 (usually the user's terminal) and
# append stderr also to the log file that is used while 'rear' runs= $RUNTIME_LOGFILE
... 0<&6 1>> >( tee -a "$RUNTIME_LOGFILE" 1>&7 ) 2>> >( tee -a "$RUNTIME_LOGFILE" 1>&8 )

See
https://github.com/rear/rear/pull/2382#discussion_r418018571
that reads

Attention:
I think when using process substitution we must ensure
to append to the filename that is the result of the process substitution
because when we do not append we overwrite existing things in that file.
So I think we must append via '2>> >( COMMAND )'
instead of plain writing via '2> >( COMMAND )'
for example

0<&6 1>&7 2>> >( tee >( cat 1>&2 ) 1>&8 )
0<&6 1>&7 2>> >( tee -a $RUNTIME_LOGFILE 1>&8 )

or for both stdout and stderr

0<&6 1>> >( tee >( cat 1>&2 ) 1>&7 ) 2>> >( tee >( cat 1>&2 ) 1>&8 )
0<&6 1>> >( tee -a $RUNTIME_LOGFILE 1>&7 ) 2>> >( tee -a $RUNTIME_LOGFILE 1>&8 )

cf. the "Start logging" section in usr/sbin/rear
https://github.com/rear/rear/blob/master/usr/sbin/rear#L404

and the subsequent
https://github.com/rear/rear/pull/2382#discussion_r419378558
that reads

Meanwhile I think when using process substitution '>( COMMAND )'
we could use both '2>> >( COMMAND )' and '2> >( COMMAND )'
because '>( COMMAND )' evaluates to a file name where to the
current bash writes and where from COMMAND gets its stdin.
Usually that intermediate file is a named pipe (FIFO).
I think it does not matter if the current bash writes or appends to
that intermediate file because the current bash is a single process
so that both methods result that all is written sequentially and
nothing is overwritten by another process (i.e. the current bash is
the only writer to that intermediate file).
But what matters is that COMMAND appends to the ReaR log file
because when using process substitution COMMAND is run
asynchronously as another process beside the current bash.
Because the current bash has stdout redirected to the log file
now the process substitution '2> >( COMMAND )' also writes
asynchronously to the ReaR log file at the same time while
the current bash stdout appends to the ReaR log file.

With COMMAND = 'tee -a $RUNTIME_LOGFILE'
I am sure it appends to the ReaR log file.

In contrast with COMMAND = 'tee >( cat 1>&2 )'
I am not sure if that actually appends to the ReaR log file.
I think it could actually append to the ReaR log file if the
result of the inner process substitution 'cat 1>&2' is
that the cat stdout file descriptor is a copy of the
current bash's stderr file descriptor which is
redirected to append to the log file so that
also cat actually appends to the log file.
I think the 'tee >( ... )' plain stdout writing (no append)
does not matter as described above because tee is a single
process that writes into a FIFO wherefrom cat gets its stdin.
What matters is whether or not cat stdout plain writes
or actually appends to the ReaR log file.

What I do not know is what excatly it means when a file descriptor
is duplicated as in '1>&2' i.e. what excatly dup does.
Usual documentation often talks about 'copy' of file descriptors
but what I would like to know is what in the kernel's file access
structures gets copied so that it exists as two separated things
in particular when the current file position / file offset value gets
copied when several process are accessing one same file, cf.
https://stackoverflow.com/questions/5256599/what-are-file-descriptors-explained-in-simple-terms

Because appending is right in any case for logging and termial output
I prefer to explicitly append to be on the safe side
instead of blindly relying on obscure things.

dcz01 commented at 2021-02-25 15:05:

@jsmeix Now i found the problem and an bug of the TSM client software...
The problems is that some files are encrypted with an other key than others and because of that TSM prompts for the password.
In the log file i found that information where the saved key is deleted because of an other entered key and then it must be entered another time:

tail -f /var/lib/rear/restore/TSM.\:.774.restore.log
02/25/21   15:49:07 ANS0361I DIAG: A record has been deleted from password file, due to password change or wrong password.
02/25/21   15:49:46 ANS0361I DIAG: A record has been deleted from password file, due to password change or wrong password.
02/25/21   15:50:10 ANS0361I DIAG: A record has been deleted from password file, due to password change or wrong password.
02/25/21   15:50:35 ANS0361I DIAG: A record has been deleted from password file, due to password change or wrong password.
Restoring TSM filespace / completed successfully

tail -f /var/lib/rear/restore/TSM.\:var\:lib\:pgsql.7
74.restore.log
02/25/21   15:54:16 ANS0361I DIAG: A record has been deleted from password file, due to password change or wrong password.
02/25/21   15:54:29 ANS0361I DIAG: A record has been deleted from password file, due to password change or wrong password.
02/25/21   15:54:37 ANS0361I DIAG: A record has been deleted from password file, due to password change or wrong password.
Restoring TSM filespace /var/lib/pgsql/ completed successfully

For the prompt of the password i changed this line in restore/TSM/default/400_restore_with_tsm.sh:

    # Regarding things like '0<&6 1>&7 2>&8' see "What to do with stdin, stdout, and stderr" in https://github.com/rear/rear/wiki/Coding-Style
    # Both stdout and stderr are redirected into the backup restore log file
    # to have all backup restore program messages in one same log file and
    # in the right ordering because with 2>&1 both streams are correctly merged
    # cf. https://github.com/rear/rear/issues/885#issuecomment-310082587
    # which also means that in '-D' debugscript mode some 'set -x' messages of this code here
    # appear in the backup restore log file which is perfectly fine because in the normal log file
    # the above UserOutput tells via "restore progress can be followed with 'tail -f $backup_restore_log_file'"
    # where to look and it is helpful for debugging to also have the related 'set -x' messages in the same log file.
    # To be more on the safe side append to the log file '>>' instead of plain writing to it '>'
    # because when a program (bash in this case) is plain writing to the log file it can overwrite
    # output of a possibly simultaneously running process that likes to append to the log file
    # (e.g. when background processes run that also uses the log file for logging)
    # cf. https://github.com/rear/rear/issues/885#issuecomment-310308763
    LC_ALL=$LANG_RECOVER dsmc restore "$filespace" "$TARGET_FS_ROOT/$filespace" -subdir=yes -replace=all -tapeprompt=no -quiet -errorlogname=\"$backup_restore_log_file\" "${TSM_DSMC_RESTORE_OPTIONS[@]}" 0<&6 1>&7 2>&8
    dsmc_exit_code=$?

jsmeix commented at 2021-02-26 13:19:

@dcz01
thank you for your feedback.
Now I can better understand the whole issue.

As time permits (but no promises) I may have a look at
restore/TSM/default/400_restore_with_tsm.sh
how it could be improved to better work in cases
when dsmc restore needs input from the user.
There is a dilemma between "polluting" the user's terminal
with tons of dsmc restore output in normal operation
versus show dsmc restore output on the user's terminal
when dsmc restore needs input from the user.
If dsmc restore output in normal operation goes to stdout
but the dsmc restore user input prompts appear on stderr
(as the bash builtin read does it with its -p prompt_text output)
we could have stdout only in the log
and stderr on the terminal plus in the log via tee.

Have a relaxed and recovering weekend!

github-actions commented at 2021-04-28 02:30:

Stale issue message

dcz01 commented at 2021-04-28 11:04:

Comment because of GitHub-Bot.
Can someone watch this problem?

github-actions commented at 2021-07-05 02:08:

Stale issue message


[Export of Github issue for rear/rear.]