#2089 Issue closed: Error() from subshell does not exit so that arbitrary further commands are run

Labels: fixed / solved / done, critical / security / legal

jsmeix opened issue at 2019-03-18 12:44:

  • ReaR version ("/usr/sbin/rear -V"):
    current GitHub master code

  • Description of the issue (ideally so that others can reproduce it):

When the Error function is called from code in
layout/save/GNU/Linux/200_partition_layout.sh
it does not error out where the Error function is called
but instead it continues and shows more and more errors, cf.
https://github.com/rear/rear/issues/2087#issuecomment-473884096

I think this is not how it should behave.
I would expect that ReaR always aborts at the first Error function call.

To reproduce it change in usr/share/rear/lib/layout-functions.sh
the get_partition_number() function to

 get_partition_number() {
     local partition=$1
     LogPrint "called get_partition_number with partition=$partition but using partition=mmcblk0boot0"
     partition="mmcblk0boot0"
    ...

and then run usr/sbin/rear -D mkrescue which results output like
(depending on how many storage objects one has):

# usr/sbin/rear -D mkrescue
Relax-and-Recover 2.4 / Git
Running rear mkrescue (PID 22678)
Using log file: /root/rear.github.master/var/log/rear/rear-g243.log
Using UEFI Boot Loader for Linux (USING_UEFI_BOOTLOADER=1)
Using autodetected kernel '/boot/vmlinuz-4.12.14-lp150.12.25-default' as kernel in the recovery system
Creating disk layout
called get_partition_number with partition=sda1 but using partition=mmcblk0boot0
ERROR: Partition number '0' of partition mmcblk0boot0 is not a valid number.
Some latest log messages since the last called script 200_partition_layout.sh:
  2019-03-18 13:34:09.355052399 Including layout/save/GNU/Linux/200_partition_layout.sh
  2019-03-18 13:34:09.356037030 Entering debugscripts mode via 'set -x'.
  2019-03-18 13:34:09.366840669 Saving disk partitions.
  2019-03-18 13:34:09.413910975 called get_partition_number with partition=sda1 but using partition=mmcblk0boot0
called get_partition_number with partition=sda2 but using partition=mmcblk0boot0
ERROR: Partition number '0' of partition mmcblk0boot0 is not a valid number.
Some latest log messages since the last called script 200_partition_layout.sh:
  Trace 2: /root/rear.github.master/usr/share/rear/lib/framework-functions.sh:116 SourceStage
  Trace 3: /root/rear.github.master/usr/share/rear/lib/framework-functions.sh:56 Source
  Trace 4: /root/rear.github.master/usr/share/rear/layout/save/GNU/Linux/200_partition_layout.sh:323 source
  Trace 5: /root/rear.github.master/usr/share/rear/layout/save/GNU/Linux/200_partition_layout.sh:74 extract_partitions
  Trace 6: /root/rear.github.master/usr/share/rear/lib/layout-functions.sh:355 get_partition_number
  Trace 7: /root/rear.github.master/usr/share/rear/lib/_input-output-functions.sh:485 StopIfError
  === End stack trace ===
  2019-03-18 13:34:10.438831655 called get_partition_number with partition=sda2 but using partition=mmcblk0boot0
called get_partition_number with partition=sda3 but using partition=mmcblk0boot0
ERROR: Partition number '0' of partition mmcblk0boot0 is not a valid number.
Some latest log messages since the last called script 200_partition_layout.sh:
  Trace 2: /root/rear.github.master/usr/share/rear/lib/framework-functions.sh:116 SourceStage
  Trace 3: /root/rear.github.master/usr/share/rear/lib/framework-functions.sh:56 Source
  Trace 4: /root/rear.github.master/usr/share/rear/layout/save/GNU/Linux/200_partition_layout.sh:323 source
  Trace 5: /root/rear.github.master/usr/share/rear/layout/save/GNU/Linux/200_partition_layout.sh:74 extract_partitions
  Trace 6: /root/rear.github.master/usr/share/rear/lib/layout-functions.sh:355 get_partition_number
  Trace 7: /root/rear.github.master/usr/share/rear/lib/_input-output-functions.sh:485 StopIfError
  === End stack trace ===
  2019-03-18 13:34:10.438831655 called get_partition_number with partition=sda2 but using partition=mmcblk0boot0
called get_partition_number with partition=sdb1 but using partition=mmcblk0boot0
ERROR: Partition number '0' of partition mmcblk0boot0 is not a valid number.
Some latest log messages since the last called script 200_partition_layout.sh:
  Trace 2: /root/rear.github.master/usr/share/rear/lib/framework-functions.sh:116 SourceStage
  Trace 3: /root/rear.github.master/usr/share/rear/lib/framework-functions.sh:56 Source
  Trace 4: /root/rear.github.master/usr/share/rear/layout/save/GNU/Linux/200_partition_layout.sh:323 source
  Trace 5: /root/rear.github.master/usr/share/rear/layout/save/GNU/Linux/200_partition_layout.sh:74 extract_partitions
  Trace 6: /root/rear.github.master/usr/share/rear/lib/layout-functions.sh:355 get_partition_number
  Trace 7: /root/rear.github.master/usr/share/rear/lib/_input-output-functions.sh:485 StopIfError
  === End stack trace ===
  2019-03-18 13:34:10.438831655 called get_partition_number with partition=sda2 but using partition=mmcblk0boot0
called get_partition_number with partition=sdb2 but using partition=mmcblk0boot0
ERROR: Partition number '0' of partition mmcblk0boot0 is not a valid number.
Some latest log messages since the last called script 200_partition_layout.sh:
  Trace 2: /root/rear.github.master/usr/share/rear/lib/framework-functions.sh:116 SourceStage
  Trace 3: /root/rear.github.master/usr/share/rear/lib/framework-functions.sh:56 Source
  Trace 4: /root/rear.github.master/usr/share/rear/layout/save/GNU/Linux/200_partition_layout.sh:323 source
  Trace 5: /root/rear.github.master/usr/share/rear/layout/save/GNU/Linux/200_partition_layout.sh:74 extract_partitions
  Trace 6: /root/rear.github.master/usr/share/rear/lib/layout-functions.sh:355 get_partition_number
  Trace 7: /root/rear.github.master/usr/share/rear/lib/_input-output-functions.sh:485 StopIfError
  === End stack trace ===
  2019-03-18 13:34:10.438831655 called get_partition_number with partition=sda2 but using partition=mmcblk0boot0
called get_partition_number with partition=sdb3 but using partition=mmcblk0boot0
ERROR: Partition number '0' of partition mmcblk0boot0 is not a valid number.
Some latest log messages since the last called script 200_partition_layout.sh:
  Trace 2: /root/rear.github.master/usr/share/rear/lib/framework-functions.sh:116 SourceStage
  Trace 3: /root/rear.github.master/usr/share/rear/lib/framework-functions.sh:56 Source
  Trace 4: /root/rear.github.master/usr/share/rear/layout/save/GNU/Linux/200_partition_layout.sh:323 source
  Trace 5: /root/rear.github.master/usr/share/rear/layout/save/GNU/Linux/200_partition_layout.sh:74 extract_partitions
  Trace 6: /root/rear.github.master/usr/share/rear/lib/layout-functions.sh:355 get_partition_number
  Trace 7: /root/rear.github.master/usr/share/rear/lib/_input-output-functions.sh:485 StopIfError
  === End stack trace ===
  2019-03-18 13:34:10.438831655 called get_partition_number with partition=sda2 but using partition=mmcblk0boot0
called get_partition_number with partition=sdb5 but using partition=mmcblk0boot0
ERROR: Partition number '0' of partition mmcblk0boot0 is not a valid number.
Some latest log messages since the last called script 200_partition_layout.sh:
  Trace 2: /root/rear.github.master/usr/share/rear/lib/framework-functions.sh:116 SourceStage
  Trace 3: /root/rear.github.master/usr/share/rear/lib/framework-functions.sh:56 Source
  Trace 4: /root/rear.github.master/usr/share/rear/layout/save/GNU/Linux/200_partition_layout.sh:323 source
  Trace 5: /root/rear.github.master/usr/share/rear/layout/save/GNU/Linux/200_partition_layout.sh:74 extract_partitions
  Trace 6: /root/rear.github.master/usr/share/rear/lib/layout-functions.sh:355 get_partition_number
  Trace 7: /root/rear.github.master/usr/share/rear/lib/_input-output-functions.sh:485 StopIfError
  === End stack trace ===
  2019-03-18 13:34:10.438831655 called get_partition_number with partition=sda2 but using partition=mmcblk0boot0
called get_partition_number with partition=sdb6 but using partition=mmcblk0boot0
ERROR: Partition number '0' of partition mmcblk0boot0 is not a valid number.
Some latest log messages since the last called script 200_partition_layout.sh:
  Trace 2: /root/rear.github.master/usr/share/rear/lib/framework-functions.sh:116 SourceStage
  Trace 3: /root/rear.github.master/usr/share/rear/lib/framework-functions.sh:56 Source
  Trace 4: /root/rear.github.master/usr/share/rear/layout/save/GNU/Linux/200_partition_layout.sh:323 source
  Trace 5: /root/rear.github.master/usr/share/rear/layout/save/GNU/Linux/200_partition_layout.sh:74 extract_partitions
  Trace 6: /root/rear.github.master/usr/share/rear/lib/layout-functions.sh:355 get_partition_number
  Trace 7: /root/rear.github.master/usr/share/rear/lib/_input-output-functions.sh:485 StopIfError
  === End stack trace ===
  2019-03-18 13:34:10.438831655 called get_partition_number with partition=sda2 but using partition=mmcblk0boot0
called get_partition_number with partition=sdb7 but using partition=mmcblk0boot0
ERROR: Partition number '0' of partition mmcblk0boot0 is not a valid number.
Some latest log messages since the last called script 200_partition_layout.sh:
  Trace 2: /root/rear.github.master/usr/share/rear/lib/framework-functions.sh:116 SourceStage
  Trace 3: /root/rear.github.master/usr/share/rear/lib/framework-functions.sh:56 Source
  Trace 4: /root/rear.github.master/usr/share/rear/layout/save/GNU/Linux/200_partition_layout.sh:323 source
  Trace 5: /root/rear.github.master/usr/share/rear/layout/save/GNU/Linux/200_partition_layout.sh:74 extract_partitions
  Trace 6: /root/rear.github.master/usr/share/rear/lib/layout-functions.sh:355 get_partition_number
  Trace 7: /root/rear.github.master/usr/share/rear/lib/_input-output-functions.sh:485 StopIfError
  === End stack trace ===
  2019-03-18 13:34:10.438831655 called get_partition_number with partition=sda2 but using partition=mmcblk0boot0
Aborting due to an error, check /root/rear.github.master/var/log/rear/rear-g243.log for details
Exiting rear mkrescue (PID 22678) and its descendant processes
Running exit tasks
You should also rm -Rf /tmp/rear.UwrqXwBgg0MNI9Q
Terminated

In my case I have

# lsblk -ipo NAME,KNAME,PKNAME,TRAN,TYPE,FSTYPE,SIZE,MOUNTPOINT
NAME        KNAME     PKNAME   TRAN   TYPE FSTYPE   SIZE MOUNTPOINT
/dev/sda    /dev/sda           sata   disk        931.5G 
|-/dev/sda1 /dev/sda1 /dev/sda        part vfat     500M /boot/efi
|-/dev/sda2 /dev/sda2 /dev/sda        part ext4   915.5G /
`-/dev/sda3 /dev/sda3 /dev/sda        part swap    15.6G [SWAP]
/dev/sdb    /dev/sdb           sata   disk        238.5G 
|-/dev/sdb1 /dev/sdb1 /dev/sdb        part         1000M 
|-/dev/sdb2 /dev/sdb2 /dev/sdb        part         1000M 
|-/dev/sdb3 /dev/sdb3 /dev/sdb        part            1K 
|-/dev/sdb5 /dev/sdb5 /dev/sdb        part ext2    1000M /sdb5
|-/dev/sdb6 /dev/sdb6 /dev/sdb        part         1000M 
`-/dev/sdb7 /dev/sdb7 /dev/sdb        part         1000M 
/dev/sr0    /dev/sr0           sata   rom          1024M

jsmeix commented at 2019-03-18 13:13:

I can reproduce it in general:

The root cause is when Error is called within a subshell.

At the beginning of init/default/030_update_recovery_system.sh
I added

( Error "Test error 1"
  Error "Test error 2"
  Error "Test error 3"
)
LogPrint "Message after Error"

and now I got

# usr/sbin/rear -D mkrescue
Relax-and-Recover 2.4 / Git
Running rear mkrescue (PID 2810)
Using log file: /root/rear.github.master/var/log/rear/rear-g243.log
ERROR: Test error 1
Some latest log messages since the last called script 030_update_recovery_system.sh:
  2019-03-18 14:11:27.326733922 Including init/default/030_update_recovery_system.sh
  2019-03-18 14:11:27.327588697 Entering debugscripts mode via 'set -x'.
ERROR: Test error 2
Some latest log messages since the last called script 030_update_recovery_system.sh:
  ===== Stack trace =====
                     END { for (i=NR; i>0;) print "Trace "NR-i": "l[i--] }
                   '
  Trace 0: usr/sbin/rear:509 main
  Trace 1: /root/rear.github.master/usr/share/rear/lib/framework-functions.sh:116 SourceStage
  Trace 2: /root/rear.github.master/usr/share/rear/lib/framework-functions.sh:56 Source
  Trace 3: /root/rear.github.master/usr/share/rear/init/default/030_update_recovery_system.sh:12 source
  === End stack trace ===
ERROR: Test error 3
Some latest log messages since the last called script 030_update_recovery_system.sh:
  ===== Stack trace =====
                     END { for (i=NR; i>0;) print "Trace "NR-i": "l[i--] }
                   '
  Trace 0: usr/sbin/rear:509 main
  Trace 1: /root/rear.github.master/usr/share/rear/lib/framework-functions.sh:116 SourceStage
  Trace 2: /root/rear.github.master/usr/share/rear/lib/framework-functions.sh:56 Source
  Trace 3: /root/rear.github.master/usr/share/rear/init/default/030_update_recovery_system.sh:12 source
  === End stack trace ===
Aborting due to an error, check /root/rear.github.master/var/log/rear/rear-g243.log for details
Exiting rear mkrescue (PID 2810) and its descendant processes
Running exit tasks
Terminated

Inspecting the log shows that

# cat -n /root/rear.github.master/var/log/rear/rear-g243.log | egrep '+ Error |USR1|EXIT_FAIL_MESSAGE|DoExitTasks'
   137  ++ Error 'Test error 1'
   168  ++ kill -USR1 3865
   170  ++ Error 'Test error 2'
   207  ++ kill -USR1 3865
   209  ++ Error 'Test error 3'
   246  ++ kill -USR1 3865
   248  +++ EXIT_FAIL_MESSAGE=0
   251  +++ DoExitTasks

the actual abort/exit happens only at the end,
I assume after the subshell is finished.
There is only one DoExitTasks (regardless that Error was called three times).

jsmeix commented at 2019-03-18 13:15:

@schlomo @gdha @gozora @schabrolles @rmetrich
can one of you explain to me why it behaves this way?

I run it on openSUSE Leap 15.0 with GNU bash, version 4.4.23

jsmeix commented at 2019-03-18 13:26:

Currently only a blind guess from what the log excerpts in
https://github.com/rear/rear/issues/2089#issuecomment-473904131
indicate:

the actual abort/exit happens only at the end,
I assume after the subshell is finished

When Error is called it does kill -USR1 $MASTER_PID where
that USR1 has a trap (see lib/_input-output-functions.sh)
that does kill $MASTER_PID whicht triggers another trap on EXIT
that calls DoExitTasks()

When Error is called multiple times from within a subshell
the USR1 is sent multiple times to $MASTER_PID (i.e. to usr/sbin/rear)
but nothing happens at $MASTER_PID because it waits until its
child subshell process is finished and after the subshell is finished
$MASTER_PID processes USR1 immediately.

jsmeix commented at 2019-03-18 13:29:

@rear/contributors
do you have an idea how to "wake up" $MASTER_PID
from an Error function call that happens within a subshell?

jsmeix commented at 2019-03-18 13:56:

It seems it helps a bit to do

    ...
    kill -USR1 $MASTER_PID
    if test $BASH_SUBSHELL -gt 0 ; then
        LogPrint "Exiting subshell $BASH_SUBSHELL"
        exit 0
    fi
}

at the end of the Error function.

That works for me for Error not in a subshells and for Error in a first subshell.

But it does not work for Error in a second level subshell as in

( LogPrint "First subshell"
  ( LogPrint "Second subshell"
    Error "Test error a"
    Error "Test error b"
  )
  Error "Test error 1"
  Error "Test error 2"
  LogPrint "Message after second subshell"
)
LogPrint "Message after first subshell"

which results

# usr/sbin/rear -D mkrescue
Relax-and-Recover 2.4 / Git
Running rear mkrescue (PID 12347)
Using log file: /root/rear.github.master/var/log/rear/rear-g243.log
First subshell
Second subshell
ERROR: Test error a
Some latest log messages since the last called script 030_update_recovery_system.sh:
  2019-03-18 15:00:21.617675796 Including init/default/030_update_recovery_system.sh
  2019-03-18 15:00:21.618557211 Entering debugscripts mode via 'set -x'.
  2019-03-18 15:00:21.622298786 First subshell
  2019-03-18 15:00:21.623905894 Second subshell
Exiting subshell 2
ERROR: Test error 1
Some latest log messages since the last called script 030_update_recovery_system.sh:
                     END { for (i=NR; i>0;) print "Trace "NR-i": "l[i--] }
                   '
  Trace 0: usr/sbin/rear:509 main
  Trace 1: /root/rear.github.master/usr/share/rear/lib/framework-functions.sh:116 SourceStage
  Trace 2: /root/rear.github.master/usr/share/rear/lib/framework-functions.sh:56 Source
  Trace 3: /root/rear.github.master/usr/share/rear/init/default/030_update_recovery_system.sh:17 source
  === End stack trace ===
  2019-03-18 15:00:21.635646384 Exiting subshell 2
Exiting subshell 1
Aborting due to an error, check /root/rear.github.master/var/log/rear/rear-g243.log for details
Exiting rear mkrescue (PID 12347) and its descendant processes
Running exit tasks
Terminated

i.e. at each subshell level its first error is shown.

jsmeix commented at 2019-03-18 14:20:

The problem with neested subshells is code like that

( LogPrint "First subshell"
  ( LogPrint "Second subshell"
    Error "Test error a"
    Error "Test error b"
  )
  LogPrint "Code in first subshell after second subshell"
  echo "foo" >/tmp/rear_test_file
  LogPrint "$( cat /tmp/rear_test_file )"
  rm -v /tmp/rear_test_file 0<&6 1>&7 2>&8
  LogPrint "End of first subshell"
)
LogPrint "Message after first subshell"

which results

# usr/sbin/rear -D mkrescue
Relax-and-Recover 2.4 / Git
Running rear mkrescue (PID 16123)
Using log file: /root/rear.github.master/var/log/rear/rear-g243.log
First subshell
Second subshell
ERROR: Test error a
Some latest log messages since the last called script 030_update_recovery_system.sh:
  2019-03-18 15:18:44.875866796 Including init/default/030_update_recovery_system.sh
  2019-03-18 15:18:44.876702734 Entering debugscripts mode via 'set -x'.
  2019-03-18 15:18:44.880133608 First subshell
  2019-03-18 15:18:44.881709353 Second subshell
Exiting subshell 2
Code in first subshell after second subshell
foo
removed '/tmp/rear_test_file'
End of first subshell
Aborting due to an error, check /root/rear.github.master/var/log/rear/rear-g243.log for details
Exiting rear mkrescue (PID 16123) and its descendant processes
Running exit tasks
Terminated

i.e. all code in the first subshell is still executed regadless
of the Error before in the second subshell.
It exits only after the first subshell is exited.

jsmeix commented at 2019-03-18 15:10:

I think at the end of the Error function
I have to recursively exit all subshells
because if we are in the Error function within a subshell
we cannot return from the Error function to its caller
because that would execute further code of the caller.

jsmeix commented at 2019-03-18 17:30:

I think in
https://github.com/rear/rear/pull/2088#issuecomment-474020900
things look better now...

jsmeix commented at 2019-03-19 08:17:

I made this a critical bug because the current Error() in ReaR behaves this way
even if currently - as far as I know - there is no actual critical issue because of this.

With the current Error() behaviour the following code in a script
would destroy data

# backup and remove data in subshell
( backup_data || Error "Failed to backup data"
  remove_data
)

because when backup_data failed in the subshell
Error() does not exit immediately so that remove_data is run
and after the subshell finished it exits but then it is too late.

schlomo commented at 2019-03-19 08:37:

@jsmeix indeed a lot of complexity added to make up for the fact that Bash is not a programming language.

First of all congrats on this deep research and the solution that you found!

Do we have such a big need for erroring out of subshells? Wouldn't the code be simpler if everybody know that this doesn't work and they they need to do error handling in the top-level shell process?

Since we don't have unit tests I am afraid that the risk for new bugs in this extremely complex new code (multi processing is always extremely complex) is as large as the risk of the bugs that we have without this complexity.

Reading through this entire issue again I am wondering if maybe we use subshells too much? Maybe a lot of ( ) cases could be handled by { ;} as well? I really would prefer to discuss that question and to work on reducing the overall complexity in our code.

If and once we switch to a real programming language like Python, then all of this will be elegantly handled by language features like exceptions (I am assuming that we won't use Go which believes in explicit error handling via return values :-)). In the mean time we can also think about adopting the Go model of explicit error handling, if we really really need to error out of nested shells.

jsmeix commented at 2019-03-19 09:20:

Yesterday I was too much in "just hacking that issue out of my way" mode
but - as always - sleeping over an issue helps ;-)

Now I have a nice clean reproducer on plain command line
(I use the outermost subshell only to not pollute my current bash):

# ( export MASTERPID=$BASHPID ; trap "echo $MASTERPID got USR1" USR1 ; ( echo begin subshell $BASHPID parent $MASTERPID ; pstree -Aplau $$ ; kill -USR1 $MASTERPID ; echo sent USR1 to $MASTERPID in subshell ; echo other stuff in subshell ; echo subshell done ) ; echo $MASTERPID done )

begin subshell 26109 parent 26108
bash,21908
  `-bash,26108
      `-bash,26109
          `-pstree,26110 -Aplau 21908
sent USR1 to 26108 in subshell
other stuff in subshell
subshell done
26108 got USR1
26108 done

One needs a recent bash version that supports BASHPID
(my GNU bash, version 3.2.57 on SLES11 does not).

It shows that the parent waits until its subshell child has finished
and then the parent processes the signal.
This behaviour matches what "man bash" (also for bash 3.x) reads

SIGNALS
...
If bash is waiting for a command to complete
and receives a signal for which a trap has been set,
the trap will not be executed until the command completes.

What is not 100% clear is that here command
not only means simple command but
also means compound command cf. "man bash"

Compound Commands
...
(list) list is executed in a subshell environment ...

Perhaps somewhere in "man bash" it is defined what command means ;-)

schlomo commented at 2019-03-19 09:37:

Yes, to the best of my knowledge a subshell behaves like a command in the parent.

To summarize: I think that you are trying to work against the design of Bash and therefore your solution is not trivial. You might also try to look into the process group to kill off stuff (not sure, maybe this can help for inspiration) and check for running background jobs.

I really do think that in ReaR we have to respect the boundaries of Bash and rather limit our code constructs to work well within those limits. Otherwise we raise complexity exponentially without any benefit on the bit picture level.

jsmeix commented at 2019-03-19 09:41:

@schlomo
I don't think we use subshells too often but we use
subshells that are much longer than needed as in
layout/save/GNU/Linux/230_filesystem_layout.sh
(here with line numbers)

    76  # Begin of the subshell that appends its stdout to DISKLAYOUT_FILE:
    77  (
...
   163                  StopIfError "Divide by zero detected"
...
   175                  StopIfError "Divide by zero detected"
...
   195                  StopIfError "Failed to save XFS options of $device"
...
   308                          Error "BTRFS_SUBVOLUME_SLES_SETUP is false but $info_message"
...
   509  ) >> $DISKLAYOUT_FILE
   510  # End of the subshell that appends its stdout to DISKLAYOUT_FILE.

in contrast to the subshells in
layout/prepare/GNU/Linux/100_include_partition_code.sh
that have basically the form

  ( echo "..."
    echo "..."
    ....
    echo "..."
  ) >> $LAYOUT_CODE

i.e. only simple echo commands in the subshell
and the subshell is not longer than actually needed.

So all our subshells should be checked but that is difficult
because it is difficult to search for subshells in the code
(searching for ( and ) would not help so much).

I like to have the fundamental functionality in ReaR fail-safe in general
to be safe against possibly unknowingly inappropriate code in scripts
so that users who adapt the ReaR sctipts do not need to know about
special limitations in our fundamental functionality.

And currently just before the ReaR 2.5 release
I cannot check and clean up all our possibly problematic subshells.

schlomo commented at 2019-03-19 10:04:

Good example, here a group command with { ;} is IMHO much much better suited. It will aggregate the output but the error functions should "just work"

jsmeix commented at 2019-03-21 10:43:

With https://github.com/rear/rear/pull/2088 merged
this specific issue should be fixed/solved/done.
I.e. now Error() from subshell exits completely
so that arbitrary further commands are no longer run.

Now that part of ReaR's fundamental functionality should work
reasonably fail-safe even in case of awkward code in scripts, e.g. see
https://github.com/rear/rear/pull/2088#issuecomment-474888371
where "reasonably fail-safe" means we cannot catch
all cases with reasonable effort, e.g. see
https://github.com/rear/rear/pull/2088#issuecomment-474902999

Regarding "awkward code in scripts":
I think even if we clean up and adjust excessive subshell usage
in all our own ReaR scripts cf. https://github.com/rear/rear/issues/2090
it would probably not really help us and our users
to avoid that Error() sometimes fails to actually error out
because ReaR calls arbitrary third-party tools (like backup/restore tools)
where we cannot know what those tools actually do.
Therefore I think ReaR's fundamental functionality should work
reasonably fail-safe in general (as above "reasonably" means
"with reasonable effort").

In this particular case the effort became almost unreasonable for me
in particular yesterday I spent much time on the simple looking task
to "just get my own PID" in bash 3.x that has no $BASHPID, cf.
https://github.com/rear/rear/blob/master/usr/share/rear/lib/_input-output-functions.sh#L212

I would be happy to learn if someone knows a simpler way
how to "get my own PID in bash 3.x" that also works from within
a (possibly nested) subshell (then "get my own subshell PID").

jsmeix commented at 2019-03-22 11:45:

For documentation to be preserved for the future
as an example of developers life in our time ;-)

Interestingly on command line

mypid=$( bash -c 'echo $PPID' )

works:

# ( echo \$\$ : $$ ; echo PPID : $PPID ; \
    ( echo in subshell ; \
      echo \$\$ : $$ ; echo PPID : $PPID ; \
      echo -n "( bash -c 'echo \$PPID' ) : " ; \
        ( : ; bash -c 'echo $PPID' ) ; \
      echo -n "bash -c 'echo \$PPID' : " ; \
        bash -c 'echo $PPID' ; \
      mypid=$( bash -c 'echo $PPID' ) ; \
        echo "mypid : $mypid" ; \
      cat /proc/self/stat >/tmp/mystsat ; \
      mypid=$( cut -d ' ' -f4 /tmp/mystsat ) ; \
      echo "mypid : $mypid" ; \
      pstree -Aplau $$ ) )

$$ : 3951
PPID : 3948
in subshell
$$ : 3951
PPID : 3948
( bash -c 'echo $PPID' ) : 29829
bash -c 'echo $PPID' : 29828
mypid : 29828
mypid : 29828
bash,3951
  `-bash,29827
      `-bash,29828
          `-pstree,29835 -Aplau 3951

But mypid=$( bash -c 'echo $PPID' )
does no longer work when it is in a sourced script:

# cat myscript.sh
( echo \$\$ : $$
  echo PPID : $PPID
  ( echo in subshell
    echo \$\$ : $$
    echo PPID : $PPID
    echo -n "( bash -c 'echo \$PPID' ) : "
    ( : ; bash -c 'echo $PPID' )
    echo -n "bash -c 'echo \$PPID' : "
    bash -c 'echo $PPID'
    mypid=$( bash -c 'echo $PPID' )
    echo "mypid : $mypid"
    cat /proc/self/stat >/tmp/mystsat
    mypid=$( cut -d ' ' -f4 /tmp/mystsat )
    echo "mypid : $mypid"
    pstree -Aplau $$
 )
)

# source myscript.sh
$$ : 3951
PPID : 3948
in subshell
$$ : 3951
PPID : 3948
( bash -c 'echo $PPID' ) : 29793
bash -c 'echo $PPID' : 29792
mypid : 29796
mypid : 29792
bash,3951
  `-bash,29791
      `-bash,29792
          `-pstree,29801 -Aplau 3951

So it seems

cat /proc/self/stat >/tmp/mystsat
mypid=$( cut -d ' ' -f4 /tmp/mystsat )

is the only reliable way to "get my own PID" in bash 3.x
that also works from within a (possibly nested) subshell
and then I want to "get my own subshell PID"
that also works in a sourced script.

jsmeix commented at 2019-03-26 11:44:

A SUSE colleague found the simplest way
to get the current PID without using bsh 4.x $BASHPID:

read current_pid junk </proc/self/stat

This also works even in nested subshells in a sourced script
and it is simpler because read is a shell builtin and
the < stdin redirection does not cause another subshell.

FYI
here again my (slightly enhanced) test script as in my
https://github.com/rear/rear/issues/2089#issuecomment-475590413
and its output results:

# cat myscript.sh

( echo in subshell
  echo \$\$ : $$
  echo PPID : $PPID
  ( echo in sub-subshell
    echo \$\$ : $$
    echo PPID : $PPID
    echo -n "( bash -c 'echo \$PPID' ) : "
    ( : ; bash -c 'echo $PPID' )
    echo -n "bash -c 'echo \$PPID' : "
    bash -c 'echo $PPID'
    mypid=$( bash -c 'echo $PPID' )
    echo "bash -c 'echo $PPID' mypid : $mypid"
    cat /proc/self/stat >/tmp/mystsat
    mypid=$( cut -d ' ' -f4 /tmp/mystsat )
    echo "cat /proc/self/stat mypid : $mypid"
    read mypid junk </proc/self/stat
    echo "read /proc/self/stat mypid : $mypid"
    pstree -Aplau $$
  )
)

# echo $$
3951

# echo $PPID
3948

# source myscript.sh

in subshell
$$ : 3951
PPID : 3948
in sub-subshell
$$ : 3951
PPID : 3948
( bash -c 'echo $PPID' ) : 6732
bash -c 'echo $PPID' : 6731
bash -c 'echo 3948' mypid : 6735
cat /proc/self/stat mypid : 6731
read /proc/self/stat mypid : 6731
bash,3951
  `-bash,6730
      `-bash,6731
          `-pstree,6740 -Aplau 3951

jsmeix commented at 2019-03-26 15:36:

Via https://github.com/rear/rear/pull/2099 the above
https://github.com/rear/rear/issues/2089#issuecomment-476590268
is now implemented.


[Export of Github issue for rear/rear.]