#3386 Issue closed: False "ERROR: Stale NFS mount point" because no timeout command in recovery system

Labels: fixed / solved / done, minor bug

jsmeix opened issue at 2025-01-22 15:32:

During my very first testing of "rear recover" in PORTABLE mode because of
https://github.com/rear/rear/pull/3379#issuecomment-2607222152
I noticed that - at least for me - "rear recover" fails with

RESCUE localhost:~ # rear -D recover
...
ERROR: Stale NFS mount point /nfs detected - please fix it first!
Some latest log messages since the last called script 100_check_stale_nfs_mounts.sh:
  2025-01-22 16:18:30.619467024 Entering debugscript mode via 'set -x'.
  2025-01-22 16:18:30.630197156 Trustworthy sourcing '/usr/share/rear/init/default/100_check_stale_nfs_mounts.sh'
  /usr/share/rear/init/default/100_check_stale_nfs_mounts.sh: line 26: timeout: command not found

I had done before

RESCUE localhost:~ # mount -t nfs -o nolock 192.168.178.66:/nfs/localhost.portable /nfs

to mount my NFS share where my localhost-portable.tar.gz is
to do then

RESCUE localhost:~ # tar -xvf /nfs/localhost-portable.tar.gz

cf.
https://github.com/rear/rear/blob/master/doc/user-guide/17-Portable-Mode.adoc

My workaround to avoid the false 'ERROR: Stale NFS mount point'
because of 'timeout: command not found' was

RESCUE localhost:~ # ln -s /bin/true /bin/timeout

after that "rear recover" in PORTABLE mode
seemed to have "just worked" for me.

So either 'timeout' gets added to REQUIRED_PROGS
or init/default/100_check_stale_nfs_mounts.sh
gets enhanced to skip testing for stale NFS mounts
when there is no 'timeout' command.

I think it is better to skip testing for stale NFS mounts
when there is no 'timeout' command because otherwise
it would also falsely error out when there is
no 'timeout' command on the original system
because the 'init' stage is always run.

gdha commented at 2025-01-23 08:57:

@jsmeix Wow, nice catch as I thought timeout was already part of one of our PROGS variables long time ago. Seems I made a wrong assumption when I found lots of timeout usages, but these seems to be variable timeout.
I prefer that 'timeout' gets added to REQUIRED_PROGS.

jsmeix commented at 2025-01-23 09:39:

Choose what you think is best.

Adding 'timeout' to REQUIRED_PROGS has the advantage
that we do not need workarounds in our code.

But adding 'timeout' to REQUIRED_PROGS alone
results that rear falsely fails with that error here
on systems where no 'timeout' command exists
when some NFS share is mounted

# mv /usr/bin/timeout /usr/bin/timeout.away

# mount -t nfs -o nolock 192.168.178.66:/nfs/localhost.portable /nfs

# usr/sbin/rear -D mkrescue
...
Running 'init' stage ======================
Running workflow mkrescue on the normal/original system
ERROR: Stale NFS mount point /nfs detected - please fix it first!
Some latest log messages since the last called script 100_check_stale_nfs_mounts.sh:
  2025-01-23 10:32:12.152329207 Entering debugscript mode via 'set -x'.
  2025-01-23 10:32:12.157029159 Trustworthy sourcing '/root/rear.github.master-jsmeix-source-wrapper/usr/share/rear/init/default/100_check_stale_nfs_mounts.sh'
  /root/rear.github.master-jsmeix-source-wrapper/usr/share/rear/init/default/100_check_stale_nfs_mounts.sh: line 26: timeout: command not found

where the error message "Stale NFS mount point /nfs detected"
is plain wrong because my NFS mount point /nfs is not stale
so nothing needs to be fixed there.

In contrast where rear should actually fail:

# umount /nfs

# usr/sbin/rear -D mkrescue
...
Running 'init' stage ======================
Running workflow mkrescue on the normal/original system
ERROR: Cannot find required programs: timeout
Some latest log messages since the last called script 950_check_missing_programs.sh:

So adding 'timeout' to REQUIRED_PROGS
additionally requires to run
init/default/100_check_stale_nfs_mounts.sh
after
init/default/950_check_missing_programs.sh
to let rear fail properly when 'timeout' is missing.

gdha commented at 2025-01-23 09:49:

I believe timeout is part of the coreutils package on RH, SLES, Debian and Ubuntu. Therefore, I wondering why was it missing on your distribution? Did timeout moved to another package within SLES somehow?

jsmeix commented at 2025-01-23 11:12:

I think you overlooked my

# mv /usr/bin/timeout /usr/bin/timeout.away

in my above test. I did that intentionally
to test how it behaves if no 'timeout' command exists.

Yes, on my SLES15-SP6 test system 'timeout'
is part of the coreutils package:

# rpm -qf /usr/bin/timeout
coreutils-8.32-150400.9.3.1.x86_64

In my above test my main point is not
whether or not a 'timeout' command exists.
My point is that 100_check_stale_nfs_mounts.sh shows
a wrong error message if no 'timeout' command exists
so adding 'timeout' to REQUIRED_PROGS alone is
not a proper fix.

jsmeix commented at 2025-01-23 11:19:

I checked on SLES11-SP4 (the oldest system I have available)
Also there 'timeout' is part of the coreutils package.

Perhaps it is sufficiently safe to blindly assume
that a 'timeout' command always exists
so adding 'timeout' to REQUIRED_PROGS alone
could be sufficient in practice?

jsmeix commented at 2025-01-27 16:03:

Meanwhile I think it is sufficiently safe
to blindly assume that a 'timeout' command always exists
so adding 'timeout' to REQUIRED_PROGS is sufficient in practice:
https://github.com/rear/rear/pull/3387

jsmeix commented at 2025-01-28 13:19:

This issue should (hopefully) be sufficiently solved now
via https://github.com/rear/rear/pull/3387


[Export of Github issue for rear/rear.]