#2341 Issue closed: RHEL7.6: rpcinfo -p fails when starting NFS

Labels: enhancement, fixed / solved / done, minor bug

hpannenb opened issue at 2020-03-17 15:07:

Relax-and-Recover (ReaR) Issue Template

Fill in the following items before submitting a new issue
(quick response is not guaranteed with free support):

  • ReaR version ("/usr/sbin/rear -V"):
    Relax-and-Recover 2.5 / 2019-05-10

  • OS version ("cat /etc/rear/os.conf" or "lsb_release -a" or "cat /etc/os-release"):
    Red Hat Enterprise Linux Server 7.6 (Maipo)

  • ReaR configuration files ("cat /etc/rear/site.conf" and/or "cat /etc/rear/local.conf"):

[...]
export TMPDIR="/var/tmp"

OUTPUT=PXE
OUTPUT_PREFIX_PXE=pxe-iso-$HOSTNAME

BACKUP=NETFS
BACKUP_URL=nfs://<SERVER>/var/backup/

PXE_TFTP_URL=nfs://<SERVER>/var/lib/tftpboot
PXE_CONFIG_URL=nfs://<SERVER>/var/lib/tftpboot/pxelinux.cfg

USE_DHCLIENT=
USE_STATIC_NETWORKING="y"
USE_RESOLV_CONF=n

MODULES=()
[...]
  • Hardware (PC or PowerNV BareMetal or ARM) or virtual machine (KVM guest or PoverVM LPAR):
    BareMetal
  • System architecture (x86 compatible or PPC64/PPC64LE or what exact ARM device):
    x86
  • Firmware (BIOS or UEFI or Open Firmware) and bootloader (GRUB or ELILO or Petitboot):
    BIOS
  • Storage (local disk or SSD) and/or SAN (FC or iSCSI or FCoE) and/or multipath (DM or NVMe):
    local disk
  • Storage layout ("lsblk -ipo NAME,KNAME,PKNAME,TRAN,TYPE,FSTYPE,SIZE,MOUNTPOINT" or "lsblk" as makeshift):
    ...
  • Description of the issue (ideally so that others can reproduce it):
    The recovery fails because the NFSv4 shared backup directory /var/backup on cannot be accessed. rpcinfo -p always fails in NETFS/default/050_start_required_nfs_daemons.sh
  • Workaround, if any:
    In the RESCUE environment adjust the line 58 in file NETFS/default/050_start_required_nfs_daemons.sh from rpcinfo -p to rpcinfo -p <SERVER>. Afterwards rear recover works flawlessly.
  • Attachments, as applicable ("rear -D mkrescue/mkbackup/recover" debug log files):
+ echo '2020-03-17 15:14:49.690880709 Leaving debugscripts mode (back to previous bash flags and options settings).'
2020-03-17 15:14:49.690880709 Leaving debugscripts mode (back to previous bash flags and options settings).
2020-03-17 15:14:49.697364352 Including verify/NETFS/default/050_start_required_nfs_daemons.sh
2020-03-17 15:14:49.699329362 Entering debugscripts mode via 'set -x'.
+ source /usr/share/rear/verify/NETFS/default/050_start_required_nfs_daemons.sh
+++ url_scheme nfs://ts9esesv805/var/backup/
+++ local url=nfs://ts9esesv805/var/backup/
+++ local scheme=nfs
+++ echo nfs
+++ grep -q :
+++ echo nfs
++ local backup_url_scheme=nfs
++ test nfs = nfs
++ local attempt=
++ local portmapper_program=
++ LogPrint 'Starting required daemons for NFS: RPC portmapper (portmap or rpcbind) and rpc.statd if available.'
++ Log 'Starting required daemons for NFS: RPC portmapper (portmap or rpcbind) and rpc.statd if available.'
++ echo '2020-03-17 15:14:49.709025586 Starting required daemons for NFS: RPC portmapper (portmap or rpcbind) and rpc.statd if availab
le.'
2020-03-17 15:14:49.709025586 Starting required daemons for NFS: RPC portmapper (portmap or rpcbind) and rpc.statd if available.
++ Print 'Starting required daemons for NFS: RPC portmapper (portmap or rpcbind) and rpc.statd if available.'
++ has_binary portmap
++ for bin in '$@'
++ type portmap
/usr/share/rear/lib/_input-output-functions.sh: line 457: type: portmap: not found
++ return 1
++ has_binary rpcbind
++ for bin in '$@'
++ type rpcbind
++ return 0
++ portmapper_program=rpcbind
++ rpcinfo -p
++ rpcbind
++ LogPrint 'Started RPC portmapper '\''rpcbind'\''.'
++ Log 'Started RPC portmapper '\''rpcbind'\''.'
++ echo '2020-03-17 15:14:49.717288722 Started RPC portmapper '\''rpcbind'\''.'
2020-03-17 15:14:49.717288722 Started RPC portmapper 'rpcbind'.
++ Print 'Started RPC portmapper '\''rpcbind'\''.'
+++ seq 5
++ for attempt in '$( seq 5 )'
++ grep -q portmapper
++ rpcinfo -p
++ sleep 1
++ for attempt in '$( seq 5 )'
++ rpcinfo -p
++ grep -q portmapper
++ sleep 1
++ for attempt in '$( seq 5 )'
++ grep -q portmapper
++ rpcinfo -p
++ sleep 1
++ for attempt in '$( seq 5 )'
++ grep -q portmapper
++ rpcinfo -p
++ sleep 1
++ for attempt in '$( seq 5 )'
++ grep -q portmapper
++ rpcinfo -p
++ sleep 1
++ test ok = 5
++ Error 'RPC portmapper '\''rpcbind'\'' unavailable.'
++ PrintError 'ERROR: RPC portmapper '\''rpcbind'\'' unavailable.'
++ PrintError 'Some latest log messages since the last called script 050_start_required_nfs_daemons.sh:'
++ PrintError '  2020-03-17 15:14:49.697364352 Including verify/NETFS/default/050_start_required_nfs_daemons.sh
  2020-03-17 15:14:49.699329362 Entering debugscripts mode via '\''set -x'\''.
  2020-03-17 15:14:49.709025586 Starting required daemons for NFS: RPC portmapper (portmap or rpcbind) and rpc.statd if available.
  /usr/share/rear/lib/_input-output-functions.sh: line 457: type: portmap: not found
  2020-03-17 15:14:49.717288722 Started RPC portmapper '\''rpcbind'\''.'
++ Log 'ERROR: RPC portmapper '\''rpcbind'\'' unavailable.'
++ echo '2020-03-17 15:14:54.761443203 ERROR: RPC portmapper '\''rpcbind'\'' unavailable.'
2020-03-17 15:14:54.761443203 ERROR: RPC portmapper 'rpcbind' unavailable.
++ LogToSyslog 'ERROR: RPC portmapper '\''rpcbind'\'' unavailable.'
++ logger -t rear -i 'ERROR: RPC portmapper '\''rpcbind'\'' unavailable.'
++ echo '===== Stack trace ====='

Execution of rpcinfo in the RESCUE environment:

RESCUE <CLIENT>:~ # rpcinfo -p
rpcinfo: can't contact portmapper: RPC: Remote system error - Connection refused

RESCUE <CLIENT>:~ # rpcinfo -p <SERVER>
   program vers proto   port  service
    100000    4   tcp    111  portmapper
    100000    3   tcp    111  portmapper
    100000    2   tcp    111  portmapper
    100000    4   udp    111  portmapper
    100000    3   udp    111  portmapper
    100000    2   udp    111  portmapper
    100005    1   udp  20048  mountd
    100005    1   tcp  20048  mountd
    100024    1   udp  54592  status
    100024    1   tcp  37424  status
    100005    2   udp  20048  mountd
[...]

hpannenb commented at 2020-03-17 16:30:

seems to be related to #2250

pcahyna commented at 2020-03-17 16:51:

@rmetrich @yontalcar can you please have a look?

hpannenb commented at 2020-03-17 17:29:

I checked on https://access.redhat.com/solutions/1132143 and I got a trigger looking into /var/log/messages of the RESCUE environment:

Mar 17 18:15:20 CLIENT rpcbind[12127]: cannot get uid of 'rpc': Success

Adding the user "rpc" to the /etc/passwd

[...]
rpc:x:32:32:Rpcbind Daemon:/var/lib/rpcbind:/sbin/nologin
[...]

in the RESCUE environment solves this rpcbind startup failure. So it should be considered to include not only rpcuser but also the rpcuser.

I will give it a try with adding this to local.conf:

CLONE_USERS=( ${CLONE_USERS[@]} rpcuser rpc )

which should be changed in Line 12 of 100_check_nfs_version.sh thereafter.

hpannenb commented at 2020-03-17 18:10:

I changed the mentioned line on my test system, rpcbind is starting and the NFS based recovery is working now.

hpannenb commented at 2020-03-19 11:17:

@jsmeix so in most of the cases it is better safe than sorry to always enabled the parameter with CLONE_ALL_USERS_GROUPS=yes.

Since Your change and mine were merged into current version I will close the issue.

pcahyna commented at 2020-03-19 11:22:

@hpannenb thanks, so the addition of the parameter after -p is not needed anymore?

@xjezda00 I am curious about this, IIUC you have been testing restore from NFS regularly, have you seen this problem? If not, why not?

pcahyna commented at 2020-03-19 11:34:

@hpannenb thanks, so the addition of the parameter after -p is not needed anymore?

Ah I see, it is intentionally called without a parameter according to a comment in the file:

# use plain 'rpcinfo -p' to check if RPC service is available
# instead of using 'rpcinfo -p localhost' because the latter
# does not work on some systems while the former works everywhere
https://github.com/rear/rear/issues/889

nevermind.

jsmeix commented at 2020-03-19 14:50:

With https://github.com/rear/rear/pull/2342 merged
this issue should be fixed.

@hpannenb
thank you for your analysis what the root cause was
and for your fix!

jsmeix commented at 2020-03-19 14:53:

@hpannenb
regarding your https://github.com/rear/rear/issues/2341#issuecomment-601124088

I did not set CLONE_ALL_USERS_GROUPS="yes".
The default is still CLONE_ALL_USERS_GROUPS="no"
https://github.com/rear/rear/blob/master/usr/share/rear/conf/default.conf#L1412

jsmeix commented at 2020-03-19 15:12:

I did https://github.com/rear/rear/issues/2345
to get the question answered if we should have
CLONE_ALL_USERS_GROUPS="yes" by default.


[Export of Github issue for rear/rear.]