#2341 Issue closed
: RHEL7.6: rpcinfo -p fails when starting NFS¶
Labels: enhancement
, fixed / solved / done
, minor bug
hpannenb opened issue at 2020-03-17 15:07:¶
Relax-and-Recover (ReaR) Issue Template¶
Fill in the following items before submitting a new issue
(quick response is not guaranteed with free support):
-
ReaR version ("/usr/sbin/rear -V"):
Relax-and-Recover 2.5 / 2019-05-10 -
OS version ("cat /etc/rear/os.conf" or "lsb_release -a" or "cat /etc/os-release"):
Red Hat Enterprise Linux Server 7.6 (Maipo) -
ReaR configuration files ("cat /etc/rear/site.conf" and/or "cat /etc/rear/local.conf"):
[...]
export TMPDIR="/var/tmp"
OUTPUT=PXE
OUTPUT_PREFIX_PXE=pxe-iso-$HOSTNAME
BACKUP=NETFS
BACKUP_URL=nfs://<SERVER>/var/backup/
PXE_TFTP_URL=nfs://<SERVER>/var/lib/tftpboot
PXE_CONFIG_URL=nfs://<SERVER>/var/lib/tftpboot/pxelinux.cfg
USE_DHCLIENT=
USE_STATIC_NETWORKING="y"
USE_RESOLV_CONF=n
MODULES=()
[...]
- Hardware (PC or PowerNV BareMetal or ARM) or virtual machine (KVM
guest or PoverVM LPAR):
BareMetal - System architecture (x86 compatible or PPC64/PPC64LE or what exact
ARM device):
x86 - Firmware (BIOS or UEFI or Open Firmware) and bootloader (GRUB or
ELILO or Petitboot):
BIOS - Storage (local disk or SSD) and/or SAN (FC or iSCSI or FCoE) and/or
multipath (DM or NVMe):
local disk - Storage layout ("lsblk -ipo
NAME,KNAME,PKNAME,TRAN,TYPE,FSTYPE,SIZE,MOUNTPOINT" or "lsblk" as
makeshift):
... - Description of the issue (ideally so that others can reproduce
it):
The recovery fails because the NFSv4 shared backup directory/var/backup
oncannot be accessed. rpcinfo -p
always fails inNETFS/default/050_start_required_nfs_daemons.sh
- Workaround, if any:
In the RESCUE environment adjust the line 58 in fileNETFS/default/050_start_required_nfs_daemons.sh
fromrpcinfo -p
torpcinfo -p <SERVER>
. Afterwards rear recover works flawlessly. - Attachments, as applicable ("rear -D mkrescue/mkbackup/recover" debug log files):
+ echo '2020-03-17 15:14:49.690880709 Leaving debugscripts mode (back to previous bash flags and options settings).'
2020-03-17 15:14:49.690880709 Leaving debugscripts mode (back to previous bash flags and options settings).
2020-03-17 15:14:49.697364352 Including verify/NETFS/default/050_start_required_nfs_daemons.sh
2020-03-17 15:14:49.699329362 Entering debugscripts mode via 'set -x'.
+ source /usr/share/rear/verify/NETFS/default/050_start_required_nfs_daemons.sh
+++ url_scheme nfs://ts9esesv805/var/backup/
+++ local url=nfs://ts9esesv805/var/backup/
+++ local scheme=nfs
+++ echo nfs
+++ grep -q :
+++ echo nfs
++ local backup_url_scheme=nfs
++ test nfs = nfs
++ local attempt=
++ local portmapper_program=
++ LogPrint 'Starting required daemons for NFS: RPC portmapper (portmap or rpcbind) and rpc.statd if available.'
++ Log 'Starting required daemons for NFS: RPC portmapper (portmap or rpcbind) and rpc.statd if available.'
++ echo '2020-03-17 15:14:49.709025586 Starting required daemons for NFS: RPC portmapper (portmap or rpcbind) and rpc.statd if availab
le.'
2020-03-17 15:14:49.709025586 Starting required daemons for NFS: RPC portmapper (portmap or rpcbind) and rpc.statd if available.
++ Print 'Starting required daemons for NFS: RPC portmapper (portmap or rpcbind) and rpc.statd if available.'
++ has_binary portmap
++ for bin in '$@'
++ type portmap
/usr/share/rear/lib/_input-output-functions.sh: line 457: type: portmap: not found
++ return 1
++ has_binary rpcbind
++ for bin in '$@'
++ type rpcbind
++ return 0
++ portmapper_program=rpcbind
++ rpcinfo -p
++ rpcbind
++ LogPrint 'Started RPC portmapper '\''rpcbind'\''.'
++ Log 'Started RPC portmapper '\''rpcbind'\''.'
++ echo '2020-03-17 15:14:49.717288722 Started RPC portmapper '\''rpcbind'\''.'
2020-03-17 15:14:49.717288722 Started RPC portmapper 'rpcbind'.
++ Print 'Started RPC portmapper '\''rpcbind'\''.'
+++ seq 5
++ for attempt in '$( seq 5 )'
++ grep -q portmapper
++ rpcinfo -p
++ sleep 1
++ for attempt in '$( seq 5 )'
++ rpcinfo -p
++ grep -q portmapper
++ sleep 1
++ for attempt in '$( seq 5 )'
++ grep -q portmapper
++ rpcinfo -p
++ sleep 1
++ for attempt in '$( seq 5 )'
++ grep -q portmapper
++ rpcinfo -p
++ sleep 1
++ for attempt in '$( seq 5 )'
++ grep -q portmapper
++ rpcinfo -p
++ sleep 1
++ test ok = 5
++ Error 'RPC portmapper '\''rpcbind'\'' unavailable.'
++ PrintError 'ERROR: RPC portmapper '\''rpcbind'\'' unavailable.'
++ PrintError 'Some latest log messages since the last called script 050_start_required_nfs_daemons.sh:'
++ PrintError ' 2020-03-17 15:14:49.697364352 Including verify/NETFS/default/050_start_required_nfs_daemons.sh
2020-03-17 15:14:49.699329362 Entering debugscripts mode via '\''set -x'\''.
2020-03-17 15:14:49.709025586 Starting required daemons for NFS: RPC portmapper (portmap or rpcbind) and rpc.statd if available.
/usr/share/rear/lib/_input-output-functions.sh: line 457: type: portmap: not found
2020-03-17 15:14:49.717288722 Started RPC portmapper '\''rpcbind'\''.'
++ Log 'ERROR: RPC portmapper '\''rpcbind'\'' unavailable.'
++ echo '2020-03-17 15:14:54.761443203 ERROR: RPC portmapper '\''rpcbind'\'' unavailable.'
2020-03-17 15:14:54.761443203 ERROR: RPC portmapper 'rpcbind' unavailable.
++ LogToSyslog 'ERROR: RPC portmapper '\''rpcbind'\'' unavailable.'
++ logger -t rear -i 'ERROR: RPC portmapper '\''rpcbind'\'' unavailable.'
++ echo '===== Stack trace ====='
Execution of rpcinfo in the RESCUE environment:
RESCUE <CLIENT>:~ # rpcinfo -p
rpcinfo: can't contact portmapper: RPC: Remote system error - Connection refused
RESCUE <CLIENT>:~ # rpcinfo -p <SERVER>
program vers proto port service
100000 4 tcp 111 portmapper
100000 3 tcp 111 portmapper
100000 2 tcp 111 portmapper
100000 4 udp 111 portmapper
100000 3 udp 111 portmapper
100000 2 udp 111 portmapper
100005 1 udp 20048 mountd
100005 1 tcp 20048 mountd
100024 1 udp 54592 status
100024 1 tcp 37424 status
100005 2 udp 20048 mountd
[...]
hpannenb commented at 2020-03-17 16:30:¶
seems to be related to #2250
pcahyna commented at 2020-03-17 16:51:¶
@rmetrich @yontalcar can you please have a look?
hpannenb commented at 2020-03-17 17:29:¶
I checked on
https://access.redhat.com/solutions/1132143
and I got a trigger looking into /var/log/messages
of the RESCUE
environment:
Mar 17 18:15:20 CLIENT rpcbind[12127]: cannot get uid of 'rpc': Success
Adding the user "rpc" to the /etc/passwd
[...]
rpc:x:32:32:Rpcbind Daemon:/var/lib/rpcbind:/sbin/nologin
[...]
in the RESCUE environment solves this rpcbind startup failure. So it
should be considered to include not only rpcuser
but also the
rpc
user.
I will give it a try with adding this to local.conf
:
CLONE_USERS=( ${CLONE_USERS[@]} rpcuser rpc )
which should be changed in Line 12 of 100_check_nfs_version.sh
thereafter.
hpannenb commented at 2020-03-17 18:10:¶
I changed the mentioned line on my test system, rpcbind is starting and the NFS based recovery is working now.
hpannenb commented at 2020-03-19 11:17:¶
@jsmeix so in most of the cases it is better safe than sorry to always
enabled the parameter with CLONE_ALL_USERS_GROUPS=yes
.
Since Your change and mine were merged into current version I will close the issue.
pcahyna commented at 2020-03-19 11:22:¶
@hpannenb thanks, so the addition of the parameter after -p
is not
needed anymore?
@xjezda00 I am curious about this, IIUC you have been testing restore from NFS regularly, have you seen this problem? If not, why not?
pcahyna commented at 2020-03-19 11:34:¶
@hpannenb thanks, so the addition of the parameter after
-p
is not needed anymore?
Ah I see, it is intentionally called without a parameter according to a comment in the file:
# use plain 'rpcinfo -p' to check if RPC service is available
# instead of using 'rpcinfo -p localhost' because the latter
# does not work on some systems while the former works everywhere
https://github.com/rear/rear/issues/889
nevermind.
jsmeix commented at 2020-03-19 14:50:¶
With
https://github.com/rear/rear/pull/2342
merged
this issue should be fixed.
@hpannenb
thank you for your analysis what the root cause was
and for your fix!
jsmeix commented at 2020-03-19 14:53:¶
@hpannenb
regarding your
https://github.com/rear/rear/issues/2341#issuecomment-601124088
I did not set CLONE_ALL_USERS_GROUPS="yes"
.
The default is still CLONE_ALL_USERS_GROUPS="no"
https://github.com/rear/rear/blob/master/usr/share/rear/conf/default.conf#L1412
jsmeix commented at 2020-03-19 15:12:¶
I did
https://github.com/rear/rear/issues/2345
to get the question answered if we should have
CLONE_ALL_USERS_GROUPS="yes"
by default.
[Export of Github issue for rear/rear.]