#1575 Issue closed: Rescue fails ERROR: Starting RPC portmapper rpcbind failed

Labels: bug, support / question, fixed / solved / done

tjgruber opened issue at 2017-11-14 20:09:

Relax-and-Recover (ReaR) Issue Template

  • rear version (/usr/sbin/rear -V):
    image

  • OS version (cat /etc/rear/os.conf or lsb_release -a):
    image

  • rear configuration files (cat /etc/rear/site.conf or cat /etc/rear/local.conf):
    image

  • Are you using legacy BIOS or UEFI boot?
    legacy BIOS

  • Brief description of the issue:
    Everything works fine until I boot to the recovery ISO or (any) recovery media. Once I'm in, and type "rear recover", it fails. I am also unable to start the mentioned daemons. Below is a screenshot of what happens

image
Here's the log file at /ver/log/rear/rear-serv-build.log:
screenshot from 2017-11-14
12-52-46

  • Work-around, if any: N/A

jsmeix commented at 2017-11-15 14:31:

I am not a Fedora user so that I cannot reproduce
what goes on on your particular system.

Regarding how you could debug what the root cause is
on your particular system you may have a look at the section
"Debugging issues with Relax-and-Recover" in
https://en.opensuse.org/SDB:Disaster_Recovery

In your particular case the script that shows the
"Starting RPC portmapper ... failed" error message is
usr/share/rear/verify/NETFS/default/050_start_required_nfs_daemons.sh
therein this code

    rpcinfo -p &>/dev/null || rpcbind || Error "Starting RPC portmapper '$portmapper_program' failed."

i.e. calling 'rpcbind' in the ReaR recovery system
failed in your particular case with

rpcbind: /run/rpcbind/rpcbind.lock: No such file or directory

which indicates that the directory /run/rpcbind/
is missing in the ReaR recovery system.

As a workaround for now you may create that directory
in the running ReaR recovery system before
you call "rear recover".

If that helps, you can add the command that creates
the missing directory to the PRE_RECOVERY_SCRIPT
config variable, see usr/share/rear/conf/default.conf
to automate your workaround.

If that makes things work, we can create the missing directory
in an appropriate ReaR script.

tjgruber commented at 2017-11-15 16:27:

Once logged in to the recovery console, I created the /run/rpcbind direcotry.
That fixed the issue.

However, there is now another issue of no network connectivity for some reason. An "ifconfig" shows no IP address.

screenshot from 2017-11-15
08-22-17

gozora commented at 2017-11-15 16:38:

@tjgruber if you are really struggling with ReaR setup and can't find solution by your own, I can try to install Fedora and take a look what can be wrong there, just let me know.

V.

tjgruber commented at 2017-11-15 16:41:

I'm still working on it.
I found this: https://github.com/rear/rear/issues/1177
which may help me. I'm in the process of trying out a few things. If I'm unable to get it working after this I'll let you know.

Thank you!

gozora commented at 2017-11-15 16:45:

@tjgruber for that network issue of yours.
Automatic network setup can be quite a pain in the ..., but you can easily bring you network up as follows:

Either:

ifconfig <dev_name> inet <IP_address> netmask <mask>

or

ip a add <ipaddress>/<mask_suffix> dev <dev_name>
ip link set <dev_name> up

V.

tjgruber commented at 2017-11-15 17:10:

Running the following allowed the recovery process to begin:

mkdir /run/rpcbind
chmod a+x /etc/scripts/system-setup.d/60-network-devices.sh
/etc/scripts/system-setup.d/60-network-devices.sh

However, now things are stuck again.

I just don't think ReaR was meant to run on Fedora.

Edit: here's the log file instead of lots of screenshots:

rear-serv-build.log

jsmeix commented at 2017-11-15 17:14:

FYI regarding commands as in
https://github.com/rear/rear/issues/1575#issuecomment-344653352
I take (again) the opportunity to advertise my glorious
NETWORKING_PREPARATION_COMMANDS
cf. usr/share/rear/conf/default.conf ;-)

gozora commented at 2017-11-15 17:21:

@tjgruber don't be upset, as far as I remember you are the first one opening issue for Fedora ;-), so yes there might be some trouble indeed, but I guess is is nothing we can't overcome ...

It looks like you've maybe hit some bug in code of mine (XFS filesystem recreation).
Would you mind uploading your rescue ISO somewhere, where I can get my dirty hands on it?
That ISO contains configuration files that might help me to find out where the problem is and possibly patch it.

Thanks in advance

V.

tjgruber commented at 2017-11-15 17:23:

I edited my post: https://github.com/rear/rear/issues/1575#issuecomment-344661234
with the actual log file and got rid of the screenshots.

tjgruber commented at 2017-11-15 17:26:

don't be upset, as far as I remember you are the first one opening issue for Fedora ;-), so yes there might be some trouble indeed, but I guess is is nothing we can't overcome ...It looks like you've maybe hit some bug in code of mine (XFS filesystem recreation).Would you mind uploading your rescue ISO somewhere, where I can get my dirty hands on it?That ISO contains configuration files that might help me to find out where the problem is and possibly patch it.Thanks in advanceV.

I'll see what I can do.
The .ISO is 200MB. Is there anything specific I can get from it instead?

gozora commented at 2017-11-15 17:32:

@tjgruber quickest way for me to find the problem is to boot your ISO ...
So you can upload it to dropbox maybe?
If you have whatever problem with it for whatever reason I will accept it, and install my own Fedora instance and test it there ...

V.

tjgruber commented at 2017-11-15 17:38:

My only problem is that I don't know what is in there, and if there would be anything company specific or any passwords or anything like that.

I'm checking it out.

tjgruber commented at 2017-11-15 17:45:

@gozora if you do want to install Fedora 26, I used LVM and XFS on all relevant partitions.

Here's a screenshot of the partitions I have:
screenshot from 2017-11-15
09-04-32

gozora commented at 2017-11-15 18:22:

My only problem is that I don't know what is in there, and if there would be anything company specific or any passwords or anything like that.

Understood! Better be safe than sorry :-)

Could you provide me at least with content of following directory /var/lib/rear/layout ?

Thx

V.

tjgruber commented at 2017-11-15 18:32:

@gozora

Could you provide me at least with content of following directory /var/lib/rear/layout ?

Is that from the .ISO or from the server?

gozora commented at 2017-11-15 18:32:

Original server

tjgruber commented at 2017-11-15 19:38:

No, that last "rear recover" killed the partitions.

gozora commented at 2017-11-15 19:48:

let's try something.
Once you boot ReaR rescue ISO go to /var/lib/rear/layout/xfs , edit every file with suffix .xfs
where you remove string sunit=0 blks from log section so it looks something like this:

[root@fedora xfs]# diff -uNp fedora-root.xfs.nosunit fedora-root.xfs
--- fedora-root.xfs.nosunit 2017-11-15 20:47:37.591982649 +0100
+++ fedora-root.xfs 2017-11-15 20:45:29.427460249 +0100
@@ -6,5 +6,5 @@ data     =                       bsize=4
          =                       sunit=0      swidth=0 blks
 naming   =version 2              bsize=4096   ascii-ci=0 ftype=1
 log      =internal               bsize=4096   blocks=2560, version=2
+         =                       sectsz=512   lazy-count=1
-         =                       sectsz=512   sunit=0 blks, lazy-count=1
 realtime =none                   extsz=4096   blocks=0, rtextents=0

and run rear recover again.

V.

gozora commented at 2017-11-15 19:59:

Hello @sandeen,

Hope you are fine these days.
Once you've left a https://github.com/rear/rear/issues/1065#issuecomment-309479672 here, remember? ;-)

I just hope you don't mind answering simple question:

I'm looking on two different XFS versions 4.5.0 and 4.10.0

When I run following command on 4.5.0 all is OK:

mkfs.xfs -f -i size=512 -d agcount=4 -s size=512 -i attr=2 -i projid32bit=1 -m crc=1 -m finobt=1 -b size=4096 -i maxpct=25 -d sunit=0 -d swidth=0 -l version=2 -l sunit=0 -l lazy-count=1 -n size=4096 -n version=2 -r extsize=4096 /dev/loop0

however same command on XFS 4.10.0 returns

Illegal value 0 for -l sunit option. value is too small

I'm not sure if this is bug or feature :-(, why is 0 suddenly considered illegal ?

Thanks

V.

tjgruber commented at 2017-11-15 20:12:

@gozora I'm not sure what you mean where you say:

remove string sunit=0 blks from log section

Here's the contents of /var/lib/rear/layout/xfs:
image
Here is the default contents of the "fedora_serv--build-root.xfs" file:
(I'm not sure what to change)

image

gozora commented at 2017-11-15 20:19:

Open fedora_serv--build-root.xfs with vi, remove sunit=0 blks, and save file.
repeat with sda1.xfs and sdb2.xfs

Your log section should look something like this:

log      =internal               bsize=4096   blocks=9600, version=2
         =                       sectsz=512   lazy-count=1

After this change rear recover should run just fine.

V.

tjgruber commented at 2017-11-15 20:35:

It failed again with an error, here's the end of the log:
screenshot from 2017-11-15
12-34-55

tjgruber commented at 2017-11-15 20:39:

And here are the file contents:
screenshot from 2017-11-15
12-38-39

gozora commented at 2017-11-15 20:40:

This error have nothing to do with XFS from before.
You have probably some remaining LVM structure present from your previous rear recover session.
You need to umount and remove all existing LVM structures (lvremove, vgremove) and partitions (fdisk) before running rear recover again.

V.

gozora commented at 2017-11-15 20:43:

@tjgruber I'm on https://gitter.im/rear/rear, so just drop me a message if you need help ...

tjgruber commented at 2017-11-15 21:06:

Okay I think I figured it out.

I ran the following two commands to get a list of logical volumes and volume groups:

lvdisplay
vgdisplay

Then I used the following commands to remove them appropriately.

lvremove
vgremove

It looks like things are in process, so I'll let you know in a few minutes what happens.

tjgruber commented at 2017-11-15 21:22:

It looks like a successful recovery to me. Thank you to everyone for the great support! It's very much appreciated and will help everyone else using Fedora, LVM and XFS.

SELinux had to do some relabeling, which it did automatically, and after that, I was able to successfully boot back into the recovered Fedora 26 server.

image

To summarize what worked:

# Create missing directory:
mkdir /run/rpcbind

# Manually start networking:
chmod a+x /etc/scripts/system-setup.d/60-network-devices.sh
/etc/scripts/system-setup.d/60-network-devices.sh

# Navigate to and list files in /var/lib/rear/layout/xfs
# Edit each file ending in .xfs with vi and remove "sunit=0 blks" from the "log" section.
# In my case, the following files, then save them:
vi /var/lib/rear/layout/xfs/fedora_serv--build-root.xfs
vi /var/lib/rear/layout/xfs/sda1.xfs
vi /var/lib/rear/layout/xfs/sdb2.xfs

# Run the following commands to get a list of LVs and VGs:
lvdisplay
vgdisplay

# Run the following commands to remove the above listed LVs and VGs:
lvremove
vgremove

# Now run recovery again:
rear recover

gozora commented at 2017-11-15 21:34:

Poking around xfsprogs sources this new behavior was introduced in version 4.7.0 ... :-/.
I'm not sure how we should approach this.
My expectation during writing of XFS recreate code was that every option from xfs_info output can be passed as option to mkfs.xfs which proved to be wrong! :-(

In my opinion best and most harmless way would be to simply ignore every option=0 pair from xfs_into as omitting such pair seems to be valid way for mkfs.xfs.
I'll wait couple of days whether we will get some inputs on this from @sandeen, and write patch afterwards.

@schlomo, @gdha, @jsmeix, @schabrolles as always, any inputs are more than welcome!

V.

gozora commented at 2017-11-15 21:36:

I'll open separate issue for xfs problem tomorrow (or Friday), so we don't mix topics here ...

gozora commented at 2017-11-15 21:37:

It looks like a successful recovery to me.

Glad you've made it!

Thank you to everyone for the great support! It's very much appreciated and will help everyone else using Fedora, LVM and XFS.

Anytime

jsmeix commented at 2017-11-16 11:01:

@gozora
I assume you will work on the XFS related parts but
I do not know what you have planned for me to work on ;-)
I.e. on what parts of this issue should I work
(so that we avoid duplicate work on same parts)?

jsmeix commented at 2017-11-16 11:14:

@gozora
regarding your proposal/question in
https://github.com/rear/rear/issues/1575#issuecomment-344735974
"ignore every option=0":

First and foremost I am not a XFS user
so that I know basically nothing about XFS and
I know even less than nothing ;-) about XFS internals.

In general I think "ignore every option=0" is dangerous because
for some options the value 0 could be right and explicitly intended
compared to not specify this option and use the mkfs.xfs default.

In particular boolean options usually have '0' as valid value,
e.g. from "man mkfs.xfs" on my SLES12 system
where /usr/sbin/mkfs.xfs is from xfsprogs-4.3.0

  crc=value
    This is used to create a filesystem which
    maintains and checks CRC information in
    all metadata objects on disk.
    The value is either 0 to disable the feature,
    or 1 to enable the use of CRCs.

gozora commented at 2017-11-16 11:31:

Hello @jsmeix

I assume you will work on the XFS related parts but
I do not know what you have planned for me to work on ;-)
I.e. on what parts of this issue should I work
(so that we avoid duplicate work on same parts)?

I basically assigned you here because you've been commenting on this issue ;-).
This issue can be closed and I'll open new one for this particular XFS problem.

In general I think "ignore every option=0" is dangerous because
for some options the value 0 could be right and explicitly intended
compared to not specify this option and use the mkfs.xfs default.

Good point!
Then I'd need to ignore just non-boolean values, I guess ... This would create a bit hard to maintain code, but unfortunately I don't see easier way around.

V.

jsmeix commented at 2017-11-16 12:41:

@gozora
"man mkfs.xfs" on my SLES12 system with xfsprogs-4.3.0
shows this non-boolean option where 0 is a valid value:

  maxpct=value
    This specifies the maximum percentage
    of space in the filesystem that can be
    allocated to inodes. The default value
    is 25% for filesystems under 1TB, 5%
    for filesystems under 50TB and 1% for
    filesystems over 50TB.
    ...
    Setting the value to 0 means that essentially
    all of the filesystem can become inode blocks,
    subject to inode32 restrictions.

which is an example for settings where
the value 0 has a special meaning like
"timeout=0" could mean unlimited waiting or
"max_space=0" could mean unlimited space
and things like that.

gozora commented at 2017-11-16 12:59:

@jsmeix thanks for info.
It looks like we can't really write a some general rule ;-/, "love it!"
Will try to think about it a bit more ...

V.

gozora commented at 2017-11-16 15:56:

I'm closing this issue as solved, and opening #1579 as follow up for XFS issue that occurred here.

gdha commented at 2017-11-29 15:46:

Update for Fedora 26 - the skel/default/run/rpcbind directory will be masked by the new mounted /run mount-point. Therefore, the problem as described in this issue is not yet fixed.

gdha commented at 2017-11-29 16:59:

The solution is to copy the systemd-tmpfiles executable and copy as is the directory /var/lib/tmpfiles.d as this will generate the /run/rpcbind directory during the start-up of rpcbind daemon. Will make a PR for this.

gdha commented at 2017-12-12 15:12:

The fix proofs to be working as I could do a successful recovery with NFS (rpcbind started automatically).
OK to close this issue.


[Export of Github issue for rear/rear.]