#819 Issue closed: Recover does not work if networking is on different interface than eth0

Labels: support / question, fixed / solved / done

phracek opened issue at 2016-04-12 08:18:

Relax-and-Recover (rear) Issue Template

Please fill in the following items before submitting a new issue:

  • rear version (/usr/sbin/rear -V): rear-1.17.2
  • OS version (cat /etc/rear/os.conf or lsb_release -a): RHEL
  • rear configuration files (cat /etc/rear/site.conf or cat /etc/rear/local.conf):
OUTPUT=ISO
BACKUP=NETFS
BACKUP_URL="nfs://10.12.0.74/mnt/rear/"
GRUB_RESCUE=1

I created backup and save it to nfs. Recover should start after system reboot. When I called rear recover, rear stopped because it was not possible to mount NFS server with backup.

ERROR: Mount command 'mount -v -t nfs -o rw,noatime 10.12.0.74:/mnt/rear/ /tmp/rear.uRiyOcvlGp6Htgf/outputfs' failed.

This problem appears only if networking is on differ interface than eth0.

# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN 
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN qlen 1000
    link/ether 00:e0:81:31:b3:6c brd ff:ff:ff:ff:ff:ff
3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000
    link/ether 00:e0:81:31:b3:52 brd ff:ff:ff:ff:ff:ff
    inet 10.16.64.143/21 brd 10.16.71.255 scope global eth1
    inet6 2620:52:0:1040:2e0:81ff:fe31:b352/64 scope global dynamic 
       valid_lft 2591905sec preferred_lft 604705sec
    inet6 fe80::2e0:81ff:fe31:b352/64 scope link 
       valid_lft forever preferred_lft forever
4: eth2: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN qlen 1000
    link/ether 00:e0:81:31:b3:53 brd ff:ff:ff:ff:ff:ff

Boot from local disk and select "Relax and Recover"

# rear recover
Relax-and-Recover 1.17.2 / Git
Using log file: /var/log/rear/rear-pogo-cn1100-01.log
ERROR: Mount command 'mount -v -t nfs -o rw,noatime 10.12.0.74:/mnt/rear/ /tmp/rear.uRiyOcvlGp6Htgf/outputfs' failed.
Aborting due to an error, check /var/log/rear/rear-pogo-cn1100-01.log for details
Terminated

# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN 
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: eth0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast state DOWN qlen 1000
    link/ether 00:e0:81:31:b3:6c brd ff:ff:ff:ff:ff:ff
3: eth1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN qlen 1000
    link/ether 00:e0:81:31:b3:52 brd ff:ff:ff:ff:ff:ff
4: eth2: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN qlen 1000
    link/ether 00:e0:81:31:b3:53 brd ff:ff:ff:ff:ff:ff
  • Work-around, if any
# killall dhclient
# dhclient eth1

# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN 
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: eth0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast state DOWN qlen 1000
    link/ether 00:e0:81:31:b3:6c brd ff:ff:ff:ff:ff:ff
3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000
    link/ether 00:e0:81:31:b3:52 brd ff:ff:ff:ff:ff:ff
    inet 10.16.64.143/21 brd 10.16.71.255 scope global eth1
    inet6 2620:52:0:1040:2e0:81ff:fe31:b352/64 scope global dynamic 
       valid_lft 2591998sec preferred_lft 604798sec
    inet6 fe80::2e0:81ff:fe31:b352/64 scope link 
       valid_lft forever preferred_lft forever
4: eth2: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN qlen 1000
    link/ether 00:e0:81:31:b3:53 brd ff:ff:ff:ff:ff:ff

# rear recover
Relax-and-Recover 1.17.2 / Git
Using log file: /var/log/rear/rear-pogo-cn1100-01.log
Calculating backup archive size
Backup archive size is 868M /tmp/rear.x8ZYqSQHNmV2Pf7/outputfs/pogo-cn1100-01/backup.tar.gz (compressed)
Comparing disks.
Disk configuration is identical, proceeding with restore.
Start system layout restoration.
Creating partitions for disk /dev/sda (msdos)
Creating LVM PV /dev/sda2

gdha commented at 2016-04-14 12:52:

@phracek are you sure the eth0/1 was not working before you started the recovery? Did you try ping? You could check the script defining the network interfaces - /etc/scripts/system-setup.d/60-network-devices.sh

Furthermore, what was the content of /etc/rear/rescue.conf file (in recovery mode only).

tcerna commented at 2016-05-12 09:35:

Hi Gratien,

I tried to reproduce this problem. I run rear mkbackup. Network interface was tested, eht1 was working and it was possible to ping this machine.

# ip address show eth1
3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000
    link/ether 00:e0:81:31:b3:52 brd ff:ff:ff:ff:ff:ff
    inet 10.16.64.143/21 brd 10.16.71.255 scope global eth1
    inet6 2620:52:0:1040:2e0:81ff:fe31:b352/64 scope global dynamic 
       valid_lft 2591989sec preferred_lft 604789sec
    inet6 fe80::2e0:81ff:fe31:b352/64 scope link 
       valid_lft forever preferred_lft forever

After this system was rebooting. Recovery mode was started and I run "rear recover".

RESCUE p1:~ # rear recover
Relax-and-Recover 1.17.2 / Git
Using log file: /var/log/rear/rear-pogo-cn1100-01.log
ERROR: Mount command 'mount -v -t nfs -o rw,noatime 10.8.174.175:/mnt/rear/ /tmp/rear.VgNEd74W7CFJcVj/outputfs' failed.
Aborting due to an error, check /var/log/rear/rear-pogo-cn1100-01.log for details
Terminated

I saw inside ip address and network interfaces was down.

RESCUE p1:~ # ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN 
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: eth0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast state DOWN qlen 1000
    link/ether 00:e0:81:31:b3:6c brd ff:ff:ff:ff:ff:ff
3: eth1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN qlen 1000
    link/ether 00:e0:81:31:b3:52 brd ff:ff:ff:ff:ff:ff
4: eth2: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN qlen 1000
    link/ether 00:e0:81:31:b3:53 brd ff:ff:ff:ff:ff:ff

See /etc/rear/rescue.conf:

RESCUE p1:~ # cat /etc/rear/rescue.conf
# initialize our /etc/rear/rescue.conf file sourced by the rear command in recover mode
# also the configuration is sourced by system-setup script during booting our recovery image

SHARE_DIR="/usr/share/rear"
CONFIG_DIR="/etc/rear"
VAR_DIR="/var/lib/rear"
LOG_DIR="/var/log/rear"

# The following 3 lines were added through 21_include_dhclient.sh
USE_DHCLIENT=y
DHCLIENT_BIN=dhclient
DHCLIENT6_BIN=

NETFS_KEEP_OLD_BACKUP_COPY=""
NETFS_PREFIX="pogo-cn1100-01"
# TMPDIR variable may be defined in local.conf file as prefix dir for mktemp command
# e.g. by defining TMPDIR=/var we would get our BUILD_DIR=/var/tmp/rear.XXXXXXXXXXXX
# However, in rescue we want our BUILD_DIR=/tmp/rear.XXXXXXX as we are not sure that
# the user defined TMPDIR would exist in our rescue image
# by 'unset TMPDIR' we achieve above goal (as rescue.conf is read after local.conf)!
unset TMPDIR

See /etc/scripts/system-setup.d/60-network-devices.sh:

RESCUE p1:~ # cat /etc/scripts/system-setup.d/60-network-devices.sh
# if USE_DHCLIENT=y then use DHCP instead and skip 60-network-devices.sh
[[ ! -z "$USE_DHCLIENT" && -z "$USE_STATIC_NETWORKING" ]] && return
# if IPADDR=1.2.3.4 has been defined at boot time via ip=1.2.3.4 then configure 
if [[ "$IPADDR" ]] && [[ "$NETMASK" ]] ; then
    device=${NETDEV:-eth0}
    ip link set dev "$device" up
    ip addr add "$IPADDR"/"$NETMASK" dev "$device"
    if [[ "$GATEWAY" ]] ; then
        ip route add default via "$GATEWAY"
    fi
    return
fi
ip addr add 10.16.64.143/21 dev eth1
ip addr add 2620:52:0:1040:2e0:81ff:fe31:b352/64 dev eth1
ip link set dev eth1 up
ip link set dev eth1 mtu 1500

Varianble $USE_DHCLIENT was checked and it was empty:

RESCUE p1:~ # echo $USE_DHCLIENT

So script 60-network-devices.sh was run manually and you can see see that network interfaces was set.

RESCUE p1:~ # bash /etc/scripts/system-setup.d/60-network-devices.s
h 
RESCUE p1:~ # echo $?
0
RESCUE p1:~ # ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN 
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: eth0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast state DOWN qlen 1000
    link/ether 00:e0:81:31:b3:6c brd ff:ff:ff:ff:ff:ff
3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000
    link/ether 00:e0:81:31:b3:52 brd ff:ff:ff:ff:ff:ff
    inet 10.16.64.143/21 scope global eth1
    inet6 2620:52:0:1040:2e0:81ff:fe31:b352/64 scope global 
       valid_lft forever preferred_lft forever
    inet6 fe80::2e0:81ff:fe31:b352/64 scope link 
       valid_lft forever preferred_lft forever
4: eth2: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN qlen 1000
    link/ether 00:e0:81:31:b3:53 brd ff:ff:ff:ff:ff:ff

But now it is still not possible to process ping with other machine. Networking is still down.

# ping 10.8.174.175
connect: Network is unreachable

And if I try to run rear recover again, the same problem is visible.

RESCUE p1:~ # rear recover
Relax-and-Recover 1.17.2 / Git
Using log file: /var/log/rear/rear-pogo-cn1100-01.log
ERROR: Mount command 'mount -v -t nfs -o rw,noatime 10.8.174.175:/mnt/rear/ /tmp/rear.MuERThCEMVxihq6/outputfs' failed.
Aborting due to an error, check /var/log/rear/rear-pogo-cn1100-01.log for details
Terminated

gdha commented at 2016-05-12 09:50:

@tcerna the rescue.conf contained the setting USE_DHCLIENT=y which means that the variable was set during the rear run. I guess there was an issue with dhclient - did you see errors in the /var/log/messages file?

tcerna commented at 2016-05-12 10:21:

@gdha There is only one error in /var/log/messages:

RESCUE p1:~ # grep "ERROR" /var/log/messages
May 12 02:17:45 p1 rear[811]: ERROR: Mount command 'mount -v -t nfs -o rw,noatime 10.8.174.175:/mnt/rear/ /tmp/rear.FCQ0I4V7iAHwKYQ/outputfs' failed.

gdha commented at 2016-06-07 15:23:

@tcerna Well I have seen it myself when one interface is using dhcp and the other interface has a static IP address. Then USE_DHCLIENT=y overrules the rest of the script /etc/scripts/system-setup.d/60-network-devices.sh.
What I do is run this script manual again on the shell as:

bash /etc/scripts/system-setup.d/60-network-devices.sh

to re-mediate the static IP address. Or, define USE_STATIC_NETWORKING=1 in /etc/rear/local.conf

phracek commented at 2016-06-10 07:47:

@gdha Let me check with @tcerna what we have configured. I guess, it was only DHCP network interface.

jsmeix commented at 2016-08-12 14:18:

@gdha
with https://github.com/rear/rear/pull/960 merged
the new NETWORKING_PREPARATION_COMMANDS
support should be also helpful to override the 'return'
in 60-network-devices.sh when DHCP was used,
see my examples in
https://github.com/rear/rear/pull/960#issuecomment-239448861
so that one does no longer need to run 60-network-devices.sh
manually again in case of special networking environments.

Of course NETWORKING_PREPARATION_COMMANDS
will not make it "just work" automatically but it is not meant
this way. It is meant to be helpful to get problems with recovery
system networking setup more easily fixed - or at least to get
more easily a quick and dirty workaround in case of
issues with recovery system networking setup.

jsmeix commented at 2016-09-28 13:29:

No further comments for a long time => closing it.
If needed it can be reopenend.


[Export of Github issue for rear/rear.]