#1177 Issue closed: REAR network interface does not start centos 7.1

Labels: support / question, fixed / solved / done

DaveMcBrave opened issue at 2017-01-23 13:51:

Network interface em1 does not start properly in rescue-system (CentOS 7.1)

  • rear version (/usr/sbin/rear -V):
    Relax-and-Recover 1.17.2
  • OS version (cat /etc/rear/os.conf or lsb_release -a):
    CentOS Linux release 7.3.1611 (Core)
    OS_VENDOR=RedHatEnterpriseServer
    OS_VERSION=7
  • rear configuration files (cat /etc/rear/site.conf or cat /etc/rear/local.conf):
    OUTPUT=ISO
    BACKUP=NETFS
    BACKUP_URL="nfs://xxx.xxx.xxx.xxx/srv-it/admin/rear"
  • Are you using legacy BIOS of UEFI boot?
    BIOS

I am trying to get rear to work with CentOS Linux release 7.3.1611 (Core) and Relax-and-Recover 1.17.2. It seems to have problems establishing the network connection in the rescue-system, thus failing to connect to the nfs-share where the backup is stored.

There seems to be a problem with the interface or the DHCP-client. When starting the rescue-system I get an error message in the udev-script that says „grep write error“, which does not concern the program much as it executes the rest of the scripts.
The DHCP-script then fails, telling me that a DHCP-client is already running. Rear then continues to execute the rest of the start up scripts, including number 60, which should -as I understand it- configure the network interfaces.

When the rescue-system is booted I can see that there is no network connection. However, I can establish it via:

chmod a+x /etc/scripts/system-setup.d/60-network-devices.sh
/etc/scripts/system-setup.d/60-network-devices.sh

When I start the recovery process now, it runs as expected.

Why does it not establish a connection when the system boots, but when I execute it manually?

I do not get a DHCP request from the server in question while it is booting. Is there maybe a problem with the name of the interface (em1)?

I tried setting the interface address in /etc/rear/mappings/ip_address:
em1 dhcp

or

em1 xxx.xxx.xxx.xxx

Both did not work. Please tell me what further information you need to solve this problem.

Output of recovery when network connection not established manually:

rear -vd recover
cat /var/log/rear-beegfs-sf-08.log
RESCUE beegfs-sf-08:/var/log/rear # cat rear-beegfs-sf-08.log 
2017-01-23 12:11:02.006261949 Relax-and-Recover 1.17.2 / Git
2017-01-23 12:11:02.008326821 Command line options: /bin/rear -vd recover
2017-01-23 12:11:02.010360505 Using log file: /var/log/rear/rear-beegfs-sf-08.log
2017-01-23 12:11:02.014114451 Combining configuration files
2017-01-23 12:11:02.016279976 Including /etc/rear/os.conf
2017-01-23 12:11:02.018397453 Skipping /etc/rear/recover.conf (file not found or empty)
2017-01-23 12:11:02.020763039 Including conf/Linux-i386.conf
2017-01-23 12:11:02.022996044 Including conf/GNU/Linux.conf
2017-01-23 12:11:02.027896080 Skipping /usr/share/rear/conf/Fedora.conf (file not found or empty)
2017-01-23 12:11:02.029854752 Skipping /usr/share/rear/conf/Fedora/i386.conf (file not found or empty)
2017-01-23 12:11:02.031938394 Skipping /usr/share/rear/conf/Fedora/7.conf (file not found or empty)
2017-01-23 12:11:02.033902424 Skipping /usr/share/rear/conf/Fedora/7/i386.conf (file not found or empty)
2017-01-23 12:11:02.036022054 Skipping /usr/share/rear/conf/RedHatEnterpriseServer.conf (file not found or empty)
2017-01-23 12:11:02.038044069 Skipping /usr/share/rear/conf/RedHatEnterpriseServer/i386.conf (file not found or empty)
2017-01-23 12:11:02.040126772 Skipping /usr/share/rear/conf/RedHatEnterpriseServer/7.conf (file not found or empty)
2017-01-23 12:11:02.042122527 Skipping /usr/share/rear/conf/RedHatEnterpriseServer/7/i386.conf (file not found or empty)
2017-01-23 12:11:02.044183311 Skipping /etc/rear/site.conf (file not found or empty)
2017-01-23 12:11:02.046197461 Including /etc/rear/local.conf
2017-01-23 12:11:02.048405944 Including /etc/rear/rescue.conf
2017-01-23 12:11:02.050630189 Running 'init' stage
2017-01-23 12:11:02.056723926 Including init/default/01_set_drlm_env.sh
2017-01-23 12:11:02.058881107 Finished running 'init' stage in 0 seconds
2017-01-23 12:11:02.066372993 Using build area '/tmp/rear.FFZsqlITDz3yDnq'
mkdir: created directory '/tmp/rear.FFZsqlITDz3yDnq/rootfs'
mkdir: created directory '/tmp/rear.FFZsqlITDz3yDnq/tmp'
2017-01-23 12:11:02.071356356 Running recover workflow
2017-01-23 12:11:02.073348726 Running 'setup' stage
2017-01-23 12:11:02.079367283 Including setup/default/01_pre_recovery_script.sh
2017-01-23 12:11:02.081316226 Finished running 'setup' stage in 0 seconds
2017-01-23 12:11:02.083309801 Running 'verify' stage
2017-01-23 12:11:02.089367723 Including verify/default/02_cciss_scsi_engage.sh
2017-01-23 12:11:02.093377395 Including verify/default/02_translate_url.sh
2017-01-23 12:11:02.095731610 Including verify/default/03_translate_tape.sh
2017-01-23 12:11:02.100417176 Including verify/default/04_validate_variables.sh
2017-01-23 12:11:02.104741634 Including verify/NETFS/default/05_check_NETFS_requirements.sh
2017-01-23 12:11:02.110991109 Skipping ping test
2017-01-23 12:11:02.123372063 Including verify/GNU/Linux/05_sane_recovery_check.sh
2017-01-23 12:11:02.125674140 Including verify/NETFS/default/05_start_required_daemons.sh
2017-01-23 12:11:02.223276012 Including verify/NETFS/default/06_mount_NETFS_path.sh
mkdir: created directory '/tmp/rear.FFZsqlITDz3yDnq/outputfs'
2017-01-23 12:11:02.225928778 Added 'rmdir -v /tmp/rear.FFZsqlITDz3yDnq/outputfs >&2' as an exit task
2017-01-23 12:11:02.233090824 Mounting with 'mount -v -t nfs -o rw,noatime /xxx.xxx.xxx.xxx:/srv-it/admin/rear /tmp/rear.FFZsqlITDz3yDnq/outputfs'
mount.nfs: mount(2): Connection timed out
mount.nfs: Connection timed out
mount.nfs: timeout set for Mon Jan 23 12:13:02 2017
mount.nfs: trying text-based options 'vers=4,addr=/xxx.xxx.xxx.xxx,clientaddr=0.0.0.0'
2017-01-23 12:14:02.275066423 ERROR: Mount command 'mount -v -t nfs -o rw,noatime /xxx.xxx.xxx.xxx:/srv-it/admin/rear /tmp/rear.FFZsqlITDz3yDnq/outputfs' failed.
=== Stack trace ===
Trace 0: /bin/rear:251 main
Trace 1: /usr/share/rear/lib/recover-workflow.sh:27 WORKFLOW_recover
Trace 2: /usr/share/rear/lib/framework-functions.sh:70 SourceStage
Trace 3: /usr/share/rear/lib/framework-functions.sh:31 Source
Trace 4: /usr/share/rear/verify/NETFS/default/06_mount_NETFS_path.sh:11 source
Trace 5: /usr/share/rear/lib/global-functions.sh:153 mount_url
Trace 6: /usr/share/rear/lib/_input-output-functions.sh:132 StopIfError
Message: Mount command 'mount -v -t nfs -o rw,noatime /xxx.xxx.xxx.xxx:/srv-it/admin/rear /tmp/rear.FFZsqlITDz3yDnq/outputfs' failed.
===================
2017-01-23 12:14:02.280972524 Running exit tasks.
2017-01-23 12:14:02.283738130 Exit task 'rmdir -v /tmp/rear.FFZsqlITDz3yDnq/outputfs >&2'
rmdir: removing directory, '/tmp/rear.FFZsqlITDz3yDnq/outputfs'
2017-01-23 12:14:02.286597455 Exit task 'cleanup_build_area_and_end_program'
2017-01-23 12:14:02.288475499 Finished in 181 seconds
2017-01-23 12:14:02.289958668 You should also rm -Rf /tmp/rear.FFZsqlITDz3yDnq
2017-01-23 12:14:02.291802597 End of program reached
2017-01-23 12:14:02.293325612 Exit task 'exec 8>&-'
2017-01-23 12:14:02.294910484 Exit task 'exec 7>&-‚

gdha commented at 2017-01-23 17:09:

@DaveMcBrave Did you do a cloning? How does the original interfaces look like (ip a), and what was found on the recover image after booting? Are you using a mix of static and dhcp IPs perhaps? If that is the case try adding USE_STATIC_NETWORKING=y to the local.conf file?

jsmeix commented at 2017-01-24 10:55:

@DaveMcBrave
as a workaround for a special case you may use
NETWORKING_PREPARATION_COMMANDS
so that you can define in /etc/rear/local.conf whatever
commands you need for networking setup in the
ReaR recovery system, see
/usr/share/rear/conf/default.conf
how NETWORKING_PREPARATION_COMMANDS
is meant to work and for some examples how it works see
https://github.com/rear/rear/pull/960#issuecomment-239448861

If you found a general/generic problem
you may adapt and enhance the scrip
usr/share/rear/rescue/GNU/Linux/310_network_devices.sh
to add missing support for your particular use case.
In this case please do also a GitHub pull request
with your changes so that we can see what is needed
to make it work for your particular use case, cf.
http://relax-and-recover.org/development/
and see the section about
"How to contribute to Relax-and-Recover" in
https://en.opensuse.org/SDB:Disaster_Recovery

Regarding "Relax-and-Recover 1.17.2":
The current version is 2.0.
Please try out if it works better with the current version.
In particular any adaptions and enhancements must be
based on the current Relax-and-Recover version.

How to test the currently newest rear GitHub master code
independent of an already installed ReaR software:

Basically "git clone" it into a directory and then
configre and run it from within that directory like:

# git clone https://github.com/rear/rear.git

# cd rear

# vi etc/rear/local.conf

# usr/sbin/rear -d -D mkbackup

(note the relative paths "etc/rear/" and "usr/sbin/").

Regarding debugging issues in the start up scripts
that are run in the ReaR recovery system:

The basic idea behind is to not let those start up scripts
run automatically and mostly silently but manually
one after the other with 'set -x' bash debugging mode.

Add 'debug' to the ReaR kernel command line
when booting the ReaR recovery/rescue system.

In the ReaR recovery/rescue system boot menue select
the topmost enty of the form "Recover HOSTNAME"
and press the [Tab] key to edit the boot command line
and append the word ' debug' at its end and boot that.

You may found yourself stopped at a blank screen.
In this case press [Enter] which runs the very first of the
start up scripts (/etc/scripts/system-setup.d/00functions.sh).
There is some bug that the initial message is not always
printed so you may need to type the very first [Enter] blindly.

The further start up scripts are run one by one
each one after pressing [Enter].

In 'debug' mode the start up scripts are run with 'set -x'
so that this way you can better see what actually goes on
in each of the start up scripts.

DaveMcBrave commented at 2017-01-24 12:19:

@gdha
Thanks for the quick reply! Adding USE_STATIC_NETWORKING=y to local.conf did help. The automatic recovery did work fine. I am using just one network interface and it is configured via DHCP. The only other interface configured is infiniband. You think this can cause problems?
Before and after recovery are the same:

 # ip add
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: em3: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN qlen 1000
    link/ether 24:6e:xx:xx:xx:xx brd ff:ff:ff:ff:ff:ff
3: em4: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN qlen 1000
    link/ether 24:6e:xx:xx:xx:xx brd ff:ff:ff:ff:ff:ff
4: em1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000
    link/ether 24:6e:xx:xx:xx:xx brd ff:ff:ff:ff:ff:ff
    inet xxx.xxx.217.18/23 brd 134.107.217.255 scope global dynamic em1
       valid_lft 16314sec preferred_lft 16314sec
    inet6 fe80::266e:96ff:xx:xx:xx:xx/64 scope link 
       valid_lft forever preferred_lft forever
5: em2: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN qlen 1000
    link/ether 24:6e:xx:xx:xx:xx brd ff:ff:ff:ff:ff:ff
6: ib0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 2044 qdisc pfifo_fast state UP qlen 256
    link/infiniband 80:00:02:08:fe:80:00:00:00:00:00:00:e4:1d:2d:03:00:4e:19:e1 brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:xx:xx:xx:xx:ff:ff:ff:ff
    inet 172.16.2.8/16 brd 172.16.255.255 scope global ib0
       valid_lft forever preferred_lft forever
    inet6 fe80::e61d:2d03:4e:19e1/64 scope link 
       valid_lft forever preferred_lft forever
7: ib1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 4092 qdisc pfifo_fast state DOWN qlen 256
    link/infiniband 80:00:02:09:xx:xx:xx:xx:00:00:00:00:e4:1d:2d:03:00:4e:19:e2 brd 00:ff:ff:ff:ff:12:40:xx:xx:xx:xx:00:00:00:00:00:ff:ff:ff:ff

@jsmeix
I have another identical server with the newest version of rear cloned from git

# rear --version
Relax-and-Recover 2.00 / Git

where I encounter the same problem. I will check in later when I tried your suggestions.

jsmeix commented at 2017-01-24 13:44:

From doc/rear-release-notes.txt (excerpts)

Release Notes for Relax-and-Recover version 2.00
...
USE_STATIC_NETWORKING=y, will cause
statically configured network settings to be applied
even when USE_DHCLIENT is in effect
...
Version 1.19.0 (October 2016)
...
USE_STATIC_NETWORKING now really
overrides USE_DHCLIENT (issue #964)
...
Version 1.17.0 (March 2015)
...
Add a new config option USE_STATIC_NETWORKING
(issue #488)

From default.conf

# Most variables can be set to an empty value VAR=
# which means that this setting is off
# or set to some automatic mode.
...
# Say "y", "Yes" (or any not empty string)
# to enable the DHCP client protocol
# which lets the rescue/recovery system
# run dhclient to get an IP address
# instead of using the same IP address as the original system:
USE_DHCLIENT=

# Say "y", "Yes" (or any not empty string)
# to enable static networking (overrules USE_DHCLIENT):
USE_STATIC_NETWORKING=

With the empty defaults of USE_DHCLIENT
and USE_STATIC_NETWORKING you get
"some automatic mode" which does not restult
the right behaviour in your particular case
so that you must explicitly specify appropriate values.

But before ReaR 2.0 it did not work fully
correct when you specified explicitly e.g.
USE_STATIC_NETWORKING=y

Since ReaR 2.0 with USE_STATIC_NETWORKING=y
you should not get anything regarding the DHCP client protocol.

And when even USE_STATIC_NETWORKING=y
(or USE_DHCLIENT=y ) does not result what
you need in the ReaR recovery system, you can
enforce exactly only your intended networking setup via
NETWORKING_PREPARATION_COMMANDS

DaveMcBrave commented at 2017-01-24 15:40:

@jsmeix

  • Using Relax-and-Recover 2.00 / Git
  • My local.conf looked like this:
OUTPUT=ISO
BACKUP=NETFS
BACKUP_URL="nfs://xxx.xxx.xxx.xxx/srv-it/admin/rear“

I started the rescue system with ‚debug' mode activated. Somehow script 58 (dhcp) chose interface em3 as a suitable interface to try dhcp with. I can confirm this as I can see the correlating log entries in the booted rescue/recover system:

# cat /var/log/messages
[…]
Jan 24 13:16:29 beegfs-sf-07 kernel: [  148.837627] device-mapper: uevent: version 1.0.3
Jan 24 13:16:29 beegfs-sf-07 kernel: [  148.837734] device-mapper: ioctl: 4.34.0-ioctl (2015-10-28) initialised: dm-devel@redhat.com
Jan 24 13:17:58 beegfs-sf-07 kernel: [  237.677329] IPv6: ADDRCONF(NETDEV_UP): em3: link is not ready
Jan 24 13:17:58 beegfs-sf-07 dhclient[962]: DHCPDISCOVER on em3 to 255.255.255.255 port 67 interval 2 (xid=0x5be9d0ae)
Jan 24 13:18:00 beegfs-sf-07 dhclient[962]: DHCPDISCOVER on em3 to 255.255.255.255 port 67 interval 2 (xid=0x5be9d0ae)
Jan 24 13:18:02 beegfs-sf-07 dhclient[962]: DHCPDISCOVER on em3 to 255.255.255.255 port 67 interval 2 (xid=0x5be9d0ae)
Jan 24 13:18:04 beegfs-sf-07 dhclient[962]: DHCPDISCOVER on em3 to 255.255.255.255 port 67 interval 5 (xid=0x5be9d0ae)
Jan 24 13:18:09 beegfs-sf-07 dhclient[962]: DHCPDISCOVER on em3 to 255.255.255.255 port 67 interval 11 (xid=0x5be9d0ae)
Jan 24 13:18:20 beegfs-sf-07 dhclient[962]: DHCPDISCOVER on em3 to 255.255.255.255 port 67 interval 12 (xid=0x5be9d0ae)
Jan 24 13:18:32 beegfs-sf-07 dhclient[962]: DHCPDISCOVER on em3 to 255.255.255.255 port 67 interval 14 (xid=0x5be9d0ae)
Jan 24 13:18:46 beegfs-sf-07 dhclient[962]: DHCPDISCOVER on em3 to 255.255.255.255 port 67 interval 13 (xid=0x5be9d0ae)
Jan 24 13:18:59 beegfs-sf-07 dhclient[962]: No DHCPOFFERS received.
[…]

screen shot 2017-01-24 at 16 34
55

screen shot 2017-01-24 at 15 29
50

There is no cable plugged…

While executing the other scripts I can see no errors. Maybe I am interpreting the output wrongly? In the running rescue system I execute ifup em1just to see what happens: nothing

Jan 24 13:20:30 beegfs-sf-07 ifup: not implemented (em1)

When I execute script 60 manually I get a connection:

# cat /var/log/messages
[…]
Jan 24 13:21:27 beegfs-sf-07 kernel: [  446.467747] ixgbe 0000:01:00.0: registered PHC device on em1
Jan 24 13:21:27 beegfs-sf-07 kernel: [  446.569646] IPv6: ADDRCONF(NETDEV_UP): em1: link is not ready
Jan 24 13:21:27 beegfs-sf-07 kernel: [  446.630090] ixgbe 0000:01:00.0 em1: detected SFP+: 6
Jan 24 13:21:27 beegfs-sf-07 kernel: [  446.744386] ixgbe 0000:01:00.1: registered PHC device on em2
Jan 24 13:21:27 beegfs-sf-07 kernel: [  446.846582] IPv6: ADDRCONF(NETDEV_UP): em2: link is not ready
Jan 24 13:21:27 beegfs-sf-07 kernel: [  446.872508] ixgbe 0000:01:00.0 em1: NIC Link is Up 10 Gbps, Flow Control: RX/TX
Jan 24 13:21:27 beegfs-sf-07 kernel: [  446.932690] ixgbe 0000:01:00.1 em2: detected SFP+: 5
Jan 24 13:21:27 beegfs-sf-07 kernel: [  446.956083] IPv6: ADDRCONF(NETDEV_UP): em4: link is not ready
Jan 24 13:21:27 beegfs-sf-07 kernel: [  446.956134] IPv6: ADDRCONF(NETDEV_CHANGE): em1: link becomes ready
[…]

When I set USE_STATIC_NETWORKING=y, I get what I want. There is no output mentioning DHCP and the rescue system is able to pull the backup from the nfs share.

Adding
NETWORKING_PREPARATION_COMMANDS=( 'ip addr add 134.107.217.17/23 dev em1' 'ip link set dev em1 up' 'ip route add default via 134.107.217.254' 'return' )
to the local.conf does also enable networking properly.

Thank yo very much for your advice, but I am still curious, why it chooses em3...

gdha commented at 2017-01-25 15:59:

@DaveMcBrave Were your questions fulfilled?

jsmeix commented at 2017-01-26 16:29:

@DaveMcBrave
I am not at all a sufficient networking expert
to be able to imagine why recovery system network setup
does not automatically work as you need it in your case.
In your
https://github.com/rear/rear/issues/1177#issuecomment-274788881
I see that your network setup looks "not really simple"
so that I assume you must - at least for now - manually
tell ReaR your intended recovery system network setup.

DaveMcBrave commented at 2017-01-27 14:02:

Thanks, to both of you. You really helped me and I consider my Problem solved, as I can recover my servers automatically now.

jsmeix commented at 2017-01-27 14:07:

Thank you for the feedback and
have a nice (and relaxed ;-) weekend.


[Export of Github issue for rear/rear.]