#951 Issue closed: On Dell machines with "biosdevname" there is no support for that in "rear recover"

Labels: enhancement, needs sponsorship, special hardware or VM

grendelson opened issue at 2016-08-03 20:48:

Relax-and-Recover (rear) Issue Template

Please fill in the following items before submitting a new issue:

  • rear version (/usr/sbin/rear -V): Relax-and-Recover 1.17.2 / Git
  • OS version (cat /etc/rear/os.conf or lsb_release -a):
    OS_VENDOR=OracleServer
    OS_VERSION=6
  • rear configuration files (cat /etc/rear/site.conf or cat /etc/rear/local.conf):
  OUTPUT=ISO
  OUTPUT_URL=nfs:///home/nfsexport
  BACKUP=NETFS
  BACKUP_URL=nfs:///home/nfsexport
  TMPDIR=/data/backup/tmp
  BACKUP_PROG_EXCLUDE=( '/tmp/_' '/dev/shm/_' '/data/backup/_' $VAR_DIR/output/\_ )
  • Brief description of the issue
    When a rescue disk is used to start a restore OR a clone, the devices in our servers ( Dell R720's) under RHEL 6 ( Oracle 6) were captured as card slot and card port - so a card in slot 4 on the MB and port 2 will appear as ifcfg-p4p2. This p4p2 is not present or created when the Live CD is booted. the underlying eth devices are found but the backed up OS network devices nor any advanced settings ( bond0, vlans etc) are working since the expected devices aren't "present". During the backed up OS boot process dmesg shows that "eth3 was renamed p4p3" for example - this is NOT present in the live CD OS.
    I've tried both NETFS and local backups, restores from NBU and from local full backups - the issue appears to be the ability to load and rename the ethernet devices to match what is expected when the network is brought up.
  • Work-around, if any
  • I've been able to use 'ip' and the tools in the Live Env to bring up the ethernet devices ( ethtool to find the active devices, enslave the devices to my bond and then add the vlan if bringing up a Clone with a different IP address). Manually running 'ifenslave' to capture ethernet devices or manually adding ip addresses and default gateways has worked so far and once a network is present the restore continues as expected and upon reboot the original OS works as expected.

gdha commented at 2016-08-05 15:38:

Note: This problem also occurs on SLES 11 SP3 with SAP HANA, The problem has to do with the usage of biosdevnames instead the normal aliases (like eth). As it also involves SLES I'll add @jsmeix to the assignees. I remember, I did some research a few months ago (when HPe did some SAP HANA test restores for one of their customers ; I asked them to find out how it worked but never got a reply ; so it was silently dropped) - next week I'll drop my notes into this issue for further investigation...

gdha commented at 2016-08-08 07:52:

These were the notes I talked about before (not much I must admit):

RHEL 6:

-bash-4.1$ pwd
/usr/share/dracut/modules.d/97biosdevname
-bash-4.1$ more install
# -*- mode: shell-script; indent-tabs-mode: nil; sh-basic-offset: 4; -*-
# ex: ts=8 sw=4 sts=4 et filetype=sh
dracut_install biosdevname
inst_rules 71-biosdevname.rules
inst_hook pre-trigger 30 "$moddir/parse-biosdevname.sh"

# set the default state according to the config
if [[ -e /etc/sysconfig/network ]]; then
    . /etc/sysconfig/network
fi

if [[ "$BIOSDEVNAME" = "yes" ]]; then
    echo "biosdevname=1" >> ${initdir}/etc/cmdline
fi

TIP: by defining BIOSDEVNAME=yes in /etc/sysconfig/network we activate the bios device naming style!


RHEL 7 & Fedora 23: How do we define the name_assign_type value?

/usr/lib/dracut/modules.d/40network/net-lib.sh:

is_persistent_ethernet_name() {
    local _netif="$1"
    local _name_assign_type="0"

    [ -f "/sys/class/net/$_netif/name_assign_type" ] \
        && _name_assign_type=$(cat "/sys/class/net/$_netif/name_assign_type")

    # NET_NAME_ENUM 1
    [ "$_name_assign_type" = "1" ] && return 1

    # NET_NAME_PREDICTABLE 2
    [ "$_name_assign_type" = "2" ] && return 0
        # NET_NAME_ENUM 1
        # NET_NAME_PREDICTABLE 2
                # NET_NAME_USER 3
                # NET_NAME_RENAMED 4
    case "$_netif" in
        # udev persistent interface names
        eno[0-9]|eno[0-9][0-9]|eno[0-9][0-9][0-9]*)
            ;;
        ens[0-9]|ens[0-9][0-9]|ens[0-9][0-9][0-9]*)
            ;;
        enp[0-9]s[0-9]*|enp[0-9][0-9]s[0-9]*|enp[0-9][0-9][0-9]*s[0-9]*)
            ;;
        enP*p[0-9]s[0-9]*|enP*p[0-9][0-9]s[0-9]*|enP*p[0-9][0-9][0-9]*s[0-9]*)
            ;;
        # biosdevname
        em[0-9]|em[0-9][0-9]|em[0-9][0-9][0-9]*)
            ;;
        p[0-9]p[0-9]*|p[0-9][0-9]p[0-9]*|p[0-9][0-9][0-9]*p[0-9]*)
            ;;
        *)
            return 1
    esac
    return 0
}


SLES 10: not found....

SLES 11: not found either...

/etc/udev/rules.d/77-network.rules
SUBSYSTEM=="net", ENV{INTERFACE}=="lo*|dummy*|vif*.*|br*|bond*|vlan*|gre*|sit*|tap*|tun*|ipip*|ip6tnl*|ipsec*|ppp*|ippp*|isdn*|modem*|dsl*|plip*|irda*", GOTO="skip_ifup"
SUBSYSTEM=="net", ACTION=="add", RUN+="/sbin/ifup $env{INTERFACE} -o hotplug"
SUBSYSTEM=="net", ACTION=="remove", RUN+="/sbin/ifdown %k -o hotplug"
LABEL="skip_ifup"

/etc/udev/rules.d/70-persistent-net.rules
# PCI device 0x19a2:0x0700 (be2net)
SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", ATTR{address}=="68:b5:99:73:25:b0", ATTR{dev_id}=="0x0", ATTR{type}=="1",

KERNEL=="eth*", NAME="eth1"
...

jsmeix commented at 2016-08-09 10:44:

@grendelson
only a side note:

I always use on my test systems in etc/rear/local.conf

# Let the rear recovery system run dhclient to get an IP address
# instead of using the same IP address as the original system:
USE_DHCLIENT="yes"

which avoids networking issues in the recovery system for me.

The reasoning behind is that in the recovery system
it should not matter which IP address it gets and how it
gets its networking config as long as the recovery system
can access the backup.tar.gz.

After "rear recover" when the recovered system is booted
it does its network setup according to the restored
config files from the backup.

@grendelson
I cannot know if also in your case USE_DHCLIENT="yes"
would result a recovery system that can access the backup
(which may not work for complicated network setups with
things like VLAN) but perhaps you could nevertheless
try out if USE_DHCLIENT="yes" is an easier workaround
for you compared to manual network setup.

grendelson commented at 2016-08-09 18:46:

@jsmeix
Unfortunately in our env we do not have DHCP assigned IP addresses and we MUST use VLAN tagging.
On a restore of the original server I expected the original network settings to be available and work since the hardware has not changed.

On a CLONE I expect the live cd to fail to get a working network since the devices do not match - but I should only have to rename the ifcfg device names, update an ip address and restart networking. Manually adding devices to the bond is working but is lengthy , fraught with peril, and unexpected when doing a restore to the original hardware.

Thanks for the USE_DHCLIENT advice however - in a different env ( or perhaps we create a vlan that DOES use DHCP - a "restore network" ) this would work IF the underlying devices ( the biosdev names ) was being detected correctly I think?

gdha commented at 2016-08-10 06:31:

@jsmeix Could you ask internal within SuSe how the biosdevnames are activated?
@phracek Could you do me a favor and try to find out how biosdevnames are defined/working on RHEL7?

gozora commented at 2016-08-10 06:35:

@gdha AFAIK it is by setting kernel parameter biosdevname=1 at boot time (on my SLES11 SP3 at least ...)

jsmeix commented at 2016-08-10 14:54:

@gdha
I will try to find out something about how network device/interface
naming happens on SUSE but do not expect too much from me
because basically I am a networking noob (guess why I am so
happy that USE_DHCLIENT="yes" moves networking stuff
out of sight for me ;-)

@grendelson
I would very much appreciate it if you could do some
experiments to get it working for your case.
I think the rear source file where you should try to
make adaptions and enhancements is primarily
usr/share/rear/rescue/GNU/Linux/31_network_devices.sh

A side note for the fun of it:

Unfortunately if you run "git blame" for that file is shows
mostly me as if I was the expert who made all that
but https://github.com/rear/rear/issues/758#issuecomment-182930790
explains how that happened ;-)

Fortunately "git log -p" exists that shows the
whole true history for that file.

grendelson commented at 2016-08-10 18:50:

I won't be able to do anything while I'm away next week - so I haven't abandoned it - I just won't be able to test anything until after 10 days. I'm also out of test hardware :( I may have to break one of my existing machine to run rear restore on over and over.
When you say test it - do you mean adding some lines to try to rename devices or find current eth devices? I know what to do to find the current active devices with ethtool and then manually add them to bond and create a vlan - are we trying to make more automation about that or trying to get the biosdevnames to work? ( that part I have NO idea where to start.)

jsmeix commented at 2016-08-11 16:12:

In https://github.com/rear/rear/pull/960
I implemended some fixes and addons
for recovery system networking setup.

In particular the new NETWORKING_SETUP_COMMANDS
functionality is intended for issues like this one where
@grendelson knows the commands how to set up
networking manually in the recovery system.

@grendelson
regarding your https://github.com/rear/rear/issues/951#issuecomment-238966006

Regarding "out of test hardware":
I recommend to use virtual machines for testing,
see the section "First steps with Relax-and-Recover"
in https://en.opensuse.org/SDB:Disaster_Recovery

Regarding the rest:
I am afraid, my networking knowledge is not sufficient
to tell you what to do if the automated networking setup
in the recovery system does not work for complicated
network setups.

In general the automated networking setup in the
recovery system is mainly done by running the script
/etc/scripts/system-setup.d/60-network-devices.sh
while booting the recovery system.

That 60-network-devices.sh script is generated
during "rear mkbackup" (or "rear mkrescue")
in the original system by the script
usr/share/rear/rescue/GNU/Linux/31_network_devices.sh

To make the automated networking setup working in
your case you would have to find out what goes wrong
c.f. "Debugging issues with Relax-and-Recover"
in https://en.opensuse.org/SDB:Disaster_Recovery

Then (usualy in several trial and error steps)
you can adapt and enhance
usr/share/rear/rescue/GNU/Linux/31_network_devices.sh
so that it finally generates a 60-network-devices.sh script
that works for your case.

Until the automatism is sufficiently adapted and enhanced
one needs some more direct method to set up networking
in the recovery system that can be predefined so that
one does not need to do the networking setup maually
directly in the recovery system.

Therefore I implemented in https://github.com/rear/rear/pull/960
support for manual recovery system networking setup
via NETWORKING_SETUP_COMMANDS, see default.conf.

This functionality is intended to be helpful as some kind of
"last resort" when the automated network devices setup
does not work as it happens here.

jsmeix commented at 2016-08-12 14:22:

With https://github.com/rear/rear/pull/960 merged
there is now the new support for
NETWORKING_PREPARATION_COMMANDS
which replaces the above mentioned support for
NETWORKING_SETUP_COMMANDS.

See default.conf how NETWORKING_PREPARATION_COMMANDS
is meant to be used and for some simple examples see
https://github.com/rear/rear/pull/960#issuecomment-239448861

@grendelson
with NETWORKING_PREPARATION_COMMANDS you
should (hopefully) be able to specify those comands
that made the recovery system networking setup
work for you so that you do no longer need to type in
those comands manally each time in the recovery system.

Of course NETWORKING_PREPARATION_COMMANDS
will not make it "just work" automatically but it is not meant
this way. It is meant to be helpful to get problems with recovery
system networking setup more easily fixed - or at least to get
more easily a quick and dirty workaround in case of
issues with recovery system networking setup.

In general regarding how to test the currently
newest rear GitHub master code:

Basically "git clone" it into a directory and
then run rear from within that directory.

# git clone https://github.com/rear/rear.git
# cd rear
# vi etc/rear/local.conf
# usr/sbin/rear -d -D mkbackup

jsmeix commented at 2016-08-12 14:24:

Next week I will try to find out about biosdevnames
versus "normal" network device names (like 'eth0')...

jsmeix commented at 2016-08-12 15:50:

Whoops!
Accidentally I closed this one (hit the wrong button).

jsmeix commented at 2016-08-15 15:41:

I think there is nothing what I could do here
because I cannot reproduce it.

I installed a SLES12 system with the kernel command line
parameter "biosdevname=1" but nevertheless I only get 'eth0'.

Furthermore the command "/sbin/biosdevname eth0"
returns nothing in my case. Regarding that tool see
https://build.opensuse.org/package/show/SUSE:SLE-12-SP1:GA/biosdevname

I think this matches what is described in the section
"6.5.2 Mapping Network Interface Names to Names
Written on the Chassis (biosdevname)" in
https://www.suse.com/releasenotes/x86_64/SUSE-SLES/11-SP3/#fate-311333

... for Dell hardware, which has the corresponding BIOS support ...
The usage of biosdevname can be enforced on every hardware
with "biosdevname=1". If the BIOS has no support, no network
interface names are renamed. 

Accordingly I assume my test system has no support for
biosdevname because I do not have such a Dell machine
so that I cannot reproduce anything here.

Furthermore my assumption is proved by what I read in
/usr/share/doc/packages/biosdevname/README
of the installed biosdevname RPM package:

biosdevname
Copyright (c) 2006, 2007 Dell, Inc.  
Licensed under the GNU General Public License, Version 2.
biosdevname in its simplest form takes a kernel device name as an
argument, and returns the BIOS-given name it "should" be.  This is
necessary on systems where the BIOS name for a given device (e.g. the
label on the chassis is "Gb1") doesn't map directly and obviously to
the kernel name (e.g. eth0).
The distro-patches/sles10/ directory contains a patch needed to
integrate biosdevname into the SLES10 udev ethernet naming rules.
This also works as a straight udev rule.  On RHEL4, that looks like:
KERNEL=="eth*", ACTION=="add", PROGRAM="/sbin/biosdevname -i %k", NAME="%c"
This makes use of various BIOS-provided tables:
PCI Confuration Space
PCI IRQ Routing Table ($PIR)
PCMCIA Card Information Structure
SMBIOS 2.6 Type 9, Type 41, and HP OEM-specific types
therefore it's likely that this will only work well on architectures
that provide such information in their BIOS.

This means this issue is hardware specific so that only those
who sit in front of such Dell machines can adapt and enhance
Relax-and-Recover to support that particular hardware
out of the box.

@grendelson
perhaps when you do not have only such kind of Dell hardware
and when there is no good reason why you must use their
special naming, you could to simplify things and use on all your
machines the kernel command line parameter "biosdevname=0"
to enforce that also on such kind of Dell hardware the usual naming
is used so that all your machines behave same and you do not need
to make special hacks for special hardware in Relax-and-Recover?

On the other hand:
@grendelson
when you can provide a GitHub pull request that adapts and
enhances Relax-and-Recover so that it even "just works"
for such kind of Dell hardware please do so,
cf. "Dirty hacks welcome" in
https://github.com/rear/rear/wiki/Coding-Style

I remove the label "bug" from this issue because it is never ever
a bug in Relax-and-Recover when it does not "just work" on
special hardware.

jsmeix commented at 2016-08-15 15:44:

@gdha
I think we should close this issue as "looking for sponsorship".

jsmeix commented at 2016-08-15 16:05:

FYI an addedum for
https://github.com/rear/rear/issues/951#issuecomment-238165367
on my above mentioned SLES12 test system that I installed
with the kernel command line parameter "biosdevname=1":

# find /etc | xargs grep -i biosdevname
/etc/default/grub:GRUB_CMDLINE_LINUX_DEFAULT="biosdevname=1 resume=/dev/sda1 splash=silent quiet showopts"
/etc/wicked/server.xml:    <builtin name="biosdevname" library="libwicked-biosname.so" symbol="biosdevname_ns"/>

# find /usr/lib/udev/rules.d | xargs grep -il biosdevname
/usr/lib/udev/rules.d/71-biosdevname.rules

# rpm -qf /usr/lib/udev/rules.d/71-biosdevname.rules
biosdevname-0.7.2-4.3.x86_64

# cat /usr/lib/udev/rules.d/71-biosdevname.rules
SUBSYSTEM!="net", GOTO="netdevicename_end"
ACTION!="add",    GOTO="netdevicename_end"
NAME=="?*",       GOTO="netdevicename_end"
ATTR{type}!="1",  GOTO="netdevicename_end"
ENV{DEVTYPE}=="?*", GOTO="netdevicename_end"
# whitelist all Dell systems
ATTR{[dmi/id]sys_vendor}=="Dell*", ENV{UDEV_BIOSDEVNAME}="1"
# kernel command line "biosdevname={0|1}" can turn off/on biosdevname
IMPORT{cmdline}="biosdevname"
ENV{biosdevname}=="?*", ENV{UDEV_BIOSDEVNAME}="$env{biosdevname}"
# ENV{UDEV_BIOSDEVNAME} can be used for blacklist/whitelist
# but will be overwritten by the kernel command line argument
ENV{UDEV_BIOSDEVNAME}=="0", GOTO="netdevicename_end"
ENV{UDEV_BIOSDEVNAME}=="1", GOTO="netdevicename_start"
# comment the next line for biosdevname to be on by default
GOTO="netdevicename_end"
LABEL="netdevicename_start"
# using NAME= instead of setting INTERFACE_NAME, so that persistent
# names aren't generated for these devices, they are "named" on each boot.
SUBSYSTEMS=="pci", PROGRAM="/sbin/biosdevname --policy physical --smbios 2.6 --nopirq -i %k", NAME="%c",  OPTIONS+="string_escape=replace"
LABEL="netdevicename_end"

grendelson commented at 2016-08-29 14:29:

I just got back to this and see the updates. I'll try to find some hacks - but honestly I like the biosdevname=0 to try to STOP the biosdevname. I'm going to try that first since it makes the HW work the way we expect and we can restore to NON DELL HW if we need to. These machines are being used for Prod rebuilds so I don't have them for much more testing - once a machine becomes available for testing I'll try to work on this more but that might be months.


[Export of Github issue for rear/rear.]