#1789 Issue closed: IBM Power8 (PPC) network bond issue with non-sequential bond names

Labels: support / question, fixed / solved / done

tpatel80 opened issue at 2018-04-28 04:58:

Relax-and-Recover (ReaR) Issue Template

Fill in the following items before submitting a new issue
(quick response is not guaranteed with free support):

  • ReaR version ("/usr/sbin/rear -V"):
    Relax-and-Recover 2.3 / 2017-12-20

  • OS version ("cat /etc/rear/os.conf" or "lsb_release -a" or "cat /etc/os-release"):
    OS_VENDOR=RedHatEnterpriseServer
    OS_VERSION=7.1

  • ReaR configuration files ("cat /etc/rear/site.conf" or "cat /etc/rear/local.conf"):

OUTPUT=ISO
BACKUP=NETFS
BACKUP_PROG=tar
BACKUP_TYPE=incremental
FULLBACKUPDAY="Mon"
BACKUP_OPTIONS="nfsvers=4,nolock"
NETFS_URL=nfs:////
USE_STATIC_NETWORKING=y
# PPC64 specifics
# Not sure if all variables are required. Most important is INITRD_COMPRESSION.
REAR_INITRD_COMPRESSION="lzma"
REQUIRED_PROGS=( "${REQUIRED_PROGS[@]}" parted sfdisk )
PROGS=( "${PROGS[@]}" partprobe fdisk cfdisk mkofboot ofpath ybin yabootconfig bootlist pseries_platform nvram ofpathname bc agetty )
test "${FIRMWARE_FILES[*]}" || FIRMWARE_FILES=( 'no' )
  • System architecture (x86 compatible or POWER and/or what kind of virtual machine):
    PPC64

  • Are you using BIOS or UEFI or another way to boot?
    Petitboot (IBM Power8 system)

  • Brief description of the issue:
    I was unable to boot rescue image on PPC64 system without adding
    REAR_INITRD_COMPRESSION="lzma".
    Once ISO was ready I simply created a bootable DVD
    which worked without any problem and
    was also able to boot from USB by simply
    'dd if=rear-rescue.iso of=/dev/sdz bs=512k'.
    Using unetbootin to create bootable USB work just fine
    for x86 systems as well as citrix xen VMs.
    After booting following issues were noticed:

Issue 1:
This system is configure to use bonded interface
however the bond numbering is not sequential,
Only bond3 is present, bond0 to bond2 are not created.

Issue 2 (minor):
MAC addresses for slave interfaces stored in
/etc/mac-addresses same as in output of 'ip addr show'
however these values are not consistent with the values
present in /proc/net/bonding/bond3.
Remapping two devices with identical MAC
doesn't seem to cause any issues.

[root ~]# ip a show | egrep -A 1 enP6
6: **enP6p112s0**:  mtu 1500 qdisc mq master bond3 state UP qlen 1000
    link/ether **f4:52:14:58:90:e0** brd ff:ff:ff:ff:ff:ff
7: **enP6p112s0d1**:  mtu 1500 qdisc mq master bond3 state UP qlen 1000
    link/ether **f4:52:14:58:90:e0** brd ff:ff:ff:ff:ff:ff

[root ~]# cat /proc/net/bonding/bond3
Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)

Bonding Mode: IEEE 802.3ad Dynamic link aggregation
Transmit Hash Policy: layer2 (0)
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 0
Down Delay (ms): 0

802.3ad info
LACP rate: slow
Min links: 0
Aggregator selection policy (ad_select): stable
Active Aggregator Info:
        Aggregator ID: 1
        Number of ports: 2
        Actor Key: 1
        Partner Key: 29024
        Partner Mac Address: 16:ce:a7:48:16:5b

Slave Interface: **enP6p112s0**
MII Status: up
Speed: 56000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: **f4:52:14:58:90:e0**
Aggregator ID: 1
Slave queue ID: 0

Slave Interface: **enP6p112s0d1**
MII Status: up
Speed: 56000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: **f4:52:14:58:90:e1**
Aggregator ID: 1
Slave queue ID: 0
  • Work-around, if any:

Issue 1:
Replaced following line in
/usr/share/rear/rescue/GNU/Linux/310_network_devices

echo "modprobe bonding **max_bonds=${#bonding_interfaces[@]}** miimon=100 mode=$bonding_mode use_carrier=0" >>$network_devices_setup_script

with

echo "modprobe bonding **max_bonds=4** miimon=100 mode=$bonding_mode use_carrier=0" >>$network_devices_setup_script

It seems that 310_network_devices should possibly
be updated to account for non-sequential bond names.

Issue 2:
Ran 'rear -v -d -S mkrescue' and modified /etc/mac-addresses
before creating iso. Tested without updating /etc/mac-addresses
and remapping two interfaces with identical MACs doesn't cause any problem.

jsmeix commented at 2018-05-02 09:36:

The usr/share/rear/rescue/GNU/Linux/310_network_devices.sh
in the ReaR release 2.3 from December 2017
was meanwhile (i.e. in our ReaR upstream GitHub master code here)
completely overhauled by @rmetrich

See his initial
https://github.com/rear/rear/pull/1574
and
https://github.com/rear/rear/commit/02937854b6ea2060a978d69355581ba93adb9e7e
plus some more subsequent fixes and enhancements like
https://github.com/rear/rear/pull/1719

I recommend to try out our current ReaR upstream GitHub master code
because that is the only place where we at ReaR upstream fix bugs.
Bugs in released ReaR versions are not fixed by us (by ReaR upstream).
Bugs in released ReaR versions that got fixed in current ReaR upstream
GitHub master code might be fixed (if the fix can be backported with
reasonable effort) by the Linux distributor wherefrom you got ReaR.

To use our current ReaR upstream GitHub master code
do the following:

Basically "git clone" it into a separated directory and then
configure and run ReaR from within that directory like:

# git clone https://github.com/rear/rear.git

# mv rear rear.github.master

# cd rear.github.master

# vi etc/rear/local.conf

# usr/sbin/rear -D mkbackup

Note the relative paths "etc/rear/" and "usr/sbin/".

If the issue also happens with current ReaR upstream GitHub master code
please provide us a complete ReaR debug log file of "rear -D mkrescue"
so that we can have a look how it behaves in your particular environment
cf. "Debugging issues with Relax-and-Recover" at
https://en.opensuse.org/SDB:Disaster_Recovery

If it works with current ReaR upstream GitHub master code
we would really appreciate an explicit positive feedback.
It helps us a lot to have an explicit feedback
when even "special things" actually work with ReaR.

Regarding how to get debug information from the setup scripts
that run during ReaR recovery system startup phase
in this particular case how to get debug information from
/etc/scripts/system-setup.d/60-network-devices.sh

In general regarding debugging issues with the startup scripts
that are run initially in the ReaR recovery system:

The basic idea behind is to not let those startup scripts
run automatically and mostly silently but manually
one after the other with 'set -x' bash debugging mode.

Add 'debug' to the ReaR kernel command line
when booting the ReaR recovery/rescue system.

In the ReaR recovery/rescue system boot menue select
the topmost enty of the form "Recover HOSTNAME"
and press the [Tab] key to edit the boot command line
and append the word ' debug' at its end and boot that.

You may found yourself stopped at a blank screen.
In this case press [Enter] which runs the very first of the
startup scripts (/etc/scripts/system-setup.d/00functions.sh).
There is some bug that the initial message is not always
printed so you may need to type the very first [Enter] blindly.

The further startup scripts are run one by one
each one after pressing [Enter].

In 'debug' mode the startup scripts are run with 'set -x'
so that this way you can better see what actually goes on
in each of the startup scripts.

Cf.
https://github.com/rear/rear/issues/1177#issuecomment-274771296

tpatel80 commented at 2018-05-03 17:58:

Tested with upstream code and issue is fixed.

Thank you!

jsmeix commented at 2018-05-04 08:28:

@tpatel80
thank you for your prompt feedback!

@rmetrich
thanks again for you greatly overhauled networking setup scripts in ReaR!

rmetrich commented at 2018-05-04 09:28:

@jsmeix You are all welcome!
Sorry, I don't have time to do more stuff for now for ReaR...

jsmeix commented at 2018-05-04 09:52:

@rmetrich
no worries - your current stuff "just works".
Have a nice (and hopefully relaxed) weekend!


[Export of Github issue for rear/rear.]