#1274 Issue closed: Fail to acquire DHCP IP in 58-start-dhclient.sh

Labels: enhancement, support / question, fixed / solved / done

styerd opened issue at 2017-03-30 14:22:

Relax-and-Recover (ReaR) Issue Template

Fill in the following items before submitting a new issue
(quick response is not guaranteed with free support):

  • rear version (/usr/sbin/rear -V): Relax-and-Recover 1.17.2
  • OS version (cat /etc/rear/os.conf or lsb_release -a): RHEL 7.2
  • rear configuration files (cat /etc/rear/site.conf or cat /etc/rear/local.conf):
    OUTPUT=PXE
    OUTPUT_URL=nfs://172.17.5.2/var/lib/tftpboot/rear
    BACKUP=NETFS
    BACKUP_URL=nfs://172.17.5.2/var/lib/tftpboot/rear
    BACKUP_PROG_EXCLUDE=("${BACKUP_PROG_EXCLUDE[@]}" '/var/tmp' '/var/crash'
    '/erambr.log*' '/backup_restore.ksh.log*'
    '/etc/sysconfig/network-scripts/ifcfg-et*'
    '/etc/sysconfig/network-scripts/ifcfg-en*')
  • Are you using legacy BIOS or UEFI boot? UEFI
  • Brief description of the issue: Processor fails to acquire DHCP IP address
  • Work-around, if any: mods to "58-start-dhclient.sh"

On occasion our HP Proliant ML30 Gen9 reorders the interfaces as shown by "ip addr show" and the NFS mount fails with "network unreachable". In that case no network interfaces are configured which prompted a closer look. In "58-start-dhclient.sh" the following behavior was noted:

Press ENTER to run 58-start-dhclient.sh

  • source /etc/scripts/system-setup.d/58-start-dhclient.sh
    ++ dhclient -lf /var/lib/dhclient/dhclient.leases.eno1 -pf /var/run/dhclient.pid -cf /etc/dhclient.conf eno1
    ++ dhclient -lf /var/lib/dhclient/dhclient.leases.eno1 -pf /var/run/dhclient.pid -cf /etc/dhclient.conf eno1
    dhclient(519) is already running - exiting.
    ++ dhclient -lf /var/lib/dhclient/dhclient.leases.eno1 -pf /var/run/dhclient.pid -cf /etc/dhclient.conf eno1
    dhclient(519) is already running - exiting.
    ++ dhclient -lf /var/lib/dhclient/dhclient.leases.eno1 -pf /var/run/dhclient.pid -cf /etc/dhclient.conf eno1
    dhclient(519) is already running - exiting.
    ++ dhclient -lf /var/lib/dhclient/dhclient.leases.eno1 -pf /var/run/dhclient.pid -cf /etc/dhclient.conf eno1
    dhclient(519) is already running - exiting.
    ++ dhclient -lf /var/lib/dhclient/dhclient.leases.eno1 -pf /var/run/dhclient.pid -cf /etc/dhclient.conf eno1
    dhclient(519) is already running - exiting.

This actually is the functioning case since eno1 did in fact acquire an IP address and that interface is up. In the failure case ens2f0 is listed first and so replaces all instances of "eno1" above but there is no server available there and no cable attached. It appears that perhaps this code was intended to start dhclient on each interface which likely would work if the device changed for each invocation and also the '-pf' parameter was changed to avoid the "already running" complaint. As coded '$DEVICE' never changes.

A simple work around was to replace all this with a simple "dhclient" invocation which without parameters tries to acquire IPs on all interfaces. Perhaps we should specify "USE_DHCIENT=y" but the existing configuration was working for us most of the time.

jsmeix commented at 2017-03-31 07:50:

@styerd
only a hint what you can use in newer ReaR versions,
see NETWORKING_PREPARATION_COMMANDS
in the current usr/share/rear/conf/default.conf
https://raw.githubusercontent.com/rear/rear/master/usr/share/rear/conf/default.conf

There have been some adaptions and enhancements
regarding network devices setup in the rescue/recovery system
since Relax-and-Recover 1.17.2 see the current release notes
https://raw.githubusercontent.com/rear/rear/master/doc/rear-release-notes.txt

Regarding specifically ReaR 1.17.2 on RHEL 7.2
you may better ask the Red Hat support people.

styerd commented at 2017-03-31 12:27:

We had briefly tried with "USE_DHCLIENT" but that didn't work either and we haven't debugged in that path. In this path we find that clearing "DEVICE, DEVICETYPE, and REALDEVICE inside the "for dev in get_device_by_hwaddr" and making the pid file parameter unique in 58-start-dhclient.sh results in starting dhclient on each interface which seems to be the intention of this code. Timeouts on the uncabled ports costs about 5 minutes so our simplest/fastest solution is to simply replace the "58-start-dhclient.sh" code with a single "dhclient" command with no device parameter which at least on RHEL7 tries DHCP on each network interface and works equally as well as starting "dhclient" on each interface.

gdha commented at 2017-04-06 13:26:

@styerd Could you explain a bit more in detail what you mean with clearing "DEVICE, DEVICETYPE, and REALDEVICE inside the "for dev in get_device_by_hwaddr" and making the pid file parameter unique in 58-start-dhclient.sh results in starting dhclient on each interface which seems to be the intention of this code. I see what you mean with the pid file...
The time-outs on the un-cabled interface - we could introduce a timeout call to reduce the amount of time

kyle-walker commented at 2017-04-10 14:55:

@gdha
I believe the code snippet that @styerd is referring to is:

/usr/share/rear/skel/default/etc/scripts/system-setup.d/58-start-dhclient.sh

    # IPv4 DHCP clients
    case $DHCLIENT_BIN in
            (dhclient)
                    dhclient -lf /var/lib/dhclient/dhclient.leases.${DEVICE} -pf /var/run/dhclient.pid -cf /etc/dhclient.conf ${DEVICE}
            ;;
            (dhcpcd)
                    dhcpcd ${DEVICE}
            ;;
    esac

The needed change to keep dhclient from failing is:

    # IPv4 DHCP clients
    case $DHCLIENT_BIN in
            (dhclient)
                    dhclient -lf /var/lib/dhclient/dhclient.leases.${DEVICE} -pf /var/run/dhclient.${DEVICE}.pid -cf /etc/dhclient.conf ${DEVICE}
            ;;
            (dhcpcd)
                    dhcpcd ${DEVICE}
            ;;
    esac

@styerd Is that correct?

styerd commented at 2017-04-10 15:05:

This is only part of the problem. As coded “$DEVICE” never changes in the loop.
I wonder if invoking “dhclient” with no DEVICE parameter wouldn’t do what is intended which seems to be to initiate DHCP on every interface.
That in fact is our current work around, to remove the loop.

jsmeix commented at 2017-12-01 13:42:

Probably nothing that needs to be done for ReaR 2.3
so that I set the milestone to ReaR 2.4.

gdha commented at 2018-05-10 08:35:

Changed label 'minor bug' to 'enhancement' as it is not really a bug (it doesn't break the rescue system). Not urgent to fix so change milestone '2.4' to 'future'

gerhard-tinned commented at 2019-02-13 13:36:

Having the same issue when trying to perform a restore from a from a machine with multiple interfaces (all dhcp). The first call to "dhclient" seems to work and the interface is receiving an IP. All following calls to dlclient fail. I believe the missing "${DEVICE}" in the pid-file name is causing the dhclient to exit with the "already running" error.

Ensuring a seperat pid-file for each dhclinet instance should fix this issue.

gdha commented at 2019-02-26 10:33:

With the committed PRs this issue can be closed


[Export of Github issue for rear/rear.]