#2016 Issue `closed`: Unable to clone Power Lpar from rear backup of SLES12 SP3 SAP¶

Labels: support / question, fixed / solved / done

dewagner1 opened issue at 2019-01-14 14:42:¶

Relax-and-Recover (ReaR) Issue Template¶

Fill in the following items before submitting a new issue
(quick response is not guaranteed with free support):

ReaR version ("/usr/sbin/rear -V"):
/usr/sbin/rear -V
Relax-and-Recover 2.4-git.0.6ec9075.unknown / 2018-12-05
OS version ("cat /etc/rear/os.conf" or "lsb_release -a" or "cat /etc/os-release"):
cat /etc/os-release

NAME="SLES"
VERSION="12-SP3"
VERSION_ID="12.3"
PRETTY_NAME="SUSE Linux Enterprise Server 12 SP3"
ID="sles"
ANSI_COLOR="0;32"
CPE_NAME="cpe:/o:suse:sles_sap:12:sp3"

ReaR configuration files ("cat /etc/rear/site.conf" and/or "cat /etc/rear/local.conf"):
cat /etc/rear/site.conf
cat: /etc/rear/site.conf: No such file or directory

cat /etc/rear/local.conf

# Default is to create Relax-and-Recover rescue media as ISO image
# set OUTPUT to change that
# set BACKUP to activate an automated (backup and) restore of your data
# Possible configuration values can be found in /usr/share/rear/conf/default.conf
#
# This file (local.conf) is intended for manual configuration. For configuration
# through packages and other automated means we recommend creating a new
# file named site.conf next to this file and to leave the local.conf as it is.
# Our packages will never ship with a site.conf.
AUTOEXCLUDE_MULTIPATH=n
BOOT_OVER_SAN=y
REAR_INITRD_COMPRESSION=lzma
OUTPUT=ISO
ISO_MAX_SIZE=4000
BACKUP=NETFS
BACKUP_URL=iso:///iso_fs/REAR_BACKUP
ISO_DIR=/iso_fs/REAR_ISO
TMPDIR=/iso_fs/REAR_TEMP
OUTPUT_URL=null
BOOT_FROM_SAN=y
EXCLUDE_MOUNTPOINTS=( /iso_fs /opt/IBM/ITM /cv /opt/splunkforwarder /opt/teamquest /var/opt/BESClient /usr/sap /hana/data /hana/log /hana/shared /usr/sap/basis /usr/sap/srm /PA_backup )
EXCLUDE_COMPONENTS=( /dev/mapper/20017380034880282 /dev/mapper/20017380034880283 /dev/mapper/2001738003488028a /dev/mapper/2001738003488028b /dev/mapper/2001738003488028c /dev/mapper/2001738003488028d /dev/mapper/2001738003488028e /dev/mapper/2001738003488028f /dev/mapper/20017380034880290 /dev/mapper/20017380034880291 /dev/mapper/20017380034880292 /dev/mapper/20017380034880293 )

Hardware (PC or PowerNV BareMetal or ARM) or virtual machine (KVM guest or PoverVM LPAR):
PowerVM LPAR
System architecture (x86 compatible or PPC64/PPC64LE or what exact ARM device):
PPC64LE
Firmware (BIOS or UEFI or Open Firmware) and bootloader (GRUB or ELILO or Petitboot):
GRUB
Storage (lokal disk or SSD) and/or SAN (FC or iSCSI or FCoE) and/or multipath (DM or NVMe):
SAN FC
Description of the issue (ideally so that others can reproduce it):
Unable to clone LPAR. Rear doesn't appear to be discovering disk properly
Workaround, if any:
I made three attempts at restoring the rear image. I ran the third attempt with the debug options. You can search the txt file for "1st Attempt", "2nd Attempt" or "3rd Attempt". Also, the debug log file is appended to end of txt file. Search for "Debug Log File".
Attachments, as applicable ("rear -D mkrescue/mkbackup/recover" debug log files):
eniesdbp102p.txt

jsmeix commented at 2019-01-15 09:49:¶

@dewagner1
your ReaR version and system info and your /etc/rear/local.conf contents
match the ones in https://github.com/rear/rear/issues/2002
and your attached eniesdbp102p.txt contains
for your "2nd Attempt" and "3rd Attempt"

Current disk mapping table (source => target):
  /dev/mapper/20017380034880281 => /dev/sda

which matches the "Description of the issue" in
https://github.com/rear/rear/issues/2002#issue-389454923

Therefore I assume that this issue here could be the same
or at least similar as https://github.com/rear/rear/issues/2002

In https://github.com/rear/rear/issues/2002 see in particular
https://github.com/rear/rear/issues/2002#issuecomment-446343401
and
https://github.com/rear/rear/issues/2002#issuecomment-446635663

schabrolles commented at 2019-01-15 09:57:¶

@dewagner1,

I notice that a device like /dev/mapper/SIBM_XXXX was detected during the first attempt.

As describe is this previous here (https://github.com/rear/rear/issues/2002#issuecomment-446209344) this usually indicates a SAN configuration problem. (This could explain why during 2nd and 3rd attempt no multipath devices were found).

Issue #2002 was solved by checking SAN and XIV Storage configuration with SAN Admin (https://github.com/rear/rear/issues/2002#issuecomment-446343401)

Could you have a look?

jsmeix commented at 2019-01-15 10:00:¶

@dewagner1
https://github.com/rear/rear/issues/2002#issuecomment-446635663
means that at

Current disk mapping table (source => target):
  /dev/mapper/20017380034880281 => /dev/sda

UserInput -I LAYOUT_MIGRATION_CONFIRM_MAPPINGS needed in /usr/share/rear/layout/prepare/default/300_map_disks.s                                              h line 275
Confirm or edit the disk mapping
1) Confirm disk mapping and continue 'rear recover'
2) n/a
3) Edit disk mapping (/var/lib/rear/layout/disk_mappings)
4) Use Relax-and-Recover shell and return back to here
5) Abort 'rear recover'

you can choose "Use Relax-and-Recover shell and return back to here"
to interrupt rear restore and in the Relax-and-Recover shell you can
run multipath -r to reload multipath.

Alternatively it might help to manually call multipath -r to reload multipath
before you call "rear recover"?

FYI:
I am neither a SAN nor a multipath user so that I cannot provide real
educated help but I try nevertheless - @schabrolles is our ReaR upstream
expert for IBM POWER in general and for SAN and multipath in particular.

jsmeix commented at 2019-01-15 10:15:¶

@dewagner1
I was wrong in my https://github.com/rear/rear/issues/2016#issuecomment-454334115

I won't help to choose "Use Relax-and-Recover shell and return back to here"
and run multipath -r because reloding multipath at this point
won't let ReaR create a new disk mapping table, see the code in
https://raw.githubusercontent.com/rear/rear/master/usr/share/rear/layout/prepare/default/300_map_disks.sh

@schabrolles
could it be that things happen too fast during "rear recover" in
usr/share/rear/layout/prepare/GNU/Linux/210_load_multipath.sh
because there is not any waiting time between

multipathd >&2 && LogPrint "multipathd started" ...

and

LogPrint "Listing multipath device found"
LogPrint "$(multipath -l ...

Perhaps a started multipathd needs some time
until multipath -l could reliably report multipath devices?

Is it perhaps even an error when multipathd needs to be started
but then no multipath devices can be found?

schabrolles commented at 2019-01-15 10:33:¶

@jsmeix,

The log shows a device named /dev/mapper/SIBM_2810XIV_780348802B1 ... This one looks strange to me and usually means something wrong in the SAN zoning configuration.

@dewagner1,

You can exit to shell and send the output of multipath -l
If possible, double check the SAN zoning configuration with your SAN admin.

dewagner1 commented at 2019-01-16 12:21:¶

Thank you for your quick response. I went back to our SAN team and they discovered an issue in their zoning configuration.

jsmeix commented at 2019-01-16 13:22:¶

@dewagner1
thanks for your feedback.
Such explicit feedback helps us when we know the root cause was
outside of ReaR and not something (possibly subtle) within ReaR.

@schabrolles
do you think there is a sufficiently simple and sufficiently reliable test
that ReaR could do to autodetect when the SAN configuration seems
to be broken?
(I assume grep for '/dev/mapper/SIBM' is not sufficiently reliable.)
If there is such a test ReaR could show at least a warning message
(or even error out) instead of blindly proceed and error out later
with various weird errors, cf. "Try to care about possible errors" in
https://github.com/rear/rear/wiki/Coding-Style

schabrolles commented at 2019-01-16 13:32:¶

@jsmeix,
I don't see an easy way to detect such kind of SAN issue. the fact the /dev/mapper/SIBM*** means there is something strange with the SAN zoning is just based on my personal experience with Linux and IBM SAN storage... I can't really tell what strange multipath name will be used with other storage vendors.
I usually look at the multipath -ll output to check the number of the available paths to verify the SAN zoning (You should do it only the first time a system is booted). I think the problem here is that the rear backup is not restored to the original system but on a fresh NEW built system (with sometimes some SAN misconfiguration).

jsmeix commented at 2019-01-16 16:46:¶

@schabrolles
thank you for the explanation.
It is much appreciated!

[Export of Github issue for rear/rear.]

#2016 Issue closed: Unable to clone Power Lpar from rear backup of SLES12 SP3 SAP¶