#2094 Issue closed: Rear Recover finishes but system will not boot. SLES12 SP3 with BRTFS/ppc64le

Labels: support / question, no-issue-activity

kkoehle opened issue at 2019-03-21 17:52:

Relax-and-Recover (ReaR) Issue Template

Fill in the following items before submitting a new issue
(quick response is not guaranteed with free support):

  • ReaR version: Relax-and-Recover 2.4 / Git

*OS version: SUSE Linux Enterprise Server 12 SP3

  • ReaR configuration files:
OUTPUT=ISO
BACKUP=NETFS
BOOT_OVER_SAN=y
AUTOEXCLUDE_MULTIPATH=n
BACKUP_URL=nfs://9.5.110.188/nfs
REAR_INITRD_COMPRESSION=lzma
REQUIRED_PROGS=( "${REQUIRED_PROGS[@]}" resize snapper chattr lsattr )
COPY_AS_IS=( "${COPY_AS_IS[@]}" /usr/lib/snapper/installation-helper /etc/snapper/config-templates/default )
BACKUP_PROG_INCLUDE=(/srv /boot/grub2/powerpc-ieee1275 /var/log /var/opt /tmp /var/tmp /var/cache /opt /usr/local /var/spool /var/lib/libvirt/images /home)
EXCLUDE_MOUNTPOINTS=( /hana/data /hana/log /hana/shared)
  • Hardware: IBM PoverVM LPAR s824

  • System architecture: PPC64LE

  • Firmware: BIOS and GRUB2

  • Storage: SAN NPIV IBM v7000

  • Description of the issue:
    Rear recover finishes without error, but LPAR will not boot.

  • Workaround, if any: none.

  • Attachments, as applicable ("rear -D mkrescue/mkbackup/recover" debug log files):

rear.txt

schabrolles commented at 2019-03-21 20:38:

@kkoehle,

I tried rear from the git repo on a sles12 without any issue (on KVM not PowerVM)

I notice that you don't have POST_RECOVERY_SCRIPT in your rear local.conf file.

Here is the part I usually add to my SLES12 servers.

## SLES12
#BACKUP_OPTIONS="nfsvers=4,nolock"
REQUIRED_PROGS=( "${REQUIRED_PROGS[@]}" snapper chattr lsattr )
COPY_AS_IS=( "${COPY_AS_IS[@]}" /usr/lib/snapper/installation-helper /etc/snapper/config-templates/default )

for subvol in $(findmnt -n -r -t btrfs | cut -d ' ' -f 1 | grep -v '^/$' | egrep -v 'snapshots|crash') ; do
        BACKUP_PROG_INCLUDE=( "${BACKUP_PROG_INCLUDE[@]}" "$subvol" )
done

POST_RECOVERY_SCRIPT=( 'if snapper --no-dbus -r $TARGET_FS_ROOT get-config | grep -q "^QGROUP.*[0-9]/[0-9]" ; then snapper --no-dbus -r $TARGET_FS_ROOT set-config QGROUP= ; snapper --no-dbus -r $TARGET_FS_ROOT setup-quota && echo snapper setup-quota done || echo snapper setup-quota failed ; else echo snapper setup-quota not used ; fi' )

kkoehle commented at 2019-03-21 22:08:

@schabrolles

I added the POST_RECOVERY_SCRIPT and no change. I don't imagine it is a KVM vs PowerVM thing that determines if a disk shows bootable. Does your SLES12 install use LVM and BTRFS for the boot disk?:

 PowerPC Firmware
 Version FW860.11 ### (SV860_063)
 SMS (c) Copyright IBM Corp. 2000,2016 All rights reserved.
-------------------------------------------------------------------------------
 Current Boot Sequence
 1.   Device is not bootable or removed.
 2.    None
 3.    None
 4.    None
 5.    None

What can I do to make PowerVM think it is bootable? Why is there no error message saying the bootloader didn't work? Is there anything I should check for?

schabrolles commented at 2019-03-22 08:09:

@kkoehle,

agree, I don't think my KVM test could explain the difference. I think there is something wrong with grub2 but I never had this on my rear on Power test.

you can try to run rear -d recover to have a debug log and check if something wrong happens during the grub installation.
usr/share/rear/finalize/Linux-ppc64le/620_install_grub2.sh

I won't be available from today to end of next week.

@jsmeix do you think it could be related to #2093

jsmeix commented at 2019-03-22 08:40:

@kkoehle
use rear -d -D recover (i.e. with -D) to get a full debug log,
cf. "Debugging issues with Relax-and-Recover" at
https://en.opensuse.org/SDB:Disaster_Recovery

jsmeix commented at 2019-03-22 09:12:

@schabrolles
https://github.com/rear/rear/issues/2093 looks totally unrelated
because there 1:1 restore for all fs and disks works fine.

But what happens in this issue here is not a recovery
on 100% compatible hardware because in
https://github.com/rear/rear/files/2993248/rear.txt
there is (excerpts)

Switching to manual disk layout configuration
Original disk /dev/mapper/36005076802818158a400000000000495 does not exist (with same size) in the target system
...
User confirmed disk layout file
Partition primary on /dev/mapper/36005076802818158a4000000000004c0: size reduced to fit on disk.
Doing SLES12-SP1 (and later) btrfs subvolumes setup because the default subvolume path contains '@/.snapshots/'
No code has been generated to recreate pv:/dev/mapper/36005076802818158a4000000000004a2 (lvmdev).
    To recreate it manually add code to /var/lib/rear/layout/diskrestore.sh or abort.
Manually add code that recreates pv:/dev/mapper/36005076802818158a4000000000004a2 (lvmdev)
1) View /var/lib/rear/layout/diskrestore.sh
2) Edit /var/lib/rear/layout/diskrestore.sh
3) Go to Relax-and-Recover shell
4) Continue 'rear recover'
5) Abort 'rear recover'
(default '4' timeout 300 seconds)
1
No code has been generated to recreate pv:/dev/mapper/36005076802818158a40000000000049f (lvmdev).
    To recreate it manually add code to /var/lib/rear/layout/diskrestore.sh or abort.
Manually add code that recreates pv:/dev/mapper/36005076802818158a40000000000049f (lvmdev)
1) View /var/lib/rear/layout/diskrestore.sh
2) Edit /var/lib/rear/layout/diskrestore.sh
3) Go to Relax-and-Recover shell
4) Continue 'rear recover'
5) Abort 'rear recover'
(default '4' timeout 300 seconds)
4
No code has been generated to recreate pv:/dev/mapper/36005076802818158a4000000000004a3 (lvmdev).
    To recreate it manually add code to /var/lib/rear/layout/diskrestore.sh or abort.
Manually add code that recreates pv:/dev/mapper/36005076802818158a4000000000004a3 (lvmdev)
1) View /var/lib/rear/layout/diskrestore.sh
2) Edit /var/lib/rear/layout/diskrestore.sh
3) Go to Relax-and-Recover shell
4) Continue 'rear recover'
5) Abort 'rear recover'
(default '4' timeout 300 seconds)
4
Confirm or edit the disk recreation script
1) Confirm disk recreation script and continue 'rear recover'
2) Edit disk recreation script (/var/lib/rear/layout/diskrestore.sh)
3) View disk recreation script (/var/lib/rear/layout/diskrestore.sh)
4) View original disk space usage (/var/lib/rear/layout/config/df.txt)
5) Use Relax-and-Recover shell and return back to here
6) Abort 'rear recover'
(default '1' timeout 300 seconds)
2
***** At this point I change the line: 
    lvm lvcreate -L 64424509440b -n root system <<<y
****** to: 
    lvm lvcreate -L 55g -n root system <<<y
***** I do this because I am restoring a disk that was 100 GB to a 60 GB disk, and the 64424509440b is too big (especially since swap still needs 2 GB)

So the recovery hapens on a smaller disk and that is a migration
that will for sure not just work.

I was wondering why that

Partition primary on /dev/mapper/36005076802818158a4000000000004c0: size reduced to fit on disk.

happened because that is a far too late "desperate" mesage from
the create_partitions() function when it works aready with
wrong data in disklayout.conf

I would have expected to see messages from
layout/prepare/default/420_autoresize_last_partitions.sh
because of

Device mapper!36005076802818158a400000000000495 does not exist (manual configuration needed)
Switching to manual disk layout configuration

we are in MIGRATION_MODE where in particular scripts like
layout/prepare/default/420_autoresize_last_partitions.sh
should run (regardless that this script alone does not help here
because in case of LVM manual adaptions are needed anyway,
cf. the Resizing partitions in MIGRATION_MODE during "rear recover"
section in default.conf:
https://github.com/rear/rear/blob/master/usr/share/rear/conf/default.conf#L370

kkoehle commented at 2019-03-22 23:19:

Here is the log file: I couldn't see anything that stood out.
rear-hana-n3.log

jsmeix commented at 2019-03-25 12:22:

I think I know why layout/prepare/default/420_autoresize_last_partitions.sh
doesn't actually do something in this particular case:
https://github.com/rear/rear/files/2998736/rear-hana-n3.log contains

+ source /usr/share/rear/layout/prepare/default/420_autoresize_last_partitions.sh
...
++ cp /var/lib/rear/layout/disklayout.conf /var/lib/rear/layout/disklayout.conf.resized_last_partition
...
+++ grep '^disk ' /var/lib/rear/layout/disklayout.conf
++ mv /var/lib/rear/layout/disklayout.conf.resized_last_partition /var/lib/rear/layout/disklayout.conf

i.e. grep '^disk ' /var/lib/rear/layout/disklayout.conf does not find anything.

@kkoehle
we also need at least your var/lib/rear/layout/disklayout.conf
plus some more additional files in the /var/lib/rear/ directory
and in its sub-directories in the ReaR recovery system,
cf. "Debugging issues with Relax-and-Recover" at
https://en.opensuse.org/SDB:Disaster_Recovery

Be careful when attaching files here to not make
possibly confidential internal information public here.
I.e. you may have to obfuscate some values in those files
before you upload those files here.
On the other hand you should not obfuscate too much
so that it would become impossible for us to see
what actually goes on on your particular system.

Additionally describe as exact as you can how
your replacement system where you run "rear recover" differs from
your original system where you had run "rear mkbackup/mkrescue".

kkoehle commented at 2019-04-26 20:07:

@schabrolles @jsmeix, I found the problem: the boot flag never gets set on the correct partition:

hana-n4:~ # parted /dev/mapper/36005076802818158a4000000000004c0
(parted) print                                                            
Number  Start   End     Size    Type     File system  Flags
 1      1049kB  8389kB  7340kB  primary               prep, type=41
 2      8389kB  64.4GB  64.4GB  primary               lvm, type=8e
(parted) set 1 boot on
 1      1049kB  8389kB  7340kB  primary               boot, prep, type=41
 2      8389kB  64.4GB  64.4GB  primary               lvm, type=8e

jsmeix commented at 2019-04-29 10:00:

According to https://github.com/rear/rear/issues/2094#issuecomment-487184999
this is a bootloader issue in MIGRATION_MODE so I think the general issue is
https://github.com/rear/rear/issues/1437

github-actions commented at 2020-06-28 01:33:

Stale issue message


[Export of Github issue for rear/rear.]