#2243 Issue closed: IBM Power8 BareMetal rescue USB boot kernel panic: VFS Unable to mount root fs on unknown-block(1,0)

Labels: enhancement, support / question, no-issue-activity

leonardottl opened issue at 2019-09-30 23:27:

Relax-and-Recover (ReaR) Issue Template

Fill in the following items before submitting a new issue
(quick response is not guaranteed with free support):

  • ReaR version ("/usr/sbin/rear -V"):
    Relax-and-Recover 2.5-git.3428.dbdfb5f.master.changed / 2019-09-17

  • OS version ("cat /etc/rear/os.conf" or "lsb_release -a" or "cat /etc/os-release"):
    Distributor ID: Ubuntu
    Description: Ubuntu 16.04.5 LTS
    Release: 16.04
    Codename: xenial

  • ReaR configuration files ("cat /etc/rear/site.conf" and/or "cat /etc/rear/local.conf"):
    OUTPUT=USB
    AUTOEXCLUDE_MULTIPATH=n
    USB_DEVICE_PARTED_LABEL=mbrar'
    BACKUP=NETFS
    BACKUP_URL=usb:///dev/disk/by-label/REAR-000

  • Hardware (PC or PowerNV BareMetal or ARM) or virtual machine (KVM guest or PoverVM LPAR):
    IBM Power 8, Bear Metal: IBM Power System S822LC (8335-GTB)

  • System architecture (x86 compatible or PPC64/PPC64LE or what exact ARM device):
    ppc64el

  • Firmware (BIOS or UEFI or Open Firmware) and bootloader (GRUB or ELILO or Petitboot):
    OP820.30 - PNOR OP8_v1.12_2.96 / BMC 2.13.104548 and Petitboot

  • Storage (local disk or SSD) and/or SAN (FC or iSCSI or FCoE) and/or multipath (DM or NVMe):
    local Disk SSD/Docker

  • Description of the issue (ideally so that others can reproduce it):
    After building the mkrescue on a USB, booting from the USB, the boot up process crashes with the following information, which states a kernel panic error VFS Unable to mount root fs on unknown-block(1,0):

image

  • Workaround, if any: None

  • Attachments, as applicable ("rear -D mkrescue/mkbackup/recover" debug log files):

here are the output from the mkrescue :

Relax-and-Recover 2.5-git.3428.dbdfb5f.master.changed / 2019-09-17
Running rear mkrescue (PID 131354)
Using log file: /var/log/rear/rear-koza.log
Using autodetected kernel '/boot/vmlinux-4.4.0-137-generic' as kernel in the recovery system
Creating disk layout
Docker is running, skipping filesystems mounted below Docker Root Dir /var/lib/docker
Using guessed bootloader 'EFI' (found in first bytes on /dev/sda)
Verifying that the entries in /var/lib/rear/layout/disklayout.conf are correct ...
Creating root filesystem layout
Skipping 'docker0': not bound to any physical interface.
Cannot include default keyboard mapping (no KEYMAPS_DEFAULT_DIRECTORY specified)
Cannot include keyboard mappings (neither KEYMAPS_DEFAULT_DIRECTORY nor KEYMAPS_DIRECTORIES specified)
Copying logfile /var/log/rear/rear-koza.log into initramfs as '/tmp/rear-koza-partial-2019-09-30T15:14:15-07:00.log'
Copying files and directories
Copying binaries and libraries
Copying all kernel modules in /lib/modules/4.4.0-137-generic (MODULES contains 'all_modules')
Copying all files in /lib*/firmware/
Symlink '/lib/modules/4.4.0-137-generic/build' -> '/usr/src/linux-headers-4.4.0-137-generic' refers to a non-existing directory on the recovery system.
It will not be copied by default. You can include '/usr/src/linux-headers-4.4.0-137-generic' via the 'COPY_AS_IS' configuration variable.
Symlink '/etc/pki/nssdb' -> '/var/lib/nssdb' refers to a non-existing directory on the recovery system.
It will not be copied by default. You can include '/var/lib/nssdb' via the 'COPY_AS_IS' configuration variable.
Symlink '/usr/share/misc/magic' -> '/usr/share/file/magic' refers to a non-existing directory on the recovery system.
It will not be copied by default. You can include '/usr/share/file/magic' via the 'COPY_AS_IS' configuration variable.
Testing that the recovery system in /tmp/rear.Xhun18bUFlRMlIR/rootfs contains a usable system
Creating recovery/rescue system initramfs/initrd initrd.cgz with gzip default compression
Created initrd.cgz with gzip default compression (238003711 bytes) in 28 seconds
Exiting rear mkrescue (PID 131354) and its descendant processes ...
Running exit tasks

Here is the log file:
rear-koza.log

jsmeix commented at 2019-10-01 15:33:

@leonardottl
I am not a IBM POWER user so that I cannot help with POWER specific issues.
@schabrolles is our IBM POWER expert
but currently he is busy with client on-site requests
cf. https://github.com/rear/rear/pull/2232#issuecomment-531280327

I notice that in your
https://github.com/rear/rear/files/3673761/rear-koza.log
there is (near the end) only

2019-09-30 15:15:01.120713440 Including output/default/950_copy_result_files.sh
2019-09-30 15:15:01.123360621 Including output/default/950_email_result_files.sh

without the expected log messsage

Copying resulting files to usb location

that should normally come from usr/share/rear/output/default/950_copy_result_files.sh
https://github.com/rear/rear/blob/master/usr/share/rear/output/default/950_copy_result_files.sh
unless things have gone wrong before so that this script does nothing.

I am also missing

... Including output/USB/Linux-i386/850_make_USB_bootable.sh
... Writing MBR of type ...

lines in your log file when running
https://github.com/rear/rear/blob/master/usr/share/rear/output/USB/Linux-i386/850_make_USB_bootable.sh

So I was wondering if before you did run rear mkrescue,
you had formatted your USB disk especially for use with ReaR, cf.
http://relax-and-recover.org/documentation/getting-started

But the log messages about REAR-000 like

2019-09-30 15:14:10.319559019 Including prep/NETFS/default/060_mount_NETFS_path.sh
mkdir: created directory '/tmp/rear.Xhun18bUFlRMlIR/outputfs'
2019-09-30 15:14:10.326117923 Mounting with 'mount -v -o rw,noatime /dev/disk/by-label/REAR-000 /tmp/rear.Xhun18bUFlRMlIR/outputfs'
mount: /dev/sdk1 mounted on /tmp/rear.Xhun18bUFlRMlIR/outputfs.

indicate that you had formatted your USB device for use with ReaR.

Therefore I think the root cause is that scripts like

usr/share/rear/output/USB/Linux-i386/850_make_USB_bootable.sh

have the system architecture Linux-i386 directory in their path
so that such scripts are not run on POWER architecture
(POWER architecture specific scripts would have to have
a Linux-ppc64 or Linux-ppc64le directory in their path)
which means in the end that current ReaR seems to
not support OUTPUT=USB on POWER architecture.

@schabrolles
could you please have a look here and verify whether or not
current ReaR supports OUTPUT=USB on POWER architecture?

schabrolles commented at 2019-10-01 15:50:

@jsmeix I currently never done any changes or test on OUTPUT=USB on Power. SO I think you point where the problem is ;-)
Unfortunately, as you said, it is difficult for me to work on this point for the moment.

leonardottl commented at 2019-10-01 16:42:

Thank you both, @jsmeix and @schabrolles , I think you have identified the issue, thanks. Would you recommend I try to build the ISO output and try with that?

leonardottl commented at 2019-10-01 23:54:

Let me update that using the ISO OUTPUT allowed us to make bootable media and we are now performing the backup, I will update if all goes well. So til now it seems that the issue was the fact that for Power USB OUTPUT seems to not be supported.

schabrolles commented at 2019-10-02 06:24:

@leonardottl, just for a test, if you really want to boot on USB, you can try to copy the iso into a USB device with dd

dd if=<path_to_rear.iso> of=/dev/<USB_block_device> bs=1M

WARNING, it will destroy all the data you have on the USB device.

Tell le if it works

jsmeix commented at 2019-10-02 10:38:

@leonardottl

in general regarding OUTPUT=USB versus OUTPUT=ISO
and BACKUP_URL=usb:... versus BACKUP_URL=nfs:...:

The USB specific parts in ReaR behave somewhat different
compared to the usual way via ISO and backup on NFS.

In particular when you have your backup on USB things are different
compared to backup on NFS, see in particular USB_SUFFIX in default.conf
https://github.com/rear/rear/blob/master/usr/share/rear/conf/default.conf#L754

For some examples of USB specific issues that I found right now see
https://github.com/rear/rear/issues/2171#issuecomment-509143476
and
https://github.com/rear/rear/issues/1738#issuecomment-370394149
and
https://github.com/rear/rear/issues/1571#issuecomment-343461088
and
https://github.com/rear/rear/issues/1158
and several more...

But the most dangerous generic issue when using a USB disk
originated in
https://github.com/rear/rear/issues/1271
which should nowadays be mostly avoided, see
https://github.com/rear/rear/issues/1271#issuecomment-346633365
but that is not yet fully sufficient because there is still
a generic problem when a ReaR recovery disk is used, see
https://github.com/rear/rear/issues/1854

By the way:
Regarding "Creating bootable USB from ReaR ISO" you may have a look at
https://github.com/rear/rear/issues/2210

If you like to try out if OUTPUT=USB could be made
working on POWER with relatively little effort:

  1. Create a new usr/share/rear/output/USB/Linux-ppc64le/ directory.
    For ppc64le versus ppc64el see
    https://github.com/rear/rear/pull/2195#issuecomment-517648246

  2. Copy the scripts in usr/share/rear/output/USB/Linux-i386/
    into the new usr/share/rear/output/USB/Linux-ppc64le/ directory.

  3. Run rear -D mkrescue in full debug mode and inspect the log file
    if things look o.k. when running those scripts.
    If not you have to adapt those scripts as needed for POWER.
    If things look o.k. try to boot the ReaR recovery system
    and if it starts up well try if rear -D recover works.
    If not see in particular "Debugging issues with Relax-and-Recover" in
    https://en.opensuse.org/SDB:Disaster_Recovery

leonardottl commented at 2019-10-02 20:28:

Thank you @jsmeix for the detailed response, I think making the USB directly may be doable then. However, we decided to generate the ISO and burn it on the USB directly using dd as you also suggested, this worked fine!

We have now made the backup successfully, but have one last question. Is it normal for mkbackup to not consider some files or directories, for instance we noticed that the home directory was not included in the tar backup file, is this correct? can we add a flag to also include it and others? Thank you for your time and help.

jsmeix commented at 2019-10-04 06:59:

ReaR runs the tar backup with the --one-file-system option, see
https://github.com/rear/rear/blob/master/usr/share/rear/backup/NETFS/default/500_make_backup.sh#L136

When you use btrfs subvolumes they behave like separated filesystems, see
https://github.com/rear/rear/blob/master/usr/share/rear/conf/examples/SLE12-SP2-btrfs-example.conf#L54

To specify explicitly what and only what is included in the backup you may use
BACKUP_ONLY_INCLUDE and BACKUP_ONLY_EXCLUDE, see
https://github.com/rear/rear/blob/master/usr/share/rear/conf/default.conf#L996

Another hint regarding architecture specific scripts:
Run e.g. rear -s mkrescue (-s is simulation mode) to see what scripts
would be sourced and grep for architecture specific scripts.
For example on my Ubuntu x86 system with OUTPUT=USB I get

# usr/sbin/rear -s mkrescue | grep 'i386'

Source conf/Linux-i386.conf
Source prep/USB/Linux-i386/340_find_mbr_bin.sh
Source prep/USB/Linux-i386/350_check_usb_disk.sh
Source prep/USB/Linux-i386/350_find_syslinux_modules.sh
Source prep/USB/Linux-i386/400_check_extlinux.sh
Source build/Debian/i386/600_fix_debian_stuff.sh
Source output/USB/Linux-i386/100_create_efiboot.sh
Source output/USB/Linux-i386/300_create_extlinux.sh
Source output/USB/Linux-i386/830_copy_kernel_initrd.sh
Source output/USB/Linux-i386/850_make_USB_bootable.sh

For OUTPUT=USB on POWER you need equivalent /USB/ scripts.

leonardottl commented at 2019-10-10 01:26:

Dear @jsmeix and @schabrolles , we are still unable to get rear to backup the entire system, if you can comment on our issue it would be most appreciated, believe us that we are working on our side to solve it but just cannot find what we are doing wrong.

To define which directories to include we are using:
BACKUP_PROG_INCLUDE=( '/*' )
BACKUP_ONLY_INCLUDE=y

should these options force rear to backup the entire file system, or is something else required?

leonardottl commented at 2019-10-10 01:29:

Here is one example where we are using the EXCLUDE option instead, but still not getting all the directories, such as /home and others.

I attach the local.conf and the log file from rear -Dv mkbackup
rear-koza.log
rear-koza.log

OUTPUT=ISO
BACKUP=NETFS
BACKUP_URL="file:///media/externo"
#BACKUP_PROG_INCLUDE=( '/' )
BACKUP_PROG_EXCLUDE=( '/media' '/var/lib/rear' '/tmp' '/proc' '/sys/' '/dev' 'lost+found')
#BACKUP_ONLY_INCLUDE="true"
BACKUP_ONLY_EXCLUDE=y

jsmeix commented at 2019-10-10 09:33:

@leonardottl
before I try to dig around in your logs (as time permits)
I would like to know what output you get for the following commands:

lsblk -ipo NAME,KNAME,PKNAME,TRAN,TYPE,FSTYPE,SIZE,MOUNTPOINT

and

findmnt -a

to get an overview of your block devices and filesystems structure
plus your disklayout.conf file that "rear mkrescue/mkbackup" created.

In general when you use BACKUP_ONLY_INCLUDE=y
you need to explicitly specify in the BACKUP_PROG_INCLUDE array
all filesystem-like thingies (usually the mountpoints of filesystems
and - if you have - btrfs subvolumes) you want to get in your backup.

leonardottl commented at 2019-10-10 16:50:

First, thanks again or the continued support. Here are the outputs and the disklayout file.

findmnt.txt
lsblk.txt

leonardottl commented at 2019-10-10 16:53:

Sorry, disklayout was missing

# Disk /dev/sda
# Format: disk <devname> <size(bytes)> <partition label type>
disk /dev/sda 960197124096 gpt
# Partitions on /dev/sda
# Format: part <device> <partition size(bytes)> <partition start(bytes)> <partition type|name> <flags> /dev/<partition>
part /dev/sda 7340032 1048576 rear-noname prep /dev/sda1
part /dev/sda 255852544 8388608 rear-noname none /dev/sda2
part /dev/sda 959932530688 264241152 rear-noname lvm /dev/sda3
# Disk /dev/sdb
# Format: disk <devname> <size(bytes)> <partition label type>
#disk /dev/sdb 960197124096 msdos
# Partitions on /dev/sdb
# Format: part <device> <partition size(bytes)> <partition start(bytes)> <partition type|name> <flags> /dev/<partition>
#part /dev/sdb 960196075520 1048576 extended none /dev/sdb1
#part /dev/sdb 960195026944 2097152 logical none /dev/sdb5
# Disk /dev/sdc
# Format: disk <devname> <size(bytes)> <partition label type>
#disk /dev/sdc 0 
# Partitions on /dev/sdc
# Format: part <device> <partition size(bytes)> <partition start(bytes)> <partition type|name> <flags> /dev/<partition>
# Disk /dev/sdd
# Format: disk <devname> <size(bytes)> <partition label type>
#disk /dev/sdd 0 
# Partitions on /dev/sdd
# Format: part <device> <partition size(bytes)> <partition start(bytes)> <partition type|name> <flags> /dev/<partition>
# Disk /dev/sde
# Format: disk <devname> <size(bytes)> <partition label type>
#disk /dev/sde 0 
# Partitions on /dev/sde
# Format: part <device> <partition size(bytes)> <partition start(bytes)> <partition type|name> <flags> /dev/<partition>
# Disk /dev/sdf
# Format: disk <devname> <size(bytes)> <partition label type>
#disk /dev/sdf 0 
# Partitions on /dev/sdf
# Format: part <device> <partition size(bytes)> <partition start(bytes)> <partition type|name> <flags> /dev/<partition>
# Disk /dev/sdg
# Format: disk <devname> <size(bytes)> <partition label type>
#disk /dev/sdg 0 
# Partitions on /dev/sdg
# Format: part <device> <partition size(bytes)> <partition start(bytes)> <partition type|name> <flags> /dev/<partition>
# Disk /dev/sdh
# Format: disk <devname> <size(bytes)> <partition label type>
#disk /dev/sdh 0 
# Partitions on /dev/sdh
# Format: part <device> <partition size(bytes)> <partition start(bytes)> <partition type|name> <flags> /dev/<partition>
# Disk /dev/sdi
# Format: disk <devname> <size(bytes)> <partition label type>
#disk /dev/sdi 0 
# Partitions on /dev/sdi
# Format: part <device> <partition size(bytes)> <partition start(bytes)> <partition type|name> <flags> /dev/<partition>
# Disk /dev/sdj
# Format: disk <devname> <size(bytes)> <partition label type>
#disk /dev/sdj 0 
# Partitions on /dev/sdj
# Format: part <device> <partition size(bytes)> <partition start(bytes)> <partition type|name> <flags> /dev/<partition>
# Disk /dev/sdk
# Format: disk <devname> <size(bytes)> <partition label type>
#disk /dev/sdk 2000398934016 msdos
# Partitions on /dev/sdk
# Format: part <device> <partition size(bytes)> <partition start(bytes)> <partition type|name> <flags> /dev/<partition>
#part /dev/sdk 2000397795328 1048576 primary none /dev/sdk1
# Format for LVM PVs
# lvmdev <volume_group> <device> [<uuid>] [<size(bytes)>]
lvmdev /dev/koza-vg /dev/sda3 Jrj4gZ-jB4Q-MptE-WqNU-Hb6p-bFbX-WukoVT 1874868224
# Format for LVM VGs
# lvmgrp <volume_group> <extentsize> [<size(extents)>] [<size(bytes)>]
lvmgrp /dev/koza-vg 4096 228865 937431040
# Format for LVM LVs
# lvmvol <volume_group> <name> <size(bytes)> <layout> [key:value ...]
lvmvol /dev/koza-vg root 956825600000b linear 
lvmvol /dev/koza-vg swap_1 3070230528b linear 
# Filesystems (only ext2,ext3,ext4,vfat,xfs,reiserfs,btrfs are supported).
# Format: fs <device> <mountpoint> <fstype> [uuid=<uuid>] [label=<label>] [<attributes>]
fs /dev/mapper/koza--vg-root / ext4 uuid=ba64eb10-3b19-4d13-ad53-038f08c8d6cf label= blocksize=4096 reserved_blocks=5% max_mounts=-1 check_interval=0d bytes_per_inode=16383 default_mount_options=user_xattr,acl options=rw,relatime,errors=remount-ro,data=ordered
fs /dev/sda2 /boot ext2 uuid=4680ee82-6a53-486c-a7ce-c1989b4986c5 label= blocksize=1024 reserved_blocks=4% max_mounts=-1 check_interval=0d bytes_per_inode=4093 default_mount_options=user_xattr,acl options=rw,relatime,block_validity,barrier,user_xattr,acl
#fs /dev/sdk1 /media/externo ext4 uuid=24945c87-1ef0-4ba9-8a0d-e0b199c85416 label=externo blocksize=4096 reserved_blocks=4% max_mounts=-1 check_interval=0d bytes_per_inode=16383 default_mount_options=user_xattr,acl options=rw,relatime,data=ordered
# Swap partitions or swap files
# Format: swap <filename> uuid=<uuid> label=<label>
swap /dev/mapper/koza--vg-swap_1 uuid=61858b20-22b9-43e7-a7c0-06ac2bea4614 label=

jsmeix commented at 2019-10-11 10:23:

Ugh - looks somewhat complicated what is mounted there,
in particular I know nothing at all about container stuff
like kubernetes and kubelet or docker.

I see nothing about home at all so it seems the /home/
directory belongs to the root filesystem so that it should normally
just be automatically included in the tar backup.

Currently I have no idea why it is not included in the backup
in this particular case here.

Is perhaps what appears as home not a regular directory
of the root filesystem but some kind of indirectly provided stuff
via some container thingy like kubernetes, docker and whatnot?

leonardottl commented at 2019-10-11 14:42:

In fact home is just an example of a directory that is missing, in fact home is basically empty in our setup, because most of the work is done on containers, and stored on a data disk, so home is basically configuration stuff for each user, not much else. But other directories are also missing.
So yes, home is in the root (which it should not be we know), but it is not being included in the bakup, in general the backup is only a fraction of the size of the whole system .

Some other points to consider:

  • the backup is bien made from the booted OS of the server, not from the ISO (is this a problem?)
  • We estimate that the size of the backup should be 100GB, it is only like 18GB
  • within the tar we see there are missing directories (such as home but others as well)
  • when we make the backup we turn off all the services that are writing to disk, such as docker and kubernets

jsmeix commented at 2020-03-27 12:51:

@leonardottl
I think it will not provide a real solution for your issue
so mainly FYI you may have a look at a similar issue
https://github.com/rear/rear/issues/2344

Perhaps it might even help you a bit nevertheless.

Regarding huge files on a ISO i.e. with OUTPUT=ISO on PPC64le
(in particular having a backup.tar.gz that is bigger than 4GiB in the ISO)
see in particular
https://github.com/rear/rear/issues/2344#issuecomment-602507147
and subsequent comments and the links therein
that are about using the -iso-level 3 option also on PPC64le, cf.
https://github.com/rear/rear/issues/2344#issuecomment-601655379

With current ReaR GitHub master code
the -iso-level 3 option is now also used on PPC64le, cf.
https://github.com/rear/rear/pull/2346

If you like to try that out see the section
"Testing current ReaR upstream GitHub master code"
https://en.opensuse.org/SDB:Disaster_Recovery

jsmeix commented at 2020-03-27 13:51:

@leonardottl
right now I did
https://github.com/rear/rear/issues/2348

Please have a look there if I described the issue correctly.
I am not a PPC user so I may misunderstand things.

leonardottl commented at 2020-03-28 02:47:

@jsmeix
Thanks for all the tips. I did not update, but our issue was resolved, we updated all of our system, including the OS and everything else. REAR is now working as it should, and backs up all of our docker system. Let me get back to you with the currently installed versions and other details.

pcahyna commented at 2020-03-28 18:29:

It is strange though because according to #2344 a PreP partition is missing, so the system should not even start booting, while here it starts booting and can not mount root.

schabrolles commented at 2020-03-28 19:18:

@pcahyna, the reason why the system can boot is because it is using petitboot (https://github.com/open-power/petitboot) used on Power BareMetal system (like LC models).
In a nuteshell petitboot is micro-linux in firmware which load and scan disk and network to find grub configuration and aggregate then into a single menu (for disk, SAN, network, dvd, usb). ... So, in that case, PreP partition is not needed.

pcahyna commented at 2020-03-28 22:10:

@schabrolles thanks for the explanation. So it looks that this and #2344 are two different issues.

schabrolles commented at 2020-03-30 05:20:

@pcahyna, yes you are right... #2344 is PowerVM LPAR, in that case, Prep is needed.

github-actions commented at 2020-06-26 01:39:

Stale issue message


[Export of Github issue for rear/rear.]