#2616 Issue `closed`: diskrestore.sh failed with "mdadm: layout -unknown- not understood for raid0"¶

Labels: bug, fixed / solved / done

cvijayvinoth opened issue at 2021-05-18 05:43:¶

ReaR version ("/usr/sbin/rear -V"):
Relax-and-Recover 2.6 / Git
OS version ("cat /etc/os-release" or "lsb_release -a" or "cat /etc/rear/os.conf"):

NAME="Ubuntu"
VERSION="20.04.1 LTS (Focal Fossa)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 20.04.1 LTS"
VERSION_ID="20.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=focal
UBUNTU_CODENAME=focal

ReaR configuration files ("cat /etc/rear/site.conf" and/or "cat /etc/rear/local.conf"):

OUTPUT=ISO
BACKUP=RSYNC
RSYNC_PREFIX="diskimage_${HOSTNAME}"
BACKUP_PROG="/var/www/html/imageBackup/rsync"
OUTPUT_URL=rsync://diskimage@192.168.1.123::rsync_backup
BACKUP_URL=rsync://diskimage@192.168.1.123::rsync_backup
BACKUP_RSYNC_OPTIONS+=(-z --progress --password-file=/var/www/html/imageBackup/user_profile/diskimage/rsync_pass)
BACKUP_PROG_EXCLUDE+=( "$(</etc/rear/path.txt)/imageBackup/iso/*" "$(</etc/rear/path.txt)/imageBackup/user_profile/*" "$(</etc/rear/path.txt)/imageBackup/data/rsync_pass" )
ISO_DIR="/var/www/html/imageBackup/iso/$HOSTNAME"
MESSAGE_PREFIX="$$: "
PROGRESS_MODE="plain"
AUTOEXCLUDE_PATH=( /tmp )
PROGRESS_WAIT_SECONDS="1"
export TMPDIR="/var/www/html/imageBackup/iso/"
PXE_RECOVER_MODE=automatic
ISO_FILES=("/var/www/html/imageBackup/rsync")
ISO_PREFIX="${HOSTNAME}"
ISO_DEFAULT="automatic"

Hardware (PC or PowerNV BareMetal or ARM) or virtual machine (KVM guest or PoverVM LPAR):
virtual machine
System architecture (x86 compatible or PPC64/PPC64LE or what exact ARM device):
x86
Firmware (BIOS or UEFI or Open Firmware) and bootloader (GRUB or ELILO or Petitboot):
BIOS and GRUB
Storage (local disk or SSD) and/or SAN (FC or iSCSI or FCoE) and/or multipath (DM or NVMe):
local disk
Storage layout ("lsblk -ipo NAME,KNAME,PKNAME,TRAN,TYPE,FSTYPE,SIZE,MOUNTPOINT" or "lsblk" as makeshift):

NAME        MAJ:MIN RM   SIZE RO TYPE  MOUNTPOINT
loop0         7:0    0    55M  1 loop  /snap/core18/1880
loop1         7:1    0  29.9M  1 loop  /snap/snapd/8542
loop2         7:2    0  71.3M  1 loop  /snap/lxd/16099
sda           8:0    0   500G  0 disk
├─sda1        8:1    0     1M  0 part
└─sda2        8:2    0   500G  0 part
  └─md0       9:0    0 999.8G  0 raid0
    ├─md0p1 259:0    0   300G  0 part  /
    ├─md0p2 259:1    0    50G  0 part  /boot
    ├─md0p3 259:2    0   100G  0 part  /home
    └─md0p4 259:3    0    15G  0 part  [SWAP]
sdb           8:16   0   500G  0 disk
├─sdb1        8:17   0     1M  0 part
└─sdb2        8:18   0   500G  0 part
  └─md0       9:0    0 999.8G  0 raid0
    ├─md0p1 259:0    0   300G  0 part  /
    ├─md0p2 259:1    0    50G  0 part  /boot
    ├─md0p3 259:2    0   100G  0 part  /home
    └─md0p4 259:3    0    15G  0 part  [SWAP]

Description of the issue (ideally so that others can reproduce it):
used RAID 0.
Workaround, if any:
Attachments, as applicable ("rear -D mkrescue/mkbackup/recover" debug log files): rear -D recover
rear-vijayyyyyyyyyyy1.log

jsmeix commented at 2021-05-18 12:05:¶

@cvijayvinoth
your https://github.com/rear/rear/files/6498661/rear-vijayyyyyyyyyyy1.log
is no '-D' debugscript log so we cannot see what actually goes on.
It contains (excerpt):

2022: 2021-05-18 05:36:17.435825730 Start system layout restoration.
2022: 2021-05-18 05:36:17.516015546 Stop mdadm
2022: 2021-05-18 05:36:17.521266345 Erasing MBR of disk /dev/sda
2022: 2021-05-18 05:36:17.530102618 Disk '/dev/sda': creating 'gpt' partition table
2022: 2021-05-18 05:36:17.837457944 Disk '/dev/sda': creating partition number 1 with name ''sda1''
2022: 2021-05-18 05:36:18.365539190 Disk '/dev/sda': creating partition number 2 with name ''sda2''
2022: 2021-05-18 05:36:20.268499349 Stop mdadm
2022: 2021-05-18 05:36:20.272854214 Erasing MBR of disk /dev/sdb
2022: 2021-05-18 05:36:20.282599710 Disk '/dev/sdb': creating 'gpt' partition table
2022: 2021-05-18 05:36:20.525363389 Disk '/dev/sdb': creating partition number 1 with name ''sdb1''
2022: 2021-05-18 05:36:20.957689644 Disk '/dev/sdb': creating partition number 2 with name ''sdb2''
2022: 2021-05-18 05:36:22.956187543 Creating software RAID /dev/md0
2022: 2021-05-18 05:36:22.961024143 UserInput: called in /usr/share/rear/layout/recreate/default/200_run_layout_code.sh line 127
2022: 2021-05-18 05:36:22.964361086 UserInput: Default input in choices - using choice number 1 as default input
2022: 2021-05-18 05:36:22.965551677 The disk layout recreation script failed

but that tells nothing what actually failed and nothing at all why it failed.

See the part
"To analyze and debug a "rear recover" failure the following information is mandatory" in
https://en.opensuse.org/SDB:Disaster_Recovery#Debugging_issues_with_Relax-and-Recover
for a complete list of what information we may need to analyze a "rear recover" failure.

cvijayvinoth commented at 2021-05-18 14:18:¶

rear-vijayyyyyyyyyyy1.log
@jsmeix : Here I uploaded the correct log for your reference .

jsmeix commented at 2021-05-18 14:25:¶

OK - now we can see the actually failing command in
https://github.com/rear/rear/files/6501998/rear-vijayyyyyyyyyyy1.log

+++ Print 'Creating software RAID /dev/md0'
+++ test -b /dev/md0
+++ mdadm --create /dev/md0 --force --metadata=1.2 --level=raid0 --raid-devices=2 --uuid=65bb9239:f0273dfd:7fc22ff5:1319e0fb --layout=-unknown- --chunk=512 /dev/sda2 /dev/sdb2
mdadm: layout -unknown- not understood for raid0.

jsmeix commented at 2021-05-18 14:29:¶

There is no unknown in layout/prepare/GNU/Linux/120_include_raid_code.sh
so I assume it comes from something in your disklayout.conf file
so we need that too to be able to do the next step in analyzing what went wrong.

cvijayvinoth commented at 2021-05-19 07:12:¶

Here I attached disklayout.conf file.

disklayout.conf.txt

pcahyna commented at 2021-05-19 08:03:¶

Here is the culprit

raid /dev/md0 metadata=1.2 level=raid0 raid-devices=2 uuid=65bb9239:f0273dfd:7fc22ff5:1319e0fb layout=-unknown- chunk=512 devices=/dev/sda2,/dev/sdb2

Can you please attach the debug log from rear -D mkrescue?

cvijayvinoth commented at 2021-05-19 09:16:¶

rear-vijayyyyyyyyyyy1.log

Here I attached rear -D mkrescue log file.

pcahyna commented at 2021-05-19 09:56:¶

So the culprit is

+++ grep Layout /var/www/html/imageBackup/iso/rear.sLsjObhTRCeYDd8/tmp/mdraid
++ layout=-unknown-

which gets printed by
mdadm --misc --detail /dev/md0
Looking at the source, mdadm can indeed print -unknown- for a RAID layout and it got recently added to RAID0 (it existed before for RAID5 and RAID6 and RAID10): https://git.kernel.org/pub/scm/utils/mdadm/mdadm.git/commit/Detail.c?id=329dfc28debb58ffe7bd1967cea00fc583139aca
I suspect we don't support RAID levels other than RAID1 very well...

jsmeix commented at 2021-05-19 10:04:¶

It is also my gut feeling that (software) RAID0 is not well supported in ReaR
because I assume RAID0 is not often used by system admins on their servers
in contrast to (software) RAID1 which is more often used but I even assume
whatever software RAID is not often used by system admins on their servers
in contrast to real hardware RAID or even SAN storage and things like that
but I am really not an expert in advanced storage technologies.

By the way:
I wonder why software RAID0 and not just LVM "as usual"?

cvijayvinoth commented at 2021-05-21 07:28:¶

@jsmeix : Currently we are having some of the machines in the format of RAID 0. So i tried to check it out.

jsmeix commented at 2021-05-21 09:55:¶

As fas as I see from plain looking at the code the relevant code parts are

during "rear mkrescue" in layout/save/GNU/Linux/210_raid_layout.sh

if [ -n "$layout" ] ; then
    layout=" layout=$layout"
else
    layout=""
fi
...
echo "raid ${device}${metadata}${level}${ndevices}${uuid}${name}${sparedevices}${layout}${chunksize}${devices}"

during "rear recover" in layout/prepare/GNU/Linux/120_include_raid_code.sh

for option in $options ; do
    case "$option" in
...
        (*)
            mdadmcmd="$mdadmcmd --$option"
            ;;
    esac
done

So I assume a possible fix in layout/save/GNU/Linux/210_raid_layout.sh
is to treat layout=-unknown- same as if $layout was empty i.e.

--- usr/share/rear/layout/save/GNU/Linux/210_raid_layout.sh.original    2021-05-18 13:20:14.270440485 +0200
+++ usr/share/rear/layout/save/GNU/Linux/210_raid_layout.sh     2021-05-21 11:53:25.389723396 +0200
@@ -67,9 +67,16 @@
             else
                 sparedevices=""
             fi

-            if [ -n "$layout" ] ; then
+            # mdadm can print '-unknown-' for a RAID layout
+            # which got recently (2019-12-02) added to RAID0 (it existed before for RAID5 and RAID6 and RAID10) see
+            # https://git.kernel.org/pub/scm/utils/mdadm/mdadm.git/commit/Detail.c?id=329dfc28debb58ffe7bd1967cea00fc583139aca
+            # so we treat '-unknown-' same as an empty value to avoid that layout/prepare/GNU/Linux/120_include_raid_code.sh
+            # will create a 'mdadm' command in diskrestore.sh like "mdadm ... --layout=-unknown- ..." which would fail
+            # during "rear recover" with something like "mdadm: layout -unknown- not understood for raid0"
+            # see https://github.com/rear/rear/issues/2616
+            if test "$layout" -a '-unknown-' != "$layout" ; then
                 layout=" layout=$layout"
             else
                 layout=""
             fi

jsmeix commented at 2021-05-21 10:04:¶

@cvijayvinoth
please test if the change in
https://github.com/rear/rear/pull/2618
makes things work for you.

cvijayvinoth commented at 2021-06-09 04:24:¶

sure.. let me check and update you..

cvijayvinoth commented at 2021-06-10 10:02:¶

still facing the same issue. here I have attached the rear -D recover log file.
rear-vijay.log

jsmeix commented at 2021-06-18 09:48:¶

@cvijayvinoth
as far as I see your https://github.com/rear/rear/files/6630326/rear-vijay.log
is only a "rear -D recover" log but my proposed fix in
https://github.com/rear/rear/pull/2618/files
is in usr/share/rear/layout/save/GNU/Linux/210_raid_layout.sh
and the layout/save stage is run during "rear mkrescue/mkbackup"

So to test if https://github.com/rear/rear/pull/2618 makes things work for you
you need to test the whole thing i.e.
first "rear -D mkrescue" or "rear -D mkbackup" and attach its log here
then boot that new created ReaR recovery system on replacement test hardware
and run "rear -D recover" there and also attach its log plus disklayout.conf here.

After the "rear -D mkrescue" or "rear -D mkbackup"
check the new disklayout.conf if it still contains layout=-unknown-.
With my changes in https://github.com/rear/rear/pull/2618
the new disklayout.conf should no longer contain layout=-unknown-.