#2990 Issue open: rear recover failed for Ubuntu Server 20.04 mdadm XFS RAID-1 disk config

Labels: support / question

danboid opened issue at 2023-05-19 14:06:

  • ReaR version ("/usr/sbin/rear -V"):

2.7 - installed using Ubuntu 23.04 .deb

  • OS version ("cat /etc/os-release" or "lsb_release -a" or "cat /etc/rear/os.conf"):

Ubuntu Server 20.04.6 amd64

  • ReaR configuration files - /etc/rear/local.conf
### write the rescue initramfs to USB and update the USB bootloader
OUTPUT=USB

### create a backup using the internal NETFS method, using 'tar'
BACKUP=NETFS

### write both rescue image and backup to the device labeled REAR-000
BACKUP_URL=usb:///dev/disk/by-label/REAR-000
  • Hardware vendor/product (PC or PowerNV BareMetal or ARM) or VM (KVM guest or PowerVM LPAR):

Viglen i7 desktop

  • System architecture (x86 compatible or PPC64/PPC64LE or what exact ARM device):

amd64

  • Firmware (BIOS or UEFI or Open Firmware) and bootloader (GRUB or ELILO or Petitboot):

UEFI

  • Storage (local disk or SSD) and/or SAN (FC or iSCSI or FCoE) and/or multipath (DM or NVMe):

Local USB 3.0 disk

  • Description of the issue (ideally so that others can reproduce it):

I want to use rear to backup and recover Ubuntu Server machines installed using mdadm software RAID (RAID-10 in production, RAID-1 here) but I have not been able to get rear recover to work with my mdadm RAID-1 test box, it errors out with The disk layout recreation script failed

My test machine has 2x SATA 3.0 disks. The slightly odd thing about this test box is that the disks are of different sizes - one is 500 GB and the other is ~ 250 GB so I created a 222 GB partition on each for mdadm to use for the RAID-1 array which I format with XFS. I would've expected both disks not being the same same shouldn't be an issue for rear because Ubuntu doesn't have an issue with this?

Note that I also install Ubuntu server so that it uses multiple UEFI boot partitions. rear supports backing up and recovering multiple UEFI partitions I presume? I think this was introduced in the Ubuntu server 20.04 installer - very useful feature!

In this test I installed Ubuntu Server 20.04.6, installed rear 2.7, did a backup then rm -rf --no-preserve-root /'d the disk so most/all of the files were deleted but the partition layout / mdadm config remained before booting off the rear disk and running rear recover. I am not using LVM or (LUKS) disk encryption. The recover log is attached.

rear-testa.log

danboid commented at 2023-05-19 14:36:

I have also tried manually wiping both disks using sgdisk --zap-all before running rear recover but it fails at the same point, just after it tries to create /dev/md0 with The disk layout recreation script failed.

danboid commented at 2023-05-19 14:45:

rear spat out a few extra errors after quitting it which I don't think made it into that log

rear-md-recovery-errors

jsmeix commented at 2023-05-22 11:38:

I didn't look at the details but because you wrote

most/all of the files were deleted
but the partition layout / mdadm config remained

see the section
"Prepare replacement hardware for disaster recovery" in
https://en.opensuse.org/SDB:Disaster_Recovery
therein in particular the part about

When your replacement storage is not pristine
new storage (i.e. when it had been ever used before),
you must completely zero out your replacement storage.
Otherwise ...

See also 'DISKS_TO_BE_WIPED'
in usr/share/rear/conf/default.conf
or online for ReaR 2.7 at
https://github.com/rear/rear/blob/rear-2.7/usr/share/rear/conf/default.conf#L440

danboid commented at 2023-05-22 11:41:

@jsmeix

It seems rear is supposed to support mdadm. Has nobody tested it with multiple UEFI partitions?

I have noticed that if I choose View original disk space usage in rear recover (after the disk layout recreation error) it lists /dev/md0 , /dev/sda1 as /boot/efi (so only one EFI partition, only one would be active at the time of creating the backup ofc) then 4 loop mounts for snap packages. Does that sound right?

In response to your reply, I have tried using sgdisk to wipe both disks so that all partitioning/mdadm info is removed yet it still fails at the same point. I initially tried to restore after just deleting files from md0, which is a "more likely" scenario.

danboid commented at 2023-05-22 11:48:

I didn't use "'DISKS_TO_BE_WIPED'" when I was creating the backup.

How do I enable this when doing a recover? I'm not expecting it will fix my issue because as I said, I've already tried restoring after manually wiping them using sgdisk --zap-all

danboid commented at 2023-05-22 15:16:

wipefs seems to be causing problems.

I have noticed that even if both of my SATA disks that I wish to restore to are totally wiped already before I run rear recover, rear still insists on running wipefs on both.

If I try running wipefs -a /dev/sda2 from the rear command line I get the error:

wipefs: error: /dev/sda2: probing initialization failed: Device or resource busy

Surely there is no need to run wipefs on the partitions if they're newly created? Why not use sgdisk instead of wipefs for when wiping partitions is required?

danboid commented at 2023-05-23 19:00:

I'll try this again tomorrow but using

wipefs -af /dev/sda2

rear 2.7 doesn't currently seem to use the -f (force) option with wipefs, it could be necessary in my case to use force for it to work at all?

https://askubuntu.com/questions/926698/wipefs-device-or-resource-busy

jsmeix commented at 2023-05-24 05:16:

Ah! Now I see it:
In usr/share/rear/layout/prepare/GNU/Linux/131_include_filesystem_code.sh
we have

cleanup_command="wipefs --all --force $device || wipefs --all $device || dd if=/dev/zero of=$device bs=512 count=1 || true"

https://github.com/rear/rear/blob/rear-2.7/usr/share/rear/layout/prepare/GNU/Linux/131_include_filesystem_code.sh#L35
BUT
in usr/share/rear/layout/prepare/GNU/Linux/120_include_raid_code.sh
we have only

wipefs -a \$component_device

https://github.com/rear/rear/blob/rear-2.7/usr/share/rear/layout/prepare/GNU/Linux/120_include_raid_code.sh#L93

@danboid
try if it works better for you when you replace in your
usr/share/rear/layout/prepare/GNU/Linux/120_include_raid_code.sh

wipefs -a \$component_device

with

wipefs -af \$component_device

but '-f' may not work because "man wipefs" tells

-f, --force
Force erasure, even if the filesystem is mounted.
This is required in order to erase
a partition-table signature on a block device.

and I think in your case it is not yet mounted
so '-f' may not improve things in your case.
Nevertheless please try out if it helps.

danboid commented at 2023-05-24 13:16:

I replaced
wipefs -a \$component_device in usr/share/rear/layout/prepare/GNU/Linux/120_include_raid_code.sh

with

wipefs -af \$component_device

but rear recover still fails at the same step (disk layout recreation) but with a different error now. When I quit out of rear it spews:

mdadm: super1.x cannot open /dev/sda2: Device or resource busy
mdadm: /dev/sda2 is not suitable for this array.
mdadm: super1.x cannot open /dev/sdb2: Device or resource busy
mdadm: /dev/sdb2 is not suitable for this array.
mdadm: create aborted

So yes, I think -af is needed but thats not all that needs to change.

danboid commented at 2023-05-24 13:24:

Woohoo!

It works!

I didn't change anything else, I just re-ran rear recover after my failed attempt and it worked on my second attempt. Maybe its just a case of adding a short sleep after running wipefs?

pcahyna commented at 2023-05-24 13:27:

Ah! Now I see it: In usr/share/rear/layout/prepare/GNU/Linux/131_include_filesystem_code.sh we have

cleanup_command="wipefs --all --force $device || wipefs --all $device || dd if=/dev/zero of=$device bs=512 count=1 || true"

https://github.com/rear/rear/blob/rear-2.7/usr/share/rear/layout/prepare/GNU/Linux/131_include_filesystem_code.sh#L35 BUT in usr/share/rear/layout/prepare/GNU/Linux/120_include_raid_code.sh we have only

wipefs -a \$component_device

https://github.com/rear/rear/blob/rear-2.7/usr/share/rear/layout/prepare/GNU/Linux/120_include_raid_code.sh#L93

Not particularly related to the issue here, but to me it seems that wiping a filesystem that is currently mounted is a bad idea.

danboid commented at 2023-05-24 14:12:

I made a very similar comment @pcahyna, it seems unnecessary to me too.

I was successfully able to boot the restored system from either disk after unplugging one so yes, multiple UEFI partitions does seem to work with rear and mdadm.

jsmeix commented at 2023-05-25 08:16:

The wipefs '--force' option originated in
https://github.com/rear/rear/issues/1327

+++ wipefs -a /dev/sda1
wipefs: /dev/sda1: ignoring nested "dos" partition table on non-whole disk device
wipefs: Use the --force option to force erase.
+++ mkfs -t ext3 -b 4096 -i 16384 -U 5dc25119-fc7c-4d93-93fb-2b26a6916036 /dev/sda1
mke2fs 1.42.11 (09-Jul-2014)
Found a dos partition table in /dev/sda1
Proceed anyway? (y,n)

at
https://github.com/rear/rear/issues/1327#issuecomment-296662350
and it was confirmed in
https://github.com/rear/rear/issues/1327#issuecomment-297002650
that it works this way
so it became implemented via
https://github.com/rear/rear/commit/bcf7c1d2f528efd9e558a9c76f4fa632c2010cf1
which shows that before it had been

wipefs_command="wipefs -a $device"

which I had initially implemented via
https://github.com/rear/rear/commit/997d4b345bae9ec5f2c81c2df09bf4a87bc456a9

jsmeix commented at 2023-05-25 08:23:

I don't like to spend any more of my time in this snakepit.
I appreciate all valuable contributions to ReaR
which make this stuff actually work preferably
in all cases for all kind of users with all their
different use cases ideally simply just automatically.

danboid commented at 2023-05-25 11:25:

@jsmeix

So you're saying that you don't want to add the wipefs force option? I can commit the change if you want. I think you're referring to whether or not we carry on using wipefs as it is right?

I'm going to be doing some more testing with Ubuntu server, mdadm and rear, maybe tomorrow if not soon. I'm going to test that md RAID 10 restores with 4 UEFI partitions works. I presume it will now I've got md RAID 1 working so if there's anything you'd like me to test out when doing that then let me know.

jsmeix commented at 2023-05-25 12:03:

The wipefs force option got added because of a user case
but now this is considered unnecessary and a bad idea
so I leave it to others to implement something better
that behaves properly in all (possibly conflicting) cases.

jsmeix commented at 2023-06-02 09:55:

From my (limited) experience with several issues like this
which we had here in the past the common root cause is:
The replacement storage does not behave same as pristine
new storage i.e. replacement disks had been used before
but were not completely zeroed out before "rear recover".

All those 'wipefs' and 'dd ... if=/dev/zero' things
during "rear recover" can be only best effort attempts
to mitigate bad effects when "rear recover" runs on hardware
(or virtual machines) with used but not cleaned target disks.

danboid commented at 2023-06-07 12:51:

I have now tested using rear to recover a Ubuntu server 20.04.6 mdadm RAID 10 install and it works fine provided I choose option #2 (Confirm identical disk mapping and proceed without manual configuration) after running rear recover.

pcahyna commented at 2023-06-07 13:13:

@danboid if you have a setup that exhibited the problem, could you please try investigating what was the root cause? See https://github.com/rear/rear/pull/2996#issuecomment-1572076921 .

danboid commented at 2023-06-07 19:44:

I'll do that on Friday, it seems I have another bug to report in rear 2.7 and I also have another suggestion or two regarding rear recover.


[Export of Github issue for rear/rear.]