#1722 Issue closed: Cannot restore disk built on multipath + md

Labels: enhancement, fixed / solved / done

rmetrich opened issue at 2018-02-01 14:57:

Relax-and-Recover (ReaR) Issue Template

Fill in the following items before submitting a new issue
(quick response is not guaranteed with free support):

  • rear version (/usr/sbin/rear -V):

rear 2.3

  • OS version (cat /etc/rear/os.conf or lsb_release -a):

RHEL 6 or 7

  • rear configuration files (cat /etc/rear/site.conf or cat /etc/rear/local.conf):

AUTOEXCLUDE_MULTIPATH=n

  • Brief description of the issue:

When the system disk consists in a multipath disk + software RAID, recovering the system if the disk is already partitioned will fail.

This is due to the Software Array being built at boot time, when disks are discovered. On RHEL, at least, this is done using a udev rule.
Because of the Software Array being built, the multipath map cannot be created (devices are busy).

  • Work-around, if any:

Stop the Software Arrays, and disable the udev rule.

rmetrich commented at 2018-02-01 15:03:

Some more details below.

  • rear 2.0 would fail trying to partition the multipath device /dev/mpathX

  • rear 2.3 proposes some disk mapping from /dev/mpathX to one of the real disk (e.g. /dev/sda), because it believes the disk doesn't exist. Later, upon parted running, parted will fail with the following error:

+++ parted -s /dev/sda mklabel msdos
Error: Partition(s) 2 on /dev/sda have been written, but we have been unable to inform the kernel of the change, probably because it/they are in use. As a result, the old partition(s) will remain in use. You should reboot now before making further changes.

The error is due to the use of /dev/sda by the MD array.

On RHEL, disabling the udev rule /lib/udev/rules.d/65-md-incremental.rules does the trick, because arrays will not be constructed automatically anymore
I don't know for other Linux variants ...

jsmeix commented at 2018-02-06 10:17:

@rmetrich
regarding "disabling the udev rule /lib/udev/rules.d/65-md-incremental.rules"
cf. what we already try to do in
usr/share/rear/layout/prepare/GNU/Linux/100_include_partition_code.sh
and see
https://github.com/rear/rear/commit/ff1bb730d38eb42e2abf866736cf188bef0b8b9b
and https://github.com/rear/rear/issues/533
how doing such things unconditioned could have
bad consequences on other Linux distributions.

rmetrich commented at 2018-02-06 10:32:

What is tried in #533 doesn't work for multipath devices, because multipath devices are currently not "discovered" at boot, leading to the creation of software raids, and then causing multipath, once loaded to fail mapping the devices.

I do see 2 solutions:

  1. either disable software raid creation at boot completely (using my proposed udev rule overloading)
  2. or include multipath support in the initramfs, as it is currently done in standard initramfs

jsmeix commented at 2018-02-06 10:36:

@rmetrich
because I have zero experience with multipath
I agree with anything that you do for multipath.

Perhaps @schabrolles could have a look (as time permits)
because he did lots of improvements for multipath in ReaR.

jsmeix commented at 2018-02-06 13:14:

@schabrolles
I dared to assign this issue to you - even if you may not have time right now.
I will still have a look but as a multipath-noob I cannot actually help here
or do a meaningful review of a pull request.

jsmeix commented at 2018-06-05 08:57:

Because the initial description
https://github.com/rear/rear/issues/1722#issue-293560693
reads (excerpts):

... recovering the system if the disk is already partitioned will fail ...
... at boot time ... [something] ... is done using a udev rule ...
... devices are busy 

the actual root cause of this issue is old (meta)-data on the disk
which is known to cause arbitrary kind of weird failures for "rear recover"
and that ReaR still has no generic "cleanupdisk" script, see
https://github.com/rear/rear/issues/799

@rmetrich
could you check in the ReaR recovery system before you run "rear recover"
what disk device nodes (like /dev/sda) and - even more important - what
partition device nodes (like /dev/sda1, /dev/sda2, ...) already exist.

Could you test if it helps to run wipefs -a -f for all of them in reverse ordering,
i.e. first wipefs the partitions starting with the last one
and finally wipefs the disk e.g. like

wipefs -a -f /dev/sda2
wipefs -a -f /dev/sda1
wipefs -a -f /dev/sda

cf. https://github.com/rear/rear/issues/799#issue-141001306

Probably to make the old partition device nodes go away
an additional parted /dev/sda mklabel command is needed
cf. https://github.com/rear/rear/issues/799#issuecomment-197286229

rmetrich commented at 2018-06-05 15:10:

@jsmeix The issue is not solvable using wipefs.
Indeed, at boot, ReaR deliberated doesn't load the multipath driver, causing disks to not be mapped by devicemapper-multipath.
However, the MD discovery mechanism still runs automatically (through the udev rule), causing the device Arrays to be rebuilt automatically, based on the content of the /etc/mdadm.conf file.
For example, you will have this on SLES12sp3:

# cat /etc/mdadm.conf 
DEVICE containers partitions
ARRAY /dev/md0 UUID=2e09978c:ab423fe6:33d5210b:66a86e34

Causing the /dev/md0 array to be built by selecting some disks matching the UUID, as shown below:

# blkid
/dev/sda1: UUID="2e09978c-ab42-3fe6-33d5-210b66a86e34" UUID_SUB="d588af1a-de50-199f-bd5d-caac8f70ea16" LABEL="any:0" TYPE="linux_raid_member" PARTUUID="000d64bb-01"
/dev/sdb1: UUID="2e09978c-ab42-3fe6-33d5-210b66a86e34" UUID_SUB="5cfbaa1a-a277-2979-15c5-e2abb5b49dbc" LABEL="any:0" TYPE="linux_raid_member" PARTUUID="0006447c-01"
/dev/sdc1: UUID="2e09978c-ab42-3fe6-33d5-210b66a86e34" UUID_SUB="d588af1a-de50-199f-bd5d-caac8f70ea16" LABEL="any:0" TYPE="linux_raid_member" PARTUUID="000d64bb-01"
/dev/sdd1: UUID="2e09978c-ab42-3fe6-33d5-210b66a86e34" UUID_SUB="5cfbaa1a-a277-2979-15c5-e2abb5b49dbc" LABEL="any:0" TYPE="linux_raid_member" PARTUUID="0006447c-01"
/dev/md0: UUID="603f8753-f80a-4799-b862-ef2f5968bdb6" TYPE="ext4"
/dev/sr0: UUID="2018-06-05-14-43-15-00" LABEL="RELAXRECOVER" TYPE="iso9660"

In my case, it selected paths sdd1 and sdc1:

# cat /proc/mdstat 
Personalities : [raid1] 
md0 : active raid1 sdd1[1] sdc1[0]
      20970368 blocks super 1.0 [2/2] [UU]
      bitmap: 0/1 pages [0KB], 65536KB chunk

unused devices: <none>

Then, upon running rear recover, Multipath fails to create the multipath mapping (e.g. /dev/mpatha) because the disks are busy due to the MD being built.

jsmeix commented at 2018-06-06 09:11:

@rmetrich
thank you so much for your explanatory description
how things behave special in case of multipath.
It helps so much!

If you find time to play around even more with it
I would like to know if you think there is a generic way
how to clean up a used disk so that multipath issues
would no longer happen.

As fas as I see it seems the UUID is the crucial point here
because I assume on an unused disk there is no disk
partition UUID that matches the UUID in /etc/mdadm.conf
so that then no disk array is automatically built.

If my assumtion is right it might be possible to somehow
clean all disk partition UUIDs from a disk as a generic solution
to make an already used disk look as if it was a new disk.

My basic idea behind such a generic "cleanupdisk" script
is to have a single place of code (i.e. one script) that
does all what is needed to make an already used disk
behave same for "rear recover" as a new disk
instead of fixing each particular issue with used disks
at various different places in the code.

But such a generic "cleanupdisk" script is for a future ReaR release.
At least for now your https://github.com/rear/rear/pull/1819
is perfectly fine.

jsmeix commented at 2018-06-06 09:32:

@rmetrich
can you explain why this issue is not solvable using wipefs in your case?

Im my non-multipath case I can clean all disk partition UUIDs
from the disks by using wipefs in the recovery system:

RESCUE f121:~ # blkid
/dev/sdb1: UUID="e1996af3-bc9d-46fc-a5fc-a3aac09e63d9" TYPE="ext4" PARTUUID="00070810-01"
/dev/sda1: UUID="b754c8dc-cd7d-42fa-8edc-a69c02c54f68" TYPE="swap" PARTUUID="00046bb9-01"
/dev/sda2: UUID="83a3ba1c-7c87-4249-a0cd-a60c3e41ac8a" UUID_SUB="0e0ab27c-9995-4005-a533-1013dd616675" TYPE="btrfs" PARTUUID="00046bb9-02"
/dev/sr0: UUID="2018-05-09-16-17-41-00" LABEL="RELAXRECOVER" TYPE="iso9660"

RESCUE f121:~ # parted -s /dev/sda unit MiB print
Model: ATA QEMU HARDDISK (scsi)
Disk /dev/sda: 20480MiB
Sector size (logical/physical): 512B/512B
Partition Table: msdos
Disk Flags: 
Number  Start    End       Size      Type     File system     Flags
 1      1.00MiB  1490MiB   1489MiB   primary  linux-swap(v1)  type=83
 2      1490MiB  20480MiB  18990MiB  primary  btrfs           boot, type=83

RESCUE f121:~ # parted -s /dev/sdb unit MiB print
Model: ATA QEMU HARDDISK (scsi)
Disk /dev/sdb: 2048MiB
Sector size (logical/physical): 512B/512B
Partition Table: msdos
Disk Flags: 
Number  Start    End      Size     Type     File system  Flags
 1      1.00MiB  2047MiB  2046MiB  primary  ext4         type=83

RESCUE f121:~ # ls -l /dev/sd*
brw-rw---- 1 root disk 8,  0 Jun  6 09:16 /dev/sda
brw-rw---- 1 root disk 8,  1 Jun  6 09:16 /dev/sda1
brw-rw---- 1 root disk 8,  2 Jun  6 09:16 /dev/sda2
brw-rw---- 1 root disk 8, 16 Jun  6 09:16 /dev/sdb
brw-rw---- 1 root disk 8, 17 Jun  6 09:16 /dev/sdb1

RESCUE f121:~ # wipefs -a -f /dev/sdb1
/dev/sdb1: 2 bytes were erased at offset 0x00000438 (ext4): 53 ef

RESCUE f121:~ # wipefs -a -f /dev/sdb 
/dev/sdb: 2 bytes were erased at offset 0x000001fe (dos): 55 aa

RESCUE f121:~ # wipefs -a -f /dev/sda2
/dev/sda2: 8 bytes were erased at offset 0x00010040 (btrfs): 5f 42 48 52 66 53 5f 4d

RESCUE f121:~ # wipefs -a -f /dev/sda1
/dev/sda1: 10 bytes were erased at offset 0x00000ff6 (swap): 53 57 41 50 53 50 41 43 45 32

RESCUE f121:~ # wipefs -a -f /dev/sda 
/dev/sda: 2 bytes were erased at offset 0x000001fe (dos): 55 aa

RESCUE f121:~ # blkid
/dev/sr0: UUID="2018-05-09-16-17-41-00" LABEL="RELAXRECOVER" TYPE="iso9660"

RESCUE f121:~ # parted -s /dev/sda unit MiB print
Error: /dev/sda: unrecognised disk label
Model: ATA QEMU HARDDISK (scsi)
Disk /dev/sda: 20480MiB
Sector size (logical/physical): 512B/512B
Partition Table: unknown
Disk Flags: 

RESCUE f121:~ # parted -s /dev/sdb unit MiB print
Error: /dev/sdb: unrecognised disk label
Model: ATA QEMU HARDDISK (scsi)
Disk /dev/sdb: 2048MiB
Sector size (logical/physical): 512B/512B
Partition Table: unknown
Disk Flags: 

RESCUE f121:~ # ls -l /dev/sd*
brw-rw---- 1 root disk 8,  0 Jun  6 09:21 /dev/sda
brw-rw---- 1 root disk 8,  2 Jun  6 09:21 /dev/sda2
brw-rw---- 1 root disk 8, 16 Jun  6 09:21 /dev/sdb

RESCUE f121:~ # parted -s /dev/sda mklabel msdos

RESCUE f121:~ # parted -s /dev/sdb mklabel msdos

RESCUE f121:~ # parted -s /dev/sda unit MiB print
Model: ATA QEMU HARDDISK (scsi)
Disk /dev/sda: 20480MiB
Sector size (logical/physical): 512B/512B
Partition Table: msdos
Disk Flags: 
Number  Start  End  Size  Type  File system  Flags

RESCUE f121:~ # parted -s /dev/sdb unit MiB print
Model: ATA QEMU HARDDISK (scsi)
Disk /dev/sdb: 2048MiB
Sector size (logical/physical): 512B/512B
Partition Table: msdos
Disk Flags: 
Number  Start  End  Size  Type  File system  Flags

RESCUE f121:~ # ls -l /dev/sd*
brw-rw---- 1 root disk 8,  0 Jun  6 09:26 /dev/sda
brw-rw---- 1 root disk 8, 16 Jun  6 09:27 /dev/sdb

RESCUE f121:~ # blkid
/dev/sr0: UUID="2018-05-09-16-17-41-00" LABEL="RELAXRECOVER" TYPE="iso9660"
/dev/sdb: PTUUID="000aaa99" PTTYPE="dos"
/dev/sda: PTUUID="00079970" PTTYPE="dos"

Now those already used disks look to me as if they were new disks.

But perhaps something subtle is still left somewhere which is the reason
why wipefs (in reverse ordering) plus parted mklabel msdos
are not yet sufficient in case of multipath.

rmetrich commented at 2018-06-06 14:33:

For MD devices, you first need to stop the array:

# mdadm --stop /dev/md0

Then wiping the partitions and disks effectively works: after executing rear recover the multipath driver will be brought up and devices mapped properly.

However, how do want to do this automatic wipe? How will you select the devices to wipe (you cannot just wipe the output of blkid otherwise USB keys connected to the host will be wiped also).
Will you wipe based on /var/lib/rear/layout/disklayout.conf ?

Even with wiping, this PR can still be useful in case the admin doesn't want a real recover but fix some things in the system. I don't know if this is a ReaR use case however.

jsmeix commented at 2018-06-06 14:54:

Currently I do not know how to do an automatic wipe.

Currently I am trying to find out if it is possible at all
to generically wipe a disk at least for the common use cases.

I would try to wipe each active (i.e. non-commented) disk in disklayout.conf
i.e. wipe each disk where diskrestore.sh would do partitioning because
doing partitioning would destroy all existing content on a disk
(current "rear recover" cannot leave some old partitions unchanged
on a disk, cf. https://github.com/rear/rear/issues/1681 )
so that things cannot get worse when such a disk is wiped before.

jsmeix commented at 2018-06-12 07:07:

With https://github.com/rear/rear/pull/1819 merged
this issue should be fixed.


[Export of Github issue for rear/rear.]