#3329 PR merged
: Fix partition naming on RHEL when migrating devices¶
Labels: bug
, fixed / solved / done
rmetrich opened issue at 2024-10-10 12:26:¶
Relax-and-Recover (ReaR) Pull Request Template¶
Please fill in the following items before submitting a new pull request:
Pull Request Details:¶
-
Type: Bug Fix
-
Impact: Normal
-
Reference to related issue (URL):
-
How was this pull request tested? Code tested in the following cases.
With QEMU SCSI disk ID "0000a":
rear> lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS
sda 8:0 0 20G 0 disk
|-sda1 8:1 0 1G 0 part
|-sda2 8:2 0 19G 0 part
`-0QEMU_QEMU_HARDDISK_0000a 253:0 0 20G 0 mpath
|-0QEMU_QEMU_HARDDISK_0000a1 253:1 0 1G 0 part
`-0QEMU_QEMU_HARDDISK_0000a2 253:2 0 19G 0 part
sdb 8:16 0 20G 0 disk
|-sdb1 8:17 0 1G 0 part
|-sdb2 8:18 0 19G 0 part
`-0QEMU_QEMU_HARDDISK_0000a 253:0 0 20G 0 mpath
|-0QEMU_QEMU_HARDDISK_0000a1 253:1 0 1G 0 part
`-0QEMU_QEMU_HARDDISK_0000a2 253:2 0 19G 0 part
sr0 11:0 1 614.7M 0 rom
With QEMU SCSI disk ID "0000":
rear> lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS
sda 8:0 0 20G 0 disk
|-sda1 8:1 0 1G 0 part
|-sda2 8:2 0 19G 0 part
`-0QEMU_QEMU_HARDDISK_0000 253:0 0 20G 0 mpath
|-0QEMU_QEMU_HARDDISK_0000p1 253:1 0 1G 0 part
`-0QEMU_QEMU_HARDDISK_0000p2 253:2 0 19G 0 part
sdb 8:16 0 20G 0 disk
|-sdb1 8:17 0 1G 0 part
|-sdb2 8:18 0 19G 0 part
`-0QEMU_QEMU_HARDDISK_0000 253:0 0 20G 0 mpath
|-0QEMU_QEMU_HARDDISK_0000p1 253:1 0 1G 0 part
`-0QEMU_QEMU_HARDDISK_0000p2 253:2 0 19G 0 part
sr0 11:0 1 614.7M 0 rom
With default friendly name ("mpatha"):
rear> lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS
sda 8:0 0 20G 0 disk
|-sda1 8:1 0 1G 0 part
|-sda2 8:2 0 19G 0 part
`-mpatha 253:0 0 20G 0 mpath
|-mpatha1 253:1 0 1G 0 part
`-mpatha2 253:2 0 19G 0 part
sdb 8:16 0 20G 0 disk
|-sdb1 8:17 0 1G 0 part
|-sdb2 8:18 0 19G 0 part
`-mpatha 253:0 0 20G 0 mpath
|-mpatha1 253:1 0 1G 0 part
`-mpatha2 253:2 0 19G 0 part
sr0 11:0 1 614.7M 0 rom
With friendly name ending with a digit ("disk0"):
rear> lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS
sda 8:0 0 20G 0 disk
|-sda1 8:1 0 1G 0 part
|-sda2 8:2 0 19G 0 part
`-disk0 253:0 0 20G 0 mpath
|-disk0p1 253:1 0 1G 0 part
`-disk0p2 253:2 0 19G 0 part
sdb 8:16 0 20G 0 disk
|-sdb1 8:17 0 1G 0 part
|-sdb2 8:18 0 19G 0 part
`-disk0 253:0 0 20G 0 mpath
|-disk0p1 253:1 0 1G 0 part
`-disk0p2 253:2 0 19G 0 part
sr0 11:0 1 614.7M 0 rom
- Description of the changes in this pull request:
The previous code was appending a "p" to the device name
unconditionally, which ended up having partition names such as 'wwidp0'
instead of 'wwid1', when the device name (e.g. 'wwid') ended with a
letter and not a digit.
The new code applies the proper naming, which is 'wwid1' when device
doesn't end with a digit (e.g. 'wwid'), and 'wwid0000p1' when the device
(e.g. 'wwid0000') ends with a digit.
See also When WWID does not end in digit, no p
is added to the
partition name of a multipath device in
RHEL7
jsmeix commented at 2024-10-11 07:29:¶
@pcahyna
please have a look here and merge it as you like
jsmeix commented at 2024-10-11 07:36:¶
@rmetrich
it seems you could simulate and test multipath things via QEMU.
If yes, is there an explanatory description how to do that?
I.e. how to setup a VM to simulate a multipath system
on a usual workstation that has no multipath hardware.
I do not setup KVM/QEMU VMs via command line
but I use only the graphical UI of virt-manager.
Is it also possible to setup a VM with virt-manager
that simulates a multipath system?
rmetrich commented at 2024-10-11 07:47:¶
Yes you can, it's easy, at least on RHEL :-)
When you create the VM, specify to use a RAW disk instead of QCOW2 and
make sure SCSI controller is used instead of VirtIO. Also make sure to
not use Cache mode, set a Serial number for the disk and select
Shareable.
Finally add an additional disk, reuse the previously created disk and
set the cache mode to None and set Serial number to same value, and you
get your Multipath.
Example of XML:
<disk type="file" device="disk">
<driver name="qemu" type="raw" cache="none" discard="unmap"/>
<source file="/var/lib/libvirt/images/multipath9-recovery.img"/>
<target dev="sda" bus="scsi"/>
<shareable/>
<serial>0000</serial>
<address type="drive" controller="0" bus="0" target="0" unit="0"/>
</disk>
<disk type="file" device="disk">
<driver name="qemu" type="raw" cache="none" discard="unmap"/>
<source file="/var/lib/libvirt/images/multipath9-recovery.img"/>
<target dev="sdb" bus="scsi"/>
<shareable/>
<serial>0000</serial>
<address type="drive" controller="0" bus="0" target="0" unit="1"/>
</disk>
jsmeix commented at 2024-10-11 13:58:¶
@rmetrich
thank you so much for your description!
I will try it out as soon as possible
i.e. when time permits - which may take some time
because currently I am messed up with overhyped
security excitement about cups-browsed when some
fools run it accessible from untrusted networks
like the public Internet, cf.
https://en.opensuse.org/SDB:CUPS_and_SANE_Firewall_settings#Automated_print_queue_setup_via_cups-browsed
pcahyna commented at 2024-10-16 16:42:¶
I think it is ok as a quick fix, but in longer term I would very much prefer to get rid of fragile text manipulations like this (note how many code branches there are and how many conditions they depend on). I believe the format of the partition device name can be determined from the layout, e.g. the layout contains
part /dev/mapper/362cea7f0a24d79002c16131a09f376a1 2199022206976 1048576 primary none /dev/mapper/362cea7f0a24d79002c16131a09f376a1p1
so we know that the partition prefix is
/dev/mapper/362cea7f0a24d79002c16131a09f376a1p
, if we know what
suffix to substract, and this can be done quite easily by adding the
partition number (1
) to the layout as an additional field.
jsmeix commented at 2024-10-17 08:26:¶
Regarding a future generic cleanup to get rid of fragile code:
@rmetrich @pcahyna
could you please attach your disklayout.conf files
(or excerpts of the relevant data in disklayout.conf)
for each of the different cases that you mentioned here
because I fail to imagine how your disklayout.conf
exactly looks like for all those different cases.
I like to understand to what extent we already know
in disklayout.conf for each child (partition) device
what its parent (disk/mpath) device is.
Because if we knew them like
parent (disk/mpath) device child (partition) device
0QEMU_QEMU_HARDDISK_0000a 0QEMU_QEMU_HARDDISK_0000a1
0QEMU_QEMU_HARDDISK_0000a 0QEMU_QEMU_HARDDISK_0000a2
0QEMU_QEMU_HARDDISK_0000 0QEMU_QEMU_HARDDISK_0000p1
0QEMU_QEMU_HARDDISK_0000 0QEMU_QEMU_HARDDISK_0000p2
mpatha mpatha1
mpatha mpatha2
disk0 disk0p1
disk0 disk0p2
362cea7f0a24d79002c16131a09f376a1 362cea7f0a24d79002c16131a09f376a1p1
we could generically determine what the used
"partition separator string" is (which could be empty)
by subtracting from the child (partition) device
all trailing numbers (i.e. the partition number) and
also subtracting the leading parent (disk/mpath) device.
When we know what "partition separator string" is used
(the "partition separator string" could be empty)
we can append it in any case to parent (disk/mpath) devices.
rmetrich commented at 2024-10-17 09:03:¶
rmetrich commented at 2024-10-17 09:04:¶
All is based on this disklayout, then I boot a different the recovery system in migration mode.
jsmeix commented at 2024-10-17 10:33:¶
Ah!
So it seems the problem appears when the ReaR recovery system
with its disklayout.conf of the original system
is booted on different replacement hardware and then
it can happen that on different replacement hardware
a different "partition separator string" is used
than what was used on the original system
which is in this case here
multipath /dev/mapper/0QEMU_QEMU_HARDDISK_0000c 21474836480 msdos /dev/sda,/dev/sdb
part /dev/mapper/0QEMU_QEMU_HARDDISK_0000c 1073741824 1048576 primary boot /dev/mapper/0QEMU_QEMU_HARDDISK_0000c1
part /dev/mapper/0QEMU_QEMU_HARDDISK_0000c 20400046080 1074790400 primary lvm /dev/mapper/0QEMU_QEMU_HARDDISK_0000c2
so the "partition separator string" on the original system
was empty in this case but on the replacement hardware
a different "partition separator string" gets used
like 'p' or '-part' or '_part' or whatever.
Do I understand it correctly?
If I understand it correctly I wonder
what the reason is why on different replacement hardware
a different "partition separator string" may get used?
I.e. what conditions or settings or automatisms
determine what "partition separator string" gets used?
I understand that a "partition separator string"
needs to be inserted when the parent (disk/mpath) device
ends with a number to separate the partition number but
I do not yet understand how on different replacement hardware
the parent (disk/mpath) device naming scheme can be different
(i.e. with or without a number at its end)
compared to the original system?
jsmeix commented at 2024-10-21 07:17:¶
@pcahyna
is there a reason why you don't merge it, cf. my
https://github.com/rear/rear/pull/3329#issuecomment-2406723276
or should I "just merge" it soon?
@rear/contributors
unless there are objections
I would like to merge it tomorrow afternoon.
pcahyna commented at 2024-10-21 10:29:¶
@jsmeix
is there a reason why you don't merge it
It's just that I was on a short holiday - but after that I found some things that are unclear with the code.
rmetrich commented at 2024-10-21 11:20:¶
It's done on line 1050, line 1049 is a condition, it will match either OS < RHEL7 (unsupported now) and cases where device name ends with a digit.
pcahyna commented at 2024-10-21 11:54:¶
@rmetrich and what happens if neither condition matches? I don't see any
else
there.
rmetrich commented at 2024-10-21 12:11:¶
Then it uses the default initialization, which is on line 1002: no device name change
jsmeix commented at 2024-10-21 12:27:¶
A side note for the fun of it:
What a pleasure for me to see how unclear code can be
for others who did not make a specific piece of code
(even when they are experienced with a project's code)
without comprehensive and explanatory comments ;-)
So the basic reasoning in
https://github.com/rear/rear/wiki/Coding-Style#code-must-be-easy-to-understand-answer-the-why
Even if all is totally obvious for you, others ...
may understand nothing at all about your code ...
is (at least to some extent) true.
pcahyna commented at 2024-10-21 12:44:¶
Then it uses the default initialization, which is on line 1002: no device name change
Ah ok, thanks. Line 1085 (former 1071) is quite confusing then.
pcahyna commented at 2024-10-21 12:47:¶
@jsmeix
A side note for the fun of it:
What a pleasure for me to see how unclear code can be for others who did not make a specific piece of code (even when they are experienced with a project's code) without comprehensive and explanatory comments ;-)
It is not a question of comments in this case, the problem is that the
structure of the code - the other branches and especially line
https://github.com/rear/rear/blob/ba835cd47f6abb97ff08814e24f6eb8fa016c1eb/usr/share/rear/lib/layout-functions.sh#L1071
indicates that all cases need to be covered (there is always an else
or a default case) and the new code breaks this pattern.
rmetrich commented at 2024-10-21 12:48:¶
Then it uses the default initialization, which is on line 1002: no device name change
Ah ok, thanks. Line 1085 (former 1071) is quite confusing then.
It's debian code, I didn't change this code because i have no debian.
pcahyna commented at 2024-10-21 12:50:¶
Then it uses the default initialization, which is on line 1002: no device name change
Ah ok, thanks. Line 1085 (former 1071) is quite confusing then.
It's debian code, I didn't change this code because i have no debian.
Sorry, I meant line 1071 (former 1085) - this is not for Debian, it is the default case.
jsmeix commented at 2024-10-21 13:14:¶
Yes, I see,
it is not a question of comments in this case
but in general a question of the basic ideas in
https://github.com/rear/rear/wiki/Coding-Style
I think in this case it is (at least also) a question
of basically meaningless variable names like
part_name="$device_name"
where that line of code on its own tells
(at least to me) nothing what it is meant to do.
So one must reverse engineer major code parts
to build up some idea what it may actually do.
But this is not something @rmetrich introduced
so he should not be made responsible here
to clean up and overhaul old code.
[Export of Github issue for rear/rear.]