#2750 PR merged: multipath: fix exclusion of still wanted devices

Labels: bug, fixed / solved / done

rmetrich opened issue at 2022-02-01 14:21:

Relax-and-Recover (ReaR) Pull Request Template

Please fill in the following items before submitting a new pull request:

Pull Request Details:
  • Type: Bug Fix

  • Impact: High

  • How was this pull request tested? On RHEL8 with Upstream ReaR

  • Brief description of the changes in this pull request:

The current code excluding multipath devices is broken when a device being excluded matches other devices.
This leads to excluding wanted devices.
This happens when having custom alias for multipath devices or there are more than 26 multipath devices and 'mpatha' is getting excluded, which leads to excluding all 'mpathaX' devices are well.

See example below:

  • have /boot on a dedicated multipath device 'mpathba'
  • have / on a dedicated multipath device 'mpatha' and LVM vg 'rhel'
  • have a dedicated multipath device 'mpathb' and LVM vg 'data'
  • tell ReaR to only include 'rhel' VG (and not 'data' VG)
  • tell ReaR to not auto-exclude multipath devices (otherwise everything
    gets commented out)
$ lsblk
NAME            MAJ:MIN RM  SIZE RO TYPE  MOUNTPOINT
sda               8:0    0    5G  0 disk
└─mpathba       253:1    0    5G  0 mpath
  └─mpathba1    253:2    0    1G  0 part  /boot
sdb               8:16   0    5G  0 disk
└─mpathba       253:1    0    5G  0 mpath
  └─mpathba1    253:2    0    1G  0 part  /boot
sdc               8:32   0   20G  0 disk
└─mpatha        253:0    0   20G  0 mpath
  ├─mpatha1     253:3    0    1G  0 part
  └─mpatha2     253:4    0   19G  0 part
    ├─rhel-root 253:5    0   10G  0 lvm   /
    └─rhel-swap 253:6    0    2G  0 lvm   [SWAP]
sdd               8:48   0   20G  0 disk
└─mpatha        253:0    0   20G  0 mpath
  ├─mpatha1     253:3    0    1G  0 part
  └─mpatha2     253:4    0   19G  0 part
    ├─rhel-root 253:5    0   10G  0 lvm   /
    └─rhel-swap 253:6    0    2G  0 lvm   [SWAP]
sde               8:64   0    2G  0 disk
└─mpathb        253:7    0    2G  0 mpath
sdf               8:80   0    2G  0 disk
└─mpathb        253:7    0    2G  0 mpath
sr0              11:0    1 1024M  0 rom

$ vgs
  VG   #PV #LV #SN Attr   VSize   VFree
  data   1   0   0 wz--n-  <2.00g <2.00g
  rhel   1   2   0 wz--n- <19.00g <7.00g

$ cat /etc/rear/local.conf
ONLY_INCLUDE_VG=("rhel")
AUTOEXCLUDE_MULTIPATH=n

With original code:

$ rear mkrescue
$ grep -v ^# /var/lib/rear/layout/disklayout.conf | grep "mpathba"
--> device excluded even though it hosts /boot

With the fix:

$ rear mkrescue
$ grep -v ^# /var/lib/rear/layout/disklayout.conf | grep "mpathba"
fs /dev/mapper/mpathba1 /boot xfs ...
multipath /dev/mapper/mpathba 5368709120 msdos /dev/sda,/dev/sdb
part /dev/mapper/mpathba 1073741824 1048576 primary none /dev/mapper/mpathba1

The root cause behind this is ReaR excludes 'mpathb' (hosting the 'data'
VG) through using a non-word matching grep command, which causes
'mpathba' (hosting /boot) to be excluded as well.

pcahyna commented at 2022-02-01 15:30:

looks good, grep -w is already being used at other places, so it should be safe. I am actually wondering how many other places do not use -w when they should, given that grep is fairly widespread in layout code.

jsmeix commented at 2022-02-02 07:22:

@pcahyna
personally I do not use grep -w when I grep in disklayout.conf
because the word separator characters in grep -w are different
than the word separator characters in bash (i.e. $IFS)
according to how I understand "man grep"

-w, --word-regexp
  Select only those lines containing matches that form whole words.
  The test is that the matching substring must
  either be at the beginning of the line,
  or preceded by a non-word constituent character.
  Similarly, it must be either at the end of the line
  or followed  by a non-word constituent character.
  Word-constituent characters are letters, digits, and the underscore.

therein in particular the last line so word separator characters in grep -w
are all characters except letters, digits, and the underscore.

To grep for a keyword of an active line in disklayout.conf I use

grep '^keyword '

with a trailing space character.
To grep for a value in the middle of a line in disklayout.conf I use

grep ' value '

with a leading and a trailing space character.
Preferably I get one whole line into a bash array
and pick the intended array element like

# disk_device="/dev/sda"
# disk_entry=( $( grep "^disk $disk_device " $LAYOUT_FILE ) )
# disk_size=${disk_entry[2]}

because this works same as

# disk_device="/dev/sda"
# read keyword disk_device disk_size disk_label < <( grep "^disk $disk_device " $LAYOUT_FILE )

which is the usual method when we iterate over several entries in disklayout.conf

Accordingly there are some more issues in
layout/save/default/335_remove_excluded_multipath_vgs.sh

done < <(grep "^#lvmdev" $LAYOUT_FILE)
...
done < <(grep "^multipath" $LAYOUT_FILE)

should be

done < <(grep "^#lvmdev " $LAYOUT_FILE)
...
done < <(grep "^multipath " $LAYOUT_FILE)

i.e. each with a trailing space.

In

    # ... - an entry looks like:
    # #lvmdev /dev/h50l050vg00 /dev/mapper/360060e8007e2e3000030e2e30000449f2 Nn3ew5-Wkve-FpSY-mgng-3T0l-rSz1-EEvPrE 502288384
    # We need to 'cut -c1-45' (third arg) to grab the full multipath device and not only a partition
    # Remember, multipath devices from a volume group that is "excluded" should be 'commented out'
    device=$(echo $mpdev | cut -c1-45)

there is no check that $mpdev has the form /dev/mapper/...
so any <device> of any commented #lvmdev line in disklayout.conf
will be used regardless if that <device> has the expected form /dev/mapper/...
cf. "Physical Volumes" in "Disk layout file syntax" in
https://github.com/rear/rear/blob/master/doc/user-guide/06-layout-configuration.adoc

FYI:
layout/save/default/335_remove_excluded_multipath_vgs.sh
was added because of
https://github.com/rear/rear/issues/1211

jsmeix commented at 2022-02-02 07:29:

@pcahyna
I dared to assign it to you.
Feel free to merge it when it is OK for you.

jsmeix commented at 2022-02-02 07:32:

@rmetrich
thank you for your continuous contributions to ReaR!

pcahyna commented at 2022-02-02 08:21:

@jsmeix

... are all characters except letters, digits, and the underscore.

thanks for noticing that, indeed that does not sound good. It is easy to imagine that someone would use a dash in a device name, for example.

To grep for a keyword of an active line in disklayout.conf I use

grep '^keyword '

with a trailing space character.
Accordingly there are some more issues in layout/save/default/335_remove_excluded_multipath_vgs.sh

done < <(grep "^#lvmdev" $LAYOUT_FILE)
...
done < <(grep "^multipath" $LAYOUT_FILE)

should be

done < <(grep "^#lvmdev " $LAYOUT_FILE)
...
done < <(grep "^multipath " $LAYOUT_FILE)

i.e. each with a trailing space.

I think that keywords are fairly safe, because they come from a well-known set of possibilities, unlike values, which can be virtually anything. Unless someone introduces two keywords where one is a substring of the other, like "lvm" and "lvmdev" or "multipath" and "multipathcomponent", hmmm.

rmetrich commented at 2022-02-02 08:38:

@jsmeix

... are all characters except letters, digits, and the underscore.

thanks for noticing that, indeed that does not sound good. It is easy to imagine that someone would use a dash in a device name, for example.

Right, not good enough then. Let me confirm this doesn't work with custom aliases even with the fix.

jsmeix commented at 2022-02-02 08:47:

I think usually grep -w "$value" is good enough because:

echo "beginfoo foo-bar foo_baz foo1.2 fooend" | grep -w 'foo'
beginfoo foo-bar foo_baz foo1.2 fooend
         ^^^

finds only the "foo" in "foo-bar"
but usually one does not search for a substring of a value but for a whole value so

# for v in beginfoo foo-bar foo_baz foo1.2 fooend ; do echo "beginfoo foo-bar foo_baz foo1.2 fooend" | grep -w "$v" ; done
beginfoo foo-bar foo_baz foo1.2 fooend
^^^^^^^^
beginfoo foo-bar foo_baz foo1.2 fooend
         ^^^^^^^
beginfoo foo-bar foo_baz foo1.2 fooend
                 ^^^^^^^
beginfoo foo-bar foo_baz foo1.2 fooend
                         ^^^^^^
beginfoo foo-bar foo_baz foo1.2 fooend
                                ^^^^^^

finds each specific value at its exact place.

jsmeix commented at 2022-02-02 08:55:

Regarding grep for keyword without trailing space:

Currently we have disk and opaldisk officially documented in
https://github.com/rear/rear/blob/master/doc/user-guide/06-layout-configuration.adoc

But since my
https://github.com/rear/rear/commit/7a944cf059135a044940408bbdc9c0c77c299322
we also have raid and the new raiddisk which is not yet officially documented
because I am not yet finished with the RAID autoresize code so I like to be able
to rename the new raiddisk if I need.
So grep ^raid $LAYOUT_FILE would find raid and the new raiddisk.
I checked the RAID code that it is safe.
I cannot rename raid to raidarray because the user could use his
specific $CONFIG_DIR/disklayout.conf see
layout/prepare/default/010_prepare_files.sh
so we must keep things backward compatible in disklayout.conf
On the other hand we did change things in disklayout.conf
so we should document that $CONFIG_DIR/disklayout.conf
must be adapted by the user when he updates/upgraded ReaR
and then I could rename raid to raidarray.

rmetrich commented at 2022-02-02 09:35:

grep -w isn't safe indeed, tested with /boot on /dev/mapper/custom-boot device and data VG on /dev/mapper/custom device: /boot was getting excluded as well.

jsmeix commented at 2022-02-02 09:44:

Ah! Yes - it fails when the whole value1 in grep -w value1
is a substring of another value2
where value2 contains value1 as a word for grep -w

jsmeix commented at 2022-02-03 14:13:

To make it officially documented what the de-facto syntax in disklayout.conf is
I added in doc/user-guide/06-layout-configuration.adoc
in the "Disk layout file syntax" section that in disklayout.conf
keyword and the values are separated by single space characters via
https://github.com/rear/rear/commit/b1ccb948bdfd83dc904937a74ae4d3d13ecb8963

This is needed because we depend on it at many places in the code
that keyword and values are separated by space characters.

pcahyna commented at 2022-02-09 22:37:

I prefer spaces over \s, for three reasons:

  • \s is hard to read, as @jsmeix has pointed out
  • spaces are consistent with the current style (and if we weant to change the style and eventually change the current occurrences, the code will become even harder to read)
  • \s is unneeded: the format of the file is under our control, we make sure that there spaces and not tabs.

rmetrich commented at 2022-02-10 08:15:

OK fixed, thanks for confirming.
I'd go with stashing the commits, no need for complete history here :-)

pcahyna commented at 2022-02-13 22:59:

ITYM squashing, not stashing, and I agree.

pcahyna commented at 2022-02-14 10:49:

I tested the original change (fails) and the latest change (succeeds) with / being on mpatha-a and excluded LV being on mpatha.
Reproducer:

# lsblk
NAME          MAJ:MIN RM SIZE RO TYPE  MOUNTPOINT
sda             8:0    0  40G  0 disk  
└─mpatha-a    253:0    0  40G  0 mpath 
  ├─mpatha-a1 253:2    0   4M  0 part  
  ├─mpatha-a2 253:3    0   1G  0 part  /boot
  ├─mpatha-a3 253:4    0   4G  0 part  [SWAP]
  ├─mpatha-a4 253:5    0   1K  0 part  
  └─mpatha-a5 253:6    0  35G  0 part  /
sdb             8:16   0   5G  0 disk  
└─mpatha-b    253:1    0   5G  0 mpath 
sdc             8:32   0   5G  0 disk  
└─mpatha-c    253:7    0   5G  0 mpath 
sdd             8:48   0   5G  0 disk  
└─mpatha-d    253:8    0   5G  0 mpath 
sde             8:64   0   5G  0 disk  
└─mpatha-e    253:9    0   5G  0 mpath 
# multipath -f mpatha-c
# sed -i.orig 's/mpatha-c /mpatha /' /etc/multipath/bindings
# systemctl restart multipathd
# lsblk
NAME          MAJ:MIN RM SIZE RO TYPE  MOUNTPOINT
sda             8:0    0  40G  0 disk  
└─mpatha-a    253:0    0  40G  0 mpath 
  ├─mpatha-a1 253:2    0   4M  0 part  
  ├─mpatha-a2 253:3    0   1G  0 part  /boot
  ├─mpatha-a3 253:4    0   4G  0 part  [SWAP]
  ├─mpatha-a4 253:5    0   1K  0 part  
  └─mpatha-a5 253:6    0  35G  0 part  /
sdb             8:16   0   5G  0 disk  
└─mpatha-b    253:1    0   5G  0 mpath 
sdc             8:32   0   5G  0 disk  
└─mpatha      253:7    0   5G  0 mpath 
sdd             8:48   0   5G  0 disk  
└─mpatha-d    253:8    0   5G  0 mpath 
sde             8:64   0   5G  0 disk  
└─mpatha-e    253:9    0   5G  0 mpath 
# yum -y install lvm2
# echo AUTOEXCLUDE_MULTIPATH=n >> /etc/rear/local.conf
# pvcreate /dev/mapper/mpatha
# vgcreate unused /dev/mapper/mpatha
# echo 'EXCLUDE_VG=("unused")' >> /etc/rear/local.conf
# rear savelayout
# grep -v '^#' /var/lib/rear/layout/disklayout.conf

The system was installed with mpatha- as the original multipath prefix.
This can be achieved using alias_prefix mpatha- in /etc/multipath.conf when installing
(use

sed -i.orig -e '/user_friendly_names/a\\
alias_prefix mpatha-' /etc/multipath.conf
systemctl restart multipathd

in kickstart %pre)

pcahyna commented at 2022-02-14 11:11:

Note that the code still looks fragile. Is it guaranteed that /dev/mapper/... can appear only in a disklayout field where we expect it, and not in some other field? And will cut -c1-45 always work? But let's worry about it later.

pcahyna commented at 2022-02-14 12:07:

@rmetrich is the PR ready to be merged? Looks so now.

rmetrich commented at 2022-02-14 12:27:

It is, thanks!

jsmeix commented at 2022-02-14 12:35:

@rmetrich @pcahyna
thank you for the fix and your verification tests!


[Export of Github issue for rear/rear.]