#2597 PR closed: New ONLY_INCLUDE_DISKS variable

Labels: enhancement, discuss / RFC, no-pr-activity

rmetrich opened issue at 2021-04-07 08:15:

Relax-and-Recover (ReaR) Pull Request Template

Please fill in the following items before submitting a new pull request:

Pull Request Details:
  • Type: Enhancement

  • Impact: Normal

  • Reference to related issue (URL): None

  • How was this pull request tested?

Tested on multipath hardware from a customer. Saved 40 minutes creating the Rescue ISO.

  • Brief description of the changes in this pull request:

This array variable enables to only process specified disks when building the disklayout.conf file.
This is particularly useful when having a lot of multipath devices which are slow to scan but anyway excluded in the end.

This is for experts only, since it may end up creating an inconsistent disklayout.conf file.
In particular the EXCLUDE_MOUNTPOINTS variable is not populated automatically because doing so would likely break things.

Usage example:

ONLY_INCLUDE_DISKS=( sda sdb )

or

ONLY_INCLUDE_DISKS=( /sys/block/sda )

pcahyna commented at 2021-04-07 08:34:

While saving 40 minutes in mkrescue sounds attractive, I am not sure whether adding yet another knob, especially one that "is for experts only, since it may end up creating an inconsistent disklayout.conf file." is the way to go.

Would it be possible to do the optimization automatically by scanning only the disks actually referenced by the objects (filesystems, VGs) that one wants to include in the backup and avoid even looking at those that are not referenced?

gdha commented at 2021-04-07 11:35:

@rmetrich could you via ONLY_INCLUDE_VG variable not exclude all other block devices? I'm afraid that the new ONLY_INCLUDE_DISKS variable will lead to more confusion by the end-users? And, when both are used and remove a disk that should not? We must be careful with exclude and include rules - less is better IMHO

rmetrich commented at 2021-04-07 11:39:

@gdha yes it's not perfect for sure. The issue with ONLY_INCLUDE_VG is that this doesn't prevent scanning the devices, because all the code is flowing from bottom up, hence first scan devices, then exclude stuff.
Whereas we would need here to do the opposite: include vg, know how a vg is created, then go down to selecting devices.

jsmeix commented at 2021-04-07 13:02:

Caution!
Long and elaborated text about generic things:

As far as I see the whole include/exclude area
has become a grown "hairy" mess over the time, cf.
https://github.com/rear/rear/issues/2229#issuecomment-531264805

Usually (i.e. in commonly used and tested cases) things work well.
But in special and/or exceptional cases things fall apart and what is worst:
Currently (as far as I see) there is no simple clear and consistently working way
how the user could specify what disk layout components he wants to get recreated
and what mountpoints or directories he wants to get included in the backup.

Including/excluding components like disks/partitions/filesystems
is different from what gets included/excluded in the backup.
There are some relationships but in general both are different things.

See
https://github.com/rear/rear/issues/2586#issuecomment-805601063
and the therein linked comments in
https://github.com/rear/rear/issues/2229
and also
https://github.com/rear/rear/issues/2236#issuecomment-531204474
which is also a special case for multipath.

For ReaR's internal BACKUP method I added some time ago
BACKUP_ONLY_INCLUDE and BACKUP_ONLY_EXCLUDE
cf. https://github.com/rear/rear/pull/1091
so that the user could specify what he wants to get in his backup
by keeping ReaR's automatisms out of his way.

Currently there is no counterpart for the disk layout like
where the user could specify what he wants to get in his disklayout.conf
by keeping ReaR's automatisms out of his way.

The problem with including/excluding disk layout components
are their interdependencies.

I think the following comment in disklayout.conf
https://github.com/rear/rear/blob/master/usr/share/rear/conf/default.conf#L2749

# How to exclude something ----- EXCLUDES -------
#
# You cannot exclude a device (e.g. /dev/sdg) directly.
# Instead you have to exclude everything ON that device
# and then the dependency tracker will automatically exclude
# the device from the recovery (because there won't be any
# recovery information for that "unnecessary" device).
#
# Furthermore, you have to exclude MD devices and LVM2
# volume groups separately as there is no automatic detection
# of these dependencies.

points to the root cause why the disk layout components
include/exclude functionality in ReaR behaves "hairy":

It is ReaR's automatisms that
on the one hand make things work well in commonly used and tested cases
BUT
on the other hand it let things fall apart in special and/or exceptional cases
(because not automatism can behave perfectly for all use cases).

I think I need to sleep over it to let a reasonable opinion appear
whether or not I like a new ONLY_INCLUDE_DISKS config variable.

In general I want "final power to the user" so in general
I like when the user can switch off ReaR's automatisms
and command ReaR what it must do (where the user takes
full responsibility for what he commands - of course).

jsmeix commented at 2021-04-07 13:09:

@rmetrich
could you provide what

# find /sys/block -ls

# lsblk -ipo NAME,KNAME,PKNAME,TRAN,TYPE,FSTYPE,SIZE,MOUNTPOINT

results on such a machine with a lot of multipath devices?
I really have no idea what "a lot of multipath devices" actually means.
Are there tens or hundreds or thousands?

rmetrich commented at 2021-04-07 13:18:

First *dm-XX nodes:

lrwxrwxrwx. 1 root root 0  6. Apr 09:24 dm-0 -> ../devices/virtual/block/dm-0
lrwxrwxrwx. 1 root root 0  6. Apr 09:24 dm-1 -> ../devices/virtual/block/dm-1
lrwxrwxrwx. 1 root root 0  6. Apr 16:53 dm-10 -> ../devices/virtual/block/dm-10
lrwxrwxrwx. 1 root root 0  6. Apr 16:54 dm-100 -> ../devices/virtual/block/dm-100
lrwxrwxrwx. 1 root root 0  6. Apr 17:00 dm-101 -> ../devices/virtual/block/dm-101
lrwxrwxrwx. 1 root root 0  6. Apr 16:54 dm-102 -> ../devices/virtual/block/dm-102
lrwxrwxrwx. 1 root root 0  6. Apr 17:00 dm-103 -> ../devices/virtual/block/dm-103
lrwxrwxrwx. 1 root root 0  6. Apr 16:54 dm-104 -> ../devices/virtual/block/dm-104
...

Then physical devices:

lrwxrwxrwx. 1 root root 0  6. Apr 09:21 sdzq -> ../devices/pci0000:17/0000:17:00.0/0000:18:00.1/host18/rport-18:0-0/target18:0:0/18:0:0:28/block/sdzq
lrwxrwxrwx. 1 root root 0  6. Apr 09:21 sdzr -> ../devices/pci0000:17/0000:17:00.0/0000:18:00.1/host18/rport-18:0-0/target18:0:0/18:0:0:29/block/sdzr
lrwxrwxrwx. 1 root root 0  6. Apr 09:21 sdzs -> ../devices/pci0000:17/0000:17:00.0/0000:18:00.1/host18/rport-18:0-0/target18:0:0/18:0:0:30/block/sdzs
...

--> 2653 "/devices/pci..." devices there.
All scanned.

jsmeix commented at 2021-04-07 13:37:

@rmetrich
does perhaps the
lsblk -ipo NAME,KNAME,PKNAME,TRAN,TYPE,FSTYPE,SIZE,MOUNTPOINT
output indicate something how the actually interesting block devices
could be distinguished from the non-interesting block devices?

What I have in mind is:

When looking at that code part in layout/save/GNU/Linux/200_partition_layout.sh

    for disk in /sys/block/* ; do
        blockd=${disk#/sys/block/}
        if [[ $blockd = hd* || $blockd = sd* || $blockd = cciss* || $blockd = vd* || $blockd = xvd* || $blockd = dasd* || $blockd = nvme* || $blockd = mmcblk* ]] ; then
...
            if is_multipath_path ${blockd} ; then
                Log "Ignoring $blockd: it is a path of a multipath device"
            elif [[ ! ($blockd = *rpmb || $blockd = *[0-9]boot[0-9]) ]]; then # Silently skip Replay Protected Memory Blocks and others  
                devname=$(get_device_name $disk)
                devsize=$(get_disk_size ${disk#/sys/block/})
                disktype=$(parted -s $devname print | grep -E "Partition Table|Disk label" | cut -d ":" -f "2" | tr -d " ")

                echo "# Disk $devname"
                echo "# Format: disk <devname> <size(bytes)> <partition label type>"
                echo "disk $devname $devsize $disktype"

                echo "# Partitions on $devname"
                echo "# Format: part <device> <partition size(bytes)> <partition start(bytes)> <partition type|name> <flags> /dev/<partition>"
                extract_partitions "$devname"
            fi
        fi
    done

the interesting block devices are those where
commands like parted -s $devname print make sense
i.e. the interesting block devices are what is commonly known
as "real disks" with partitions on it.

I wonder if there isn't a more direct way to get the intended "real disks"
than to first list all block devices and then sort out the unintended ones.

I know basically nothing about multipath so the following could be plain wrong:
I think the problem with multipath is that one "real disk behind multipath" appears
as several duplicates under normal /dev/sd[a-z]* disk device nodes.
Because normal /dev/sd[a-z]* disk device nodes must be normally considered
it means for multipath that all those multipath duplicates must be also considered
and sorted out via is_multipath_path ${blockd} which costs time :-(

jsmeix commented at 2021-04-07 13:57:

@pcahyna
perhaps in the function is_multipath_path in lib/layout-functions.sh
the most time consuming part is not the device specific call

multipath -c /dev/$1

but the listing of the whole multipath topology via

multipath -l | grep -q '[[:alnum:]]' || return 1

see my comment in the code why I added that call
https://github.com/rear/rear/blob/master/usr/share/rear/lib/layout-functions.sh#L718

jsmeix commented at 2021-04-07 14:02:

@rmetrich
could you try out how long it runs when you only comment out the line

multipath -l | grep -q '[[:alnum:]]' || return 1

in the function is_multipath_path in lib/layout-functions.sh ?

rmetrich commented at 2021-04-07 14:14:

I cannot, it's a customer that has the issue, but I can print here the log taken in Debug:

2021-04-02 11:40:15.927764930 No partitions found on /dev/sdaht.
2021-04-02 11:40:17.589797772 No partitions found on /dev/sdaig.
2021-04-02 11:40:17.615676066 No partitions found on /dev/sdaih.
...
2021-04-02 11:47:34.067632684 No partitions found on /dev/sdzx.
2021-04-02 11:47:34.106361186 No partitions found on /dev/sdzy.
2021-04-02 11:47:34.139715422 No partitions found on /dev/sdzz.

then

2021-04-02 11:49:19.249785504 Disk /dev/sdaa is not used by any mounted filesystem. Excluding.
2021-04-02 11:49:19.251896277 Marking /dev/sdaa as done.
2021-04-02 11:49:19.263201014 Marking opaldisk:/dev/sdaa as done.
2021-04-02 11:49:19.342139884 Marking /dev/sdaa1 as done (dependent on /dev/sdaa)
2021-04-02 11:49:19.344697017 Marking /dev/sdaa1 as done.
...

2021-04-02 11:53:10.810005723 Marking opaldisk:/dev/sdzy as done.
2021-04-02 11:53:10.844155949 Disk /dev/sdzz is not used by any mounted filesystem. Excluding.
2021-04-02 11:53:10.846765299 Marking /dev/sdzz as done.
2021-04-02 11:53:10.864604452 Marking opaldisk:/dev/sdzz as done.

So that's 10 minutes on that system (but the customer has larger ones).
Then there is Grub2 probe running for 20 minutes:

2021-04-02 12:02:37.511409752 Including output/default/940_grub2_rescue.sh
2021-04-02 12:02:37.512429156 Entering debugscripts mode via 'set -x'.
2021-04-02 12:02:37.532636160 Setting up GRUB_RESCUE: Adding Relax-and-Recover rescue system to the local GRUB 2 configuration.
...
2021-04-02 12:23:50.177669407 Finished GRUB_RESCUE setup: Added 'Relax-and-Recover' GRUB 2 menu entry.

jsmeix commented at 2021-04-07 14:23:

@rmetrich
the customer could run once

# time multipath -l 2>&1 | wc -l

(the | wc -l is there to suppress the actual output but get an idea how much it is)
so we could get an idea how long it takes when multipath -l is run 2653 times.

pcahyna commented at 2021-04-07 16:10:

does perhaps the
lsblk -ipo NAME,KNAME,PKNAME,TRAN,TYPE,FSTYPE,SIZE,MOUNTPOINT
output indicate something how the actually interesting block devices
could be distinguished from the non-interesting block devices?

lsblk --nodeps -o fstype /dev/sd... shows mpath_member for multipath paths. This information seems to be eventually obtained from udev.

I suspect that the multipath -c command actually does not look at the kernel multipath device tables, but checks whether the device should be a multipath path, instead of checking whether the device actually is a multipath path. It is even hinted in the manual page ("-c Check if a block device should be a path in a multipath device."). This could be the reason both for this problem and for #2298.

pcahyna commented at 2021-04-07 16:44:

@rmetrich
the customer could run once

# time multipath -l 2>&1 | wc -l

(the | wc -l is there to suppress the actual output but get an idea how much it is)
so we could get an idea how long it takes when multipath -l is run 2653 times.

I don't think multipath -l is the problem here - the customer is probably using ReaR 2.4 from RHEL 8 which does not have this command yet (#2299 was not yet merged).

pcahyna commented at 2021-04-07 16:54:

Actually... @rmetrich maybe the problem lies not in any slowness of multipath -c, but in returning wrong information. Can you please check whether multipath -c returns the correct information (return code 0) for all those /dev/sd* devices? You would see messages like "Ignoring $blockd: it is a path of a multipath device" in the log if paths are detected correctly (even without a debug log).

jsmeix commented at 2021-04-08 13:49:

@rmetrich @pcahyna
I have a gereral understanding question:

I do not understand what the reason behind is why a system
needs to have very many disk-like devices accessible.

Because
https://documentation.suse.com/sles/15-SP2/html/SLES-all/cha-multipath.html#sec-multipath-mpiotools
reads (excerpt)

17.4.1 Device Mapper Multipath Module
...
Configurations of up to eight paths to each device are supported.

it seems eight paths to a device is a reasonable maximum
that is used "out there in practice".

So when according to
https://github.com/rear/rear/pull/2597#issuecomment-814908043
there are 2653 sd[a-z]* device paths it seems
there are 379 dm-[0-9]* multipath devices
(because 7 is the only proper divisor <= 8 i.e. 2653 = 379 * 7)
or there is one normal system disk /dev/sda (non-multipath)
so there are 2652 sd[a-z]* multipath device paths with
either 1326 = 2652 / 2 dm-[0-9]* multipath devices
or 663 = 2652 / 4 dm-[0-9]* multipath devices
or 442 = 2652 / 6 dm-[0-9]* multipath devices.

Or do I misunderstand something here?
If not:
I wonder why there are hundreds of multipath devices accessible on a system.

Are those hundreds of disk-like devices perhaps
first combined into RAID arrays to get RAID devices
that are then perhaps used as LVM PVs to be
further combined into VGs and LVs so that
from hundreds of low level disk-like storage objects
only a few high level strorage objects get created
that are finally mounted?

Or is there perhaps some lower level configuration mistake
(e.g. related to the SAN setup, perhaps no or insufficient SAN zoning?)
when a system has hundreds of disk-like devices accessible
but most of them are perhaps not meant to be used by that system?

The

lsblk -ipo NAME,KNAME,PKNAME,TRAN,TYPE,FSTYPE,SIZE,MOUNTPOINT

output of such a system could really help to better understand its storage structure.

rmetrich commented at 2021-04-08 14:53:

It's not rare that Red Hat customers have such many multipath devices on their system.
Usually they host large databases and stuff like that.
So yes, here there is a single disk for the OS and the rest are "raw devices" we shouldn't even look at.
You can have similar devices with VxVM for example.

jsmeix commented at 2021-04-09 08:17:

@rmetrich
thank you for the info how that system looks like.
That helped.
Now I understand.

they host large databases and stuff like that
...
there is a single disk for the OS and
the rest are "raw devices" we shouldn't even look at

This is a special case.
Even if that use case is relatively normal in enterprise server environments
usually each particular enterprise environment is an individual special case.
Special cases need special setup which needs special config variables
to also provide "final power to special enterprise server admins"
who are experienced users who know how to set special config variables
properly to make things work as they need in their enterprise environments.

Cf. the meaning behind of the "Dirty hacks welcome" section in
https://github.com/rear/rear/wiki/Coding-Style

So I agree with a new ONLY_INCLUDE_DISKS config variable.

jsmeix commented at 2021-04-09 08:38:

I am wondering if we should use
in default.conf

ONLY_INCLUDE_DISKS=()

and in layout/save/GNU/Linux/200_partition_layout.sh

for disk in ${ONLY_INCLUDE_DISKS[@]:-/sys/block/*} ; do

OR
if we should perhps better use
in default.conf

CONSIDER_DISKS=( /sys/block/* )

and in layout/save/GNU/Linux/200_partition_layout.sh

for disk in ${CONSIDER_DISKS[@]} ; do

which looks simpler and more straightforward to me i.e. more KISS and
less RFC 1925 item 6a "It is always possible to add another level of indirection".

Now I see that the config variable name ONLY_INCLUDE_DISKS is wrong.
It should be ONLY_CONSIDER_DISKS (or something else that tells its actual meaning)
because what is specified in that array is not what will be (only) included
because certain disks will get automatically excluded by the hardcoded
conditions in layout/save/GNU/Linux/200_partition_layout.sh

pcahyna commented at 2021-04-09 08:40:

@rmetrich
thank you for the info how that system looks like.
That helped.
Now I understand.

they host large databases and stuff like that
...
there is a single disk for the OS and
the rest are "raw devices" we shouldn't even look at

This is a special case.
Even if that use case is relatively normal in enterprise server environments
usually each particular enterprise environment is an individual special case.
Special cases need special setup which needs special config variables
to also provide "final power to special enterprise server admins"
who are experienced users who know how to set special config variables
properly to make things work as they need in their enterprise environments.

Cf. the meaning behind of the "Dirty hacks welcome" section in
https://github.com/rear/rear/wiki/Coding-Style

So I agree with a new ONLY_INCLUDE_DISKS config variable.

I will note that this special case should be already adequately covered by the current code that skips multipath paths, so I would like to determine why it does not work as expected first. Otherwise you may just introduce a dirty hack which does not in fact allow some new special case, but is merely a workaround for another part of the code not working properly.

jsmeix commented at 2021-04-09 08:50:

@pcahyna
I agree that we need to find out why it is so slow for thousands of
multipath path devices in /sys/block/*

Perhaps we find out that we cannot get automated exclusion of
thousands of multipath path devices sufficiently fast,
cf. https://github.com/rear/rear/pull/2597#issuecomment-814950019
"there is Grub2 probe running for 20 minutes"
(we could not make Grub2 probe running faster).

My general intent is to provide "final power to all users"
what layout components get included and what get excluded.

Perhaps my above CONSIDER_DISKS=( /sys/block/* ) proposal
points into the right direction.

Perhaps we could have INCLUDE_DISKS=() so the current default
to consider all in /sys/block/* and automatically exclude by the hardcoded
conditions in layout/save/GNU/Linux/200_partition_layout.sh is done
while a user specified INCLUDE_DISKS=( sda sdc ) will include (only) them
and skip all exclude conditions in layout/save/GNU/Linux/200_partition_layout.sh

pcahyna commented at 2021-04-09 09:00:

Perhaps we find out that we cannot get automated exclusion of
thousands of multipath path devices sufficiently fast.

Perhaps. And perhaps not. It would be good to know whether there is really such a fundamental problem before attempting to "solve" it using yet another configuration variable.

My general intent is to provide "final power to all users"
what layout components get included and what get excluded.

I am afraid that in this case you would provide merely the power to all users to shoot themselves in the foot without providing the complementary power of aiming properly. Aren't the SCSI device names allocated in an essentially unpredictable order? Is then wise to use them as stable identifiers in a config file? I think it is much more preferable to let the users describe includes and excludes in terms of some stable identifiers (like filesystem mount points, volume groups or logical volumes).

Also, the discussion here is not about what to exclude (multipath components are excluded anyway), but about a performance optimization which allows to avoid looking at them before excluding them.
EDIT: This is not entirely correct, the proposed change introduces indeed a new way to define what is included/excluded, but the primary motivation was an optimization and not having another way of configuring inclusion/exclusion.

pcahyna commented at 2021-04-09 10:25:

@rmetrich

So yes, here there is a single disk for the OS and the rest are "raw devices" we shouldn't even look at.

I understand that it would be desirable to have a generic way to tell what devices to avoid looking at. Even if we manage to fix it for the multipath case, what if a customer uses non-multipath devices for such purposes? Given the problems with configuring this kind of exclusion using explicit device names, would it be possible to ask the kernel whether there is anything interesting on the drives? After all, the kernel looks at disks when they are configured to search for partition tables etc. Would it work to automatically exclude disks that have no partitions known to the kernel, no mounted filesystems and no holders (/sys/block/*/holders)?

pcahyna commented at 2021-04-09 10:28:

FYI multipath component devices have a /sys/block/*/holders/* symlink that points to the multipath device, so it is possible to recognize them this way (if one does not want to use lsblk).

jsmeix commented at 2021-04-09 11:20:

@pcahyna
perhaps I misunderstand what you wrote

I am afraid that in this case you would provide merely the
power to all users to shoot themselves in the foot
without providing the complementary power of aiming properly

but for me provide final power to all users essentially means
to provide the power to all users to even shoot themselves in the foot.

ReaR is meant to be used by a user who is root and such users
have unlimited power to do what they want on their systems
where "aiming properly" is completely their own responsibility
just as it is for a real shooter - the gun provides him power
but "aiming properly" is completely the shooter's responsibility.

In contrast to a weapon ReaR has reasonably safe default behaviour
(at least for the commonly known and well tested use cases).

By default ReaR should avoid (as far as possible with reasonable effort)
that a ReaR user shoots himself in the foot (but there are still many places
where ReaR does not protect the user to shoot himself in the foot).

The crucial part - and that is what I have to basically always explain - is
that for me "provide final power to all users" means

  • first and foremost a reasonable default behaviour

plus

  • the user has unlimited power to change the defaults as he likes

When the user changed something then things could fail in arbitrary ways.
Of course we would try to error out cleanly when user settings cannot work,
cf. "Try hard to care about possible errors" in
https://github.com/rear/rear/wiki/Coding-Style
but we cannot ensure that we catch all bad cases so things might even go
terribly wrong when the user changed something in unexpected ways.

Simply put:
In "final power to the user" the word 'final' is crucial
so "final power to the user" does not mean the program does nothing to avoid bad outcome
instead it means the program does not hinder the user to explicitly let it also do bad things.

Or in other words:
"Final power to the user" means
the programmer gives his power away to the user of his program
and not the programmer keeps the power himself via hardcoded things in his program.

The perfect example for "final power to the user" is nowadays "# rm -rf /":

# rm -rf /
rm: it is dangerous to operate recursively on '/'
rm: use --no-preserve-root to override this failsafe

# rm -rf --no-presreve-root /
[all is gone now]

jsmeix commented at 2021-04-09 11:31:

Regarding

what if a customer uses non-multipath devices for such purposes
...
the proposed change introduces indeed a new way to define what is included/excluded,
but the primary motivation was an optimization

Exactly.

My proposed change introduces intentionally a generic way
to define what disks are included (and all others are excluded)
because I like to avoid optimization "hacks" for specific cases.

I think my proposed change would also help a user who has
a few disks for the OS (the only ones of interest for ReaR) and
hundreds of (non-multipath) "raw devices" to host large databases.

I think @rmetrich current ONLY_INCLUDE_DISKS implementation
would do the same - only the variable name is misleading as long as
the exclude conditions in layout/save/GNU/Linux/200_partition_layout.sh
are not skipped when ONLY_INCLUDE_DISKS is specified.

pcahyna commented at 2021-04-09 12:10:

@jsmeix

perhaps I misunderstand what you wrote

I am afraid that in this case you would provide merely the
power to all users to shoot themselves in the foot
without providing the complementary power of aiming properly

but for me provide final power to all users essentially means
to provide the power to all users to even shoot themselves in the foot.

ReaR is meant to be used by a user who is root and such users
have unlimited power to do what they want on their systems
where "aiming properly" is completely their own responsibility
just as it is for a real shooter - the gun provides him power
but "aiming properly" is completely the shooter's responsibility.

I think you indeed misunderstood. My point was that you make the user responsible for "aiming properly", but in this case you would not provide them with proper means for aiming. The only way the proposed variable aims at block devices is by using their assigned names, which are, I am afraid, not reliable - /sys/block/sdxyz can become /sys/block/sdqux on next boot. (IIRC I even saw an issue about similar problems from you, but I can't find it now.) So a user might determine that including /sys/block/sdxyz works for them, but after a reboot it becomes /sys/dev/sdqux and gets excluded from the backup.

Moreover, generally speaking, while places where "users can shoot themselves in the foot" are acceptable in a tool like this, I believe one should be cautious when adding a new one and first ask whether the same desired outcome can be achieved in a safer way.

jsmeix commented at 2021-04-09 12:10:

Regarding

Aren't the SCSI device names allocated in an essentially unpredictable order?
Is then wise to use them as stable identifiers in a config file?

Yes, cf. https://github.com/rear/rear/issues/2254

But layout/save/GNU/Linux/200_partition_layout.sh
is run on the original system when it is up and running
and ReaR config files are executed sourced as scripts, cf.
https://github.com/rear/rear/blob/master/etc/rear/local.conf

So the user could implement what is needed in his special case to specify
the currently right kernel device names while "rear mkrescue/mkbackup" is
running i.e. while layout/save/GNU/Linux/200_partition_layout.sh is running.

Those kernel device names at the time when "rear mkrescue/mkbackup"
is running get stored in disklayout.conf.

When on the replacement hardware other kernel device names are used
ReaR should normally go automatically into MIGRATION_MODE
(cf. its description in default.conf).
In MIGRATION_MODE the user must during "rear recover"
manually specify the right device name mapping
(via some simple user dialog).

This is a nice example where current ReaR
does not protect the user to shoot himself in the foot.
When he accepts a wrong device mapping
ReaR will "just install" on a wrong device, cf.
https://github.com/rear/rear/issues/1271

pcahyna commented at 2021-04-09 12:24:

Regarding

Aren't the SCSI device names allocated in an essentially unpredictable order?
Is then wise to use them as stable identifiers in a config file?

Yes, cf. #2254

That's what I meant, thanks.

But layout/save/GNU/Linux/200_partition_layout.sh
is run on the original system when it is up and running
and ReaR config files are executed sourced as scripts, cf.
https://github.com/rear/rear/blob/master/etc/rear/local.conf

So the user could implement what is needed in his special case to specify
the currently right kernel device names while "rear mkrescue/mkbackup" is
running i.e. while layout/save/GNU/Linux/200_partition_layout.sh is running.

For the case that motivated this PR, the problem is, how will the user implement this detection (of unwanted multipath devices)? Either they implement it in the same way as it is done now in ReaR, but then they will save nothing, because if the method is the same, it will lead to the same 40-minute delay.
Or they implement something better, but then why not implement this "better thing" in ReaR directly?
You mentioned that "perhaps we find out that we cannot get automated exclusion of thousands of multipath path devices sufficiently fast". This is certainly a possibility, but in this case the power given to the user is illusory. It will not allow them to make any actual improvement in their local configuration, lest they hardcode the list of device names in the config file and be surprised when the devices rename on some future reboot.

Of course, I know that you want to make the variable more generic and not tied to this specific case, which certainly goes in the right direction and the argument above does not apply. But...

Those kernel device names at the time when "rear mkrescue/mkbackup"
is running get stored in disklayout.conf.

When on the replacement hardware other kernel device names are used
ReaR should normally go automatically into MIGRATION_MODE
(cf. its description in default.conf).
In MIGRATION_MODE the user must during "rear recover"
manually specify the right device name mapping
(via some simple user dialog).

This is a nice example where current ReaR
does not protect the user to shoot himself in the foot.
When he accepts a wrong device mapping
ReaR will "just install" on a wrong device, cf.
#1271

I am sure that relying on (unpredictable) device names is already a problem in the current code. I am only saying that it would be better to avoid introducing another place which relies on them and thus amplifying the problem. Especially among configuration variables, as those are the documented user interface that one should not change (because of compatibility).

jsmeix commented at 2021-04-09 12:24:

@pcahyna
yes of course we should be cautious and careful with what we implement.
I found too many places where the "code blindly proceed in case of errors"
cf. "Try hard to care about possible errors" in https://github.com/rear/rear/wiki/Coding-Style
and I know there are still many such places left that would need to be fixed...

I appreciate your research to find a more reliably (and hopefully faster)
working method to detect when a block device is a multipath path device.
The current is_multipath_path function implementation had hit us too often
so a better method would help so much.
This task belongs to the above mentioned

  • first and foremost a reasonable default behaviour

My long term intent to introduce a generic way
to let the user define what disks are included
is a separated task that was triggered by this issue here.
This task belongs to the above mentioned

  • the user has unlimited power to change the defaults as he likes

The "first and foremost" makes clear what should be done first and foremost ;-)

jsmeix commented at 2021-04-09 12:34:

@pcahyna
very interesting conclusion in your
https://github.com/rear/rear/pull/2597#issuecomment-816645289

how will the user implement this detection (of unwanted multipath devices)?
Either they implement it in the same way as it is done now in ReaR,
but then they will save nothing...
Or they implement something better, but then why not implement
this "better thing" in ReaR directly?

You are right.

jsmeix commented at 2021-04-09 12:56:

@pcahyna
but isn't it the point of @rmetrich pull request here
that with a new config variable to specify the wanted disks
there is no need for detection of unwanted disks
so any detection issues can be avoided?

With a new config variable to specify the wanted disks
the user only needs to detect the wanted disks
where it does not matter when it is slow for only a few wanted disks
and he does not need to detect any unwanted disks
which could delay things when it is slow for very many unwanted disks.

So when we cannot implement right now or perhaps never
a reliable and sufficiently fast default behaviour
then
a generic "final power to the user" method would help the user.

Both parts
first and foremost a reasonable default behaviour
and
the user has unlimited power to change the defaults as he likes
do not contradict or compete but complement each other
and both help the user in the end.

jsmeix commented at 2021-04-09 13:02:

Sun is shining, weekend is there, and
I wish you all a relaxed and recovering weekend!

pcahyna commented at 2021-04-09 13:11:

@jsmeix

but isn't it the point of @rmetrich pull request here
that with a new config variable to specify the wanted disks
there is no need for detection of unwanted disks
so any detection issues can be avoided?

With a new config variable to specify the wanted disks
the user only needs to detect the wanted disks
where it does not matter when it is slow for only a few wanted disks
and he does not need to detect any unwanted disks
which could delay things when it is slow for very many unwanted disks.

Depends. The question is, is it possible to implement a reliable way of detecting the wanted disks without looking at all the disks and examining whether they are wanted or not? It seems that in this case the condition is "the disk is wanted if it is not multipath". In this case one has to look at all the disks to determine whether they are multipath and my reasoning applies. But perhaps there would be a better way. (I would say that one possible better way would be to start at the objects that the user is interested in - esp. filesystems, maybe LVs, VGs, RAIDs, and recursively determine their underlying devices until one reaches the leaves, instead of starting from the leaves and building the entire stack from the bottom up.) Still, this better way could then be worth implementing in ReaR in a generic manner, but perhaps it is too difficult.

jsmeix commented at 2021-04-19 11:05:

@pcahyna
regarding your https://github.com/rear/rear/pull/2597#issuecomment-816671377

The commonly used point of view is that in the tree-like structure of storage objects
(it is no tree because one higher level storage object could have more than one parent
e.g. a non-degraded software RAID array "md" device has at least two parent devices)
the actual disks are the root devices and on them certain levels of higher and higher
level storage objects are built until the leaves are usually some "mountable thingies"
(like filesystems or btrfs subvolumes).

This commonly used point of view is what lsblk shows
and what admins usually have in mind when they imagine
their particular tree-like structure of storage objects on their machines.

Cf. what Stefan Hundhammer wrote in
https://lists.opensuse.org/archives/list/yast-devel@lists.opensuse.org/thread/YK6MJ4WYT5TS672FMOKK2BNULCNLFIE6/
(excerpt):

Users know they have disks in their computers; so we should represent 
them, visualize them. This is the primary handle for users because it 
corresponds to something from the real world; something they can touch, 
something they can buy in a shop. You can't buy a partition or an LVM, 
but you CAN buy a disk. Don't take away from users the few things from 
the real world that they can make sense of in this context. ;-)

Same appies for virtual disks.
Virtual disks are what admins have first and foremost on virtual machines.

So in ReaR we also should let the user specify his things
according to how he is used to imagine his things,
i.e. first and foremost disks.

So far from a usability point of view.

Now from a "what ReaR is meant to do" point of view:

ReaR's main task is to recreate a system as much as possible as it was before
on bare metal replacement hardware (or on bare virtual machines).

It is not ReaR's main task to initially create a system where it might make sense
to let the user specify the final storage objects (usually "mountable thingies")
and let some (semi)-automatism create the needed lower and lower level
storage objects until finally some disks are used (those where the intended
final storage objects fit best).

It is ReaR's main task is to use "same" disks as on the original system
and recreate on them the same structure of higher and higher
level storage objects.

Finally from a technical point of view:

To create a structure of storage objects on bare metal replacement hardware
(or on a bare virtual machine) one must start with disks, create partitions on them,
and so on.

Therefore ReaR uses "whole" disks when recreating
and its first thing is parted mklabel, then parted mkpart and so on.

There is no code in ReaR to "partially" recreate storage objects on a disk.

Summary (from my point of view):

It would help the user a lot if he could specify with final power
what disks ReaR has to use (and then all other disks must not be used).

When a higher level storage object (e.g. a LV in a VG that is spread over several disks)
uses a disk that is not specified to be used (if the user has specified what to use)
ReaR (i.e. "rear mkrescue") must error out so the user would see the conflict.

jsmeix commented at 2021-04-19 11:41:

@rmetrich
I do not want to work against you and/or what you intend here
so I would like to know it it would be OK for you when I implement
some generic INCLUDE_DISKS functionality as I described in
https://github.com/rear/rear/pull/2597#issuecomment-816529397
and subsequent comments or if you disagree with me?

@rear/contributors
I could do an initial separated pull request so you could have a look
and we could then further discuss things based on real code proposals?

rmetrich commented at 2021-04-19 12:15:

Hi @jsmeix no worry, my hack is just a proposal, if you believe we can do better, I'll be happy to reject this :)
Unfortunately I'm busy with other stuff and cannot contribute to rear yet.

jsmeix commented at 2021-04-19 12:35:

Just an offhanded thought FYI:

https://github.com/rear/rear/issues/1271#issuecomment-822425412
may indicate we may also need something like
an EXCLUDE_DISKS config variable (that is empty by default)
so that the user could (if needed) specify his sacrosanct disks.

Perhaps ideally not only as kernel device nodes (with or without path) like

EXCLUDE_DISKS=( /dev/sdc sdc )

but also via any device node symlink (with or without path) like

EXCLUDE_DISKS=( /dev/disk/by-label/REAR-000 REAR-000 )

pcahyna commented at 2021-04-19 12:54:

@jsmeix

regarding your #2597 (comment)

The commonly used point of view is that in the tree-like structure of storage objects
(it is no tree because one higher level storage object could have more than one parent
e.g. a non-degraded software RAID array "md" device has at least two parent devices)
the actual disks are the root devices and on them certain levels of higher and higher
level storage objects are built until the leaves are usually some "mountable thingies"
(like filesystems or btrfs subvolumes).

Yes, when I wrote that comment I had an opposite view in mind: the "mountable thingies" as roots and the actual disks as leaves. Of course since the tree is not actually a tree but a DAG, both viewpoints (with actual disks as roots vs. with mountable thingies as roots) can be considered equally valid (or invalid), but my comment should be interpreted with the "mountable thingies as roots" view in mind.

This commonly used point of view is what lsblk shows
and what admins usually have in mind when they imagine
their particular tree-like structure of storage objects on their machines.

I had in mind the output of lsblk --inverse

Cf. what Stefan Hundhammer wrote in
https://lists.opensuse.org/archives/list/yast-devel@lists.opensuse.org/thread/YK6MJ4WYT5TS672FMOKK2BNULCNLFIE6/
(excerpt):

Users know they have disks in their computers; so we should represent 
them, visualize them. This is the primary handle for users because it 
corresponds to something from the real world; something they can touch, 
something they can buy in a shop. You can't buy a partition or an LVM, 
but you CAN buy a disk. Don't take away from users the few things from 
the real world that they can make sense of in this context. ;-)

Same appies for virtual disks.
Virtual disks are what admins have first and foremost on virtual machines.

So in ReaR we also should let the user specify his things
according to how he is used to imagine his things,
i.e. first and foremost disks.

So far from a usability point of view.

I am not sure that I agree with this view. Users buy disks to put the "mountable thingies" and eventually data on them, so they may buy a disk, but ultimately they are interested in the data, not disks.
The discussion you reference, is AFAICT, about the installer, which can have different usability requirements than a backup/recovery tool. (When you install a system, you don't have any data on it yet.) If anything, the discussion is more relevant to the recover phase, not to the backup phase (indeed, in the recover phase, you certainly want to have a detailed control over what disks are going to be used, since you don't want to overwrite disks that still have valuable data. Similar concerns apply when installing, so here the installer is a more relevant analogy to the recover phase than to the backup/mkrestore part.)

Now from a "what ReaR is meant to do" point of view:

ReaR's main task is to recreate a system as much as possible as it was before
on bare metal replacement hardware (or on bare virtual machines).

It is not ReaR's main task to initially create a system where it might make sense
to let the user specify the final storage objects (usually "mountable thingies")
and let some (semi)-automatism create the needed lower and lower level
storage objects until finally some disks are used (those where the intended
final storage objects fit best).

It is ReaR's main task is to use "same" disks as on the original system
and recreate on them the same structure of higher and higher
level storage objects.

It seems that what you say goes a bit against the current philosophy of ReaR though, because the current include/exclude rules are often formulated in terms of the final (or close to final) storage objects: ONLY_INCLUDE_VG, EXCLUDE_MOUNTPOINTS, EXCLUDE_VG, EXCLUDE_RECREATE - even AUTOEXCLUDE_DISKS excludes disk that are not used by mounted filesystems.

(I admit I am not familiar enough with the current include/exclude rules.)

Thinking in terms of "mountable thingies" has also the advantage of naturally mapping to backups, because usually those are the only objects that one knows how to backup (unless one uses a method like BLOCKCLONE, but I suppose it is not very typical).

Finally from a technical point of view:

To create a structure of storage objects on bare metal replacement hardware
(or on a bare virtual machine) one must start with disks, create partitions on them,
and so on.

Therefore ReaR uses "whole" disks when recreating
and its first thing is parted mklabel, then parted mkpart and so on.

There is no code in ReaR to "partially" recreate storage objects on a disk.

Well, I suspect that if you exclude a VG that has a PV on a partition of a disk, it is allowed to recreate the partition without recreating the PV and VG that resides in it. The disk will then indeed become "partially recreated". (I haven't tried it though.)

Summary (from my point of view):

It would help the user a lot if he could specify with final power
what disks ReaR has to use (and then all other disks must not be used).

If you want to go this route, please let the user specify the disks in terms of some stable identifier (/dev/disk/by-*) instead of using block device names.

When a higher level storage object (e.g. a LV in a VG that is spread over several disks)
uses a disk that is not specified to be used (if the user has specified what to use)
ReaR (i.e. "rear mkrescue") must error out so the user would see the conflict.

Sure.

jsmeix commented at 2021-04-19 14:02:

I admit never really understood the current include/exclude rules.

I can "understand" a particular case by trial and error plus reading the code
but I don't have an overall understanding in my mind to be able to tell
off the top of my head what config variables must be set with which values
to get a desided outcome.

From my failed attempts to understand things my result
that I personally think the whole include/exclude area
has become a grown "hairy" mess over the time, cf.
https://github.com/rear/rear/pull/2597#issuecomment-814895323

My vague basic understanding is that during "rear mkrescue"
first all storage objects ReaR knows about are included in disklayout.conf
(except some automated hardcoded ignored ones, cf. this issue here)
and then some "not needed" entries in disklayout.conf get disabled
by various subsequent exclude scripts which is the main "hairy" part.

Cf. what rear -s mkrescue | grep layout/save shows (excerpts)

Source layout/save/GNU/Linux/200_partition_layout.sh
...
Source layout/save/GNU/Linux/220_lvm_layout.sh
Source layout/save/GNU/Linux/230_filesystem_layout.sh
...
Source layout/save/default/310_include_exclude.sh
Source layout/save/default/320_autoexclude.sh
Source layout/save/default/330_remove_exclusions.sh

A bad consequence is that when ReaR does not support something
it gets silently ignored during "rear mkrescue" and then the user has to
learn the hard way later (when it is too late) when "rear recover" fails
with unfriedly messages that something is not supported by ReaR, cf.
https://github.com/rear/rear/issues/2560

E.g. as far as I know off the top of my head
a mounted filesystem that is not supported by ReaR is silently ignored and
an unsupported storage object is also silently ignored during "rear mkrescue"
for the latter see https://github.com/rear/rear/issues/2560

What is missing is the final power where the user could specify
what he wants and "rear mkrescue" fails if it cannot fulfill that
and where the uscer could specify what is sacrosanct
for ReaR and "rear mkrescue" fails if it needs that.

Ideally ReaR would have sufficient automated checks
that find out when something unsupported is actually in use
so "rear mkrescue" could error out to have the user informed in time
when he can adapt and enhance things to make "rear recover" work.

Let me experiment a bit with some generic INCLUDE_DISKS
and EXCLUDE_DISKS functionality to find out how that works
in practice at least for me then let's see if that way looks promising.

github-actions commented at 2021-06-21 02:25:

Stale pull request message

jsmeix commented at 2021-06-23 10:57:

This issue is not forgotten - only too much other stuff went in the way.
Hopefully I find time next week...

gdha commented at 2021-07-02 07:16:

@rmetrich @pcahyna @jsmeix Is there no way to investigate why multipath is so terrible slow? What @rmetrich is trying to implement yet another work-around (which is sometimes to only way), but it makes ReaR more complicated towards the end-user who cannot find its way in all the hundreds variable settings. The main goal is still Relax not Stress.

jsmeix commented at 2021-07-02 10:14:

@gdha
I think you may misunderstand two things here:

(1) This particular issue is a special case:

This particular issue is not about a "normal" system.

This issue is about a system with thousands of disks, see
https://github.com/rear/rear/pull/2597#issuecomment-814908043

... physical devices
...
2653 "/devices/pci..." devices there.
All scanned.

and
https://github.com/rear/rear/pull/2597#issuecomment-814957059

... get an idea how long it takes when 'multipath -l' is run 2653 times.

So multipath is not terrible slow.
But tousands of times multipath gets terrible slow.

Systems with thousands of disks mean stress in general.

Not only with ReaR but basically all tools that deal with disks
may get terrible slow with thousands of disks up to low level things
like possibly udev event storms and things like that
(kernel developers could tell such stories ;-)

So we are thinking about some additional generic INCLUDE_DISKS
and EXCLUDE_DISKS functionality to make it possible for admins
of systems with thousands of disks to have less stress - at least in ReaR.

I am totally selfish here:
If a SUSE customer with a system with thousands of disks
files an urgent support case at SUSE then I will get stress.
But I do not like stress - actually I really really hate stress.
So to get rid of such customer-introduced "real-time stress"
(likely amplified by some well-intended advice by managers)
I would implement whatever quick and ditry hack to somehow
make that stuff somehow work for the particular customer
to get that annoyance out of my way as quickly as possible.

(2) More and more config variables don't mean stress:

The crucial point with config variables is that normally the user
should not need to care about their values because normally
config variables have a default setting where things "just work".

Only if things do not "just work" the user would have to care
about that specific case where it does not "just work" for him
with his particular system in his particiular environment.

In such a case the user would have to read default.conf to find out
what the right config variables and values are that he must
specifically set to make his special case work.

When there are config variables for basically all and everything
the user has relatively easy means to make his special case work.

When there is no config variable for a particular kind of thing
the user would have to adapt the ReaR scripts that deal
with that particular kind of thing which is less easy compared to
read default.conf and set some appropriate config variables.

But the worst of all when config variables are missing is
that then very most of the users won't adapt the ReaR scripts.

Instead the users make issues here and then we get their stress
and the users are right doing so because only in theory
"simply change the code - that's just what free software is about"
is right but in practice users don't appreciate such habits.

In particular SUSE customers usually won't adapt ReaR scripts
because they want to use "the software" (i.e. the ReaR scripts)
as is from a rear* RPM package same on all their systems
and specify system-dependant things only in config files
on each system as needed.

So having config variables for basically all and everything
means less stress for all of us compared to not having
config variables for basically all and everything.

Adendum:
It is a separated and different topic
if we should have all config variables in a single default.conf
or if we should split them up into commonly used config variables
and config variables for special cases.

github-actions commented at 2021-09-01 02:12:

Stale pull request message

pcahyna commented at 2021-09-15 18:19:

@rmetrich @pcahyna @jsmeix Is there no way to investigate why multipath is so terrible slow? What @rmetrich is trying to implement yet another work-around (which is sometimes to only way), but it makes ReaR more complicated towards the end-user who cannot find its way in all the hundreds variable settings. The main goal is still Relax not Stress.

@gdha I fully agree with your view in general, and indeed I spent some time researching why is multipath slow. It turns out, however, that in the original issue (https://bugzilla.redhat.com/show_bug.cgi?id=1946911) the culprit was not multipath devices, but just lots of unmounted SAN devices, so optimizing multipath would not have helped here.

OTOH, it also turns out that there is indeed a multipath performance problem that was fixed in PR #2034 but reintroduced in ReaR 2.6. As written by @jsmeix above:

@pcahyna
perhaps in the function is_multipath_path in lib/layout-functions.sh
the most time consuming part is not the device specific call

multipath -c /dev/$1

but the listing of the whole multipath topology via

multipath -l | grep -q '[[:alnum:]]' || return 1

see my comment in the code why I added that call
https://github.com/rear/rear/blob/master/usr/share/rear/lib/layout-functions.sh#L718

The problematic change is here: https://github.com/rear/rear/pull/2299/files#diff-2916f533f1aa01a8a571a7062e43957bdd1523e70ce94a5107541ff466eef22cR705 - for each device, it issues multipath -l. And since multipath -l scans all devices and the time it takes is proportional to their number, the total time spent on all this is quadratic in the number of devices. For 1600 devices it used to take 300 s, now it takes 4445 s. I will try to propose a fix for that (using the strategy mentioned in https://github.com/rear/rear/pull/2597#issuecomment-815038677 or https://github.com/rear/rear/pull/2597#issuecomment-816587171 ).

jsmeix commented at 2021-11-11 11:50:

@pcahyna @rmetrich @rear/contributors
please have a look at https://github.com/rear/rear/pull/2708

jsmeix commented at 2021-11-15 12:31:

With https://github.com/rear/rear/pull/2708 merged
at least the long computing time in layout/save/GNU/Linux/200_partition_layout.sh
should be avoided.

I don't know if the "Grub2 probe running for 20 minutes" issue
in output/default/940_grub2_rescue.sh can be avoided, see
https://github.com/rear/rear/pull/2597#issuecomment-814950019
When it hangs inside a command call (e.g. when GRUB is slow)
there is perhaps not much what ReaR could do.
At least that code is only run when GRUB_RESCUE is wanted by the user.

github-actions commented at 2022-01-15 02:23:

Stale pull request message


[Export of Github issue for rear/rear.]