#3495 Issue open: checklayout should not highlight a disc swap

Labels: enhancement, discuss / RFC, support / question

bwelterl opened issue at 2025-07-10 10:38:

When a cron job is configured to check a layout change, if a swap of disk happens during the boot (usual with VMs), the layout file changes, and a new rescue ISO is created than can be avoided.
There is no more adherence of disk names sdx/vdx in linux system, so Rear should be able to manage a name swap.

Describe the solution you'd like

The idea is to exclude the name of the device in the comparison of disk layout file, only the concrete internal disk layout must be stable (partition size, LV size/name)
Thank you !

jsmeix commented at 2025-07-10 11:33:

@bwelterl
what exactly do you mean with "a swap of disk"
(why singular "disk" and not "disks"?)
and/or "name swap" in your specific case?
Couldn't you not provide a "real world" example
from your particular environment?
E.g. on your specific system the output of

# lsblk -ipo NAME,KNAME,PKNAME,TRAN,TYPE,FSTYPE,LABEL,SIZE,MOUNTPOINT[S]

before and after the "swap of disk".

I ask because "rear recover" is first and foremost
intended to work on pristine new replacement hadware
("bare metal" disaster recovery is the primary use case)
and all what reliably exists on pristine new hadware
are only plain kernel device nodes like
/dev/sda and /dev/sdb.

In particular higher level disk device names like
all what is in /dev/disk/by-* are different
on replacement hadware like UUID based names
or could be different like by-path names when
things are not wired 100% same on replacement hadware.

Therefore disklayout.conf is based on kernel device nodes
and any change in kernel device nodes is considered to be
a "migration" in ReaR, see the MIGRATION_MODE description
in usr/share/rear/conf/default.conf
https://github.com/rear/rear/blob/rear-2.9/usr/share/rear/conf/default.conf#L454

Therefore - to be on the safe side - it is correct
that any change in kernel disk device nodes indicates
that a new ReaR recovery system should be made.

It is not and cannot be ReaR's task to make a decision
what a change in kernel disk device nodes actually means,
in particular not to let ReaR make an automated decision
if such a change could be ignored (at least not until
someone adds AI to ReaR - and then good luck with the
"intelligence" of that - have fun when AI hallucinates
disk layout thingies for new replacement hadware ;-)

Additionally you may have a look at the section
"Relax-and-Recover versus backup and restore"
https://en.opensuse.org/SDB:Disaster_Recovery#Relax-and-Recover_versus_backup_and_restore
which reads (excerpt):

It is your task to ensure your backup is consistent.
...
Therefore after each change of the basic system
(in particular after a change of the disk layout)
"rear mkbackup" needs to be run to create
a new ReaR recovery system together with a
matching new backup of the files
(or when third party backup software is used
"rear mkrescue" needs to be run to create
a new ReaR recovery system and additionally
a matching new backup of the files must be created).

bwelterl commented at 2025-07-10 12:08:

Hi Johannes,

Thanks for your answer.

For your information, the layout is standard, just 2 disks:

# lsblk -ipo NAME,KNAME,PKNAME,TRAN,TYPE,FSTYPE,LABEL,SIZE,MOUNTPOINT
NAME                            KNAME     PKNAME    TRAN   TYPE FSTYPE      LABEL   SIZE MOUNTPOINT
/dev/sr0                        /dev/sr0            sata   rom                     1024M 
/dev/vda                        /dev/vda                   disk                      20G 
|-/dev/vda1                     /dev/vda1 /dev/vda         part vfat                600M /boot/efi
|-/dev/vda2                     /dev/vda2 /dev/vda         part xfs                   1G /boot
`-/dev/vda3                     /dev/vda3 /dev/vda         part LVM2_member        18.4G 
  |-/dev/mapper/rhel_rhel8-swap /dev/dm-0 /dev/vda3        lvm  swap                  2G [SWAP]
  `-/dev/mapper/rhel_rhel8-root /dev/dm-1 /dev/vda3        lvm  xfs                16.4G /
/dev/vdb                        /dev/vdb                   disk xfs                  10G /var/www
/dev/vdc                        /dev/vdc                   disk LVM2_member       102.4M

But with recent linux kernel, even with scsi_mod.scan=sync, the detection of devices at boot time is not deterministic.

So a reboot might change vda to vdb, all the other layout remains the same.

Thus, the cron job that runs checklayout rebuilds a rescue ISO that is not necessary.

As you said, everything in rear recovery is rebuilt, but has a strict mapping to disk names, and even in this case, the migration mode is able to recover. But, the issue here is not the recover, but the useless rebuild of rescue ISO.

The comparison of layout might exclude the names.
This is of course an enhancement of current behavior and needs some discussion (without AI :).

  • The layout file might include a generic name for the disk
  • or the diff somehow does not take the disk name into account.
    Thanks !

Benoit

jsmeix commented at 2025-07-10 13:21:

My totally untested offhanded spontaneous idea
what might be doable with reasonable effort in ReaR and
what does not try to let ReaR make automated decisions
is the following:

Let the user specify in his etc/rear/local.conf
those kernel disk device nodes in his specific environment
where differences of those kernel disk device nodes
in disklayout.conf do not matter for him.

E.g. when the user specified in his local.conf something like

INTERCHANGEABLE_VALUES=( '/dev/sda /dev/sdb' '/dev/vda /dev/vdb' )

then ReaR could "mindlessly" replace
/dev/sda and /dev/sdb by one same unique dummy value XXX and
/dev/vda and /dev/vdb by another same unique dummy value YYY
in the layout files before they get compared by "rear checklayout"
(cf. layout/compare/default/500_compare_layout.sh)
so differences in those specified kernel device nodes
would no longer be noticed.
But a change from /dev/sda to /dev/vda would be noticed
in this example - to get all of them interchangeable

INTERCHANGEABLE_VALUES=( '/dev/sda /dev/sdb /dev/vda /dev/vdb' )

would do it.

My point is that this way we don't need to implement
"intelligence" in ReaR (which will fail in some cases
so it could result an ultimate disaster for the user
instead of a recovery from a disaster, cf. the
MIGRATION_MODE description in default.conf) and
we keep the user in full control over what ReaR does for him.

pcahyna commented at 2025-07-10 14:29:

I understand the motivation for the request, but I am afraid that "to exclude the name of the device in the comparison of disk layout file, only the concrete internal disk layout must be stable (partition size, LV size/name)" will be hard to work out in practice (regardless of any rules/configuration variables to determine what can safely swap with what). The problem is that you also need to preserve the relation between partitions (and other objects) and disks. So supposing that you have this layout:

# Disk /dev/vda
# Format: disk <devname> <size(bytes)> <partition label type>
disk /dev/vda 10737418240 msdos
# Partitions on /dev/vda
# Format: part <device> <partition size(bytes)> <partition start(bytes)> <partition type|name> <flags> /dev/<partition>
part /dev/vda 1073741824 1048576 primary boot /dev/vda1

one sees that there is a disk and a partition, but that's not all, it also specifies that the partition is on that disk. If we strip all the /dev/vda parts, we will have something like disk 10737418240 msdos and part 1073741824 1048576 primary boot 1, but how do we know that the two belong together?
We could use the ordering in the file, but now we are imposing an additional constraint on the format.

Generally speaking, disk device reordering throws a monkey wrench into many of the layout concepts used by ReaR and I would say that the impact on checklayout is one of the lesser concerns.

jsmeix commented at 2025-07-10 14:54:

@pcahyna
did you perhaps misunderstand that my porposal in
https://github.com/rear/rear/issues/3495#issuecomment-3057439000
is only meant to help the user with "rear checklayout"
but it is not meant for "rear recover"?

For "rear recover" the current MIGRATION_MODE implementation
works sufficiently at least for the sufficiently simple
use case of @bwelterl where the automated disk mapping
based on disk size should work sufficiently in practice
in particular when each disk size is unique and when on
the replacement hardware the disks have exact same size
as on the original system (which is easily possible
with virtual disks).

Yes, the current MIGRATION_MODE implementation
has known severe insufficiencies in certain cases,
for example https://github.com/rear/rear/issues/3477
and https://github.com/rear/rear/issues/3473

pcahyna commented at 2025-07-10 14:57:

@jsmeix I don't think I misunderstood anything here - my reply was only about the "checklayout" part and was not meant to imply anything about "recover" at all, except for the general remark at the end, which was unrelated to your proposal.

pcahyna commented at 2025-07-10 14:59:

Actually, my whole reply is unrelated to your proposal.

gdha commented at 2025-07-11 06:45:

Why not remove the checklayout job in cron? A weekly mkbackup job is more than sufficient.

pcahyna commented at 2025-07-15 14:01:

@gdha exactly my thought.

The problem apparently occurs when using the crontab that used to be the part of the package and was removed here: 89a8f18ec402b439caf4800421644f5bf5d174e5 and contained this:
/usr/sbin/rear checklayout || /usr/sbin/rear mkrescue.
We used to have it in the RHEL package until RHEL 8 and we also removed it in RHEL 9. I believe that having this crontab was a bad idea given that it does not perform any backup by itself, so having it as the default merely gives a false sense of security (ReaR is being scheduled periodically, but it does not perform an actual backup).
I think the proper solution for the affected user should be to remove the crontab or to adapt it so that it does what they need, which should in any case include generating a rescue image whenever the backup gets produced - not more often, not less. Generating the rescue image without producing a backup at the same time risks running into consistency issues.
And indeed mkbackup is the way to do it, if an internal backup method (like NETFS) is being used.

jsmeix commented at 2025-07-23 13:35:

I think this issue is sufficiently discussed
and a proper way what to do was found which is:

Do no longer use the deprecated (and by upstream removed)

30 1 * * * /usr/sbin/rear checklayout || /usr/sbin/rear mkrescue

daily cron job.

Instead do the following:
Run "rear mkbackup" when needed
in particular after each change of the basic system
(for example after a change of the disk layout)
or
when a system backup is made by third party backup software
then also run "rear mkrescue".

bwelterl commented at 2025-07-23 13:47:

Thanks for all answers.

Why not remove the checklayout job in cron? A weekly mkbackup job is more than sufficient.

We can't pronounce about the needs, it depends on the use.

There are 2 subjects here:

  • the cron job, that I understand is deprecatd
  • the use of checklayout command.

This is the subject. Is it still useful and supported if a cron job is not usable ?
And the question is to adapt it to avoid unexpected rescue/backup if only a swap between 2 disks happened.

The initial improvement remains the same, with or without cron job, once per week or per day does not matter.

Maybe a output of checklayout to a customized path, and then an external tool to compare and eventually alert the user: a warning if only disk name changed.

Thanks !

Benoit

jsmeix commented at 2025-07-23 14:08:

@bwelterl
I understand your request.

I could try to implement a proof of concept of my
https://github.com/rear/rear/issues/3495#issuecomment-3057439000
to see whether or not it could work out in practice
BUT ONLY IF the other
@rear/contributors
support my proposal in general
because I do not want to waste my time
for something that would not be accepted by them.

bwelterl commented at 2025-07-23 14:33:

Thanks Johannes for your reply, and sorry for the delay of my answers.

Your proposal is certainly the easiest: generate a temporary layout file with interchangeable values (defined in the var) changed to a generic name, so we can keep the same diff logic.
We can even add 2 levels:

  • if a diff is seen in original layout file
  • if a INTERCHANGEABLE_VAR is not defined, return 1
  • else generate temporary layout file with generic value instead of INTERCHANGEABLE_VAR
    • if a diff is seen, return 1
    • else print a warning

I might try to implement that.

Thank you !

Benoit

jsmeix commented at 2025-07-23 14:50:

Yes, something like that is also my idea.

As a possible template see the code in the function
apply_layout_mappings() in lib/layout-functions.sh
how one could replace words by unique generic words.
I think the code in apply_layout_mappings() is more
complicated than what is needed for this issue here
but it may help as a possible template how to do it.
I mean in particular to have a look at the sed commands
which may provide hints what to have in mind when replacing
in particular higher level disk device node values.

So for e.g.

INTERCHANGEABLE_VALUES=( '/dev/sda /dev/sdb' '/dev/vda /dev/vdb' )

both words /dev/sda and /dev/sdb would be replaced
by the same unique generic word _REAR0_
and both words /dev/vda and /dev/vdb would be replaced
by the same unique generic word _REAR1_
in the two disklayout files before they get compared.


[Export of Github issue for rear/rear.]