#2787 Issue open: Rescue ISO is dependant of files on the backup, for example /etc/fstab. The cron.d job will break the coherency

Labels: enhancement

bwelterl opened issue at 2022-04-07 13:38:

Relax-and-Recover (ReaR) Issue Template

Fill in the following items before submitting a new issue
(quick response is not guaranteed with free support):

  • ReaR version ("/usr/sbin/rear -V"):
    rear 2.6

  • OS version ("cat /etc/os-release" or "lsb_release -a" or "cat /etc/rear/os.conf"):
    RHEL8

  • ReaR configuration files ("cat /etc/rear/site.conf" and/or "cat /etc/rear/local.conf"):

  • Hardware vendor/product (PC or PowerNV BareMetal or ARM) or VM (KVM guest or PowerVM LPAR):

  • System architecture (x86 compatible or PPC64/PPC64LE or what exact ARM device):

  • Firmware (BIOS or UEFI or Open Firmware) and bootloader (GRUB or ELILO or Petitboot):

  • Storage (local disk or SSD) and/or SAN (FC or iSCSI or FCoE) and/or multipath (DM or NVMe):

  • Storage layout ("lsblk -ipo NAME,KNAME,PKNAME,TRAN,TYPE,FSTYPE,LABEL,SIZE,MOUNTPOINT"):

  • Description of the issue (ideally so that others can reproduce it):

It's related to the issue https://github.com/rear/rear/pull/2786

To reproduce the issue, change the UUID of the UEFI partition:

disklayout.conf before the change:

fs /dev/vda1 /boot/efi vfat uuid=ACC0-624F label= options=rw,relatime,fmask=0077,dmask=0077,codepage=437,iocharset=ascii,shortname=winnt,errors=remount-ro

fstab before the change:

UUID=ACC0-624F          /boot/efi               vfat    umask=0077,shortname=winnt 0 2

Then change the label:

mlabel -N 12345678 -i /dev/vda1 ::

Then, the cron.d job (or new systemd job) will recreate the rescue iso because the layout changed:

30 1 * * * root test -f /var/lib/rear/layout/disklayout.conf && /usr/sbin/rear checklayout || /usr/sbin/rear mkrescue

New disklayout.conf:

options=rw,relatime,fmask=0077,dmask=0077,codepage=437,iocharset=ascii,shortname=winnt,errors=remount-ro

But the /etc/fstab file that will be used during the recover, and that should be patched, is still the one in the backup, thus the old one with old UUID.
And the patch will fail ... (without a warning :D).
And the system will not boot.

  • Workaround, if any:

A rescue ISO should not be built if the required data in the backup is not updated also.

The recover of the system layout should rely on the data in the rescue ISO and not on the backup.
Thus all the files that should be patched to update the UUID should be embedded in the rescue.

Thanks !

  • Attachments, as applicable ("rear -D mkrescue/mkbackup/recover" debug log files):

jsmeix commented at 2022-04-07 14:03:

@bwelterl
thank you for your enlightning issue report!

I have to think a bit deeper about it - it looks complicated.

jsmeix commented at 2022-04-07 14:11:

Problem summary as far as I see it:

What should ReaR do when UUIDs in restored files like /etc/fstab
do not exist on the recreated system?

Cf.
https://github.com/rear/rear/issues/2785#issuecomment-1091773765
(excerpt):

When the filesystems get recreated with the UUPDs from disklayout.conf
old and new UUIDs are the same in FS_UUID_MAP (which is the usual case)
so finalize/GNU/Linux/280_migrate_uuid_tags.sh won't change anything.

So the recreated UUIDs are normally consistent with what is in disklayout.conf
but they can be inconsistent with what there is in restored config files.

So comparing UUIDs in disklayout.conf with the actually recreated UUIDs
(e.g. via FS_UUID_MAP) won't help to detect that inconsistency.

My immediate idea is a simple test as a start:
When a actually recreated UUID does not appear in a config file
it could indicate that something is fishy with this particular UUID
or all is well because that kind of UUID is never used in a config file.

pcahyna commented at 2022-04-07 14:16:

The ReaR development version got rid of the cron job some time ago already, #2139

bwelterl commented at 2022-04-07 14:20:

@pcahyna Yes (still in RHEL8 :D) but the issue remains with systemd job !

pcahyna commented at 2022-04-07 14:25:

yeah, I know that the cron job is in RHEL 8.

concerning systemd job, I don't see any in the sources right now and this issue shows that it may be a bad idea.

pcahyna commented at 2022-04-07 14:26:

Fedora ships the cron job and systemd job examples as documentation

jsmeix commented at 2022-04-07 15:00:

Whether or not there is a cron job doesn't actually matter here
i.e. it doesn't matter how "rear mkrescue" got launched.

The issue is that there can be inconsistencies between
what gets recreated according to disklayout.conf
and what gets restored from the backup and
currently ReaR cannot detect that because the current
FS_UUID_MAP stuff only detects when there are inconsistencies
between what is in disklayout.conf and what was actually recreated
(i.e. different UUID recreated than what is in disklayout.conf) so
currently ReaR cannot migrate restored files from the backup
in most cases because nowadays filesystem UUIDs are normally
successfully recreated with the UUIDs in disklayout.conf

pcahyna commented at 2022-04-07 15:22:

If the backup is taken after mkrescue, one can detect the problem, by running checklayout before taking the backup.
But if the backup is taken before mkrescue (and that's the case of the cron job that triggered the problem here), I don't see how to detect it easily.

jsmeix commented at 2022-04-08 13:20:

I got distracted by other things and now weekend is almost there...

My primary intent is not to let ReaR automatically fix such inconsistencies
but to only detect them which is what @bwelterl intends with his
https://github.com/rear/rear/pull/2786

For internal backup methods we may run what checklayout does
during mkbackuponly to avoid that an inconsistent backup
is taken with mkbackuponly after mkrescue.

When an inconsistent backup is taken with an external backup method
after mkrescue there is nothing we can do to detect or avoid that.

When an inconsistent backup was taken before mkrescue
there is nothing we can do during mkrescue because we cannot know
if a backup was taken before and what there is in the backup.

So all we have during rear recover is what there is in disklayout.conf
and what is actually recreated versus what there is in the restored files.

So all what we can do after the backup was restored during rear recover
is to compare what there is in disklayout.conf and what is actually recreated
versus what there is in the restored files, cf. my above
https://github.com/rear/rear/issues/2787#issuecomment-1091793474

This are currently only offhanded ideas in my head.
I will have to try out something and play around with it
to get some better understanding how things really behave.

@pcahyna @bwelterl @rear/contributors
I wish you all a relaxed and recovering weekend!

jsmeix commented at 2022-04-08 13:32:

I think it is no bug in ReaR that things in the recovery system
must be consistent with what there is in the backup.

In some cases we may avoid inconsistencies
e.g. avoid that an inconsistent backup is taken with mkbackuponly after mkrescue
cf. https://github.com/rear/rear/issues/2787#issuecomment-1092853451
but I think in most cases we cannot do anything in ReaR to avoid
inconsistencies between recovery system and what there is in the backup
so it is mostly up to the user to keep recovery system and backup in sync.

pcahyna commented at 2022-04-11 12:27:

I have been thinking about how to help the user to avoid such inconsistencies.

For internal backup methods we may run what checklayout does
during mkbackuponly to avoid that an inconsistent backup
is taken with mkbackuponly after mkrescue.

When an inconsistent backup is taken with an external backup method
after mkrescue there is nothing we can do to detect or avoid that.

We can advise the user to run checklayout before (and possibly after) their external backup procedure to imitate what the internal backup would do.

When an inconsistent backup was taken before mkrescue
there is nothing we can do during mkrescue because we cannot know
if a backup was taken before and what there is in the backup.

This case is a more difficult one. It could be solved by saving the layout at the time mkbackuponly is executed to some place under /var/lib/rear and have a special variant of checklayout that would check the current layout not against the layout saved by mkrescue (under /var/lib/rear/layout) but against the layout saved by mkbackuponly. In essence have two saved layouts, one by mkrescue, the other by mkbackuponly, and when each of the two commands detect a difference from the layout saved by the other command, it is an inconsistency, thus abort.

pcahyna commented at 2022-04-11 12:31:

Another option would be to embed some files in the rescue image:

Rear is checking some files to update the UUID during the recovery, these files should be embedded in the rescue ISO.

and restore them from the rescue image instead of from the backup. This of course opens a potential can of worms: one could then gradually embed all the system in the rescue image, and we would not have a separate backup anymore (that's the approach we don't want to take in general, I suppose). But embedding only files that need to be edited in the recovered system could be an acceptable and well-defined subset.

jsmeix commented at 2022-04-11 13:08:

I am currently working on some initial simple attempt to at least be able
to detect when something went wrong or looks wrong with UUIDs.
An initial pull request will come soon so you can have a look...

jsmeix commented at 2022-04-11 13:59:

@pcahyna
yes, we must make it very clear to the user that he is responsible
to keep his backup in sync with his ReaR recovery system and
that he can and should use "rear checklayout" to check it.

In general he should make an up-to-date backup
when he made a new ReaR recovery system to be
on the safe side or that he should in general make the
backup and the ReaR recovery system at the same time
in one single run to be on the safe side.

Simply put
he should use "rear mkbackup" to be on the safe side
or the equivalent for external backup tools i.e.
"rear mkrescue && make a new external backup".

pcahyna commented at 2022-04-11 14:09:

When an inconsistent backup is taken with an external backup method
after mkrescue there is nothing we can do to detect or avoid that.

Some external backup methods are executed from ReaR : looking at usr/share/rear/backup, that's at least BORG, DUPLICITY, TSM. Although it depends on the choice of terminology: maybe you consider backups that are triggered from inside ReaR internal and leave the term "external" only for backups that are taken completely outside ReaR.

jsmeix commented at 2022-04-12 12:40:

I used the term "external backup" in a sloppy way when I basically
meant that the backup is made outside of ReaR.

Regarding TSM backup via "rear mkbackup":
I guess this is not often used by TSM users.
I think most TSM users use a native TSM method to make their backup.
I assume most TSM users have TSM integrated in whatever bigger stuff
so they do their backups via that bigger stuff (e.g. whatever backup
management solution or something like that).
So I think that in most cases a TSM backup is made outside of ReaR.
But I am not a TSM user so this is only a guess and I could be wrong.

bwelterl commented at 2022-04-12 12:45:

I think in customer solution, TSM is used to backup data, and rear the system (what is expected), but then rear is using TSM also to save the rear backup.
I think there is no solution to restore an OS in TSM like rear.

pcahyna commented at 2022-04-13 16:19:

TSM is used to backup data, and rear the system (what is expected)

Sure, but the question was, is the TSM backup of system files made by invoking rear mkbackup(only), or outside of ReaR's control?

then rear is using TSM also to save the rear backup

Now I don't understand what you mean, is TSM used to save the recovery media (ISO image)? Or what is "the rear backup" in this context, do you mean the backup of system files?

I guess this is not often used by TSM users.
I think most TSM users use a native TSM method to make their backup.
I assume most TSM users have TSM integrated in whatever bigger stuff
so they do their backups via that bigger stuff (e.g. whatever backup
management solution or something like that).
So I think that in most cases a TSM backup is made outside of ReaR.
But I am not a TSM user so this is only a guess and I could be wrong.

I am not a TSM user either, so I am interested in @bwelterl 's (or @rmetrich 's) opinion, in case they know some representative sample of ReaR + TSM users. Is the above true? As a counterexample, issue #2762 is from a user who found that he needed to modify usr/share/rear/backup/TSM/default/500_make_TSM_backup.sh, so presumably he is using ReaR to take TSM backup.

@xslima00 is my understanding correct that you take TSM backups of the system by running rear mkbackup or rear mkbackuponly ?

xslima00 commented at 2022-04-13 20:10:

@pcahyna

Hello,

TSM backups can be triggered by Rear or outside of Rear control. In both cases is backup incremental. So in case it´s triggered outside and rear mkbackup in next hour (we use it like this) it will simply build backup ISO and another incremental backup which finish within seconds (just new ISO is uploaded to TSM server and very few files). Problem of standalone TSM backup is you cannot boot from it. That´s why we use Rear. In case of crash TSM admin restore ISO to some location, I will boot from it and make rear restore (via TSM client).

issue 2762 - in standard situation there is no need to modify scripts. But we want to backup TSM server itself to another TSM server. That´s why was necessary perform workaround I described to force TSM client backup to correct server.

jsmeix commented at 2022-05-05 11:02:

With https://github.com/rear/rear/pull/2795 merged
we have code that might be also used to solve this issue here.

While https://github.com/rear/rear/pull/2795
detects when restored basic system files
do not match the recreated system during "rear recover"
the ultimate goal of this issue here is to detect inconsistencies beween
basic system files in the backup and what is stored in the recovery system
i.e. detect inconsistencies early between "rear mkrescue" and "rear mkbackuponly"
and not detect inconsistencies (possibly too) late during "rear recover".

pcahyna commented at 2022-05-05 13:20:

I believe we should leave this issue open. #2795 introduces detection of inconsistency at recovery time, when it is a bit too late. The proper fix would be to avoid inconsistency before it bites, by detecting it during mkrescue & mkbackuponly and ideally introducing a savelayout-like command that one could use for consistency enforcement when an external backup method is used instead of mkbackuponly.
EDIT: which is basically what you wrote.

jsmeix commented at 2022-05-05 13:27:

Yes, this issue will (of course) not be done for the (already delayed) ReaR 2.7.
After the ReaR 2.7 release I intend to have a look - as time permits.

github-actions commented at 2022-07-05 03:26:

Stale issue message

pcahyna commented at 2022-07-13 07:41:

Well, this does not look "completed", reopening.

jsmeix commented at 2022-07-19 11:40:

Regarding how to do a basic check that a backup via "rear mkbackuponly"
is consistent with a "rear mkrescue" that had been done before:

I noticed right now during "rear mkbackuponly"

# usr/sbin/rear -D mkbackuponly
...
Running 'layout/save' stage ======================
Creating disk layout
Disabling excluded components in /var/tmp/rear.sbMwPY4tMjELAQ0/tmp/backuplayout.conf
Using sysconfig bootloader 'grub2' for 'rear recover'
Verifying that the entries in /var/tmp/rear.sbMwPY4tMjELAQ0/tmp/backuplayout.conf are correct
Created disk layout (check the results in /var/tmp/rear.sbMwPY4tMjELAQ0/tmp/backuplayout.conf)
Running 'backup' stage ======================
...

so it creates a current disk layout in .../tmp/backuplayout.conf
because of

DISKLAYOUT_FILE=$TMP_DIR/backuplayout.conf

in lib/mkbackuponly-workflow.sh
but as far as I see on first glance it does not compare
the current disk layout in .../tmp/backuplayout.conf
with an existing disk layout in var/lib/rear/layout/disklayout.conf

I have no idea why "rear mkbackuponly" creates a disk layout at all
but when it creates one anew which does not match an existing one
in var/lib/rear/layout/disklayout.conf it indicates that something
is wrong so "rear mkbackuponly" may error out to be on the safe side.

In my case both are same except comments with dates:

# diff -U0 var/lib/rear/layout/disklayout.conf /var/tmp/rear.sbMwPY4tMjELAQ0/tmp/backuplayout.conf
--- var/lib/rear/layout/disklayout.conf 2022-07-19 13:22:12.957330441 +0200
+++ /var/tmp/rear.sbMwPY4tMjELAQ0/tmp/backuplayout.conf 2022-07-19 13:23:35.749330441 +0200
@@ -1 +1 @@
-# Disk layout dated 20220719132209 (YYYYmmddHHMMSS)
+# Disk layout dated 20220719132330 (YYYYmmddHHMMSS)
@@ -37 +37 @@
-#        Update Time : Tue Jul 19 13:21:43 2022
+#        Update Time : Tue Jul 19 13:23:34 2022

(the "Update Time" is from a mdadm output comment for a RAID array)

pcahyna commented at 2022-07-25 11:01:

I have no idea why "rear mkbackuponly" creates a disk layout at all

I believe this is needed to determine what gets included/excluded in the backup.

but when it creates one anew which does not match an existing one
in var/lib/rear/layout/disklayout.conf it indicates that something
is wrong so "rear mkbackuponly" may error out to be on the safe side.

yes, this is a good idea and probably easy to do. Doing the opposite (detecting at mkrescue time if we are consistent with what we saved during mkbackuponly) is more difficult. I outlined a possible approach above:

It could be solved by saving the layout at the time mkbackuponly is executed to some place under /var/lib/rear and have a special variant of checklayout that would check the current layout not against the layout saved by mkrescue (under /var/lib/rear/layout) but against the layout saved by mkbackuponly. In essence have two saved layouts, one by mkrescue, the other by mkbackuponly, and when each of the two commands detect a difference from the layout saved by the other command, it is an inconsistency, thus abort.

The fact that mkbackuponly saves the layout under $TMP_DIR makes it easier (we need to let it save to a persistent location and extend it).

github-actions commented at 2022-10-11 03:58:

Stale issue message

github-actions commented at 2022-12-11 02:40:

Stale issue message

github-actions commented at 2023-02-11 02:30:

Stale issue message

github-actions commented at 2023-04-30 02:20:

Stale issue message


[Export of Github issue for rear/rear.]