#2795 PR merged: Verify file hashes at the end of recover after file restore from backup

Labels: enhancement, fixed / solved / done

pcahyna opened issue at 2022-04-26 16:24:

Pull Request Details:
  • Type: Enhancement

  • Impact: Normal

  • Reference to related issue (URL): #2785

  • How was this pull request tested?
    On RHEL 8.5: rear mkbackuponly, change of UUID of /boot and remount, rear mkrescue and recovery.
    After recovery, the system does not boot, as it can not mount /boot. After manual fixing of /etc/fstab, checked the recovery log - it contains

2022-04-26 10:17:49.073321433 Including finalize/default/060_compare_files.sh
2022-04-26 10:17:49.082885638 Some configuration files in the restored system do not match the saved layout!
  • Brief description of the changes in this pull request:

Configuration files can get out-of-sync with the recreated layout. This leads to inconsistent configuration, e.g. filesystem not mountable at boot, if they are mounted by UUIDs, and failures to migrate UUIDs in files (if needed).

To detect this problem, use the saved md5sums of configuration files (originally used for "checklayout") to verify the restored files and inform the user in the case of a mismatch.

No attempt is made to repair the problem; the user is informed, but they must fix the problem manually.

If a file may need a migration of UUIDs, then it is very likely prone to get out-of-sync with the layout if it gets modified. Add all such files to the list of files whose hashes are recorded in the saved layout to detect such problems.

Introduce a variable called FILES_TO_PATCH_PATTERNS to hold the list of these files (shell glob patterns, actually), as it is now needed in multiple places.

Change finalize/GNU/Linux/250_migrate_disk_devices_layout.sh to use FILES_TO_PATCH_PATTERNS as well.

Supersedes #2788 , #2786

TODO: the log says that files have changed, but it does not say which files. The md5sum output contains this information, but it does not appear in the log. What is the best way to show the output in the log?

pcahyna commented at 2022-04-26 16:26:

Suggestions for alternative wording of the message (should it be made more prominent, e.g. by prefixing it with "WARNING!"?) and an alternative name for FILES_TO_PATCH_PATTERNS are welcome.

jsmeix commented at 2022-04-27 09:48:

In general regarding non fatal errors and LogPrintError usage:

In case of non fatal errors they should be prefixed with "Error: "
in the LogPrintError message, e.g. see the commit message in my
https://github.com/rear/rear/pull/2788/commits/cf96b605420a24514a96bf543977a1ba70f56343

decide for the user that ... is an error so show it as "Error" to the user, cf.
https://schlomo.schapiro.org/2015/04/warning-is-waste-of-my-time.html

By the way:
LogPrintError does not automatically add an "Error: " prefix
because LogPrintError is intended to be also used for
"error-like" messages when the user must decide if that means
a real error in his case, see the comment at the LogPrintError function
in rear/lib/_input-output-functions.sh which is currently online at
https://github.com/rear/rear/blob/master/usr/share/rear/lib/_input-output-functions.sh#L492

jsmeix commented at 2022-04-27 10:09:

In general how to show command stdout and stderr output
on the users's terminal and also have it in the log file,
see backup/BORG/default/500_make_backup.sh
where the general form is
(including stdin redirection from the users's keyboard)

COMMAND 0<&6 1>> >( tee -a "$RUNTIME_LOGFILE" 1>&7 ) 2>> >( tee -a "$RUNTIME_LOGFILE" 1>&8 )

For the reason why this is used as general form
see the lengthy comments at
https://github.com/rear/rear/pull/2382#pullrequestreview-403320309

Perhaps this could be done in a simpler and more straightforward way
but I don't have the needed expert knowledge in this area
to fully understand what actually goes on with such redirections.

pcahyna commented at 2022-04-27 10:39:

In general regarding non fatal errors and LogPrintError usage:

In case of non fatal errors they should be prefixed with "Error: " in the LogPrintError message, e.g. see the commit message in my cf96b60

decide for the user that ... is an error so show it as "Error" to the user, cf.
https://schlomo.schapiro.org/2015/04/warning-is-waste-of-my-time.html

By the way: LogPrintError does not automatically add an "Error: " prefix because LogPrintError is intended to be also used for "error-like" messages when the user must decide if that means a real error in his case, see the comment at the LogPrintError function in rear/lib/_input-output-functions.sh which is currently online at https://github.com/rear/rear/blob/master/usr/share/rear/lib/_input-output-functions.sh#L492

I agree with displaying it as Error: because while not fatal, the condition is likely serious and and often needs manual intervention.

jsmeix commented at 2022-04-27 10:43:

In general regarding what to do in case of a FAILED md5sum
cf. skel/default/etc/scripts/system-setup

# In case of a FAILED md5sum inform the user:
echo "Possibly corrupted Relax-and-Recover rescue system"

Here plain echo is used because during recovery system startup
we do not have ReaR functions (rear is not running).

When something went wrong during "rear recover" we at least
show the issue to the user (which is done in this pull request)
and sometimes we let the user confirm it via an appropriate UserInput call
or we may force the user to make decisions via an appropriate user dialog.

When restored config files are inconsistent with the recreated system
I think we should not continue "rear recover" but instead force
the user to make decisions via an appropriate user dialog.
Usually one user dialog option is to
"Use Relax-and-Recover shell and return back to here"
so the user could manually fix things before proceeding.

This is not needed in this pull request here.
Better enhance it step by step with separated pull requests.

pcahyna commented at 2022-04-27 16:12:

In general how to show command stdout and stderr output on the users's terminal and also have it in the log file, see backup/BORG/default/500_make_backup.sh where the general form is (including stdin redirection from the users's keyboard)

COMMAND 0<&6 1>> >( tee -a "$RUNTIME_LOGFILE" 1>&7 ) 2>> >( tee -a "$RUNTIME_LOGFILE" 1>&8 )

For the reason why this is used as general form see the lengthy comments at #2382 (review)

Perhaps this could be done in a simpler and more straightforward way but I don't have the needed expert knowledge in this area to fully understand what actually goes on with such redirections.

What I don't understand: https://github.com/rear/rear/wiki/Coding-Style#what-to-do-with-stdin-stdout-and-stderr says "Because stdout and stderr are redirected into ReaR's log file (...)", but I don't see the output from md5sum in the log file anywhere (with my current code).

And according to the linked discussion, one should be able to use 2> instead of 2>>, am I right?

jsmeix commented at 2022-04-28 07:59:

In general regarding 1> and 2> versus 1>> and 2>>:

See usr/sbin/rear

# To be on the safe side append to the log file '>>' instead of plain writing to it '>'
# because when a program (bash in this case) is plain writing to the log file it can overwrite
# output of a possibly simultaneously running process that likes to append to the log file
# (e.g. when a background process runs that also uses the ReaR log file for logging).

In general all log messages must be always appended
because previous log messages must never be overwritten
(it is the whole point of a log that existing entries are sacrosanct).

For better readability here a quotation what I had written in
https://github.com/rear/rear/pull/2382#pullrequestreview-403320309
(excerpts):

I think when using process substitution we must ensure
to append to the filename that is the result of the process substitution
because when we do not append we overwrite existing things in that file.
So I think we must append via
2>> >( COMMAND )
instead of plain writing via
2> >( COMMAND )
...
Meanwhile I think when using process substitution >( COMMAND )
we could use both 2>> >( COMMAND ) and 2> >( COMMAND )
because >( COMMAND ) evaluates to a file name where to the
current bash writes and where from COMMAND gets its stdin.
Usually that intermediate file is a named pipe (FIFO).
I think it does not matter if the current bash writes or appends to
that intermediate file because the current bash is a single process
so that both methods result that all is written sequentially and
nothing is overwritten by another process (i.e. the current bash is
the only writer to that intermediate file).
But what matters is that COMMAND appends to the ReaR log file
because when using process substitution COMMAND is run
asynchronously as another process beside the current bash.
Because the current bash has stdout redirected to the log file
now the process substitution 2> >( COMMAND ) also writes
asynchronously to the ReaR log file at the same time while
the current bash stdout appends to the ReaR log file.
With
COMMAND = tee -a $RUNTIME_LOGFILE
I am sure it appends to the ReaR log file.

So

2>> >( tee -a $RUNTIME_LOGFILE )

is simple (just use append mode everywhere in case of log messages)
while in contrast

2> >( tee -a $RUNTIME_LOGFILE )

has a complicated reasoning behind why that could also work.

Because I don't see a reason why 2> is better here than 2>>
I would prefer to
"just use append mode everywhere in case of log messages".

Or is there a reason why 2> is better here than 2>> ?

jsmeix commented at 2022-04-28 08:36:

@pcahyna
regarding your "I don't see the output from md5sum in the log file":

See usr/sbin/rear

# In debug modes stdout and stderr are redirected to the log
# cf. https://github.com/rear/rear/issues/2416
# In non-debug modes (in particular also in verbose mode)
# stdout and stderr are redirected to a temporary file if possible
# i.e. when TMP_DIR exists - it does not exist for the 'help' workflow
...
# In non-debug modes the log cannot contain possibly false alarm messages
# cf. https://github.com/rear/rear/issues/2416
# but in non-debug modes stdout and stderr of all programs is still available
# for the Error function to extract some latest messages
# cf. https://github.com/rear/rear/issues/2623

I updated
https://github.com/rear/rear/wiki/Coding-Style#what-to-do-with-stdin-stdout-and-stderr

Thank you for pointing out the outdated info in
https://github.com/rear/rear/wiki/Coding-Style

pcahyna commented at 2022-04-28 13:41:

I changed the error message in de9a211b557f97ef08a2ecc2029f3f543a821a57.

pcahyna commented at 2022-04-28 13:47:

@jsmeix I think I addressed all your review comments except for adding crypttab to the list of file patterns.

pcahyna commented at 2022-04-28 13:50:

@bwelterl can you please have a look if you are interested? This should address your comment https://github.com/rear/rear/pull/2788#discussion_r850477309

pcahyna commented at 2022-04-28 15:41:

I repeated the test with the updated code. The log now has:

2022-04-28 11:26:08.986818408 Including finalize/default/060_compare_files.sh
/etc/rear/local.conf: FAILED
/etc/mtab: FAILED open or read
/etc/fstab: FAILED
md5sum: /etc/mtab: No such file or directory
md5sum: WARNING: 1 listed file could not be read
md5sum: WARNING: 2 computed checksums did NOT match
2022-04-28 11:26:09.002365987 Error: configuration files shown above in the restored system do not match the saved layout!

In the test I changed local.conf between mkbackuponly and mkrescue. It is a question if the check should fail because of this mismatch, but if not, probably /etc/rear/ should not be in CHECK_CONFIG_FILES at all.
The mtab error is more serious, I am afraid it will appear every time, making the check practically useless - do we really want to patch UUIDs in mtab ? ( On this system, /etc/mtab is a symlink to ../proc/self/mounts`. )

jsmeix commented at 2022-04-29 09:27:

Regarding https://github.com/rear/rear/pull/2795#issuecomment-1112363240

Regarding
FILES_TO_PATCH_PATTERNS=" ... [e]tc/mtab ..."

I have
on SLES 10 SP4

# file /etc/mtab 
/etc/mtab: ASCII text

on SLES11 SP4

# file /etc/mtab
/etc/mtab: ASCII text

on SLES12 SP5

# file /etc/mtab
/etc/mtab: symbolic link to ../proc/self/mounts

on SLES15 SP3

# file /etc/mtab
/etc/mtab: symbolic link to ../proc/self/mounts

In curent ReaR master code

# find usr/share/rear -type f | xargs grep 'mtab'

shows several scripts where we have special mtab handling
in particular the finalize scripts

finalize/GNU/Linux/250_migrate_disk_devices_layout.sh
finalize/GNU/Linux/250_migrate_lun_wwid.sh
finalize/GNU/Linux/320_migrate_network_configuration_files.sh
finalize/GNU/Linux/260_rename_diskbyid.sh
finalize/GNU/Linux/280_migrate_uuid_tags.sh

mention the symlink target of etc/mtab is /mnt/local/proc/12345/mounts

We have special symlink handling in those finalize scripts
that originated in particular for mtab at
https://github.com/rear/rear/pull/2048
"Do not patch $TARGET_FS_ROOT/etc/mtab if symbolic link"
which means on the other hand that
we do patch $TARGET_FS_ROOT/etc/mtab if it is not a symbolic link.

Because current ReaR master code supports when etc/mtab is not a symbolic link
I think we should not abandon this support now with this pull request.

So I think special symlink handling is needed for the files
where FILES_TO_PATCH_PATTERNS evaluates to.

Offhandedly I think special handling of symlinks and
other special files (e.g. files in /proc/ /sys/ /dev/ or /run/)
is needed in layout/save/default/600_snapshot_files.sh
that matches the special handling of symlink and other special files
in the above mentioned finalize scripts (perhaps not an exact copy
of what is done in those finalize scripts but something similar).

Simply put:
Offhandedly I think usually (i.e. except possible exceptions)
files that will be skipped when patching in the finalize stage
should also be skipped by this md5sum test here
because this md5sum test here is not and
cannot be meant to verify if all restored config files are OK,
instead this test is only meant to check if certain restored files
of specific interest for "rear recover" are OK.

jsmeix commented at 2022-04-29 09:51:

Regarding https://github.com/rear/rear/pull/2795#issuecomment-1112363240

CHECK_CONFIG_FILES=( ... '/etc/rear/' ... )

# git log --follow -p usr/share/rear/conf/default.conf
shows that this originated at
https://github.com/rear/rear/commit/929279d9cc8a19e683680b900ee2b21f14177808

As usual the old commits do not have any explanatory comment
or a link to an issue that explanains things or something like that
so we cannot know WHY /etc/rear/ was added to CHECK_CONFIG_FILES
which gives us the freedom to do what we like from our current point of view.

Currently I don't know whether or not
/etc/rear/ should be in CHECK_CONFIG_FILES.

Offhandedly my gut feeling is that a changed local.conf
between mkbackuponly and mkrescue calls for possible
problems later during "rear recover" so it should be shown
to the user during "rear recover" so that the user is made aware
that there could be problems because of his changed local.conf

Cf. my meanwhile enhanced part
"It is your task to ensure your backup is consistent"
in the section
"Relax-and-Recover versus backup and restore" in
https://en.opensuse.org/SDB:Disaster_Recovery
that reads in particular:

... after each change of the basic system
... "rear mkbackup" needs to be run to create
a new ReaR recovery system together
with a matching new backup of the files

A changed local.conf indicates a change of the basic system
or the changed local.conf could be itself considered as such a change
of the basic system because ReaR belongs to the basic system
when one considers disaster recovery as a basic system thing.

pcahyna commented at 2022-05-02 11:48:

Here's what the recovery log shows:

/etc/rear/local.conf: FAILED
/etc/fstab: FAILED
md5sum: WARNING: 2 computed checksums did NOT match
2022-05-02 06:17:37.663028084 Error: configuration files shown above in the restored system do not match the saved layout!

(there is now a change to the last log message as you proposed.)
And here are the messages from the log from rear -d mkrescue

2022-05-02 06:11:24.985098966 Including layout/save/default/600_snapshot_files.sh
2022-05-02 06:11:24.995463992 Adding symlink /etc/sysconfig/grub with target /etc/default/grub to CHECK_CONFIG_FILES
2022-05-02 06:11:25.004853338 Skip adding symlink /etc/mtab target /proc/22743/mounts on /proc/ /sys/ /dev/ or /run/ to CHECK_CONFIG_FILES

pcahyna commented at 2022-05-02 12:02:

I believe all the comments are addressed now and I consider the branch ready to be merged (with squashing of commits marked as fixup! in order to not overcomplicate history).

jsmeix commented at 2022-05-02 12:09:

@pcahyna
thank you so much for all your work here!
It became really complicated to get it done correctly.
Feel free to merge it when you like.

pcahyna commented at 2022-05-02 13:43:

@jsmeix yes, it is complicated. When thinking about the symlink support, I realized that there is still a gap in the present approach: what if another file that matches CHECK_CONFIG_FILES or FILES_TO_PATCH_PATTERNS gets added (when it was missing previously)? This won't be detected by md5sum -c, but may represent a change relevant to layout. This may explain why md5sum -c is not used but the complete list of checksums is regenerated and compared in layout/compare/default/510_compare_files.sh : https://github.com/rear/rear/pull/2788#issuecomment-1105249183

jsmeix commented at 2022-05-03 11:51:

@pcahyna
yes, you are right - thank you for your sharp and thorough analysis!

The simple high level goal is to detect all changes in files where
CHECK_CONFIG_FILES and FILES_TO_PATCH_PATTERNS
evaluate to.

When CHECK_CONFIG_FILES and FILES_TO_PATCH_PATTERNS
evaluate to different files this is a change that must be detected.

This means md5sum -c cannot be used here but instead the complete
list of files must be regenerated and their checksums must be regenerated
and compared with the previous list of files and checksums.

This means the same checksum generating code as currently in your
layout/save/default/600_snapshot_files.sh here in this pull request
must be also run to regenerate those checksums in your new
finalize/default/060_compare_files.sh here in this pull request
and also in
layout/compare/default/510_compare_files.sh in master code

jsmeix commented at 2022-05-03 12:58:

@pcahyna
I did a quick https://github.com/rear/rear/pull/2796
which is only meant as a proposal so you could have a look
how things might be done according to
https://github.com/rear/rear/pull/2795#issuecomment-1116010676

I did not plain copy the your checksum generating code in
layout/save/default/600_snapshot_files.sh here in this pull request
but I modified it somewhat - in particular I do not add things
to the user config variable CHECK_CONFIG_FILES
but instead I use a local files_for_md5sum array
because I feel uncertain changing user config variables.

pcahyna commented at 2022-05-04 09:16:

This means the same checksum generating code as currently in your
layout/save/default/600_snapshot_files.sh here in this pull request
must be also run to regenerate those checksums in your new
finalize/default/060_compare_files.sh here in this pull request
and also in
layout/compare/default/510_compare_files.sh in master code

I don't think that any modification of layout/compare/default/510_compare_files.sh (except for replacing the diff -u by a better comparison like cmp and using sort to avoid false warnings when the set of files is unchanged, but the order is changed) is needed. layout/save/default/600_snapshot_files.sh appends to the variable CHECK_CONFIG_FILES, so layout/compare/default/510_compare_files.sh will inherit it ( layout/save is executed before layout/compare ). I now see that it is highly non-obvious, so I should add a comment explaining it.
You are right that if I move to something else than md5sum -c in finalize, the same code needs to be run there (finalize/default/060_compare_files.sh) to regenerate CHECK_CONFIG_FILES.

pcahyna commented at 2022-05-04 09:25:

in particular I do not add things
to the user config variable CHECK_CONFIG_FILES
but instead I use a local files_for_md5sum array
because I feel uncertain changing user config variables.

I understand your feeling, but if we are to use an internal variable for this, I think it should be global, not local (do we have a standard prefix for global variables not exposed to the user?) and set only once in layout/save/default/600_snapshot_files.sh to avoid duplication.

jsmeix commented at 2022-05-04 11:30:

Meanwhile I think I am overcautious because
we often modify config variables when needed
(e.g. there are many COPY_AS_IS+= in the scripts)
so it is OK to append to CHECK_CONFIG_FILES.

We do not have a special syntax for global nopn-config variables.
In https://github.com/rear/rear/wiki/Coding-Style there is only

All variables that are used in more than a single script must be all-caps

jsmeix commented at 2022-05-04 11:42:

I used layout/compare/default/510_compare_files.sh for my proposal in
https://github.com/rear/rear/pull/2796
to not cause conflicts with your
layout/save/default/600_snapshot_files.sh in this pull request here.

pcahyna commented at 2022-05-04 12:06:

I used layout/compare/default/510_compare_files.sh for my proposal in #2796 to not cause conflicts with your layout/save/default/600_snapshot_files.sh in this pull request here.

Ah ok, that's good. I propose to do one adjustment here: split layout/save/default/600_snapshot_files.sh into two files: one would adjust CHECK_CONFIG_FILES according to FILES_TO_PATCH_PATTERNS (my new code) and the other would actually perform the hash calculation (the original code). This would enable us to symlink the former from other places and thus share the code, without having to always execute the latter. Then I would merge it and do other improvements (like the issue I found above, and #2796) separately.

jsmeix commented at 2022-05-04 12:16:

My https://github.com/rear/rear/pull/2795#issuecomment-1114774428
still holds. i.e.:
Feel free to merge it when you like.

pcahyna commented at 2022-05-04 14:02:

@jsmeix thanks a lot for all your helpful review - I now added layout/save/default/490_check_files_to_patch.sh with the new stuff and retested, so if you are fine with the name, I am going to squash the commits and merge.

jsmeix commented at 2022-05-05 07:06:

@pcahyna
thank you for your patience with this complicated and laborious thing!

pcahyna commented at 2022-05-05 13:39:

Regarding

When CHECK_CONFIG_FILES and FILES_TO_PATCH_PATTERNS evaluate to different files this is a change that must be detected.

This means md5sum -c cannot be used here but instead the complete list of files must be regenerated and their checksums must be regenerated and compared with the previous list of files and checksums.

and

I think [e]tc/crypttab should be added here because it contains UUIDs.

these issues remain.

The approach to solve the first one is discussed in https://github.com/rear/rear/pull/2796#issuecomment-1118387393 .
But since it likely is a rare issue, I believe that properly solving #2787 will bring more benefit to users, if we want to continue improving this stuff and need to choose what we will dedicate our time to.

jsmeix commented at 2022-05-05 13:55:

Via
https://github.com/rear/rear/commit/58ddf3e094456df3c41f0b2089ee962d34836da6
I dared to "just add right now" [e]tc/crypttab
to FILES_TO_PATCH_PATTERNS in default.conf
because I cannot imagine a reason why it should not be checked
and patched if a UUID changed therein - i.e. I cannot imagine
what could go wrong or could be worse when it is checked
(and patched if needed) compared to ignoring it (as it was before).


[Export of Github issue for rear/rear.]