#3404 Issue closed: Monitor "rear mkbackup" result (when run as cron job)

Labels: support / question

LordSpectre opened issue at 2025-02-16 08:05:

Requesting support or just a question

Hello, is there a way to monitoring the backup result to execute some commands (e.g. send mail) if the backup fails?

Platform

Linux x64

Output

Additional information

As per short info, my REAR installation runs on RHEL7 via crontab every night. What I would like to do is implement a check of backup status, and in case something gone wrong execute some commands (e.g. send email). Is there any native configuration for that? e.g.
POST_BACKUP_SCRIPT=/etc/rear/post_backup_success.sh
POST_FAILURE_SCRIPT=/etc/rear/post_backup_failure.sh

Or maybe I have to change some scripts?
Thank you.

jsmeix commented at 2025-02-17 07:43:

@LordSpectre
what backup method do you use?
I.e. what BACKUP=... do you have specified
in your etc/rear/local.conf?

Yes, using POST_BACKUP_SCRIPT would be the generic way
to do that inside ReaR.
There is no such thing as POST_FAILURE_SCRIPT in ReaR.
See usr/share/rear/conf/default.conf what config variables
are supported by ReaR.

But you may also call whatever you need outside of ReaR
after "rear mkbackup/mkbackuponly" had finished
to check and verify the results.

LordSpectre commented at 2025-02-17 07:58:

@jsmeix

I'm using NETFS as BACKUP type.
POST_BACKUP_SCRIPT will be called anyway ONLY when the backup is finished. If Rear stops with any error (e.g. no space in /boot to write boot files) the POST_BACKUP_SCRIPT will not be executed.

I notice that for any error in any stage, Rear will write this in its log:
Exit task '(( EXIT_FAIL_MESSAGE )) && echo 'rear mkbackup failed, check /var/log/rear/rear-xxxxxxxxx.log for details' 1>&8'

So maybe I can just check for "EXIT_FAIL_MESSAGE" and/or "rear mkbackup failed" strings after Rear backup task.

jsmeix commented at 2025-02-17 09:22:

@LordSpectre
perhaps there is a misunderstanding here?

Because you wrote initially "monitoring the backup result"
I was thinking you mean the result of only what belongs
to making a backup of the system's files?

Cf. the section
"Relax-and-Recover versus backup and restore"
https://en.opensuse.org/SDB:Disaster_Recovery#Relax-and-Recover_versus_backup_and_restore

But because now you wrote

If Rear stops with any error
(e.g. no space in /boot to write boot files)
the POST_BACKUP_SCRIPT will not be executed.

I am now wondering if you perhaps actually mean
"monitoring the whole ReaR result"?

For the latter the exit code of sbin/read should help
because for "rear mkbackup/mkbackuponly" the exit code
should be zero only when "rear mkbackup/mkbackuponly"
finished without error.

But when "rear mkbackup/mkbackuponly" finished without error
it does not mean all is 100% OK for what you need because
there could be issues which are an error for your use case
but which were not (yet?) considered as an error by ReaR.

In contrast to "rear mkbackup/mkbackuponly" exit code
"rear checklayout" is special because it finishes
with exit code 1 when the disk layout has changed
or when configuration files have changed
but this exit code 1 is no error but meant
as a signal that things have changed.

LordSpectre commented at 2025-02-17 10:28:

@jsmeix

I'm sorry, perhaps I wasn't clear enough in my request. What I need is to run the backup at night using a script in crontab like this:
rear mkbackup

Then, I want to check that everything went well without having to manually check the ReaR log under /var/log/rear every day.

As I mentioned earlier, it seems that for every error where ReaR fails to complete the backup, it writes the following error:
(( EXIT_FAIL_MESSAGE )) && echo 'rear mkbackup failed'
Doesn't matter which is the cause, I always notice this

Perhaps I can add a check to the script in crontab and perform a grep on the latest ReaR log. If this error is present, it means something went wrong, and I can send an email. Afterward, I could analyze the log to figure out what went wrong.
Any other advice are welcome :)

gdha commented at 2025-02-17 12:33:

@LordSpectre You could pimp your cron entry as follow:

/usr/sbin/rear mkbackup >/dev/null 2&>1
echo exit code $? >> /var/log/rear/rear-$(hostname -s).log

Then you only have the check the last line in the log file. I do it all the time with the help of ansible

jsmeix commented at 2025-02-17 13:53:

@LordSpectre
if you like to 'grep' for ReaR messages:
I would recommend to 'grep' for success to be on the safe side
instead of trying to 'grep' for whatever error messages
e.g. think about something may send SIGKILL to ReaR
so no error message of that abort can be shown by ReaR.

See at the end of usr/sbin/rear what the log messages are
which are sent to "syslog" (via 'logger')
https://github.com/rear/rear/blob/rear-2.9/usr/sbin/rear#L876
and see the LogToSyslog function implementation
https://github.com/rear/rear/blob/rear-2.9/usr/share/rear/lib/_input-output-functions.sh#L544

For example on my openSUSE Leap 15.6 workstation:

# usr/sbin/rear -s mkbackup
Relax-and-Recover 2.9 / 2025-01-31
Running rear mkbackup (PID 9409 date 2025-02-17 14:37:44)
...

# usr/sbin/rear -s mkbackuponly
Relax-and-Recover 2.9 / 2025-01-31
Running rear mkbackuponly (PID 11450 date 2025-02-17 14:44:05)
...

# usr/sbin/rear -s mkrescue & sleep 1 ; kill -9 $!
[1] 12752
Relax-and-Recover 2.9 / 2025-01-31
Running rear mkrescue (PID 12752 date 2025-02-17 14:50:15)
...
[1]+  Killed                  usr/sbin/rear -s mkrescue

# tail /var/log/messages
...
2025-02-17T14:37:48.015782+01:00 localhost rear[10856]: rear mkbackup finished with zero exit code
2025-02-17T14:44:07.075626+01:00 localhost rear[12328]: rear mkbackuponly finished with zero exit code

i.e. no syslog message about the killed "rear mkrescue".

jsmeix commented at 2025-02-17 14:08:

@gdha
do you perhaps know the reason why
we do not send the final LogToSyslog messages
also to the ReaR logfile?

E.g. at the end of sbin/rear something like

    if test $EXIT_CODE -eq 0 ; then
        LogToSyslog "$PROGRAM $WORKFLOW finished with zero exit code"
        echo "${MESSAGE_PREFIX}$( date +"%Y-%m-%d %H:%M:%S.%N " )$PROGRAM $WORKFLOW finished with zero exit code" >>"$LOGFILE"
    else

The ${MESSAGE_PREFIX}$( date +"%Y-%m-%d %H:%M:%S.%N " ) part is from
https://github.com/rear/rear/blob/rear-2.9/usr/share/rear/lib/_input-output-functions.sh#L414

gdha commented at 2025-02-17 14:16:

@jsmeix you will see in the usr/sbin/rear code at the end:

# There should be no syslog message for the help workflow:
if test "$WORKFLOW" != "help" ; then
    if test $EXIT_CODE -eq 0 ; then
        LogToSyslog "$PROGRAM $WORKFLOW finished with zero exit code"
    else
        if test "checklayout" = "$WORKFLOW" -a $EXIT_CODE -eq 1 ; then
            # The checklayout workflow is special because it sets EXIT_CODE=1
            # when the disk layout has changed or when configuration files have changed
            # but that exit code 1 is no error but meant as a signal that things have changed
            # which require a new ISO image so that users can run "rear checklayout || rear mkrescue"
            # see https://github.com/rear/rear/issues/564
            LogToSyslog "$PROGRAM checklayout finished with exit code 1 (layout or config changed)"
        else
            LogToSyslog "$PROGRAM $WORKFLOW failed with exit code $EXIT_CODE"
        fi
    fi
fi

exit $EXIT_CODE

Something should be logged in /var/log/messages after each rear run.

jsmeix commented at 2025-02-17 14:38:

@gdha
yes, I know it, cf. 'git blame usr/sbin/rear' ;-)
and that's what I wrote in
https://github.com/rear/rear/issues/3404#issuecomment-2663204576

But I asked about a reason why we do not send those
final LogToSyslog messages also to the ReaR logfile?

I.e. why only in syslog (e.g. in /var/log/messages)
but not also in the ReaR logfile?

gdha commented at 2025-02-17 14:50:

@jsmeix Sorry, misundertstood the question - my head is not clear today ;-( To be honest I have no idea why.

jsmeix commented at 2025-02-17 15:17:

@gdha
thank you!

So it seems there is no reason why those messages
could not be also output to the logfile which means
(as time permits) I will make an enhancement pull request
to also output our final LogToSyslog messages to the logfile.

jsmeix commented at 2025-02-17 15:45:

Hmmm...

I think I can simplify LogToSyslog things
to make LogToSyslog always also output to the log file
because I don't see a reason why LogToSyslog messages
should not always also be sent to the log file.

My current reasoning:

# find usr/sbin/rear usr/share/rear/ -type f | xargs grep 'LogToSyslog' | grep -v ': *#'

usr/sbin/rear:
        LogToSyslog "$PROGRAM $WORKFLOW finished with zero exit code"
            LogToSyslog "$PROGRAM checklayout finished with exit code 1 (layout or config changed)"
            LogToSyslog "$PROGRAM $WORKFLOW failed with exit code $EXIT_CODE"

usr/share/rear/lib/_input-output-functions.sh:
function LogToSyslog () {
    LogToSyslog "ERROR: $*"

So LogToSyslog is only called
at the end of sbin/rear for our final messages and
in the Error() function in lib/_input-output-functions.sh as

Log "ERROR: $*"
LogToSyslog "ERROR: $*"

where the message is already also output to the log file.

When I move in sbin/rear the
"Saving $RUNTIME_LOGFILE as $LOGFILE"
part after the final LogToSyslog calls
then I could call Log in LogToSyslog to make
LogToSyslog always also output to the log file.

This would avoid needlessly overcomplicated duplicated code like

LogToSyslog "$PROGRAM $WORKFLOW finished with zero exit code"
echo "${MESSAGE_PREFIX}$( date +"%Y-%m-%d %H:%M:%S.%N " )$PROGRAM $WORKFLOW finished with zero exit code" >>"$LOGFILE"

as in my
https://github.com/rear/rear/issues/3404#issuecomment-2663242237
which is needed when the "Saving $RUNTIME_LOGFILE as $LOGFILE"
part happens before the final LogToSyslog calls.

LordSpectre commented at 2025-02-17 16:44:

@LordSpectre You could pimp your cron entry as follow:

/usr/sbin/rear mkbackup >/dev/null 2&>1
echo exit code $? >> /var/log/rear/rear-$(hostname -s).log

Then you only have the check the last line in the log file. I do it all the time with the help of ansible

Thank you, this is an interesting approach. Here is a test I did:

# echo exit code $?
exit code 143

Is there any exit code list? I would base my script basing on that code, but from what I understand, if files are modified during the backup, the exit code will be 1.
So, which codes should I check?
0 and 1 = OK
Others = KO

gdha commented at 2025-02-18 07:34:

@LordSpectre Articles https://komodor.com/learn/exit-codes-in-containers-and-kubernetes-the-complete-guide/ and https://www.cyberciti.biz/faq/linux-bash-exit-status-set-exit-statusin-bash/ provide a nice overview of exit codes.
In your case 143 means SIGTERM exit (128+15).

LordSpectre commented at 2025-02-18 08:21:

@gdha thank you very much. I tried to simulate several errors, e.g. nfs only read mount, ctrl-c during backup, full /boot partition on host, etc. The exit code is always 143. Plus I notice that with below common WARNING, the exit code is still 0.
That's good, so it means if REAR complete correctly the backup, I always get ZERO. So I can just check if the exit code is ZERO, or different from ZERO and then send mail.

WARNING: tar ended with return code 1 and below output:
  ---snip---
  tar: /var/spool/postfix/public/qmgr: socket ignored
  tar: /var/spool/postfix/public/flush: socket ignored
  tar: /var/spool/postfix/public/showq: socket ignored
  ----------
This means that files have been modified during the archiving
process. As a result the backup may not be completely consistent
or may not be a perfect copy of the system. Relax-and-Recover
will continue, however it is highly advisable to verify the
backup in order to be sure to safely recover this system.

jsmeix commented at 2025-02-18 09:04:

That "tar ended with return code 1" issue
is a good example for what I meant with

when "rear mkbackup/mkbackuponly" finished without error
it does not mean all is 100% OK for what you need because
there could be issues which are an error for your use case

in my above
https://github.com/rear/rear/issues/3404#issuecomment-2662514125

For example the following 'tar' output

WARNING: tar ended with return code 1 and below output:
  ---snip---
  tar: /etc/fstab: file changed as we read it

could mean that the backup is really not good enough
to restore the system in a clean and usable state.

But it depends on the 'tar' version what its message
"file changed as we read it" actually means, see
https://stackoverflow.com/questions/20318852/tar-file-changed-as-we-read-it
excerpts

tar 1.34 or before
These versions of tar use ... "ctime"
...
tar 1.35+
In these versions, tar uses "mtime"

Disaster recovery is not "easy".
Disaster recovery is not at all "simple".
There is no such thing as a disaster recovery solution that "just works".
https://en.opensuse.org/SDB:Disaster_Recovery#Inappropriate_expectations
So:
No disaster recovery without testing and continuous validation.
https://en.opensuse.org/SDB:Disaster_Recovery#No_disaster_recovery_without_testing_and_continuous_validation

LordSpectre commented at 2025-02-18 18:16:

@jsmeix This is clear, I don't want to check the backup consistency, but only the result of the whole operation. But I already know that exit code = 0 doesn't mean that I'm safe 100% :)


[Export of Github issue for rear/rear.]