#1088 Issue closed: Support for "Multiple Backup Methods"

Labels: enhancement, fixed / solved / done

jsmeix opened issue at 2016-11-25 15:30:

Implement support for "Multiple Backup Methods".

Cf.
https://github.com/rear/rear/issues/769#issuecomment-183310560
and
https://github.com/rear/rear/issues/1078#issuecomment-262458799
and
https://github.com/rear/rear/issues/1078#issuecomment-262921898
and
https://github.com/rear/rear/issues/1085#issuecomment-262977318

jsmeix commented at 2016-11-25 15:40:

Only a totally untested idea off the top of my head
how to get support for "Multiple Backup Methods"
implemented as some kind of pure add-on so that
all the existing code for single backup method
could still work as before (i.e. I really want to avoid
to do major changes in existing code):

The idea is based on the assumption that
a "rear mkbackup" run can be splitted into
a "rear mkrescue" run plus a "rear mkbackuponly" run
and that the result is still the same.

If that is true (or can be made true with reasonable effort)
support for "Multiple Backup Methods" could be simply
an initial "rear mkrescue" run plus a sequence of
subseqent "rear mkbackuponly BACKUP_METHOD" runs
where the BACKUP_METHOD option specifies what
backup method has to be run.

The configuration values for "Multiple Backup Methods" could
then be simply additional /etc/rear/BACKUP_METHOD.conf files
that are read for each "rear mkbackuponly BACKUP_METHOD" run
after the usual config files were read so that the values in a
/etc/rear/BACKUP_METHOD.conf file overwrite values regarding
the backup in a global config file like /etc/rear/local.conf.

This way support for "Multiple Different Backup Methods" should
be possible but I even can imagine that it is hopefully not too
hard to also support "Multiple Same Backup Methods" in an
analogous way.

But all that depends on whether or not "rear mkbackup"
can be replaced by "rear mkrescue" plus "rear mkbackuponly".

gozora commented at 2016-11-25 15:48:

Reading rear --help I've found option -c DIR.
If I got your idea right, same would be archived with having multiple of:

rear -c /etc/rear/BACKUP_METHOD1 mkbackup
rear -c /etc/rear/BACKUP_METHOD2 mkbackup
rear -c /etc/rear/BACKUP_METHOD3 mkbackup

in crontab, or?

jsmeix commented at 2016-11-25 19:21:

Wih "-c DIR" one gets an "alternative config directory; instead of /etc/rear"
which should also work but then one needs to manually maintain multiple
whole local.conf files in each of the alternative config dirs.

My proposal means that all can use one same base /etc/rear/local.conf
but on top of that an additional /etc/rear/FOO.conf is sourced so that
the values in FOO.conf could overwrite (if needed) values in local.conf.

Basically I mean a different command line option, something like

  -a file
    additionally source file after the usual config files had been sourced
    but before any script is sourced

jsmeix commented at 2016-11-25 19:28:

By the way:

Another possible advantage of "Multiple Backup Methods"
when it can be done as described above is that then
multiple backups could run in parallel, i.e. basically like

rear mkrescue ; rear -a backup1 mkbackuponly & rear -a backup2 mkbackuponly & ...

For example think about a multiple-core machine
with tons of main memory and several disks
e.g. sda for the system and sdb for /home and sdc for a database
where multiple independent backups could run in parallel.

jsmeix commented at 2016-11-25 19:31:

Of course multiple backups require
the counterpart for the restore, cf.
https://github.com/rear/rear/issues/987

gozora commented at 2016-11-25 19:38:

I got your idea now. Despite I dont like idea of overriding config files (can be quite messy), it is a good start.

gozora commented at 2016-11-25 19:42:

For example think about a multiple-core machine with tons of main memory and several disks

Sounds like HANA :-)

jsmeix commented at 2016-11-25 19:52:

I need overriding config files because of backward compatibility
and to keep the code simple (there is no support for arrays
of arrays in bash and even if that was supported, I would
try to avoid it because it would be a nightmare for the user
to specify such config structures correctly).

With overriding values by a subsequent config file
one can have e.g. in /etc/rear/backup1

BACKUP=NETFS
BACKUP_URL=nfs://server1/systembackup

and in /etc/rear/backup2

BACKUP=NETFS
BACKUP_URL=nfs://server2/homebackup

and in /etc/rear/backup3

BACKUP=NETFS
BACKUP_URL=nfs://server3/databasebackup

and - if my above assumptions are right - then

rear mkrescue ; rear -a backup1 mkbackuponly & rear -a backup2 mkbackuponly & rear -a backup3 mkbackuponly

should (hopefully) "just do the right thing".

Simply put:
"Multiple Backup Methods" could be "Backups on Steroids" ;-)

Seriously:
If "Multiple Parallel Backups and Restores"
would be possible, it could be a major boost for ReaR.
ReaR is know to be fast but then it could be made
baically theoretically unlimited fast - in practice
only limited by what the hardware can do.

jsmeix commented at 2016-11-25 20:00:

The possible advantages are so compelling
that I will try to get it implemented for ReaR 2.0
at least in an initially basically working way
(or I learn it is not possible with reasonable effort).

gozora commented at 2016-11-25 20:05:

Definetely! Then we can have NTFS with Windows deployment in ReaR 3.0 :-).

jsmeix commented at 2016-11-25 20:23:

Of course you meant
"Parallel Deployments of Multiple Systems"
...
hmmmm...
...
perhaps not a real bad idea.
Think about hundreds of virtual machines on one host.
Way cool to restore all the virtual disk images in parallel...

gozora commented at 2016-11-25 20:27:

I have a feeling that this brainstorming is going too far :-)!

jsmeix commented at 2016-11-25 20:30:

Not at this time and not with my second beer!
;-)

gozora commented at 2016-11-25 20:32:

Cheers from Slovakian pub! :-)

jsmeix commented at 2016-11-25 20:33:

Cheers!

jsmeix commented at 2016-11-28 12:53:

In https://github.com/rear/rear/pull/1090
I use the new option '-C' to source additional
config files because according to
https://github.com/rear/rear/issues/1088#issuecomment-263013427
those additional files are sourced after the usual config files
had been sourced but before any script is sourced which
means those files behave as additional config files so that
the option name '-C' matches better the main purpose
(regardless that in ReaR any file is a bash script).

didacog commented at 2016-11-29 14:15:

Hi,

Sorry but I'm quite confused last days viewing lots of messages about add multiple backup methods, NTFS,... to ReaR.

In a previous discussion #769 we were talking about moving out external backup methods from ReaR to maintain stability and use a separate package to provide those external backup methods, and now seems a 180ยบ turn.

Those thoughts are gone? because seems adding more and more extra stuff to ReaR instead of moving out from main package, what was discussed previously.

I just want to clarify my thoughts :P

Regards,

jsmeix commented at 2016-11-29 15:59:

@didacog
extracting the backup functionality from ReaR
into a separate open source project is a different
(strictly speaking unrelated) issue, cf. in particular
https://github.com/rear/rear/issues/769#issuecomment-183331359

As long as extracting the backup functionality is not decided
the scripts that provide the backup functionality stay in ReaR.
I assume for ReaR 2.0 it will not be extracted because
we do not have a clear and stable interface between
ReaR and the backup software, cf.
https://github.com/rear/rear/issues/769#issuecomment-183370695

Currently I am implementing support for multiple backups
in ReaR and right now I did my very initial simple first
succesful recovery with two backups using the same
backup method (NETFS and 'tar').

As expected there are several unexpected things
but up to now I could solve all of them reletively easily
so that currently it looks really promising.

jsmeix commented at 2016-11-29 17:09:

For a very basic and initial way how recovery with two backups
using same backup method (NETFS and 'tar') worked for me, see
https://github.com/rear/rear/pull/1091#issuecomment-263621714

For initial usage with different backup methods
one needs to get all the binaries and other needed files
of all used backup methods into the recovery system
during the initial "rear mkbackup/mkrescue" run.
I think for now one can do that maually via
REQUIRED_PROGS and COPY_AS_IS
in /etc/rear/local.conf - later this should happen
somehow automatically...

jsmeix commented at 2016-11-30 12:05:

With
https://github.com/rear/rear/pull/1091
together with
https://github.com/rear/rear/pull/1090
there is now a first initial (and actually usable) implementation
for multiple backups of same backup type.

E.g. several 'tar' backups via NETFS
which is currently the only kind that I have used.

See
https://github.com/rear/rear/pull/1091#issuecomment-263621714
for an example how one could do it.

jsmeix commented at 2016-11-30 12:25:

@gozora
I count on you to try it out with two different backup methods like
a 'tar' backup via NETFS for the basic system plus an additional
backup via BACKUP=BORG e.g. for things like /home/*

In particular I would be interested regarding what I wrote above in
https://github.com/rear/rear/issues/1088#issuecomment-263633271

For initial usage with different backup methods
one needs to get all the binaries and other needed files
of all used backup methods into the recovery system
during the initial "rear mkbackup/mkrescue" run.
I think for now one can do that maually via
REQUIRED_PROGS and COPY_AS_IS
in /etc/rear/local.conf - later this should happen
somehow automatically...

For Borg I would assume that something like

REQUIRED_PROGS=( "${REQUIRED_PROGS[@]}" borg locale )

is needed in the config file that is used while
"rear mkbackup" or "rear mkrescue" is run to get all
what Borg needs for restore into the recovery system.

gozora commented at 2016-11-30 12:37:

Hello @jsmeix,

I'll definitely going to test it! (not sure if I can make it today ...)

Copy of base progs needed for Borg is done in usr/share/rear/prep/BORG/default/200_prep_borg.sh

PROGS=( "${PROGS[@]}" "${PROGS_BORG[@]}" borg locale )

So to my understanding not explicit action is necessary.

jsmeix commented at 2016-11-30 13:44:

I know about prep/BORG/default/200_prep_borg.sh
(this is wherefrom I made my REQUIRED_PROGS proposal)
but I think you misunderstand because you cannot have
both BACKUP=NETFS and BACKUP=BORG in the
config file that is used for "mkbackup/mkrescue"
(i.e. when the recovery system ISO gets created).

E.g. for a 'tar' backup via NETFS for the basic system plus
an additional backup via BACKUP=BORG e.g. for /home/*
you would use BACKUP=NETFS and all what the
'tar' backup via NETFS needs in one config file
that is used for "mkbackup/mkrescue"
(i.e. when the recovery system ISO gets created)
and you would use BACKUP=BORG and all what the
Borg backup needs in another config file
that is used for "mkbackuponly"
(i.e. when only the additional backup gets created), cf. in
https://github.com/rear/rear/pull/1091#issuecomment-263621714
my local.conf plus base.conf versus rest.conf and how I used them

rear -C base -d -D mkbackup
rear -C rest -d -D mkbackuponly
...
rear -C base -d -D recover
rear -C rest -d -D restoreonly

In your case I think you need in base.conf (whatever you call it)

REQUIRED_PROGS=( "${REQUIRED_PROGS[@]}" borg locale )

to get all what Borg needs for restore into the recovery system.

gozora commented at 2016-11-30 13:49:

Of course I've misunderstood, it is my trademark :-)!
Now its crystal clear, thanks @jsmeix!

jsmeix commented at 2016-11-30 14:31:

FYI:

My next step is to automate the rear calls as in

rear -C base -d -D mkbackup
rear -C rest -d -D mkbackuponly
...
rear -C base -d -D recover
rear -C rest -d -D restoreonly

with a new config array variable in local.conf like

MULTIPLE_BACKUPS=( base rest )

so that one single call of

rear -d -D mkbackup

actually runs the equivalent of

rear -C base -d -D mkbackup && rear -C rest -d -D mkbackuponly

and one single call of

rear -d -D recover

actually runs the equivalent of

rear -C base -d -D recover && rear -C rest -d -D restoreonly

My private top secret future plan:
Something like

MULTIPLE_BACKUPS=( first second& third& )

should run first

rear C first mkbackup

and afterwards the two mkbackuponly in parallel like

rear -C second mkbackuponly &
rear -C third mkbackuponly &

and analogous behaviour for "rear recover".
I.e. backup and restore on speed ;-)

gozora commented at 2016-11-30 14:41:

@jsmeix This is the stage where Micro$oft uses Advanced button :-).
So maybe we should consider documentation section with title Advanced ReaR options ...

jsmeix commented at 2016-11-30 15:01:

I will do proper documentation when
multiple backups work sufficiently well
and when there are no serious regressions,
i.e. when it can be considered to be somewhat stable
so that it is unlikely that I need to do incompatible changes
how multiple backups work.

Currently there is intentionally nothing documented
so that nothing is officially announced for the users
and we can freely change as we like how multiple
backups work until it works sufficiently o.k. for us.

I am playing around with the current implementation
and I had already some interesting "effects" - i.e. I am
learning how it actually behaves in practice - which is
probably helpful to write usable documentation ;-)

jsmeix commented at 2016-11-30 15:40:

My very first experiments with manually running
two "rear mkbackupoly" and two "rear restoreonly"
in parallel worked in principle.
I did only this changes:
In usr/sbin/rear I added a test
for REAR_ALLOW_RUNNING_IN_PARALLEL

# LOCKLESS_WORKFLOWS can run simultaneously with another instance by using a LOGFILE.lockless:
if IsInArray "$WORKFLOW" "${LOCKLESS_WORKFLOWS[@]}" ; then
    LOGFILE="$LOGFILE.lockless"
else
  if ! test "$REAR_ALLOW_RUNNING_IN_PARALLEL" ; then
    # When this currently running instance is not one of the LOCKLESS_WORKFLOWS
    # then it cannot run simultaneously with another instance
    # in this case pidof is needed to test what running instances there are:
    if ! has_binary pidof ; then
        echo "ERROR: Required program 'pidof' missing, please check your PATH" >&2
        exit 1
    fi
    # For unknown reasons '-o %PPID' does not work for pidof at least in SLES11
    # so that a manual test is done to find out if another pid != $$ is running:
    for pid in $( pidof -x "$SCRIPT_FILE" ) ; do
        if test "$pid" != $$ ; then
            echo "ERROR: $PROGRAM is already running, not starting again" >&2
            exit 1
        fi
    done
  fi
fi

and in usr/share/rear/conf/default.conf
I added current PID ($$) to the LOGFILE name

LOGFILE="$LOG_DIR/rear-$HOSTNAME-$$.log"

and then it works

# usr/sbin/rear -C base -d -D mkbackup
# export REAR_ALLOW_RUNNING_IN_PARALLEL="yes"
# usr/sbin/rear -C rootbackup -d -D mkbackuponly &
# usr/sbin/rear -C homebackup -d -D mkbackuponly &

But it seems not to make a real difference because
on a virtual machine with even 2 CPUs and 2 GiB memory
each two backup and restore 'tar' processes seem to run
not simultaneously but one after the other :-(

gozora commented at 2016-11-30 22:02:

Just finished "Multi-backup" backup/restore with NETFS + BORG, with overall result: SUCCESS!

Must admit that configuration was a bit tricky, but as sysadmin I really want to scarify my comfort for this.
When talking about "tricky" part, all I have on my mind is that you must know what configuration option to put in which config (as discussed in https://github.com/rear/rear/issues/1088#issuecomment-263861320 ), but I believe that this will get better over time.

In my humble opinion, this feature adds another, very useful dimension to ReaR possibilities.

Good job @jsmeix !

If there is someone curious, here are sample configs I've used for NETFS + BORG backup/restore:

root@mate:/etc/rear# cat base_os.conf
OUTPUT=ISO
OUTPUT_URL=nfs://beta.virtual.sk/mnt/rear/iso
GRUB_RESCUE=n
REQUIRED_PROGS=( "${REQUIRED_PROGS[@]}" borg locale )
COPY_AS_IS=( "${COPY_AS_IS[@]}" "/root/borg_keys" )

BACKUP=NETFS
BACKUP_URL=nfs://beta.virtual.sk/mnt/rear
BACKUP_PROG_EXCLUDE=( ${BACKUP_PROG_EXCLUDE[@]} '/media/*' '/home' )

root@mate:/etc/rear# cat home.conf 
MANUAL_INCLUDE=YES
BACKUP_PROG_INCLUDE=( '/home' )

### Borg stuff ###
BACKUP=BORG
BORGBACKUP_HOST="beta.virtual.sk"
BORGBACKUP_USERNAME="root"
BORGBACKUP_REPO="/mnt/rear/borg/${HOSTNAME}"
BORGBACKUP_PRUNE_HOURLY=5
BORGBACKUP_PRUNE_WEEKLY=2
BORGBACKUP_COMPRESSION="zlib,9"

## Borg repository encryption
BORGBACKUP_ENC_TYPE="keyfile"
export BORG_KEYS_DIR="/root/borg_keys"
export BORG_PASSPHRASE="S3cr37_P455w0rD"

## Borg behavior
export BORG_RELOCATED_REPO_ACCESS_IS_OK="yes"
export BORG_UNKNOWN_UNENCRYPTED_REPO_ACCESS_IS_OK="yes"
export BORG_REMOTE_PATH="/usr/local/bin/borg"

Then for backup:

root@mate:/# rear -C base_os mkbackup
root@mate:/# rear -C home mkbackuponly

And restore:

RESCUE mate:~ # rear -C base_os recover
RESCUE mate:~ # rear -C home restoreonly

gdha commented at 2016-12-01 07:30:

@gozora @jsmeix indeed an excellent job done by you both! Now that you have the experience in configuration why not write it down on our relax-and-recover web pages? The web pages are also in GitHub so that would be easy to extend, if you wish of course. Otherwise, it will be a dead functionality if nobody knows about it (or when it is too complex to get it right).

gozora commented at 2016-12-01 09:06:

I didn't do anything all kudos goes to @jsmeix.
I of course I can help with documentation, but I guess @jsmeix wants to wait with this step (see https://github.com/rear/rear/issues/1088#issuecomment-263894804)

gdha commented at 2016-12-01 09:19:

@gozora @jsmeix from experience I know the longer you wait to write something down the more difficult it becomes as we tend to forget details (at least with me but that could be the age of course)

jsmeix commented at 2016-12-01 09:19:

@gozora
many thanks for your testing
and your enlightening feedback.

In particular I learned that MANUAL_INCLUDE=YES
works when used with "mkbackuponly" because in this case
it does not matter that MANUAL_INCLUDE=YES
disables all entries in disklayout.conf (at least for me it does)
so that one cannot (at least I cannot) use MANUAL_INCLUDE=YES
with "mkbackup" (or "mkrescue"), cf.
https://github.com/rear/rear/issues/1019#issuecomment-251348729

Therefore I invented BACKUP_ONLY_INCLUDE
so that I do not have to use MANUAL_INCLUDE=YES
and - by the way - I invented also its logical counterpart
BACKUP_ONLY_EXCLUDE which could be useful to get things
in the backup that are not intended to be restored (i.e. have
the content of EXCLUDE_MOUNTPOINTS in the backup)
because I think in general a backup should be complete
(i.e. basically it should contain all regular files - except tmp files)
regardless what exactly is actually needed to be restored,
see also BACKUP_RESTORE_MOVE_AWAY_FILES
in default.conf.

jsmeix commented at 2016-12-01 09:25:

@gdha
don't worry regarding documentation
but also don't hurry ;-)

I will properly document multiple backups.

I think as long as I am actively working on it I will not
forget details.

But as long as multiple backups is under development
I do intentionally not document it as I described in
https://github.com/rear/rear/issues/1088#issuecomment-263894804

Multiple backups will be documented in time
before the ReaR 1-20 release.

jsmeix commented at 2016-12-01 10:15:

@gozora
I think for Borg you could use BACKUP_ONLY_INCLUDE="yes"
instead of MANUAL_INCLUDE=YES, see
https://github.com/rear/rear/issues/1019#issuecomment-264132715

jsmeix commented at 2016-12-01 11:10:

@gdha
regarding "dead functionality if nobody knows about it":
If multiple backups work sufficiently well in practice
we should at least tell about it at FOSDEM 2017
perhaps even present it (I can do that ;-)

gozora commented at 2016-12-01 11:28:

@jsmeix

I think for Borg you could use BACKUP_ONLY_INCLUDE="yes"
instead of MANUAL_INCLUDE=YES

Will test today evening.

jsmeix commented at 2016-12-01 13:28:

I have even more good news today:

Regarding
https://github.com/rear/rear/issues/1088#issuecomment-263905911

... manually running two "rear mkbackupoly" ... in parallel ...
... seems not to make a real difference ...
... 'tar' processes seem to run not simultaneously ...

I only misinterpreted ReaR's output messages
which is the result of ReaR's current logging that is
not prepared for multiple simultaneous rear runs
together with mixing up progress subsystem messages
that is also not prepared for multiple simultaneous rear runs.

Actually things do run well in parallel.

With some quick-and-dirty experimantal logging enhancements
and progress subsystem changes (mainly showing messages
with a leading PID and let the progress subsystem output each
message on a separated line which effectively disables the intent
of the progress subsystem, cf. lib/progresssubsystem.nosh)
one can nicely see how things "just work" even in parallel:

# usr/sbin/rear -C rootbackup -d -D mkbackuponly & usr/sbin/rear -C homebackup -d -D mkbackuponly & wait ; echo done
[1] 23905
[2] 23906
23906 Relax-and-Recover 1.19 / Git
23905 Relax-and-Recover 1.19 / Git
23906 Using log file: /root/rear/var/log/rear/rear-d25-23906.log
23905 Using log file: /root/rear/var/log/rear/rear-d25-23905.log
23906 Sourcing additional configuration file '/root/rear/etc/rear/homebackup.conf'
23905 Sourcing additional configuration file '/root/rear/etc/rear/rootbackup.conf'
23906 Using backup archive 'backup-homebackup.tar.gz'
23905 Using backup archive 'backup-rootbackup.tar.gz'
23906 Creating disk layout
23905 Creating disk layout
23906 Creating tar archive '/tmp/rear.jFUrPHh4h8TQYdv/outputfs/d25/backup-homebackup.tar.gz'
23906 ProgressStart: Preparing archive operation
23905 Creating tar archive '/tmp/rear.6zpcz6nQp1GR5nC/outputfs/d25/backup-rootbackup.tar.gz'
23905 ProgressStart: Preparing archive operation
23906 ProgressInfo: Archived 22 MiB [avg 11648 KiB/sec] 
23905 ProgressInfo: Archived 9 MiB [avg 4816 KiB/sec] 
23906 ProgressInfo: Archived 35 MiB [avg 12021 KiB/sec] 
23905 ProgressInfo: Archived 16 MiB [avg 4328 KiB/sec] 
...
23906 ProgressInfo: Archived 154 MiB [avg 12199 KiB/sec] 
23905 ProgressInfo: Archived 145 MiB [avg 10648 KiB/sec] 
23906 ProgressStop:  OK
23905 ProgressInfo: Archived 166 MiB [avg 11356 KiB/sec] 
23906 Archived 154 MiB in 15 seconds [avg 10572 KiB/sec]
23905 ProgressInfo: Archived 190 MiB [avg 12190 KiB/sec] 
23906 Saving /root/rear/var/log/rear/rear-d25-23906.log as /root/rear/var/log/rear/rear-d25-mkbackuponly-homebackup.log
23905 ProgressInfo: Archived 222 MiB [avg 13406 KiB/sec] 
23905 ProgressInfo: Archived 253 MiB [avg 13680 KiB/sec] 
23905 ProgressInfo: Archived 284 MiB [avg 14550 KiB/sec] 
23905 ProgressInfo: Archived 314 MiB [avg 15340 KiB/sec] 
23905 ProgressStop:  OK
23905 Archived 314 MiB in 22 seconds [avg 14642 KiB/sec]
23905 Saving /root/rear/var/log/rear/rear-d25-23905.log as /root/rear/var/log/rear/rear-d25-mkbackuponly-rootbackup.log
[1]-  Done  usr/sbin/rear -C rootbackup -d -D mkbackuponly
[2]+  Done  usr/sbin/rear -C homebackup -d -D mkbackuponly
done

jsmeix commented at 2016-12-01 13:45:

And for "restoreonly" in parallel the only thing
that gets in the way is when
usr/share/rear/restore/default/995_remount_sync.sh
is executed by the first "restoreonly" process
because remounting 'sync' makes it unusable slow
for the still running other 'tar' restore process.
Quick and dirty disabling
usr/share/rear/restore/default/995_remount_sync.sh
makes running two "restoreonly" in parallel "just work".

jsmeix commented at 2016-12-01 16:46:

@gozora
for the fun of it:
I run those parallel things now on a virtual machine
with 4 CPUs and 4GB main memory... :-)

gozora commented at 2016-12-01 16:49:

:-)
I was just thinking; what is the fastest DR solution in world?

jsmeix commented at 2016-12-01 16:54:

I don't know what it is now
but I have a dim feeling what it will be in the future ;-)

FYI:
One cannot run "recover" in parallel with "restoreonly"
because first the basic system with all the mountpoints
below /mnt/local must be there.
Afterwards one can run multiple "restoreonly" in parallel.

Tomorrow I will test with bigger additional backups
to get some time data how much running multiple
"restoreonly" in parallel actually makes it faster
(on sufficiently powerful hardware).

gozora commented at 2016-12-01 17:33:

Hello @jsmeix

I think for Borg you could use BACKUP_ONLY_INCLUDE="yes"
instead of MANUAL_INCLUDE=YES, see

I've adapted my config:

root@mate:~# diff /etc/rear/home.conf.old /etc/rear/home.conf
1c1
< MANUAL_INCLUDE=YES
---
> BACKUP_ONLY_INCLUDE=YES

And I can confirm that BACKUP_ONLY_INCLUDE works perfectly fine with Borg.

didacog commented at 2016-12-01 18:04:

@jsmeix
for sure, extracting the backup functionality is not decided yet. Maybe never becomes it that way, but as we started that discussion, could be possible and well, seems that those are features could break backwards compatibility and could make things harder to "extract backup functionality" if will happen in some future.
Just were some thoughts about my impressions on those features appearing last days in relation with #769. Nothing else.

Regards,

jsmeix commented at 2016-12-02 08:42:

@didacog
if "extracting the backup functionality" ever happens
my current understanding what that means is basically
a simple move of the scripts that implement the actual
backup and restore into a separated project but I would
insist on that the underlying framework that ReaR can
call external tools for backup and restore must be kept
in ReaR and what I implemented in
https://github.com/rear/rear/pull/1090
and
https://github.com/rear/rear/pull/1091
are primarily enhancements of that framework.
The only real backup scripts that I enhanced
was to make NETFS 'tar' support the new
BACKUP_ONLY_INCLUDE and
BACKUP_ONLY_EXCLUDE
flags.

@didacog
what I would like to see from you is that you test the
current multiple backup implementation in ReaR
with your preferred backup method and that you enhance
your preferred backup method as needed to work well for
multiple backups.
In particular it must be possible that the user can specify
exactly what he wants to have in the backup and what not
for your backup method (e.g. via support for things like BACKUP_ONLY_INCLUDE and BACKUP_ONLY_EXCLUDE
or something similar as appropriate for your backup method).

jsmeix commented at 2016-12-02 08:46:

@gozora
the above results (Borgs works perfectly fine with multiple backups)
indicate that you implemented support for Borg perfectly fine!

didacog commented at 2016-12-02 10:25:

@jsmeix
Of course we will test multiple backup as fast as we can have time after completing DRLM roadmap for 2.1.0 version.

I may have some thoughts about using it, but well, maybe could be better to talk about this after testing and knowing better how it works. maybe at FOSDEM'17 we'll have time to talk about it. ;)

Please do not misunderstand me when I talk about my doubts on new features, the idea was just to clarify my thoughts, nothing else.
Writing is harder to tint the words than talking about them face to face. :P

Regards,

jsmeix commented at 2016-12-02 13:04:

@didacog
I look forward to seeing you at FOSDEM 2017 and
I do appreciate your thoughts and your doubts
because if all just blindly applaud "hooray" for
anything I do, I would do very lot of nonsense.

jsmeix commented at 2016-12-02 13:25:

Regarding
https://github.com/rear/rear/issues/1088#issuecomment-264228023
"get some time data how much running multiple restoreonly
in parallel actually makes it faster":

On my virtual machine with 4 CPUs and 4GB main memory:

RESCUE d25:~ # date +%s ; rear -C rootbackup -d -D restoreonly & rear -C homebackup -d -D restoreonly & wait ; date +%s
1480684655
...
1480684660

RESCUE d25:~ # date +%s ; rear -C rootbackup -d -D restoreonly ; rear -C homebackup -d -D restoreonly ; date +%s
1480684671
...
1480684680

5 seconds (1480684660-1480684655) for parallel
versus
9 seconds (1480684680-1480684671) for sequential
with small backups
156M backup-homebackup.tar.gz and
316M backup-rootbackup.tar.gz

jsmeix commented at 2016-12-02 13:26:

Typo correction:
5 seconds (1480684660-1480684655) for parallel
versus
9 seconds (1480684680-1480684671) for sequential

jsmeix commented at 2016-12-02 14:16:

On a virtual machine with 1 CPUs and 1GB main memory
with bigger backups 1 GB urandom.data in /root/ and in /home/

# cat /dev/urandom | head -c 1G >/home/urandom.data
# cat /dev/urandom | head -c 1G >/root/urandom.data

# ls -lh /root/urandom.data /home/urandom.data
-rw-r--r-- 1 root root 1.0G Dec  2 14:51 /home/urandom.data
-rw-r--r-- 1 root root 1.0G Dec  2 14:29 /root/urandom.data

that result
1.2G backup-homebackup.tar.gz
and
1.4G backup-rootbackup.tar.gz
(how evil from me to torture 'tar' with random data ;-)

My timing results for two sequential mkbackuponly
versus running them in parallel:

# date +%s ; usr/sbin/rear -C rootbackup -d -D mkbackuponly ; usr/sbin/rear -C homebackup -d -D mkbackuponly ; date +%s
1480687026
...
1480687139

# date +%s ; usr/sbin/rear -C rootbackup -d -D mkbackuponly & usr/sbin/rear -C homebackup -d -D mkbackuponly & wait ; date +%s
1480687162
...
1480687265

113 seconds for two sequential mkbackuponly
versus
103 seconds for two mkbackuponly in parallel

This is an unexpected good result because during
single mkbackuponly run my "virt-manager" shows
permanent 100% CPU usage when only one 'tar' runs
so that I would not have expected any speed gain
when running two of them in parallel.
But even on that weak machine running in parallel
gets the whole job done about 10% faster!

FYI:
I did run those tests several times to be safe against
random results - i.e. I can reproduce my timing results.
I have even worse examples: 115 seconds for sequential
versus 101 seconds for parallel.

jsmeix commented at 2016-12-02 14:30:

Now restore of that on my virtual machine
with 4 CPUs and 4GB main memory:

RESCUE d25:~ # date +%s ; rear -C rootbackup -d -D restoreonly ; rear -C homebackup -d -D restoreonly ; date +%s
1480688682
...
1480688712

RESCUE d25:~ # date +%s ; rear -C rootbackup -d -D restoreonly & rear -C homebackup -d -D restoreonly & wait ; date +%s
1480688726
...
1480688746

30 seconds (1480688712-1480688682) for sequential
versus
20 seconds (1480688746-1480688726) for parallel.

On that machine my "virt-manager" shows
about 40% CPU usage when only one 'tar' runs versus
about 80% CPU usage when two 'tar' run in parallel.

jsmeix commented at 2016-12-02 15:44:

Now backup of that on that virtual machine
with 4 CPUs and 4GB main memory:

# date +%s ; usr/sbin/rear -C rootbackup -d -D mkbackuponly ; usr/sbin/rear -C homebackup -d -D mkbackuponly ; date +%s
1480690968
...
1480691073

# date +%s ; usr/sbin/rear -C rootbackup -d -D mkbackuponly & usr/sbin/rear -C homebackup -d -D mkbackuponly & wait ; date +%s
1480691760
...
1480691822

105 seconds (1480691073-1480690968) for sequential
versus
62 seconds (1480691822-1480691760) for parallel
with
about 40% CPU usage for sequential
about 80% CPU usage for parallel.

jsmeix commented at 2016-12-05 11:31:

With https://github.com/rear/rear/issues/1096
fixed by https://github.com/rear/rear/pull/1101
the last missing step for
"Multiple simultaneous backups and/or restores"
is https://github.com/rear/rear/issues/1102

For ReaR 2.0 I like to have manual setup of
"Multiple simultaneous backups and/or restores"
implemented and documented so that
experienced users can use it.

Any kind of "nice to use simple automatisms"
would come later - if at all - based on helpful user feedback
(e.g. "I need it all simply just work" is no helpful feedback ;-)

jsmeix commented at 2016-12-05 13:49:

With https://github.com/rear/rear/issues/1102
fixed by https://github.com/rear/rear/pull/1103
"Multiple simultaneous backups and/or restores"
work well for me.

Example:

# grep -v '^#' etc/rear/local.conf
OUTPUT=ISO
SSH_ROOT_PASSWORD="rear"
USE_DHCLIENT="yes"
KEEP_BUILD_DIR=""
this_file_name=$( basename ${BASH_SOURCE[0]} )
LOGFILE="$LOG_DIR/rear-$HOSTNAME-$WORKFLOW-${this_file_name%.*}.log"

# grep -v '^#' etc/rear/base.conf

this_file_name=$( basename ${BASH_SOURCE[0]} )
LOGFILE="$LOG_DIR/rear-$HOSTNAME-$WORKFLOW-${this_file_name%.*}.log"
BACKUP_PROG_EXCLUDE=( /root/* /home/* )
BACKUP_PROG_ARCHIVE="backup-${this_file_name%.*}"
BACKUP=NETFS
BACKUP_OPTIONS="nfsvers=3,nolock"
BACKUP_URL=nfs://10.160.4.244/nfs

# grep -v '^#' etc/rear/rootbackup.conf 

MESSAGE_PREFIX="PID$$: "
PROGRESS_MODE="plain"
PROGRESS_WAIT_SECONDS="3"
this_file_name=$( basename ${BASH_SOURCE[0]} )
LOGFILE="$LOG_DIR/rear-$HOSTNAME-$WORKFLOW-${this_file_name%.*}.log"
BACKUP_ONLY_INCLUDE="yes"
BACKUP_PROG_INCLUDE=( /root/* )
BACKUP_PROG_ARCHIVE="backup-${this_file_name%.*}"
BACKUP=NETFS
BACKUP_OPTIONS="nfsvers=3,nolock"
BACKUP_URL=nfs://10.160.4.244/nfs

# grep -v '^#' etc/rear/homebackup.conf 

MESSAGE_PREFIX="PID$$: "
PROGRESS_MODE="plain"
PROGRESS_WAIT_SECONDS="2"
this_file_name=$( basename ${BASH_SOURCE[0]} )
LOGFILE="$LOG_DIR/rear-$HOSTNAME-$WORKFLOW-${this_file_name%.*}.log"
BACKUP_ONLY_INCLUDE="yes"
BACKUP_PROG_INCLUDE=( /home/* )
BACKUP_PROG_ARCHIVE="backup-${this_file_name%.*}"
BACKUP=NETFS
BACKUP_OPTIONS="nfsvers=3,nolock"
BACKUP_URL=nfs://10.160.4.244/nfs

# usr/sbin/rear -C base -d -D mkbackup

# usr/sbin/rear -C rootbackup -d -D mkbackuponly & usr/sbin/rear -C homebackup -d -D mkbackuponly & wait ; echo done

Recovery:

RESCUE f79:~ # rear -C base -d -D recover

RESCUE f79:~ # rear -C rootbackup -d -D restoreonly & rear -C homebackup -d -D restoreonly & wait ; echo done

jsmeix commented at 2016-12-05 13:53:

The final step from my current point of view is
https://github.com/rear/rear/issues/1104

jsmeix commented at 2016-12-07 11:14:

After the final step
https://github.com/rear/rear/issues/1104
is merged, there is one more "very final" step:
https://github.com/rear/rear/issues/1110

jsmeix commented at 2016-12-07 13:28:

@gozora
I have a request:

In my current documentation for multiple backups
https://github.com/rear/rear/blob/master/doc/user-guide/11-multiple-backups.adoc
I describe in particular
"Relax-and-Recover Setup for Different Backup Methods"
using NETFS/'tar' and Borg which is based on your above
https://github.com/rear/rear/issues/1088#issuecomment-264009984

To be on the safe side I would very much appreciate it
if you could test if it actually works what I describe there.

Optionally - if possible - even together with
"Running Multiple Backups and Restores in Parallel"

gozora commented at 2016-12-07 13:35:

@jsmeix
Sure, will try that later today ...

gozora commented at 2016-12-07 17:08:

@jsmeix
Remarks from my quick tests:

  • BORGBACKUP_ARCHIVE_PREFIX="backup-${this_file_name%.*}" can be used only if it expands to alphanumeric values only, hence config filenames like backup_home.conf will not be accepted.
  • if ReaR fails it reports: "Aborting due to an error, check /var/log/rear/rear-mate.log", this looks to be wrong since e.g. "rear-mate.1084.log" was in use.

Unfortunately I don't have more time for tests today :-(. I'll try to continue tomorrow.

V.

jsmeix commented at 2016-12-08 15:47:

With
https://github.com/rear/rear/pull/1115
merged, now when rear errors out
it shows the right logfile name.

jsmeix commented at 2016-12-08 16:23:

With
https://github.com/rear/rear/commit/e5cbf1f46cded901ec1df505f28cf6b24562efa3
I use in doc/user-guide/11-multiple-backups.adoc simple

BORGBACKUP_ARCHIVE_PREFIX="rear"

to avoid the first item in the list in the above
https://github.com/rear/rear/issues/1088#issuecomment-265507663

gozora commented at 2016-12-08 16:26:

@jsmeix actually you can remove it all.
BORGBACKUP_ARCHIVE_PREFIX="rear" is defined in default.conf

jsmeix commented at 2016-12-08 16:42:

Guess wherefrom I got that value ;-)

I like to mention BORGBACKUP_ARCHIVE_PREFIX
because it matches the BACKUP_PROG_ARCHIVE
that I use for NETFS/tar because for multiple NETFS/tar
backups one must explicitly specify different
BACKUP_PROG_ARCHIVE values in the
different config files.

I assume when one does multiple Borg backups
it is also required to explicitly specify different
BORGBACKUP_ARCHIVE_PREFIX values
in the different config files?

Perhaps you could try out if multiple Borg backups work?

gozora commented at 2016-12-08 17:01:

As far as I remember Borg does not support multiple writes to single archive. (It was not even possible to mount archive while backup was running).
I could however try writing to two separate archives ... In any case BORGBACKUP_ARCHIVE_PREFIX can be omitted (I think ;-) )

gozora commented at 2016-12-08 17:46:

Hello @jsmeix
One other remark regarding docu. There is a slight problem with:

rear -C basic_system mkbackup
rear -C home_backup mkbackuponly

The thing is, that if you run rear -C basic_system mkbackup for the very first time /borg/keys will be empty. (as Borg creates encryption keys during repository initialization). This means that ReaR rescue/recover system will have not encryption keys available.
This can be corrected by successive executing of rear -C basic_system mkrescue or running rear -C home_backup mkbackuponly before rear -C basic_system mkbackup

So I'd correct docu as follows:

  • Option 1
rear -C home_backup mkbackuponly
rear -C basic_system mkbackup
  • Option 2
rear -C basic_system mkbackup
rear -C home_backup mkbackuponly
rear -C basic_system mkrescue

V.

jsmeix commented at 2016-12-12 11:16:

@gozora
many thanks for your valuable feedback!
I fixed the issue in your
https://github.com/rear/rear/issues/1088#issuecomment-265805547
via https://github.com/rear/rear/pull/1118

gozora commented at 2016-12-12 11:25:

Hello @jsmeix ,

Maybe one more similar to:

if ReaR fails it reports: "Aborting due to an error, check /var/log/rear/rear-mate.log", this looks to be wrong since e.g. "rear-mate.1084.log" was in use.

Once I was booted in ReaR restore/recover media, and run rear -C base_or recover and encounter failure, I had that menu with choices displayed (sorry I don't know how it is called and how it looks ;-), but I'm sure you will know what I'm talking about)

rear failed to recover 
1) edit something
2) edit that thing
3) ....
4) show log /var/log/rear/...
5) Abort
...

Once I choose 4 I got error /var/log/rear/... does not exist ...

If you can't locate where could be the problem I can try to reproduce today evening ...

V.

jsmeix commented at 2016-12-12 12:53:

I found "View Relax-and-Recover log" in
usr/share/rear/layout/prepare/default/200_recreate_hpraid.sh
and
usr/share/rear/layout/recreate/default/200_run_script.sh

gozora commented at 2016-12-12 12:57:

I think it is this one which is failing.

jsmeix commented at 2016-12-12 13:05:

Yes - one of many such cases.

The whole logging in ReaR is currently an overcomplicated mess.
At some later time I will clean that up.

But for the next ReaR 2.0 I will try to deal with the current mess.

I think I need to replace bacically everywhere
$LOGFILE by $REAR_LOGFILE

$REAR_LOGFILE is the one that is used while rear runs
while $LOGFILE can be specified different by the user
to be used as final name for the log file.

I think I will rename the meaningless REAR_LOGFILE
(what a surprise: that thingy belongs to ReaR! ;-)
to RUNTIME_LOGFILE to make its name tell
what it actually is.

jsmeix commented at 2016-12-12 13:11:

I submitted
https://github.com/rear/rear/issues/1119
to fix that.

jsmeix commented at 2016-12-13 13:33:

I have not yet tested it but I have the dim feeling that
currently multiple backups will horribly fail with
BACKUP_TYPE=incremental or BACKUP_TYPE=differential
because those use hardcoded fixed backup archive names,
cf. prep/NETFS/default/070_set_backup_archive.sh

jsmeix commented at 2016-12-13 13:51:

https://github.com/rear/rear/commit/032203812ce85fed8f8745a902824811662163b7
documents that multiple backups are in general not supported for
BACKUP_TYPE=incremental or BACKUP_TYPE=differential

I think one can use incremental or differential backups
for one single part of the multiple backups (e.g. for the
backup of the basic system) but what will fail is
using multiple incremental or differential backups.

jsmeix commented at 2016-12-13 16:01:

Regarding
https://github.com/rear/rear/issues/1088#issuecomment-266404883
where "rear recover" fails with some kind of menue that contains
"View Relax-and-Recover log":
I have tested it with current GitHub master code
and now it works for me.

For the fun of it:
My main testing problem was to make "rear recover" fail in the
right way so that I get that "View Relax-and-Recover log".
I.e. for me "rear recover" works too good ;-)

gozora commented at 2016-12-13 16:18:

Thanks, I'll check that during my next restore failure.
For me it is not that hard to make ReaR fail during restore :-)

jsmeix commented at 2016-12-15 15:38:

With https://github.com/rear/rear/issues/1116
fixed by https://github.com/rear/rear/pull/1129
there are currently no issues left regarding multiple backups
according to how currently multiple backups is documented
to work in doc/user-guide/11-multiple-backups.adoc
so that I close this issue now because
from my current point of view multiple backups
should now work sufficiently to be released in ReaR 2.0.

For further issues with multiple backups
please report them as separated GitHub issues.

@gozora
again many thanks for all your valuable help and work!

jsmeix commented at 2016-12-15 16:10:

Regarding
https://github.com/rear/rear/issues/1088#issuecomment-263887450
I submitted
https://github.com/rear/rear/issues/1131


[Export of Github issue for rear/rear.]