#2831 PR merged: Refactor rsync URL support, fixes rsync OUTPUT_URL

Labels: enhancement, cleanup, fixed / solved / done

pcahyna opened issue at 2022-06-29 10:04:

Pull Request Details:
  • Type: Bug Fix

  • Impact: Normal

  • Reference to related issue (URL): closes #2781

  • How was this pull request tested?

    • recovery using OUTPUT=USB BACKUP=RSYNC with a rsync://...:/... (rsync over ssh) and rsync://...::/... (rsyncd) BACKUP_URL
    • recovery using OUTPUT=ISO BACKUP=RSYNC with a rsync://...:/... (rsync over ssh) BACKUP_URL and three values of OUTPUT_URL: rsync://...:/... (rsync over ssh) but different from BACKUP_URL, rsync://...::/... (rsyncd), and undefined (to inherit BACKUP_URL). Verified that the output ISO is found at the right places on the rsync server.
    • mkrescue and mkbackuponly to localhost with a rsync://...:/... (rsync over ssh) BACKUP_URL and OUTPUT_URL, the latter pointing to a different directory than the former. Verified that after mkrescue the ISO is under OUTPUT_URL and not BACKUP_URL.
  • Brief description of the changes in this pull request:

The code to parse rsync:// URLs was BACKUP_URL specific.
If one specified BACKUP=RSYNC and an OUTPUT_URL different from BACKUP_URL,
the OUTPUT_URL was ignored and the output files went to BACKUP_URL.

Fix by introducing generic functions for rsync URL parsing and
use them for both BACKUP_URL and OUTPUT_URL, as appropriate.
Replace all uses of global RSYNC_* variables derived
from BACKUP_URL by those functions.

jsmeix commented at 2022-06-29 11:59:

@pcahyna
thank you very much for that major cleanup!

Could you please provide me a summary of the advantages
but also the possibly backward incompatible changes
that could happen because of this cleanup?

I would like to add that summary to the
"New features, bigger enhancements, and possibly backward incompatible changes"
section in the ReaR 2.7 release notes which are currently at
https://github.com/rear/rear.github.com/blob/jsmeix-release-notes-2-7/documentation/release-notes-2-7.md

pcahyna commented at 2022-06-29 15:12:

During the work on this PR I have found many other anomalies in the rsync code, most of them are left untouched.

pcahyna commented at 2022-06-29 19:16:

Could you please provide me a summary of the advantages
but also the possibly backward incompatible changes
that could happen because of this cleanup?

I suppose you are interested in changes since 2.6, that is this change and the change that triggered #2781 combined. Ok, I will provide a summary.

pcahyna commented at 2022-06-29 19:55:

@jsmeix "Rsync OUTPUT_URLs are now properly supported with BACKUP=RSYNC. Previously the output went to the location specified by BACKUP_URL and OUTPUT_URL was ignored. One exception was OUTPUT=PXE, where the output was uploaded to OUTPUT_URL in addition to BACKUP_URL, but RSYNC_PREFIX was not respected and the interpretation of the URL was different - an URL of the form rsync://[USER@]HOST[:PORT]/PATH was interpreted as using the rsync protocol, while in all other cases such URL would be interpreted as using rsync over ssh. This special handling is now removed - a rsync OUTPUT_URL with OUTPUT=PXE now creates the RSYNC_PREFIX directory at the destination and the URL is interpreted as in all other cases. "

Does it make sense? I know it is complicated. Too many special cases, fortunately it is about removing some of them...

jsmeix commented at 2022-06-30 07:56:

@pcahyna
thank you for your summary!
Yes, it does make sense.

didacog commented at 2022-07-05 08:16:

Hello, seems that you are avoiding OUTPUT_PREFIX_PXE and is not being created when set in the config.

jsmeix commented at 2022-07-05 09:02:

@didacog
I will act for now here for @pcahyna
until he has again time to look here
but I am neither a PXE user nor a rsync user
so I can not at all imagine what you have
in your etc/rear/local.conf

Therefore please provide here sufficient information
that I have at least a chance to understand your issue
in particular your complete etc/rear/local.conf
and your complete etc/rear/site.conf if you use one.
I need the actual ReaR config files because
I am not a DRLM user so DRLM config files are "foreign" for me.
Additionally I need a complete ReaR debugscript log file
to have a chance to deduce what happens on your system.

Only FYI:
I want to release ReaR 2.7 on the date specified on
https://github.com/rear/rear/milestones
because I already postponed it rather often and
I can't wait endlessly until every issue is solved
because I am under increasing pressure to release
a new ReaR version (regardless if it is perfect or not).
For SUSE I could fix missing things relatively easy
by SUSE maintenance updates for our SLE-HA customers
and I assume it is same for Red Hat for RHEL customers.
Other users could use our latest master code.

jsmeix commented at 2022-07-05 09:07:

Regarding OUTPUT_PREFIX_PXE I see no difference between
the changed code here and the master code:

localhost:~/rear.pcahyna.rsync-url-fix-refactor.pullrequest2831 # find usr/sbin/rear usr/share/rear -type f | xargs grep 'OUTPUT_PREFIX_PXE' >/tmp/OUTPUT_PREFIX_PXE.pcahyna.rsync-url-fix-refactor

localhost:~/rear.github.master # find usr/sbin/rear usr/share/rear -type f | xargs grep 'OUTPUT_PREFIX_PXE' >/tmp/OUTPUT_PREFIX_PXE.rear.github.master

# diff -s /tmp/OUTPUT_PREFIX_PXE.rear.github.master /tmp/OUTPUT_PREFIX_PXE.pcahyna.rsync-url-fix-refactor
Files /tmp/OUTPUT_PREFIX_PXE.rear.github.master and /tmp/OUTPUT_PREFIX_PXE.pcahyna.rsync-url-fix-refactor are identical

so it seems nothing was changed regarding OUTPUT_PREFIX_PXE here.

didacog commented at 2022-07-05 09:27:

Regarding OUTPUT_PREFIX_PXE I see no difference between the changed code here and the master code:

localhost:~/rear.pcahyna.rsync-url-fix-refactor.pullrequest2831 # find usr/sbin/rear usr/share/rear -type f | xargs grep 'OUTPUT_PREFIX_PXE' >/tmp/OUTPUT_PREFIX_PXE.pcahyna.rsync-url-fix-refactor

localhost:~/rear.github.master # find usr/sbin/rear usr/share/rear -type f | xargs grep 'OUTPUT_PREFIX_PXE' >/tmp/OUTPUT_PREFIX_PXE.rear.github.master

# diff -s /tmp/OUTPUT_PREFIX_PXE.rear.github.master /tmp/OUTPUT_PREFIX_PXE.pcahyna.rsync-url-fix-refactor
Files /tmp/OUTPUT_PREFIX_PXE.rear.github.master and /tmp/OUTPUT_PREFIX_PXE.pcahyna.rsync-url-fix-refactor are identical

so it seems nothing was changed regarding OUTPUT_PREFIX_PXE here.

@jsmeix
I just figured it out because of this comment:

This special handling is now removed - a rsync OUTPUT_URL with OUTPUT=PXE now creates the RSYNC_PREFIX directory at the destination and the URL is interpreted as in all other cases. "

didacog commented at 2022-07-05 09:32:

@didacog I will act for now here for @pcahyna until he has again time to look here but I am neither a PXE user nor a rsync user so I can not at all imagine what you have in your etc/rear/local.conf

Therefore please provide here sufficient information that I have at least a chance to understand your issue in particular your complete etc/rear/local.conf and your complete etc/rear/site.conf if you use one. I need the actual ReaR config files because I am not a DRLM user so DRLM config files are "foreign" for me. Additionally I need a complete ReaR debugscript log file to have a chance to deduce what happens on your system.

Only FYI: I want to release ReaR 2.7 on the date specified on https://github.com/rear/rear/milestones because I already postponed it rather often and I can't wait endlessly until every issue is solved because I am under increasing pressure to release a new ReaR version (regardless if it is perfect or not). For SUSE I could fix missing things relatively easy by SUSE maintenance updates for our SLE-HA customers and I assume it is same for Red Hat for RHEL customers. Other users could use our latest master code.


@jsmeix
I understand, but I think this regression should be fixed before releasing, or at least an agreement on how to solve it. Before any change ReaR always created the PXE subfolder in the URL path, and is important for security reasons. I'm sure nobody wants to share data over tftp because backup and pxe outputs are in the same place ...
Hopefully we can solve this before July 12th.

Regards,
Didac

pcahyna commented at 2022-07-05 15:32:

@jsmeix still here... I don't think that OUTPUT_PREFIX_PXE is the problem (it had no special handling in the old code indeed, so it should not need one now), but there is something else going on.

jsmeix commented at 2022-07-06 07:43:

@pcahyna
I appreciate it that you still participate here
but you don't need if you don't like and when you have
currently better things "to do" with "higher priority" ;-)

jsmeix commented at 2022-07-06 07:50:

I have a plan how to proceed with this issue:

I would like to merge it tomorrow afternoon
unless there are objections from @pcahyna or @rear/contributors

My reasoning:
In its current state it is another good step forward
towards cleaned up rsync code in ReaR.
And when it is merged it is simpler and easier to get tested
by all users who use our current GitHub master code.

Remaining issues because of our rsync code cleanup in ReaR
can and will be fixed via separated pull requests.

jsmeix commented at 2022-07-06 08:15:

In general regarding BACKUP=... versus OUTPUT=... :

The final goal should be that consistently and in general
for all combinations of internal BACKUP methods and OUTPUT methods
that are specified in etc/rear/local.conf like

OUTPUT=output_method
OUTPUT_URL=output_scheme://output/destination
BACKUP=backup_method
BACKUP_URL=backup_scheme://backup/destination

the RESULT_FILES (i.e. the ReaR recovery system files)
get sent to the output/destination
via the output_scheme protocol
and
the backup files (e.g. backup.tar.gz and backup.log)
get sent to the backup/destination
via the backup_scheme protocol.

Optionally (but not strictly required)
at most one of OUTPUT_URL or BACKUP_URL could be left out
and then the one that is specified determines both.

For example the usual way like

OUTPUT=output_method
BACKUP=backup_method
BACKUP_URL=backup_scheme://backup/destination

the RESULT_FILES (i.e. the ReaR recovery system files)
get sent to the backup/destination
via the backup_scheme protocol
and
the backup files (e.g. backup.tar.gz and backup.log)
get sent to the backup/destination
via the backup_scheme protocol.

But also the other way round should work like

OUTPUT=output_method
OUTPUT_URL=output_scheme://output/destination
BACKUP=backup_method

the RESULT_FILES (i.e. the ReaR recovery system files)
get sent to the output/destination
via the output_scheme protocol
and
the backup files (e.g. backup.tar.gz and backup.log)
get sent to the output/destination
via the output_scheme protocol.

External BACKUP methods can behave as they like
(i.e. whatever is reasonable for each particular external BACKUP method).

pcahyna commented at 2022-07-07 09:51:

@jsmeix yes, I agree, the problem is that the current code has special output URL handling for BACKUP=RSYNC, which is inconsistent, since output should be independent on backup. See

https://github.com/pcahyna/rear/blob/5cb10407d08e0d923219c9b013d420a4b632843f/usr/share/rear/output/default/950_copy_result_files.sh#L132

This pull request makes the code respect OUTPUT_URL if it is not identical to BACKUP_URL, but does not remove this special handling. Should it be removed entirely before 2.7 ? Would be good for consistency, but it would mean more changes and more testing (and also behavior changes).

pcahyna commented at 2022-07-07 09:58:

the problem is (...)

note that this is not the problem reported by @didacog, it is a separate problem (and not a regression, it has been always there).

jsmeix commented at 2022-07-07 10:07:

@pcahyna
I would prefer to not add new bigger changes for ReaR 2.7
so we could finally release it "as is" - even when 'rsync'
is currently not yet well cleaned up - but we also have some
more areas where current things in ReaR are not yet clean
(e.g. OUTPUT=USB and ReaR recovery system bootloader setup)
but nevertheless the current state should be sufficiently OK
to be released.

jsmeix commented at 2022-07-07 10:14:

By the way only as a side note FYI see my
https://github.com/rear/rear/pull/2577#issuecomment-789696867
and the subsequent comment there
which looks like an inconsistency to me.

I think there should not be two subtle different methods
in ReaR to do the backup via rsync and I still do not understand
the reason behind why there are two different implementations
in ReaR to do the backup via rsync - this all is too confusing.

pcahyna commented at 2022-07-07 10:14:

@jsmeix I understand, the only advantage would be to make all related behavior changes at the same time, so that users would not have to adapt their workflows/configs multiple times.

jsmeix commented at 2022-07-07 10:22:

Normally users would not have to adapt their workflows/configs multiple times
because when a user has a working disaster recovery procedure with ReaR,
then he should not update ReaR ("never change a working system").
In contrast when things do not work sufficiently with an older ReaR version,
then the user should try out if a newer ReaR version works better.
Cf. the section "Version upgrades with Relax-and-Recover" in
https://en.opensuse.org/SDB:Disaster_Recovery

So we can move forward little step by little step
together with users who test each little step
which is the only way we can move forward
with our manpower at ReaR upstream.

pcahyna commented at 2022-07-07 10:25:

By the way only as a side note FYI see my #2577 (comment) and the subsequent comment there which looks like an inconsistency to me.

I think there should not be two subtle different methods in ReaR to do the backup via rsync and I still do not understand the reason behind why there are two different implementations in ReaR to do the backup via rsync - this all is too confusing.

Indeed, that has been very confusing to me as well, but it is yet a different kind of confusion than the one here. (And I have thought that the BACKUP=NETFS and BACKUP_PROG=rsync does not use rsync over ssh, but local rsync to a mounted filesystem, which makes more sense: the rsync invocation reads

$BACKUP_PROG --verbose "${BACKUP_RSYNC_OPTIONS[@]}" --one-file-system --delete \
            --exclude-from=$TMP_DIR/backup-exclude.txt --delete-excluded \
            $(cat $TMP_DIR/backup-include.txt) "$backuparchive" >&2

and $backuparchive does not contain anything ssh-transport-related. )

jsmeix commented at 2022-07-07 10:34:

@pcahyna
for the fun of it:
It's a great relief to me to see that also you and @gdha seem to be confused
what BACKUP=NETFS with BACKUP_PROG=rsync versus BACKUP=RSYNC actually does
so I feel less lonely with my own confusion :-)

jsmeix commented at 2022-07-07 10:40:

@rear/contributors
I would like to merge it in a few hours
(at about 15:00 CEST) unless there are objections

didacog commented at 2022-07-07 12:52:

@jsmeix please see
https://github.com/rear/rear/issues/2781#issuecomment-1177558312
before merging!

jsmeix commented at 2022-07-07 13:01:

@didacog
thank you for debugging and for your fix!

Of course I will wait now until @pcahyna had a look.

So I will not merge it this week.
Perhaps next week if all goes well
(I am not in the office tomorrow).

jsmeix commented at 2022-07-11 11:08:

@pcahyna
THANK YOU SO MUCH!

I think I can feel the "URI/URL parsing hell" a bit ;-)
I was in there when I cleaned up the url_* functions :-)
(cf. https://github.com/rear/rear/issues/856 and others)

jsmeix commented at 2022-07-11 11:11:

According to
https://github.com/rear/rear/issues/2781#issuecomment-1180233302
the issue https://github.com/rear/rear/issues/2781
is fixed with the latest commit here
https://github.com/rear/rear/pull/2831/commits/e7556a13eb95f21b5fb749c089facaf8678f3cd4
so I would like to merge this pull request tomorrow afternoon
unless there are objections by @rear/contributors

pcahyna commented at 2022-07-12 18:25:

By the way, I tested the code after the last commit again by backing up/recovering to/from a rsync:// BACKUP_URL (only rsync over ssh was tested though) and several possibilities for OUTPUT_URL. All the tests pass.


[Export of Github issue for rear/rear.]