#2625 PR merged: Fix backup removal in exit task and cleanup handling of URL mountpoints

Labels: cleanup, fixed / solved / done, critical / security / legal

pcahyna opened issue at 2021-06-04 18:08:

Pull Request Details:
  • Type: Bug Fix / Cleanup

  • Impact: Critical

  • Reference to related issue (URL): #2611 #782 #465

  • How was this pull request tested?
    On CentOS 7 or CentOS 8:

    • backup and restore with backup on ISO and ISO stored on OUTPUT_URL=sftp://
    • backup to NFS with several reproducers for #2611 :
      • overriding umount by a function that does nothing and inserting a script that kills ReaR before unmounting URLs, thus triggering the exit tasks
        • both in backup and output stages (bug was present only in the output stage)
      • inserting a script that blocks umount by spawning a long running process with its cwd inside the temporary mountpoint $BUILD_DIR/outputfs
        • both in backup and output stages (bug was present only in the output stage)
      • overriding umount by a function that does nothing and returns an error
        • only in the output stage
    • backup to NFS without any bug reproducer
    • complete backup and restore with rsync over ssh (BACKUP=RSYNC)
    • backup to sshfs (BACKUP=NETFS BACKUP_URL=sshfs://...)
    • mkrescue with OUTPUT=PXE and several reproducers
      PXE_TFTP_URL: nfs://..., PXE_CONFIG_URL: nfs://. Reproducers consist of overriding umount by a function that does nothing and returns an error
      • before 800_copy_to_tftp.sh
      • before 810_create_pxelinux_cfg.sh
    • mkrescue with OUTPUT=PXE and no bug reproducer
      PXE_TFTP_URL: nfs://..., PXE_CONFIG_URL: nfs://

In the case of reproducers I chceked that other directories under the mountpoint do not get removed. In the case of no reproducers I checked that the proces completes successfully, and also that the build dir is correctly removed and that ReaR does not create and remove useless $BUILD_DIR/outputfs with URL schemes that do not need it (rsync://, sftp://).

  • Brief description of the changes in this pull request:

Cleanup of temporary mount point handling, particularly for output. Unification of mount point umount and cleanup - move to the mount_url() and umount_url() functions, callers do not need to worry about it, resulting in huge deduplication (demultiplication), this concerns both the normal path and the error path (exit tasks). Replaced the various rm -rf of the mountpoint by rmdir (this fixes #2611) and added lazy umount in case normal umount does not succeed. If build dir is kept, propose a safe way to remove it to the user (rm -Rf --one-file-system instead of just rm -Rf where the user would risk to remove everything in their mounted filesystem if still mounted.)

Stop creating the empty mountpoint ($BUILD_DIR/outputfs) for schemes that do not need to mount anything (rsync:// or anything "ftp-like".)

Fixes also some other bugs noted in the process: filesystem-specific umount command not called (https://github.com/rear/rear/commit/20359a987662cc0c3fcfa52d62d1feac7cd55850#r51319634) , unknown schemes considered invalid (noted in discussion under #932).

Identical scripts under DUPLICITY and YUM were replaced by symlinks to increase code sharing.

Reverts #782 (the change that introduced bug #2611) and #578 - it is not clear how .lockfile can exist in the unmounted filesystem, and if it does, it is a bug.

pcahyna commented at 2021-06-07 07:49:

@N3WWN hello, please review the part that touches YUM (the second, third and fourth commit in the series).

pcahyna commented at 2021-06-07 10:38:

During the work on this branch I have found many other issues that are not fixed in this PR (to keep the amount changes manageable). They are:

  1. Unexpected use of local backup scheme in RBME: 5d85fc250232921bf314643e22b210b71996884f
  2. Probably wrong symlink in BLOCKCLONE: PR #1172 , commit a4ab7406e653406c010c58a232f898d1ea85e592 @gozora
  3. Duplicate script in PXE: d850c4094238a03c9b926b88d7e1582ecd28af52
  4. UEFI seems to have broken OBDR support: 41efc97eb7141c30455df45a871b98cd08e09fa7
  5. OBDR probably got broken on ppc64le: 4ef0f30156f0afea4a02d12f40c2c9d18cbe5e43, PR #1383
  6. OUTPUT_MOUNTCMD got broken by commit 6cd218ae30601eb56098b22581345f6ea8fe0aa2, see https://github.com/rear/rear/commit/20359a987662cc0c3fcfa52d62d1feac7cd55850#r51375233
  7. OUTPUT_UMOUNTCMD is not called in exit tasks on error
  8. Possible problems with .lockfile: PR #578 - what is .lockfile actually used for? What if $OUTPUT_URL = $BACKUP_URL but $OUTPUT_PREFIX != $NETFS_PREFIX ? What if NETFS_KEEP_OLD_BACKUP_COPY is set and KEEP_OLD_OUTPUT_COPY is unset? Also, it seems that opath.old created by a run that aborts will then be removed entirely by a subsequent run, because the lockfile will get removed by 150_save_copy_of_prefix_dir.sh, resulting in no working backup. backup/NETFS/default/250_create_lock.sh claims that it "creates a lockfile in $NETFS_PREFIX to avoid that mkrescue overwrites ISO/LOGFILE made by a previous mkbackup run when the variable NETFS_KEEP_OLD_BACKUP_COPY has been set", but it is not clear how this gets achieved in the code.
  9. backup log gets copied to ${opath} even if ${opath} is an empty string (can happen when backing up to a tape), resulting in dropping the log to the root directory: https://github.com/rear/rear/blob/2433c72f614af5b22547d6bd873ff4d224901a2c/usr/share/rear/backup/NETFS/default/500_make_backup.sh#L357
  10. FDRUPSTREAM seems to copy a file to the output location when it is not mounted yet: #2142 @mutable-dan

jsmeix commented at 2021-06-08 11:31:

@pcahyna
thank you so much for that huge cleanup and severe bugfix work!

I will have a look as time permits but possibly it gets delayed until next week.

@gdha @rmetrich and all @rear/contributors
I would much appreciate it if you find some time and have a look.
Perhaps you spot this or that issue by plain looking at the code?

Regarding "Impact: Urgent (Is Urgent higher than Critical?)"
Personally I think Urgent issues have higher timing priority than Critical issues
because Urgent means actual time pressure right now while Critical does not
necessarily imply time pressure but Critical could imply time pressure indirectly.
In this particular case I think the issue is Critical (because possible loss of data)
but not actually Urgent (because no current end-user case who has the issue).

N3WWN commented at 2021-06-08 13:00:

@N3WWN hello, please review the part that touches YUM (the second, third and fourth commit in the series).

@pcahyna LGTM! I recall explicitly copying the files instead of using symlinks in case the files were still in a state of flux (the ZIPPER method was pretty new at the time) and I didn't want to force others to test the YUM method. Now that these methods have been shown to be out of active change/development, I think it's safe to convert the files to symlinks. Thanks!

pcahyna commented at 2021-06-08 13:14:

@N3WWN thanks for the review. Note though that the symlinked files are not those copied from ZYPPER (those usually have some modifications, al least s/ZYPPER/YUM/), but from NETFS. It would be nice to share files between ZYPPER and YUM, but it would take more work.

N3WWN commented at 2021-06-08 13:33:

@N3WWN thanks for the review. Note though that the symlinked files are not those copied from ZYPPER (those usually have some modifications, al least s/ZYPPER/YUM/), but from NETFS. It would be nice to share files between ZYPPER and YUM, but it would take more work.

Yes, you're correct. It's been a minute since I've really looked at the code, but still LGTM even after my mistaken reference ;)

gdha commented at 2021-06-11 07:07:

@pcahyna concerning OBDR - I cannot recall anyone ever used this (besides myself when I implemented this >10yrs ago and since then never used nor tested). I would even vote to remove this from ReaR (why maintaining if not used)?
Lots of changes - so we will see how these behave after a while ;-)
The problem with ReaR is/was it tries to do good for too many people without the needed backup/help from the community and then small mistakes creep in... It is good to clean up the code.
And for that a big kudo for @pcahyna and @jsmeix

pcahyna commented at 2021-06-11 08:01:

@gdha thank you for your review. Could you please have a look at the problems reported above, especially item 3. (apparently redundant script under PXE) and item 8. (intent of .lockfile)?

pcahyna commented at 2021-06-18 10:21:

I am going to push some commits to address what you have found and rebase. I will revert d850c4094238a03c9b926b88d7e1582ecd28af52 as well.

jsmeix commented at 2021-06-18 10:42:

@pcahyna
if you like you may do as a "by the way" addon in this pull request
https://github.com/rear/rear/pull/2608#issuecomment-832939131
https://github.com/rear/rear/pull/2608#issuecomment-834219708

pcahyna commented at 2021-06-18 11:17:

@jsmeix all the updates pushed. If it looks ok, I will rebase one more time to squash a fixup commit into its ancestor and then you can merge the PR.

pcahyna commented at 2021-06-18 11:20:

concerning OBDR - I cannot recall anyone ever used this (besides myself when I implemented this >10yrs ago and since then never used nor tested). I would even vote to remove this from ReaR (why maintaining if not used)?

@gdha Do new tape drives support OBDR, or is it something that HP dropped a long time ago? If it got dropped, I am for removing it. If it is still supported on reasonably modern hardware, I could try resurrecting it.

jsmeix commented at 2021-06-18 11:32:

From plain looking at the changes all looks good to me
so @pcahyna you can rebase and squash as you need.

pcahyna commented at 2021-06-18 11:57:

rebase finished.

jsmeix commented at 2021-06-18 11:57:

FYI regarding OBDR:
At SUSE we got in July 2018 a customer issue report about

ReaR OUTPUT=OBDR support missing for PPC64LE (needs to be implemented)

on SLES12 SP2 with SAP on PPC64LE using ReaR 1.18 which resulted
https://github.com/rear/rear/issues/1868

In our SUSE internal issue report I found this piece of information:

Tape drive is a IBM 7226 multimedia enclosure
It is not clear at this time if it supports OBDR ...
since 7226 multimedia enclosure seems
no function of OBDR(not sure)

cf. https://www.ibm.com/products/7226-multimedia-enclosure
and https://www.ibm.com/downloads/cas/XQ1DVZJP
where none of them contain OBDR so it seems
a IBM 7226 multimedia enclosure does not support OBDR.

There was no further info whether or not that drive actually supports OBDR
or any further info how far OBDR actually worked for them, cf.
https://github.com/rear/rear/issues/1868#issuecomment-406592866

pcahyna commented at 2021-06-18 11:59:

rebased again to fix a typo in commit message

pcahyna commented at 2021-06-18 12:00:

I thought that OBDR was HP-specific, but maybe it got implemented by other vendors as well.

jsmeix commented at 2021-06-18 12:03:

If there are no objections right now
I would like to merge it in about half an hour.

jsmeix commented at 2021-06-18 12:05:

When OBDR is/was HP-specific it explains why IBM documents don't mention it.

pcahyna commented at 2021-06-18 12:13:

I pushed again to fix another tiny commit message mistake, no functional difference.

gdha commented at 2021-06-18 12:27:

@pcahyna Yes, OBDR is HPE specific - see the included PDF file.

I thought that OBDR was HP-specific, but maybe it got implemented by other vendors as well.

Compatibility_Matrices.pdf

That being said we never got a complaint it wasn't working (and I'm pretty sure it is broken in the meantime) - IMHO I'm in favor of removing it altogether. @dagwieers Do you mind if it was gone as you were the only one ever used by my knowledge?

jsmeix commented at 2021-06-18 13:05:

@pcahyna
again many thanks for your huge bugfix and cleanup work!

I wish you and all ReaR contributors and users
a relaxed and recovering weekend!


[Export of Github issue for rear/rear.]