#2269 PR merged: Issue 2247: draft implementation of 'mountonly' workflow.

Labels: enhancement, documentation, fixed / solved / done

petroniusniger opened issue at 2019-10-29 13:47:

First a rescue image + backup archive (mkbackup workflow) has been created for a VM (openSUSE Leap 15.0). The VM has then been booted off the rescue ISO and the new repair workflow executed to make sure that all local filesystems ended up mounted below /mnt/local (these included LVM Logical Volumes and a LUKS-encrypted filesystem), as well as the most important virtual ones (sys, proc, dev). I then checked that it was possible to chroot inside the system and run YaST. I also verified that once I exited the chroot environment and shut down the VM, that process cleanly unmounted all target filesystems.

In a second test, I also verified that I was able to do the same even when no backup archive was present (i.e. having only run the mkrescue workflow during the preparation phase).

  • Brief description of the changes in this pull request:

The philosophy I followed was to take advantage of the existing disklayout.conf (generated during the rescue workflow) and to reuse as much as possible of the filesystem recreation mechanism while implementing an alternative version of it that would only mount but not recreate anything (but still based on the same dependency analysis and the auto-generation of an intermediate script).

  • More detailed overview of the changes:

    • a new repair workflow has been created, described as "use ReaR as live
      media to repair the system" in the online help message

    • the repair workflow is composed of the following stages: setup,
      check, layout/prep-for-mount, layout/do-mount and wrapup
      (setup and wrapup are reused as they are -- the other three stages
      are new)

    • layout/prep-for-mount is mostly a clone of layout/prepare (through
      symlinks), but with a few files of its own (540_generate_device_code.sh,
      550_finalize_script.sh and 600_show_unprocessed.sh), modified versions
      of the original ones

    • layout/do-mount is a clone of layout/recreate (through symlinks), but
      with only three action scripts kept (100_confirm_layout_code.sh,
      200_run_layout_code.sh and 250_verify_mount.sh -- no local
      modification as of yet)

    • check is a minimalist clone of verify (through symlinks) that ignores
      all the supported backup engines. This also makes it possible for the
      repair workflow to be used even when no backup is available

    • in layout/prepare/GNU/Linux/160_include_luks_code.sh (sourced through
      its symlink layout/prep-for-mount/GNU/Linux/160_include_luks_code.sh),
      I've added a new function open_crypt() (cloned from create_crypt())
      that only performs the cryptsetup luksOpen operation

    • in lib/layout-functions.sh, a new function do_mount_device()
      (cloned from create_device()) deals with opening (as in the case of
      LUKS-encrypted filesystems) and then mounting the filesystems. To do
      this, it may call the new function open_crypt() or existing mount_*()
      functions

    • a new action script 540_generate_device_code.sh, specific to the
      prep-for-mount stage (cloned from the one that comes with the
      prepare stage) will process the disk layout and generate a custom
      script (still called diskrestore.sh) to perform the actual mount of
      all the needed filesystems (including the virtual ones -- that part is
      currently hardcoded). That script will be called later by
      layout/do-mount/default/200_run_layout_code.sh

    • a new example scenario has been added to chapter 4 of the User Guide,
      describing how to use the new repair workflow and why you would
      want to do so

Still TBD: once the repair workflow has been run, disable call to rear recover without rebooting first to make sure we always start from a known state, as suggested by @schlomo (in an email dated March 14, 2019).

jsmeix commented at 2019-11-05 09:18:

@rear/contributors
I would like to get at least one more review from another ReaR maintainer
to be more on the safe side that merging this pull request
will not cause regressions for existing use cases.

petroniusniger commented at 2019-11-05 09:55:

@jsmeix
Many thanks for your review -- I'll update the comments accordingly.

As to your plan to merge the PR tomorrow: I'm currently working on a small improvement initially suggested by @schlomo to prevent running the recover workflow straight after repair but to request a reboot first in order to start from a pristine state. I already have a working implementation of that improvement, but would prefer to run some more tests before committing it.

Knowing this, do you prefer waiting for my commit (which I'm sure will warrant further review, though more limited in scope), or should I rather focus on fixing the current review comments to let you merge and then submit the aforementioned improvement as a separate branch?

jsmeix commented at 2019-11-05 10:35:

@petroniusniger
of course I will wait until you tell me that your consider your pull request
to be sufficiently complete to provide a useful piece of functionality.

But there is no need to wait until all is done for the repair workflow.

I would prefer to implement new things step by step
via several pull requests as long as each pull request
implements a consistent self contained state.

What I mean in general is:

"Submit early, submit often"
(i.e. do GitHub pull requests early and often),
cf. Wikipedia: Release early, release often
http://www.wikipedia.org/wiki/Release_early,_release_often

Do not mix up several independent separated issues in one big all-in-one
GitHub pull request because a pull request cannot be partially accepted.

On the other hand do not split several changes that belong to each other
into several pull requests.

What I mean is mainly that a single piece of functionality like
"initial very basic implementation of the repair workflow"
should have one pull request.

The final full featured repair workflow may consist of many single
pieces of functionality so many pull requests could be needed.

petroniusniger commented at 2019-11-05 10:58:

@jsmeix
Thanks -- understood. I don't intend to further develop the new workflow at this stage, but I would prefer for the requested interlock mechanism to be present from the start.

I'm integrating your review comments as we speak.

jsmeix commented at 2019-11-06 09:34:

@petroniusniger
it seems your recent changes do not yet appear here in this pull request,
cf. https://github.com/rear/rear/pull/2269/files
for the current content of the files of this pull request.

Did you perhaps forget to do a git push to "upload" your
recent changes from your local git repository to this pull request?

Cf. the "Contributing" section in
http://relax-and-recover.org/development/
and
https://2buntu.com/articles/1459/keeping-your-forked-repo-synced-with-the-upstream-source/
that is referenced as 'this article' in the above "Contributing" section in
"see also this article for an illustrated guide to keep your forked repo in sync"

petroniusniger commented at 2019-11-12 12:23:

@jsmeix : I've not forgotten the push. I've not pushed yet because I'm not finished implementing all your review comments (I still need to rename the workflow). What's more, I don't want to push any code that I've not tested.

I hope I'll be able to finish this week (including the eventual push), but that also depends on any emergencies that may surface at the office.

jsmeix commented at 2019-11-12 13:44:

@petroniusniger
thank you for your explanation.
All is perfectly fine - no rush - take your time.

Seems you also suffer from arbitrary kind of weird
"emergencies at the office" that appear all of a sudden ;-)

petroniusniger commented at 2019-11-15 15:20:

"Interlock" between workflows: Implementation details
As suggested by @schlomo, I have put in place a new mechanism to determine which workflows can be chained in the same session and which cannot (if the user tries to launch a 2nd workflow which we think requires a clean initial state, he will be requested to reboot first).

The current implementation only takes care of allowing certain workflows (restoreonly and finalizeonly) after mountonly while prohibiting all others. But I tried to design it in such a way that it could easily be extended to cover other cases as well.

Here is how this works:

  • a new script in the setup phase (called 002_clean_start.sh) defines a "breadcrumb" file ($VAR_DIR/last_run_workflow)
  • if the file exists, it reads it content: the name of the last run workflow
  • it then goes through a case matrix (last workflow vs current) to decide whether to abort or to proceed
  • if the file doesn't exist, the current workflow proceeds
  • generation of the "breadcrumb" file should ideally happen in the first script that actually alters the state of the recovery environment. In the case of the mountonly workflow, I've set as the first task inside diskrestore.sh (generated by layout/prep-for-mount/default/540_generate_device_code.sh)

petroniusniger commented at 2019-11-15 16:29:

Tests ran with the modified code:

  • generated fresh 2-step backup (mkbackup + mkbackuponly) of test VM
  • booted VM on recovery media and ran new mountonlyworkflow:
    • all local filesystems correctly mounted (including LUKS-encrypted and virtual ones)
    • chrooted inside target system
    • ran YaST, loaded "bootloader" configuration module (then quit without changes)
  • tried running recover workflow:
    • was denied with the expected error message
  • tried running restoreonly workflow:
    • proceeded and spawned a sub-shell asking to start the restore process on the backup host
    • left the sub-shell without restoring anything
  • rebooted the VM, still to the recovery media
  • tried again running the recover workflow:
    • this time, started as expected
    • aborted by answering 'no' to the first confirmation prompt

jsmeix commented at 2019-11-18 15:05:

@rear/contributors
I would appreciate a second review from another ReaR maintainer
to be more on the safe side against possible regressions.

petroniusniger commented at 2019-11-21 16:11:

@jsmeix
From my point of view, this branch is now ready for merge -- unless someone else offers to review it, of course.

petroniusniger commented at 2019-11-21 16:12:

Updated PR title to reflect the renaming of the workflow.

jsmeix commented at 2019-11-22 13:19:

I would like to wait until next monday
for a second review from another ReaR maintainer.

When there is no objection I would merge it next monday afternoon.

jsmeix commented at 2019-11-26 08:22:

@petroniusniger
thank you for your valuable contribution to ReaR
in particular because it perfectly complements and
seamlessly integrates with existing functionality in ReaR.
That is much appreciated!


[Export of Github issue for rear/rear.]