#1806 PR merged: Issue #1380 - ReaR recovery fails when the OS contains a Thin Pool/Volume

Labels: enhancement, fixed / solved / done

rmetrich opened issue at 2018-05-15 15:42:

Using 'vgcfgrestore' to restore Thin LVs doesn't work, because there
isn't the metadata in the configuration file.
To be able to use Thin LVs, upon failure of 'vgcfgrestore' (or migration),
traditional 'vgcreate/lvcreate' are used instead.

This approach is sub-optimal because it requires to know all the
possible options to 'lvcreate', which is hard/impossible to do.
Nonetheless, this is better than nothing since the issue with 'lvcreate'
options was already there in case of migration.

The code using 'lvcreate' only runs upon 'vgcfgrestore' failure, which
typically happens only with Thin LVs (IMHO). Hence, on most setups, the
optimized code will still run, preventing any regression to happen.

Signed-off-by: Renaud MĂ©trich rmetrich@redhat.com

PLEASE TEST MORE.

I tested as follows:

  • regular system with 1 VG, no thin pool
  • system with 1 VG, and root filesystem on thin pool + some special LVs (mirror, raid1, raid5, etc).

I plan to test tomorrow on a system with 1 VG, no thin pool but some special LVs (mirror, raid, etc) to verify that vgcfgrestore also works with special LVs.

rmetrich commented at 2018-05-16 10:50:

The proposed code does the following:

  1. During backup

  2. Collect additional LV properties

    origin: originating LV (for cache and snapshots)
    lv_layout: type of LV (mirror, raid, thin, etc)
    pool_lv: thin pool hosting LV
    chunk_size: size of the chunk, for various volumes
    stripes: number of stripes, for Mirror volumes and Stripes volumes
    stripe_size: size of a stripe, for Raid volumes

  3. Skip caches and snapshots (not supported)

  4. During restore

  5. If in Migration mode (e.g. different disks but same size), go through the vgcreate/lvcreate code (Legacy Method), printing Warnings because the initial layout may not be preserved (because we do not save all attributes needed for re-creating LVM volumes)

  6. Otherwise, try "vgcfgrestore"

  7. If it fails

    • Try "vgcfgrestore --force"
    • If it fails, use vgcreate/lvcreate (Legacy Method)
    • Otherwise, remove Thin pools (which are broken due to --force flag)
    • Create Thin pools using Legacy Method (but do not create other LVs which have been succesfully restored)

jsmeix commented at 2018-05-17 11:24:

@gdha
I am currently too busy with other stuff so that I cannot have a look here right now.
Is this enhancement useful for ReaR 2.4 or can it be postponed to ReaR 2.5?

rmetrich commented at 2018-05-17 11:29:

Thin pools are widely used along with docker.
The code has been implemented to prevent any regression when not using thin pools (the vgcfgrestore should just work). When in migration mode, the code should enhance the recovery, but this requires some testing (I tested with Thin pools, Linear volumes and Raid/Mirror volumes).

gozora commented at 2018-05-17 11:45:

Hello @rmetrich at first glance it looks good to me. I'm however not that good in reading plain code :-/ ...
But I can run couple of tests during this weekend if needed ...

V.

gdha commented at 2018-05-17 12:49:

@rmetrich End of next week I can do some automated tests - this weekend is occupied (marriage of my daughter and other work duty unfortunately). Beginning of next week I take a few days off to really relax (in Germany and Luxembourg).

Additional note: ReaR 2.4 is OK for me as my main customer is also struggling with these kind of stuff (with docker containers). And, I believe RedHat had plans to replace there 2.0 version with the latest 2.4 if I'm not mistaken.

rmetrich commented at 2018-05-17 12:54:

đź‘Ť Perfect!
We have indeed plans to rebase to 2.4 asap (RHEL7.6), but it's not committed.

jsmeix commented at 2018-05-17 13:07:

@rmetrich
only a side note FYI in general
regarding what you wrote in your initial comment

This approach is sub-optimal because it requires to know all the
possible options to 'lvcreate', which is hard/impossible to do.

You may have a look at my general opinion about such cases
e.g. in
https://github.com/rear/rear/pull/1513#issuecomment-332123766
and
https://github.com/rear/rear/pull/1204#issuecomment-282252947
and
https://github.com/rear/rear/issues/1242#issuecomment-288349895
and in the latter follow the links to the other issue comments.

But I know basically nothing at all about Thin Pool/Volume or even docker
so that I don't know if it could make any sense in this particular case
to implement config variables where the user could - if needed - specify
his particular settings for his particular Thin Pool/Volume setup.

jsmeix commented at 2018-05-17 13:15:

@rmetrich
another side note FYI in general regarding "when in migration mode":

When in migration mode it is perfectly fine to ask the user
via the UserInput function for needed values and/or let him
check and confirm something before it gets actually set up.

Only when not in migration mode "rear recover" should normally
run unattended - but even if UserInput is also needed in this case
the user could predefine UserInput values to get it run unattended,
cf. the USER_INPUT_... variables in default.conf

schabrolles commented at 2018-05-19 16:52:

@rmetrich,
I tried your patch on a sles11sp4 (POWER arch). I got the following error during rear recover

No code has been generated to recreate fs:/ (fs).
    To recreate it manually add code to /var/lib/rear/layout/diskrestore.sh or abort.
UserInput -I ADD_CODE_TO_RECREATE_MISSING_FSFS needed in /usr/share/rear/layout/prepare/default/600_show_unprocessed.sh line 33
Manually add code that recreates fs:/ (fs)
1) View /var/lib/rear/layout/diskrestore.sh
2) Edit /var/lib/rear/layout/diskrestore.sh
3) Go to Relax-and-Recover shell
4) Continue 'rear recover'
5) Abort 'rear recover'
(default '4' timeout 10 seconds)

FYI, the layout is LVM sles11 default:
/boot in a partition
/ in LVM

I don't have time to look at the reason why ... but just to be sure I've tested several time. I also confirm that I don't get any problem with the current master branch on this system.

I attache the log file of rear -D recover in debug mode
rear-rear-sles11-144.log

I'm gonna test on other system (SLES12 / RHEL7 / RHEL6 and ubuntu)

rmetrich commented at 2018-05-19 18:30:

I'll have a look on tuesday. Thanks for reporting.

Renaud.

Out of the office / Sent from my phone.

Le sam. 19 mai 2018 18:53, SĂ©bastien Chabrolles notifications@github.com
a Ă©crit :

@rmetrich https://github.com/rmetrich,
I tried your patch on a sles11sp4 (POWER arch). I got the following error
during rear recover

No code has been generated to recreate fs:/ (fs).
To recreate it manually add code to /var/lib/rear/layout/diskrestore.sh or abort.
UserInput -I ADD_CODE_TO_RECREATE_MISSING_FSFS needed in /usr/share/rear/layout/prepare/default/600_show_unprocessed.sh line 33
Manually add code that recreates fs:/ (fs)

  1. View /var/lib/rear/layout/diskrestore.sh
  2. Edit /var/lib/rear/layout/diskrestore.sh
  3. Go to Relax-and-Recover shell
  4. Continue 'rear recover'
  5. Abort 'rear recover'
    (default '4' timeout 10 seconds)

I don't have time to look at the reason why ... but just to be sure I've
tested several time. I also confirm that I don't get any problem with the
current master branch on this system.

I attache the log file of rear -D recover in debug mode
rear-rear-sles11-144.log
https://github.com/rear/rear/files/2019653/rear-rear-sles11-144.log

I'm gonna test on other system (SLES12 / RHEL7 / RHEL6 and ubuntu)

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/rear/rear/pull/1806#issuecomment-390417770, or mute
the thread
https://github.com/notifications/unsubscribe-auth/ABHBc11N1T9p6KIBGovoWBzJfBUGTsFzks5t0E3ogaJpZM4T_zHF
.

rmetrich commented at 2018-05-22 08:06:

@schabrolles From the Debug log, I do not see any create_lvmvol() call, causing the issue.
Can you provide the Debug log for the rescue creation, it's likely there that the issue is.

schabrolles commented at 2018-05-22 08:25:

@rmetrich,

What is strange is that /boot fs is present in disklayout file (I checked in the rescue image before running rear recover)

# Disk /dev/vda
# Format: disk <devname> <size(bytes)> <partition label type>
disk /dev/vda 53687091200 msdos
# Partitions on /dev/vda
# Format: part <device> <partition size(bytes)> <partition start(bytes)> <partition type|name> <flags> /dev/<partition>
part /dev/vda 413696 1048576 primary boot,prep /dev/vda1
part /dev/vda 1077936128 213909504 primary none /dev/vda2
part /dev/vda 52395245568 1291845632 primary lvm /dev/vda3
# Format for LVM PVs
# lvmdev <volume_group> <device> [<uuid>] [<size(bytes)>]
lvmdev /dev/rootvg /dev/vda3 9cavrA-gSqy-ROyO-7GWa-PXVI-Ker3-cBtZld 102334464
# Format for LVM VGs
# lvmgrp <volume_group> <extentsize> [<size(extents)>] [<size(bytes)>]
lvmgrp /dev/rootvg 4096 12491 51163136
# Filesystems (only ext2,ext3,ext4,vfat,xfs,reiserfs,btrfs are supported).
# Format: fs <device> <mountpoint> <fstype> [uuid=<uuid>] [label=<label>] [<attributes>]
fs /dev/mapper/rootvg-lvroot / ext3 uuid=487f413a-b41a-4347-89cd-e4249207ab47 label= blocksize=4096 reserved_blocks=0% max_mounts=27 check_interval=180d bytes_per_inode=13207 options=rw,acl,user_xattr
fs /dev/vda2 /boot ext3 uuid=189efaae-c4e6-4df3-af91-5150b91001f1 label= blocksize=4096 reserved_blocks=0% max_mounts=36 check_interval=180d bytes_per_inode=16380 options=rw,acl,user_xattr
# Swap partitions or swap files
# Format: swap <filename> uuid=<uuid> label=<label>
swap /dev/mapper/rootvg-lv_swap uuid=89296247-5f68-43c6-884f-862dc506ee26 label=

Here is the log during rear -D mkrescue :
rear-rear-sles11-144.log

rmetrich commented at 2018-05-22 08:43:

The issue is on /dev/mapper/rootvg-lvroot which has no lvmvol description.
This is due to my new code (me--) which seems to not work with your LVM version (help is being printed):

 2845 ++ lvm lvs --separator=: --noheadings --units b --nosuffix -o origin,lv_name,vg_name,lv_size,lv_layout,pool_lv,chunk_size,stripes,stripe_size
 2846   Logical Volume Fields
 2847   ---------------------
 2848     lv_all               - All fields in this section.
 2849     lv_uuid              - Unique identifier.
...
 2957   Unrecognised field: lv_layout

What is your LVM version?

schabrolles commented at 2018-05-22 09:19:

@rmetrich

LVM version from sles11sp4

lvm> version
  LVM version:     2.02.98(2) (2012-10-15)
  Library version: 1.03.01 (2011-10-15)
  Driver version:  4.25.0

rmetrich commented at 2018-05-22 09:52:

OK, can reproduce on RHEL6.4 which has similar version.

rmetrich commented at 2018-05-22 12:56:

@schabrolles That should make it now. I had to add quite a lot of code for older LVM versions.

schabrolles commented at 2018-05-22 16:50:

@rmetrich this is working now ... I'm planning to test it tonight on all the Linux distribution I got.
(rhel6 rhel7 ubuntu sles11 sles12) and test migration tomorrow morning.

rmetrich commented at 2018-05-23 12:03:

Tested (RHEL7 + RHEL6) and pushed. Would you prefer squashed commits?

jsmeix commented at 2018-05-23 13:00:

I prefer the default behaviour with non-squashed commits
(i.e. keep the individual commits in a pull request as they happened
to keep the history of how something was implemented)
because the final merge commit that merges a whole pull request
into the ReaR upstream master code is such a squashed commit.

For example see in
https://github.com/rear/rear/pull/1805
the individual commits "inside" the pull request
https://github.com/rear/rear/pull/1805/commits/65d0fc32e0814545e1fa28c44586e7400a5859e9
https://github.com/rear/rear/pull/1805/commits/bc1fa9878dab19ed4c5a0392d98c873c1f7845a7
https://github.com/rear/rear/pull/1805/commits/0ebeeac6d71cf3dba4e7e0085e56e066cf5f9db2
versus the final merge commit
https://github.com/rear/rear/commit/51d5f949fdd17a0fb205d865f9a4d5704a6650fe
which result in the ReaR upstream master branch
the following git log entries

|
* s.chabrolles@fr.ibm.com 51d5f949fdd17a0fb205d865f9a4d5704a6650fe Fri May 18 09:24:23 2018 +0200
|\ Merge pull request #1805 from schabrolles/get_only_partition_from_holders :
| | Verify if dm-X is a partition before adding to sysfs_paths
| |
| * s.chabrolles@fr.ibm.com 0ebeeac6d71cf3dba4e7e0085e56e066cf5f9db2 Tue May 15 07:23:51 2018 +0200
| | Update comment :
| |
| * s.chabrolles@fr.ibm.com bc1fa9878dab19ed4c5a0392d98c873c1f7845a7 Mon May 14 22:04:42 2018 +0200
| | add comments :
| |
| * s.chabrolles@fr.ibm.com 65d0fc32e0814545e1fa28c44586e7400a5859e9 Mon May 14 19:04:47 2018 +0200
|/ Verify if dm-X is a partition before adding to sysfs_paths :
|

rmetrich commented at 2018-05-23 13:18:

Alright, not touching anything then. I also prefer having the history.

rmetrich commented at 2018-05-28 09:04:

Thanks to all of you for accepting this.

razorfish-sl commented at 2018-06-03 04:51:

I noticed that REAR was not backing up physical device thin pools, today...
Not lost anything, just noticed on recovery the old data was still in the thinpool

However.....
Just thinking of a case recently.(this week)

I had a "thinpool" that was mapped to a BLOCK FILE ,then remounted on loop, and REAR obviously dealt with it correctly because it was a physical file on /dev/sda ,in the directory tree.
REAR is capable of this case since it sees a file in a directory, the config for LVM is something separate.

So possibly we may need to be careful, since in this case LVM "sees" a thin pool, but actually it is a "lie." it's actually a raw image mounted as loop.

We may get into a situation where "REAR" backs it up correctly, as a general file , BUT then backs it up AGAIN as an LVM thin pool, in this situation how would it re-create the physical device
or not get confused it is actually a file mounted as a "block device"?


[Export of Github issue for rear/rear.]