#2259 Issue closed: Recover fails to recreate LVM (lvmvol entries missing in disklayout.conf with older LVM that does not support 'pool_lv' et al.)

Labels: enhancement, bug, documentation, cleanup, fixed / solved / done

abbbi opened issue at 2019-10-21 14:42:

Trying to recover a SLES12 based system with MDADM RAID1 and identical disk layout results in basically no MDADM raid or any partition/fs to be created. I tried with identical disk size and slightly bigger disks.

Im not quite sure where the issue is here, the recovery fails with:

No code has been generated to recreate fs:/usr (fs)

but it does not attempt to create any of the partitions or mdadm raid settings. I tried with REAR 2.4 and REAR 2.5, and recent GIT checkout, is there anything im missing?

Complete recover log and layout in:

http://grinser.de/~abi/layout.tar.gz

UserInput: Valid choice number result 'Confirm disk mapping and continue 'rear recover''
User confirmed disk mapping
Applied disk layout mappings to /var/lib/rear/layout/disklayout.conf
Applied disk layout mappings to /var/lib/rear/layout/config/df.txt
Applied disk layout mappings to /etc/rear/rescue.conf
Examining msdos disk /dev/sda to automatically resize its last active partition
Checking /dev/sda1 if it is the last partition on /dev/sda
Checking /dev/sda2 if it is the last partition on /dev/sda
Checking /dev/sda3 if it is the last partition on /dev/sda
Found 'primary' partition /dev/sda3 as last partition on /dev/sda
Determining if last partition /dev/sda3 is resizeable
Determining new size for last partition /dev/sda3
Determining if last partition /dev/sda3 actually needs to be increased or shrinked
New /dev/sda is 111384567808 bytes bigger than old disk
Skip increasing last partition /dev/sda3 (new disk less than 10% bigger)
Examining msdos disk /dev/sdb to automatically resize its last active partition
Checking /dev/sdb1 if it is the last partition on /dev/sdb
Checking /dev/sdb2 if it is the last partition on /dev/sdb
Checking /dev/sdb3 if it is the last partition on /dev/sdb
Found 'primary' partition /dev/sdb3 as last partition on /dev/sdb
Determining if last partition /dev/sdb3 is resizeable
Determining new size for last partition /dev/sdb3
Determining if last partition /dev/sdb3 actually needs to be increased or shrinked
New /dev/sdb is 111384567808 bytes bigger than old disk
Skip increasing last partition /dev/sdb3 (new disk less than 10% bigger)
UserInput -I LAYOUT_FILE_CONFIRMATION needed in /usr/share/rear/layout/prepare/default/500_confirm_layout_file.sh line 26
Confirm or edit the disk layout file

  1. Confirm disk layout and continue 'rear recover'
  2. Edit disk layout (/var/lib/rear/layout/disklayout.conf)
  3. View disk layout (/var/lib/rear/layout/disklayout.conf)
  4. View original disk space usage (/var/lib/rear/layout/config/df.txt)
  5. Use Relax-and-Recover shell and return back to here
  6. Abort 'rear recover'
    (default '1' timeout 300 seconds)
    1
    UserInput: Valid choice number result 'Confirm disk layout and continue 'rear recover''
    User confirmed disk layout file
    No code has been generated to recreate fs:/usr (fs).
    To recreate it manually add code to /var/lib/rear/layout/diskrestore.sh or abort.
    UserInput -I ADD_CODE_TO_RECREATE_MISSING_FSUSRFS needed in /usr/share/rear/layout/prepare/default/600_show_unprocessed.sh line 33
    Manually add code that recreates fs:/usr (fs)
  7. View /var/lib/rear/layout/diskrestore.sh
  8. Edit /var/lib/rear/layout/diskrestore.sh
  9. Go to Relax-and-Recover shell
  10. Continue 'rear recover'
  11. Abort 'rear recover'
    (default '4' timeout 300 seconds)

abbbi commented at 2019-10-21 14:57:

As a side note; i also tried with

export MIGRATION_MODE='true'

which results in the same situation.
As i see it there is the root filesystem on /dev/md1, and the rest of the filesystems of the system are
placed on LVM volumes on another raid device.

It seems to me the recovery script stops after the first filesystem, as the created diskrestore.sh only creates code for this Filesystem to be created. All others are missing from diskrestore.sh.

From the log it can be seen that create_fs is only called once during recovery, for the / file system:

Begin create_fs ( fs:/ )

all others are, for some reason, skipped:

grep "create_fs" recover.log
++ type -t create_fs
++ create_fs fs:/
++ Log 'Begin create_fs( fs:/ )'
++ echo '2019-10-21 16:39:09.488672395 Begin create_fs( fs:/ )'
2019-10-21 16:39:09.488672395 Begin create_fs( fs:/ )
++ Log 'End create_fs( fs:/ )'
++ echo '2019-10-21 16:39:09.498287615 End create_fs( fs:/ )'
2019-10-21 16:39:09.498287615 End create_fs( fs:/ )

jsmeix commented at 2019-10-22 08:01:

@abbbi
without your complete rear -D recover debug log file
(ideally together with your diskrestore.sh file)
there is basically nothing what I could tell right now.

I would need much more time to reproduce it
because I would need to set up a test system
with a similar disk layout as yours.

I can hardly imagine from plain looking at your disklayout.conf file
how things would work during rear recover in your particular case
because I am not a regular RAID and/or LVM user.

What looks unexpected to me (as neither a RAID nor a LVM expert)
in your disklayout.conf file is that it does not contain lvmvol entries
according to what I read in the section "Disk layout file syntax" in
https://github.com/rear/rear/blob/master/doc/user-guide/06-layout-configuration.adoc
which tells about three kind of LVM related entries in disklayout.conf

Physical Volumes
lvmdev ...

Volume Groups
lvmgrp ...

Logical Volumes
lvmvol ...

so that I would normally expect lvmvol entries in the disklayout.conf file
as shown in the disklayout.conf file example in the section
"Layout information gathered during rescue image creation" in
https://github.com/rear/rear/blob/master/doc/user-guide/06-layout-configuration.adoc

But in your disklayout.conf file there is only lvmdev
and lvmgrp but no lvmvol so I would expect that
no logical volumes can be re-created which looks like the reason
why subsequently no filesystems on LVM can be created.

If my above analysis is right, I would also need
your complete rear -D mkrescue/mkbackup debug log file
to have a chance to find the root cause (i.e. the reason why
there are no lvmvol entries in your disklayout.conf file).

See the item

Attachments, as applicable ("rear -D mkrescue/mkbackup/recover" debug log files)

in https://github.com/rear/rear/blob/master/.github/ISSUE_TEMPLATE.md
and see "Debugging issues with Relax-and-Recover" in
https://en.opensuse.org/SDB:Disaster_Recovery

abbbi commented at 2019-10-22 13:27:

Ah, i think the problem is with lvs parameters, it can be seen that during the mkrescue following command is called, that creates the help output, instead an correct information to create the
disk layout.

++ lvm lvs --separator=: --noheadings --units b --nosuffix -o origin,lv_name,vg_name,lv_size,modules,pool_lv,chunk_size,stripes
++ read line
  Logical Volume Fields
  ---------------------
    lv_all               - All fields in this section.
    lv_uuid              - Unique identifier.
    lv_name              - Name.  LVs created for internal use are enclosed in brackets.
    lv_path              - Full pathname for LV.

Error is:

Unrecognised field: pool_lv

I can confirm that after changing

/usr/share/rear/layout/save/GNU/Linux/220_lvm_layout.sh

and removing the pool_lv options, my generated disklayout.conf does include all needed lvm volumes.
But it seems that the missing pool_lv value may result in other issues, as disklayout.conf now references /dev/mapper/* devices for the filesystems, which then are not correctly mapped during
recovery (No code has been generated to recreate /dev/mapper/vg00-home (lvmvol).)

lvmvol /dev/vg00 home 4294967296b linear thinpool:0 chunksize:1b stripesize:4294967296b
lvmvol /dev/vg00 srv 268435456000b linear thinpool:0 chunksize:1b stripesize:4294967296b
 [..]
fs /dev/mapper/vg00-usr /usr ext4 uuid=8a2c2ab9-89f4-4796-8cb5-d4395c7d4494 label=usr blocksize=4096 reserved_blocks=4% max_mounts=34 check_interval=150d bytes_per_inode=16384 options=rw,relatime,user_xattr,acl,barrier=1,data=ordered

log with -D (without any changes done to rear GIT)

http://www.grinser.de/~abi/mkrescue.log

jsmeix commented at 2019-10-22 14:10:

@abbbi
thank you for your debugging and your prompt response.

According to

# find usr/share/rear -name '*.sh' | xargs grep -l 'pool_lv'

usr/share/rear/layout/save/GNU/Linux/220_lvm_layout.sh

the only script that contains pool_lv is
usr/share/rear/layout/save/GNU/Linux/220_lvm_layout.sh
and according to the output of

# git log -p --follow usr/share/rear/layout/save/GNU/Linux/220_lvm_layout.sh

the first time when pool_lv appears in that output is (excerpts)

commit b184194f37dd22a7e55655ff388579297239e73c
...
+    lvm 8>&- 7>&- lvs --separator=":" --noheadings --units b --nosuffix -o origin,lv_name,vg_name,lv_size,lv_layout,pool_lv,chunk_size,stripes | while read line ; do

which is
https://github.com/rear/rear/commit/b184194f37dd22a7e55655ff388579297239e73c
that points to
https://github.com/rear/rear/issues/1380
so it seems this issue here is just one additional chapter
in the never ending LVM Thin Pool/Volume story...

jsmeix commented at 2019-10-22 14:13:

@rmetrich @pcahyna
could you please have a look here because recently
you had worked on the LVM related code in ReaR.
Many thanks in advance!

abbbi commented at 2019-10-22 14:18:

hi!

thanks for YOUR response in the first place. I went back to commit e4eb0e1786d0346adea0383f02fdea18e7c799a9 and re-created an iso image, which then results in REAR doing a correct recovery:

Start system layout restoration.
Creating partitions for disk /dev/sda (msdos)
Creating partitions for disk /dev/sdb (msdos)
Creating software RAID /dev/md1
Creating software RAID /dev/md3
Creating LVM PV /dev/md3
Creating LVM VG vg00
Creating LVM volume vg00/usr
Creating LVM volume vg00/var
Creating LVM volume vg00/home
Creating LVM volume vg00/srv
Creating filesystem of type ext3 with mount point / on /dev/md1.
Mounting filesystem /
Creating filesystem of type ext4 with mount point /usr on /dev/mapper/vg00-usr.
Mounting filesystem /usr
Creating filesystem of type ext4 with mount point /home on /dev/mapper/vg00-home.
Mounting filesystem /home
Creating filesystem of type ext4 with mount point /srv on /dev/mapper/vg00-srv.

jsmeix commented at 2019-10-22 14:25:

When I use the git blame view of layout/save/GNU/Linux/220_lvm_layout.sh
https://github.com/rear/rear/blame/b184194f37dd22a7e55655ff388579297239e73c/usr/share/rear/layout/save/GNU/Linux/220_lvm_layout.sh
that matches the commit b184194f37dd22a7e55655ff388579297239e73c
where pool_lv was introduced and therein I go back
to one commit before by clicking on the small symbol next to the top line
of the code block that contains the change where pool_lv was added
(that symbol shows the mouse-over text View blame prior to this change)
I get
https://github.com/rear/rear/blame/e4eb0e1786d0346adea0383f02fdea18e7c799a9/usr/share/rear/layout/save/GNU/Linux/220_lvm_layout.sh
which matches the commit that also @abbbi had found.

I guess in layout/save/GNU/Linux/220_lvm_layout.sh
the code in the fallback case for older LVM versions

    else
        # Compatibility with older LVM versions (e.g. <= 2.02.98)
        # No support for 'lv_layout', too bad, do our best!

needs to be adapted.

jsmeix commented at 2019-10-22 14:47:

@rmetrich @pcahyna
while you are at it:

I wonder about the strange looking way how lvm is called

lvm 8>&- 7>&- ...

that goes back to
https://github.com/rear/rear/commit/b383dcda0e37f85e45a57c43d7fa92f79ce8910e
see the "use file descriptor 7 for real STDOUT so th.." changes in
https://github.com/rear/rear/blame/e4eb0e1786d0346adea0383f02fdea18e7c799a9/usr/share/rear/layout/save/GNU/Linux/220_lvm_layout.sh

I wonder why fd7 and fd8 need to be closed for the lvm commands
because in a usual pipe like COMMAND1 | COMMAND2
stdin of COMMAND2 is set to stdout of COMMAND1
and other fds like fd7 and fd8 should not matter at all.

So - if you agree - I would like to get rid of that 8>&- 7>&-
and just use an usual pipe

   lvm ... | while read line ; do

jsmeix commented at 2019-11-15 13:03:

@abbbi
there are SLE12 systems where the lvm command supports pool_lv
so your particular case seems to be an exceptional case.

A colleague at SUSE tried to reproduce the issue
but on his SLES12-SP2 system things worked.

Here what he reports:

LVM commands during recover:

> # grep '^+* lvm ' /var/log/rear/recover/rear-linux-4qvu.log 
> +++ lvm version
> +++ lvm vgchange -a n
> +++ lvm vgchange -a n system
> +++ lvm pvcreate -ff --yes -v --uuid X9rQQ7-rnfM-i1zl-rnFm-Amji-Vo6w-2J2aKs --restorefile /var/lib/rear/layout/lvm/system.cfg /dev/mapper/0QEMU_QEMU_HARDDISK_0001-part2
> +++ lvm vgcfgrestore -f /var/lib/rear/layout/lvm/system.cfg system
> +++ lvm vgchange --available y system

LVM commands during mkbackup:

# grep '^+* lvm ' /var/log/rear/rear-linux-4qvu.log
++ lvm pvdisplay -c
++ lvm vgdisplay -c
++ lvm lvs -o lv_layout
++ lvm lvs --separator=: --noheadings --units b --nosuffix -o origin,lv_name,vg_name,lv_size,lv_layout,pool_lv,chunk_size,stripes,stripe_size
++ lvm vgcfgbackup -f /var/lib/rear/layout/lvm/%s.cfg

The mkbackup operation invoked an lvm command with "pool_lv"
but it was successful. The mkbackup log shows:

> ++ lvm lvs --separator=: --noheadings --units b --nosuffix -o origin,lv_name,vg_name,lv_size,lv_layout,pool_lv,chunk_size,stripes,stripe_size
> ++ read line
> ++ '[' 0 -eq 0 ']'
> ++ echo '# Format for LVM LVs'
> ++ echo '# lvmvol <volume_group> <name> <size(bytes)> <layout> [key:value ...]'
> ++ header_printed=1
> +++ awk -F : '{ print $1 }'
> +++ echo :root:system:17003708416:linear::0:1:0

Running the command outside of ReaR (directly in terminal):

> # lvm lvs --separator=: --noheadings --units b --nosuffix -o origin,lv_name,vg_name,lv_size,lv_layout,pool_lv,chunk_size,stripes,stripe_size
>   :root:system:17003708416:linear::0:1:0

FYI
what LVM versions we provide in SLE12 in its various service packs:

SLE-12 GA: LVM2.2.02.98

SLE-12 SP1 SP2 SP3: LVM2.2.02.120

SLE-12 SP4: LVM2.2.02.180

@abbbi
perhaps you do not use an up-to-date LVM version?

abbbi commented at 2019-11-19 17:51:

hi,
sorry apparently this seems to be an opensuse:

:~ # lsb_release -a
LSB Version:    n/a
Distributor ID: SUSE LINUX
Description:    openSUSE 12.1 (x86_64)
Release:        12.1
Codename:       Asparagus

:~ # rpm -qa | grep lvm
lvm2-2.02.84-19.8.1.x86_64

jsmeix commented at 2019-11-21 09:56:

I am not at all a LVM expert but
I found a command that collects the supported LVM lvs fields

lvs_supported_fields=( $( lvs -o help 2>&1 | cut -s -d '-' -f 1 ) )

which works for me both on SLES11-SP4 with lvm2-2.02.98
and on openSUSE Leap 15.0 with lvm2-2.02.177.

Perhaps it is possible with reasonable effort to enhance the code
so that only those LVM fileds are used that are actually supported.

jsmeix commented at 2019-11-21 10:24:

The following seems to work for me:

logical_volumes="$( lvs --separator=/ --noheadings --nosuffix -o vg_name,lv_name )"

lvs_supported_fields=( $( lvs -o help 2>&1 | cut -s -d '-' -f 1 ) )
lvs_fields="origin lv_size lv_layout pool_lv chunk_size stripes stripe_size"

for lvs_field in $lvs_fields ; do
    for lvs_supported_field in "${lvs_supported_fields[@]}" ; do
        test "$lvs_supported_field" = "$lvs_field" || continue
        lvs_variable_name="lvs_$lvs_field"
        for logical_volume in $logical_volumes ; do
            declare $lvs_variable_name="$( echo $( lvs --noheadings --units b --nosuffix -o $lvs_field $logical_volume ) )"
            echo "$lvs_field for $logical_volume has value: '${!lvs_variable_name}'"
        done
    done
done

The crazy $( echo $( lvs ... is there to get rid of
various whitespace charactes that lvs has in its output.

On SLE-12 SP4 with lvm2-2.02.180 it results

origin for system/root has value: ''
origin for system/swap has value: ''
lv_size for system/root has value: '19927138304'
lv_size for system/swap has value: '1535115264'
lv_layout for system/root has value: 'linear'
lv_layout for system/swap has value: 'linear'
pool_lv for system/root has value: ''
pool_lv for system/swap has value: ''
chunk_size for system/root has value: '0'
chunk_size for system/swap has value: '0'
stripes for system/root has value: '1'
stripes for system/swap has value: '1'
stripe_size for system/root has value: '0'
stripe_size for system/swap has value: '0'

jsmeix commented at 2019-12-13 12:54:

With https://github.com/rear/rear/pull/2291 merged
this issue is now avoided because now it errors out
in layout/save/GNU/Linux/220_lvm_layout.sh
during "rear mkrescue" in case of insufficient LVM tools
and in layout/save/default/950_verify_disklayout_file.sh
a simple test was added to verify that the 'lvm...' entries
in disklayout.conf look syntactically correct.

On the other hand this means that ReaR does no longer
support Linux distributions where too old LVM tools are used
that do not support the needed options of the "lvm ..." calls
in layout/save/GNU/Linux/220_lvm_layout.sh

According to
https://github.com/rear/rear/blob/master/doc/rear-release-notes.txt

ReaR-2.5 is supported on the following Linux based
operating systems:

  o Fedora 28 and 29
  o RHEL 5, 6 and 7
  o CentOS 5, 6 and 7
  o ScientificLinux 6 and 7
  o SLES 11, 12 and 15
  o openSUSE Leap and openSUSE Tumbleweed
  o Debian 6, 7, 8 and 9
  o Ubuntu 12, 14, 16, 17 and 18

ReaR-2.5 dropped official support for the following
Linux based operating systems:

  o Fedora < 28
  o RHEL 3 and 4
  o SLES 9 and 10
  o openSUSE <= 13
  o Debian < 6
  o Ubuntu < 12

openSUSE 12.1 is already documented to be no longer supported.


[Export of Github issue for rear/rear.]