#2596 Issue closed: Unused RHEL Physical Volume aborts mkrescue

Labels: enhancement, support / question, fixed / solved / done

rmccrack opened issue at 2021-04-06 17:54:

  • ReaR version ("/usr/sbin/rear -V"): Relax-and-Recover 2.6 / Git

  • OS version ("cat /etc/os-release" or "lsb_release -a" or "cat /etc/rear/os.conf"):
    OS_VENDOR=RedHatEnterpriseServer
    OS_VERSION=6.10

  • ReaR configuration files ("cat /etc/rear/site.conf" and/or "cat /etc/rear/local.conf"):
    Site.conf:

BACKUP=CDM
OUTPUT=ISO
export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:/usr/lib64/python2.7/site-packages/:/usr/lib64/bind9-export/"

local.conf:

AUTOEXCLUDE_DISKS=( "/dev/sdc" "/dev/sdc1" )
EXCLUDE_COMPONENTS=( "${EXCLUDE_COMPONENTS[@]}" "pv:/dev/sdc1" )
export TMPDIR=/dev/shm
  • Hardware (PC or PowerNV BareMetal or ARM) or virtual machine (KVM guest or PoverVM LPAR):

  • System architecture (x86 compatible or PPC64/PPC64LE or what exact ARM device): x86

  • Firmware (BIOS or UEFI or Open Firmware) and bootloader (GRUB or ELILO or Petitboot):

  • Storage (local disk or SSD) and/or SAN (FC or iSCSI or FCoE) and/or multipath (DM or NVMe):
    Some multipath, some not - the behavior is the same regardless.

  • Storage layout ("lsblk -ipo NAME,KNAME,PKNAME,TRAN,TYPE,FSTYPE,SIZE,MOUNTPOINT" or "lsblk" as makeshift):

  • Description of the issue (ideally so that others can reproduce it):
    I have a number of systems (both physical and virtual) that have one or more disks known to LVM as physical volumes, but that have not been allocated to a volume group or logical volume. The minute the 220_lvm_layout.sh script encounters these, it aborts the mkrescue process with ERROR: LVM 'lvmdev' entry in /var/lib/rear/layout/disklayout.conf where volume_group or device is empty or more than one word. In the case of this example, /dev/sdc1 is the problem disk.

I think these are being used as raw disk by applications, hence the odd configuration, and need a way to either exclude these from the ReaR backup or allow the backup to continue in spite of them.

  • Workaround, if any:

  • Attachments, as applicable ("rear -D mkrescue/mkbackup/recover" debug log files):

To paste verbatim text like command output or file content,
include it between a leading and a closing line of three backticks like

# pvs
  PV         VG   Fmt  Attr PSize   PFree
  /dev/sda2  vg00 lvm2 a--u  19.87g  11.37g
  /dev/sdb1  vg01 lvm2 a--u  20.00g 768.00m
  /dev/sdc1       lvm2 ---- 109.00g 109.00g
  /dev/sdd1  vg01 lvm2 a--u 220.99g  33.55g

# pvdisplay
 "/dev/sdc1" is a new physical volume of "109.00 GiB"
  --- NEW Physical volume ---
  PV Name               /dev/sdc1
  VG Name
  PV Size               109.00 GiB
  Allocatable           NO
  PE Size               0
  Total PE              0
  Free PE               0
  Allocated PE          0
  PV UUID               q6dPsv-gGZu-SSug-Oyjd-3WSN-c3IM-60TWnP

Entries in the disklayout.conf file:
# lvmdev <volume_group> <device> [<uuid>] [<size(bytes)>]
lvmdev /dev/vg01 /dev/sdb1 joZClk-U4yz-dRmL-ERff-HKMR-y8GR-Exbg6y 41940992
lvmdev /dev/vg01 /dev/sdd1 yd2ZlP-Cpys-HStw-1zeK-LAdY-z0yI-mM0Nf9 463459122
lvmdev /dev/vg00 /dev/sda2 n0y20c-I6ke-ScDY-tEVs-oYyD-pI9S-pevNwx 41678848
lvmdev /dev/ /dev/sdc1 q6dPsv-gGZu-SSug-Oyjd-3WSN-c3IM-60TWnP 228588822

gdha commented at 2021-04-07 07:38:

@rmccrack Could you also show us the output of lvs? And, if possible, the debug log-file of rear -vdD savelayout?

rmccrack commented at 2021-04-07 18:58:

Per request, lvs output below, log file attached.

lvs

LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert
lv_root vg00 -wi-ao---- 1.00g
lv_swap vg00 -wi-ao---- 2.00g
lv_tmp vg00 -wi-ao---- 512.00m
lv_usr vg00 -wi-ao---- 2.00g
lv_var vg00 -wi-ao---- 3.00g
lv_db2_data1 vg01 -wi-ao---- 10.00g
lv_db2_data2 vg01 -wi-ao---- 10.00g
lv_db2_data3 vg01 -wi-ao---- 10.00g
lv_db2_data4 vg01 -wi-ao---- 10.00g
lv_db2_dbdir vg01 -wi-ao---- 6.00g
lv_db2_dbtmp vg01 -wi-ao---- 15.00g
lv_db2_dump vg01 -wi-ao---- 6.00g
lv_db2_indx1 vg01 -wi-ao---- 10.00g
lv_db2_indx2 vg01 -wi-ao---- 10.00g
lv_db2_indx3 vg01 -wi-ao---- 10.00g
lv_db2_indx4 vg01 -wi-ao---- 10.00g
lv_db2_inst vg01 -wi-ao---- 10.00g
lv_db2_lob1 vg01 -wi-ao---- 10.00g
lv_db2_lob2 vg01 -wi-ao---- 10.00g
lv_db2_log vg01 -wi-ao---- 15.00g
lv_db2_rep vg01 -wi-ao---- 5.00g
lv_db2_src vg01 -wi-ao---- 6.00g
lv_db2_tmp vg01 -wi-ao---- 6.00g
lv_db2_tools vg01 -wi-ao---- 20.00g
lv_db_db2 vg01 -wi-ao---- 2.00g
lv_hd vg01 -wi-ao---- 128.00m
lv_hd_aa vg01 -wi-ao---- 128.00m
lv_hd_db vg01 -wi-ao---- 256.00m
lv_hd_oy vg01 -wi-ao---- 128.00m
lv_hd_sa vg01 -wi-ao---- 1.00g
lv_hd_sc vg01 -wi-ao---- 128.00m
lv_hd_su vg01 -wi-ao---- 128.00m
lv_home vg01 -wi-ao---- 128.00m
lv_isv vg01 -wi-ao---- 1.50g
lv_isv_ILMT vg01 -wi-ao---- 508.00m
lv_isv_ems vg01 -wi-ao---- 384.00m
lv_microsoft vg01 -wi-ao---- 128.00m
lv_opt vg01 -wi-ao---- 768.00m
lv_opt_puppet vg01 -wi-ao---- 768.00m
lv_opt_tivoli vg01 -wi-ao---- 1.69g
lv_oy_var vg01 -wi-ao---- 256.00m
lv_sccm vg01 -wi-ao---- 512.00m
lv_splunk_uf vg01 -wi-ao---- 256.00m
lv_su_staging vg01 -wi-ao---- 1.00g
lv_tanium vg01 -wi-ao---- 6.00g

gdha commented at 2021-04-08 09:27:

@rmccrack Couldn't find the log file...

rmccrack commented at 2021-04-08 13:26:

debug_log.txt.gz

gdha commented at 2021-04-08 14:56:

The issue is in /usr/share/rear/layout/save/GNU/Linux/220_lvm_layout.sh the vgrp variable is empty (we probably have to skip these entries):

+++ echo /dev/sdc1::228588822:-1:0:0:-1:0:0:0:0:q6dPsv-gGZu-SSug-Oyjd-3WSN-c3IM-60TWnP
+++ cut -d : -f 1
++ pdev=/dev/sdc1
++ test dev/sdc1 = /dev/sdc1
++ is_false yes
++ case "$1" in
++ return 1
+++ echo /dev/sdc1::228588822:-1:0:0:-1:0:0:0:0:q6dPsv-gGZu-SSug-Oyjd-3WSN-c3IM-60TWnP
+++ cut -d : -f 2
++ vgrp=
+++ echo /dev/sdc1::228588822:-1:0:0:-1:0:0:0:0:q6dPsv-gGZu-SSug-Oyjd-3WSN-c3IM-60TWnP
+++ cut -d : -f 3
++ size=228588822
+++ echo /dev/sdc1::228588822:-1:0:0:-1:0:0:0:0:q6dPsv-gGZu-SSug-Oyjd-3WSN-c3IM-60TWnP
+++ cut -d : -f 12
++ uuid=q6dPsv-gGZu-SSug-Oyjd-3WSN-c3IM-60TWnP

rmccrack commented at 2021-04-19 19:40:

Is there still information you require from me? Asking because of the "waiting for info" tag.

gdha commented at 2021-04-20 07:53:

@rmccrack No - don't worry still need to test this out myself when I find some time for it

gdha commented at 2021-04-21 08:58:

Tested it out with 3 disks (sda being root/boot devices and sdb and sdc being empty).
As long as the 2 empty disks are not in use (not partitioned nor LVM'ed) all is fine with savelayout.
Once the 2 disks (sdb and sdc) are partitioned and LVMed - pvcreate) we get the following error message by rear:

# /home/gdha/projects/rear/usr/sbin/rear -v savelayout
Relax-and-Recover 2.6 / Git
Running rear savelayout (PID 2500 date 2021-04-21 08:49:38)
Using log file: /home/gdha/projects/rear/var/log/rear/rear-ubuntu20.log
Running workflow savelayout on the normal/original system
Creating disk layout
Overwriting existing disk layout file /home/gdha/projects/rear/var/lib/rear/layout/disklayout.conf
ERROR: LVM 'lvmdev' entry in /home/gdha/projects/rear/var/lib/rear/layout/disklayout.conf where volume_group or device is empty or more than one word
Some latest log messages since the last called script 220_lvm_layout.sh:
  2021-04-21 08:49:40.256765507 Including layout/save/GNU/Linux/220_lvm_layout.sh
  2021-04-21 08:49:40.265442543 Begin saving LVM layout ...
Error exit of rear savelayout (PID 2500) and its descendant processes
Exiting subshell 1 (where the actual error happened)
Aborting due to an error, check /home/gdha/projects/rear/var/log/rear/rear-ubuntu20.log for details
Exiting rear savelayout (PID 2500) and its descendant processes ...
Running exit tasks
Terminated

However, the disklayout.conf file is still written as:

# grep -v \# disklayout.conf
disk /dev/sda 10737418240 gpt
part /dev/sda 536870912 1048576 sda1 boot,esp /dev/sda1
part /dev/sda 1073741824 537919488 sda2 none /dev/sda2
part /dev/sda 9124708352 1611661312 sda3 none /dev/sda3
disk /dev/sdb 135069696 gpt
part /dev/sdb 135035392 17408 primary none /dev/sdb1
disk /dev/sdc 224121856 gpt
part /dev/sdc 224087552 17408 primary none /dev/sdc1
lvmdev /dev/ubuntu-vg /dev/sda3 LtE3U0-P4QW-0sqd-zN07-0PcG-hn9z-zodth8 17821696
lvmdev /dev/ /dev/sdb1 u4rIXd-eKXI-DYB1-Zscz-oAsv-ekZO-5F3EXe 263741

That is not OK IMHO as we can skip these devices if it was the intend to have PVs created, but not used - ReaR should not bail out cause of this (avoiding false sense of coverage if the error was not seen)

jsmeix commented at 2021-04-21 11:55:

My - as usual long and elaborated - comment
in layout/save/GNU/Linux/220_lvm_layout.sh
at that error exit explains the reason behind

# After the 'lvmdev' line was written to disklayout.conf so that the user can inspect it
# check that the required positional parameters in the 'lvmdev' line are non-empty
# because an empty positional parameter would result an invalid 'lvmdev' line
# which would cause invalid parameters are 'read' as input during "rear recover"
# cf. "Verifying ... 'lvm...' entries" in layout/save/default/950_verify_disklayout_file.sh
# The variables are not quoted because plain 'test' without argument results non-zero exit code
# and 'test foo bar' fails with "bash: test: foo: unary operator expected"
# so that this also checks that the variables do not contain blanks or more than one word
# because blanks (actually $IFS characters) are used as field separators in disklayout.conf
# which means the positional parameter values must be exactly one non-empty word.
# Two separated simple 'test $vgrp && test $pdev' commands are used here because
# 'test $vgrp -a $pdev' does not work when $vgrp is empty or only blanks
# because '-a' has two different meanings: "EXPR1 -a EXPR2" and "-a FILE" (see "help test")
# so that when $vgrp is empty 'test $vgrp -a $pdev' tests if file $pdev exists
# which is usually true because $pdev is usually a partition device node (e.g. /dev/sda1)
# so that when $vgrp is empty 'test $vgrp -a $pdev' would falsely succeed:
test $vgrp && test $pdev || Error "LVM 'lvmdev' entry in $DISKLAYOUT_FILE where volume_group or device is empty or more than one word"

and the

lvmdev /dev/ /dev/sdb1 ...

line does look wrong, see "Disk layout file syntax" in
https://github.com/rear/rear/blob/master/doc/user-guide/06-layout-configuration.adoc

lvmdev /dev/<volume_group> <device> [<uuid>] [<size(bytes)>]

so the <volume_group> is really missing.

@gdha
you could try out how far "rear recover" would work
when you disable that error exit during "rear mkrescue".
I guess "rear recover" would fail in a rather user unfriendly way
with an incomplete lvmdev /dev/ /dev/sdb1 ... entry
so I think currently that error exit during "rear mkrescue" is right
cf. "Try hard to care about possible errors" in
https://github.com/rear/rear/wiki/Coding-Style

It is a different issue if we could enhance the LVM code
to be more robust against partially set up LVM things
or more generally if we could enhance the disk layout code
to also support any kind of "partially" set up storage things like

  • disks without partitions i.e. raw disk support
  • disks with partitions but no filesystems ("raw partition" support?)
  • disks with partitions and filesystems which are not mounted
  • other higher level storage objects where nothing is mounted (e.g. "raw LUKS volumes")

It is another different issue how to backup data on
raw or unmounted storage objects, e.g. by backup with dd, cf.
https://github.com/rear/rear/blob/master/doc/user-guide/12-BLOCKCLONE.adoc

jsmeix commented at 2021-04-21 11:59:

@rmccrack
in your case the lvmdev /dev/ /dev/sdc1 line looks wrong
so that "rear recover" would fail if disklayout.conf is used with such a line.

jsmeix commented at 2021-04-21 12:02:

As far as I see the issue is not a bug in the current code.
I even think the current code works perfectly right here because
it avoids that later when it is too late "rear recover" cannot work.

As far as I see the issue is and enhancement request
to let ReaR support certain kind of "raw" storage objects.

gdha commented at 2021-04-21 12:05:

@jsmeix I do not have to perform a recover to know it will fail as the VG is badly defined - only a PV was created, but not used. That is my point here - there is nothing wrong in creating a PV that is not yet used (sometimes it takes a couple of days in big firms with different teams to complete the creation of a VG). ReaR should not fail over it - not even showing an error as it is not an error as such. We should skip these devices and that I tested already with success.

       # With the above example vgrp=system
        vgrp=$( echo $line | cut -d ":" -f "2" )

        # When vgrp is empty then the PV is not part of any VG - skip this device? #2596
        [[ -z "$vgrp" ]] && continue

jsmeix commented at 2021-04-21 12:30:

@gdha
thank you for the explanation.

From the code in 220_lvm_layout.sh I think you mean to skip a PV.
I am not a LVM expert but in general I don't like to "just skip" things.
Perhaps it is OK to skip a PV that is not part of a VG.
But then there is no info in disklayout.conf about that PV.
So on the original system there was a PV (pvdisplay has shown it)
that is not part of a VG but "rear recover" would not recreate it
as a PV that is not part of a VG but "just skip" it.
So "rear recover" would "just silently" not recreate
the original system as much as possible as it was before.

When it "takes a couple of days ... to complete the creation of a VG"
I wonder why then "rear mkrescue" is run while system setup is in the works?
If "rear mkrescue" is run while system setup is in the works and it errors out
because the current system setup is incomplete or inconsistent
then it is good that "rear mkrescue" errors out so the admin knows
that something is wrong.

In general I do not like when "rear mkrescue"
just silently skips things that ReaR does not support see
https://github.com/rear/rear/pull/2597#issuecomment-822491263
(excerpt):

A bad consequence is that when ReaR does not support something
it gets silently ignored during "rear mkrescue" and then the user has to
learn the hard way later (when it is too late) when "rear recover" fails
with unfriedly messages that something is not supported by ReaR, cf.
https://github.com/rear/rear/issues/2560

The current code is at least somewhat consistent.
It errors out when it detects things it cannot cope with.

Just silently skip things that ReaR does not support is what
I fixed at many places (still too many places are left to be fixed)
because that is the bad old usability experience of ReaR:

  • "rear mkrescue" basically always "just works" on a first attempt
  • "rear recover" basically always "just fails" on a first attempt

I think we either implement proper support for PVs that are not part of a VG
or we implement proper support to let the user explicity specify what to skip
cf. the related issue https://github.com/rear/rear/pull/2597
which is about disks but the basic ideas behind are same.

jsmeix commented at 2021-04-21 12:43:

@rmccrack
try out if "just skipping" PVs that are not part of a VG as in
https://github.com/rear/rear/issues/2596#issuecomment-824008112
makes both "rear mkrescue" and "rear recover"
(try out "rear recover" only on separated test hardware)
work sufficient for your particular use case
(you won't get the skipped PVs recreated as PVs).

gdha commented at 2021-04-21 12:47:

@jsmeix @rmccrack Yeah - ReaR is not perfect, however, end-users are even worse ;-) They don't look at the logs and are unaware of errors - just as a reminder ReaR runs most of the time via cron or another kind of scheduler. The logs are these analyzed for errors (99% are not checked at all).
Disks which are present, but not used are shown in the disklayout in comment, perhaps we can do the same for PVs? I tend to say an empty PV should not be recreated in a DR environment as it does not bring any added value to the restored system.
If they didn't need it before then in a DR situation it is even less important to have it.

jsmeix commented at 2021-04-21 12:49:

In default.conf we have EXCLUDE_VG
but neither EXCLUDE_PV nor EXCLUDE_LV
and layout/save/GNU/Linux/220_lvm_layout.sh
contains nothing at all about EXCLUDE_ only
layout/save/default/310_include_exclude.sh and
layout/save/default/335_remove_excluded_multipath_vgs.sh
contain EXCLUDE_VG and I think the reason is how exclusion is implemented, cf.
https://github.com/rear/rear/pull/2597#issuecomment-822491263
(excerpt)

My vague basic understanding is that during "rear mkrescue"
first all storage objects ReaR knows about are included in disklayout.conf
(except some automated hardcoded ignored ones ...)
and then some "not needed" entries in disklayout.conf get disabled
by various subsequent exclude scripts

gdha commented at 2021-04-21 12:57:

@jsmeix Perhaps this would be sufficient? It is shown and nothing breaks:

# PV /dev/sdb1 is not part of any knowm VG (yet) - skipping device:
# lvmdev /dev/ /dev/sdb1 u4rIXd-eKXI-DYB1-Zscz-oAsv-ekZO-5F3EXe 263741
# PV /dev/sdc1 is not part of any knowm VG (yet) - skipping device:
# lvmdev /dev/ /dev/sdc1 A3oUft-RXK0-ATkg-jcv2-hl1j-naAX-JvowMu 437671

jsmeix commented at 2021-04-21 13:02:

@gdha
yes, all storage objects that ReaR knows about
should be shown in disklayout.conf

  • either as "active" entries
  • or commented out (as documentation for the user)

Since the lsblk output is shown as header comment in disklayout.conf
we might even completely skip storage object entries in disklayout.conf
when the user had explicitly specified to have them skipped e.g.
via EXCLUDE_VG and new EXCLUDE_PV and EXCLUDE_LV
because this could make our code much simpler like

    lvm pvdisplay -c | while read line ; do

        # With the above example pdev=/dev/sda1
        # (the "echo $line" makes the leading blanks disappear)
        pdev=$( echo $line | cut -d ":" -f "1" )

        # Skip PVs where their device node is explicitly specified to be skipped:
        IsInArray $pdev ${EXCLUDE_PV[@]} && continue

instead of complicated code that must work fail-safe even for problematic cases
to still write half-broken entries to disklayout.conf that get later commented out.

jsmeix commented at 2021-04-21 13:04:

@gdha
do you already have code that makes entries in disklayout.conf
as shown in your
https://github.com/rear/rear/issues/2596#issuecomment-824038297
?

If yes, please do a pull request.

jsmeix commented at 2021-05-04 11:10:

With https://github.com/rear/rear/pull/2603 merged
this issue should be sufficiently solved
(as far as possible with reasonable effort).

gdha commented at 2021-05-04 12:04:

@jsmeix Confirmation of success:

# usr/sbin/rear -v savelayout
Relax-and-Recover 2.6 / Git
Running rear savelayout (PID 1262 date 2021-05-04 11:49:13)
Using log file: /home/gdha/projects/rear/var/log/rear/rear-ubuntu20.log
Running workflow savelayout on the normal/original system
Creating disk layout
Overwriting existing disk layout file /home/gdha/projects/rear/var/lib/rear/layout/disklayout.conf
Disabling component 'disk /dev/sdb' in /home/gdha/projects/rear/var/lib/rear/layout/disklayout.conf
Disabling component 'part /dev/sdb' in /home/gdha/projects/rear/var/lib/rear/layout/disklayout.conf
Disabling component 'disk /dev/sdc' in /home/gdha/projects/rear/var/lib/rear/layout/disklayout.conf
Disabling component 'part /dev/sdc' in /home/gdha/projects/rear/var/lib/rear/layout/disklayout.conf
Using guessed bootloader 'EFI' (found in first bytes on /dev/sda)
Verifying that the entries in /home/gdha/projects/rear/var/lib/rear/layout/disklayout.conf are correct
Created disk layout (check the results in /home/gdha/projects/rear/var/lib/rear/layout/disklayout.conf)
Exiting rear savelayout (PID 1262) and its descendant processes ...
Running exit tasks

jsmeix commented at 2021-05-04 12:26:

@gdha
thank you so much for your confirmation that things work well for you!


[Export of Github issue for rear/rear.]