#2044 Issue closed: WARNING: Failed to connect to lvmetad. Falling back to device scanning

Labels: enhancement, fixed / solved / done

gdha opened issue at 2019-02-16 12:43:

  • ReaR version ("/usr/sbin/rear -V"): any

  • OS version ("cat /etc/rear/os.conf" or "lsb_release -a" or "cat /etc/os-release"): centos/rhel

  • System architecture (x86 compatible or PPC64/PPC64LE or what exact ARM device): x86_64

  • Description of the issue (ideally so that others can reproduce it): when using lvm we see lots of these warnings "WARNING: Failed to connect to lvmetad. Falling back to device scanning" in the log file

  • Workaround, if any: see https://unix.stackexchange.com/questions/332556/arch-linux-installation-grub-problem - Edit your /etc/lvm/lvm.conf and set use_lvmetad = 0 - still need to verify this

gozora commented at 2019-02-16 15:11:

MIght this be related to https://github.com/rear/rear/issues/2035 ?

V.

jsmeix commented at 2019-02-18 08:52:

@gdha
I also noticed those warnings and got worried about what might be wrong
but those warnings are nothing but a perfect example of a useless
"WARNING is a waste of my time" cf.
https://blog.schlomo.schapiro.org/2015/04/warning-is-waste-of-my-time.html

At least for me (using SLES) LVM "just worked" without lvmetad.
I would appreciate it if we could silence this useless warnings.

@gozora
I think @gdha means here etc/lvm/lvm.conf in the recovery system.

In contrast - as far as I understand it - https://github.com/rear/rear/issues/2035
is about LVM in the recreated system because there things fail (hang up)
within a chroot $TARGET_FS_ROOT so that one would have to edit
$TARGET_FS_ROOT/etc/lvm/lvm.conf to make a difference there.
But we cannot "just change" files in the recreated system because
in general the restored files from the user's backup are sacrosanct.

gozora commented at 2019-02-18 09:02:

@jsmeix
My https://github.com/rear/rear/issues/2044#issuecomment-464354380 was a reaction to https://unix.stackexchange.com/questions/332556/arch-linux-installation-grub-problem (provided by @gdha) which says excerpt:

...
/run/lvm/lvmetad.socket: connect failed: No such file or directory
or
WARNING: failed to connect to lvmetad: No such file or directory. Falling back to internal scanning.

This is because /run is not available inside the chroot. These warnings will not prevent the system from booting, provided that everything has been done correctly, so you may continue with the installation.

So at the end of the day, we might not need to modify anything in recovered system ($TARGET_FS_ROOT/etc/lvm/lvm.conf), but just plain mounting /run might help. I guess it is at least wroth trying ...

V.

jsmeix commented at 2019-02-18 09:30:

Yes,
if LVM can no longer work without lvmetad or certain things in /run
we should bind-mount /run at $TARGET_FS_ROOT/run.

My point was that for me (on SLES) LVM "just worked"
without lvmetad or certain things in /run - same as in
https://unix.stackexchange.com/questions/332556/arch-linux-installation-grub-problem
These warnings will not prevent the system from booting
so that these warnings are useless and can be silenced.

But the behaviour in issue https://github.com/rear/rear/issues/2035
is different because there things do not work and then
the LVM programs should error out (instead of endless waiting)
when they can no longer work without lvmetad or certain things in /run.

gozora commented at 2019-02-18 09:41:

@jsmeix
Things might "just work" for you because of older/different version of LVM ;-).

But the behaviour in issue #2035
is different because there things do not work and then
the LVM programs should error out (instead of endless waiting)
when they can no longer work without lvmetad or certain things in /run.

I don't think there is some kind of endless waiting. There is just too many block devices LVM would like to scan and timing out after 10 seconds on each of them, might just appear to be endless ...

@gdha, @jsmeix just for comparing, can you post here what versions of LVM are you using you your particular systems ?

V.

jsmeix commented at 2019-02-18 11:27:

SLES15 and openSUSE Leap 15.0:

# lvm  version
  LVM version:     2.02.177(2) (2017-12-18)
  Library version: 1.03.01 (2017-12-18)
  Driver version:  4.37.0

SLES12-SP4

# lvm  version
  LVM version:     2.02.180(2) (2018-07-19)
  Library version: 1.03.01 (2018-07-19)
  Driver version:  4.37.0

SLES11-SP4

# lvm  version
  LVM version:     2.02.98(2) (2012-10-15)
  Library version: 1.03.01 (2011-10-15)
  Driver version:  4.25.0

gozora commented at 2019-02-18 11:29:

Wow SLES12-SP4 runs newer version of LVM than SLES15 ...

V.

jsmeix commented at 2019-02-18 11:32:

Seems so - I saw it and double checked it - that's what lvm version outputs
(which does not take possible SUSE specific patches into account)...

jsmeix commented at 2019-02-22 11:33:

With https://github.com/rear/rear/pull/2047 merged
/proc/ /sys/ /dev/ and /run/ are only bind-mounted into TARGET_FS_ROOT
at the beginning of the finalize stage during rear recover
so that this won't help for all what happens during rear recover
before its finalize stage - in particular https://github.com/rear/rear/pull/2047
cannot help when the disk layout gets recreated during rear recover.

gdha commented at 2019-03-27 13:11:

on Ubuntu 18.04:

ESCUE client:~ # lvm version
  LVM version:     2.02.176(2) (2017-11-03)
  Library version: 1.02.145 (2017-11-03)
  Driver version:  4.37.0
  Configuration:   ./configure --build=x86_64-linux-gnu --prefix=/usr --includedir=${prefix}/include --mandir=${prefix}/share/man --infodir=${prefix}/share/info --sysconfdir=/etc --localstatedir=/var --disable-silent-rules --libdir=${prefix}/lib/x86_64-linux-gnu --libexecdir=${prefix}/lib/x86_64-linux-gnu --runstatedir=/run --disable-maintainer-mode --disable-dependency-tracking --exec-prefix= --bindir=/bin --libdir=/lib/x86_64-linux-gnu --sbindir=/sbin --with-usrlibdir=/usr/lib/x86_64-linux-gnu --with-optimisation=-O2 --with-cache=internal --with-clvmd=corosync --with-cluster=internal --with-device-uid=0 --with-device-gid=6 --with-device-mode=0660 --with-default-pid-dir=/run --with-default-run-dir=/run/lvm --with-default-locking-dir=/run/lock/lvm --with-thin=internal --with-thin-check=/usr/sbin/thin_check --with-thin-dump=/usr/sbin/thin_dump --with-thin-repair=/usr/sbin/thin_repair --enable-applib --enable-blkid_wiping --enable-cmdlib --enable-cmirrord --enable-dmeventd --enable-dbus-service --enable-lvmetad --enable-lvmlockd-dlm --enable-lvmlockd-sanlock --enable-lvmpolld --enable-notify-dbus --enable-pkgconfig --enable-readline --enable-udev_rules --enable-udev_sync

And, I can confirm that by defining in /etc/lvm/lvm.conf the following : use_lvmetad = 0 the problem disappears.
We can create a script /usr/share/rear/build/GNU/Linux/640_verify_lvm_conf.sh to modify this setting

jsmeix commented at 2019-03-28 10:14:

@gdha
I would appreciate such a
usr/share/rear/build/GNU/Linux/640_verify_lvm_conf.sh
script.

On SLES10 and SLES11 lvm version does not show its Configuration:
in contrast to SLES12:

# lvm version
  LVM version:     2.02.180(2) (2018-07-19)
  Library version: 1.03.01 (2018-07-19)
  Driver version:  4.37.0
  Configuration:   ./configure --host=x86_64-suse-linux-gnu --build=x86_64-suse-linux-gnu --program-prefix= --disable-dependency-tracking --prefix=/usr --exec-prefix=/usr --bindir=/usr/bin --sbindir=/usr/sbin --sysconfdir=/etc --datadir=/usr/share --includedir=/usr/include --libdir=/usr/lib64 --libexecdir=/usr/lib --localstatedir=/var --sharedstatedir=/usr/com --mandir=/usr/share/man --infodir=/usr/share/info --disable-dependency-tracking --prefix=/ --libdir=/lib64 --with-usrlibdir=/usr/lib64 --with-usrsbindir=/usr/sbin --sbindir=/sbin --enable-dmeventd --enable-udev_sync --enable-udev_rules --enable-cmdlib --enable-applib --enable-dmeventd --enable-realtime --enable-pkgconfig --enable-selinux --with-clvmd=corosync --with-cluster=internal --datarootdir=/usr/share --with-default-locking-dir=/run/lock/lvm --enable-cmirrord --enable-lvmetad --with-default-pid-dir=/run --with-default-dm-run-dir=/run --with-default-run-dir=/run/lvm --with-tmpfilesdir=/usr/lib/tmpfiles.d --with-thin=internal --with-device-gid=6 --with-device-mode=0640 --with-device-uid=0 --with-dmeventd-path=/sbin/dmeventd --with-thin-check=/usr/sbin/thin_check --with-thin-dump=/usr/sbin/thin_dump --with-thin-repair=/usr/sbin/thin_repair --with-udev-prefix=/usr/ --enable-blkid-wiping --enable-lvmpolld

and openSUSE Leap 15.0:

# lvm version
  LVM version:     2.02.177(2) (2017-12-18)
  Library version: 1.03.01 (2017-12-18)
  Driver version:  4.37.0
  Configuration:   ./configure --host=x86_64-suse-linux-gnu --build=x86_64-suse-linux-gnu --program-prefix= --disable-dependency-tracking --prefix=/usr --exec-prefix=/usr --bindir=/usr/bin --sbindir=/usr/sbin --sysconfdir=/etc --datadir=/usr/share --includedir=/usr/include --libdir=/usr/lib64 --libexecdir=/usr/lib --localstatedir=/var --sharedstatedir=/var/lib --mandir=/usr/share/man --infodir=/usr/share/info --disable-dependency-tracking --enable-dmeventd --enable-cmdlib --enable-udev_rules --enable-udev_sync --with-udev-prefix=/usr/ --enable-selinux --enable-pkgconfig --with-usrlibdir=/usr/lib64 --with-usrsbindir=/usr/sbin --with-default-dm-run-dir=/run --with-tmpfilesdir=/usr/lib/tmpfiles.d --with-thin=internal --with-device-gid=6 --with-device-mode=0640 --with-device-uid=0 --with-dmeventd-path=/usr/sbin/dmeventd --with-thin-check=/usr/sbin/thin_check --with-thin-dump=/usr/sbin/thin_dump --with-thin-repair=/usr/sbin/thin_repair --enable-applib --enable-blkid_wiping --enable-cmdlib --enable-lvmetad --enable-lvmpolld --enable-realtime --with-cache=internal --with-default-locking-dir=/run/lock/lvm --with-default-pid-dir=/run --with-default-run-dir=/run/lvm

that contain Configuration: ... --enable-lvmetad ....

On SLES10 and SLES11 /etc/lvm/lvm.conf does not contain lvmetad
in contrast to SLES12 and openSUSE Leap 15.0 that contain

# find /etc/lvm/ | xargs grep -i lvmetad
/etc/lvm/lvm.conf:      debug_classes = [ "memory", "devices", "activation", "allocation", "lvmetad", "metadata", "cache", "locking", "lvmpolld", "dbus" ]
...
/etc/lvm/lvm.conf:      use_lvmetad = 1

On SLES10 and SLES11 there is no lvmetad process running
in contrast to SLES12 and openSUSE Leap 15.0

# ps auxw | grep lvmetad
root  ... /usr/sbin/lvmetad -f

viper1986 commented at 2019-04-03 10:26:

Hello,
I have similar error in my rear recover.
I perform restore to another LPAR on PowerVM. OS SLES 12 PPC64LE.
I try to modify /etc/lvm/lvm.conf and set use_lvmetad = 0 but it not working.

lvmetad.socket.txt
use_lvmetad.txt

gdha commented at 2019-04-03 10:34:

@viper1986 the issue has nothing to do with the lvmetab warning - duplicate UUID for a PV detected

+++ lvm pvcreate -ff --yes -v --uuid 2qVfcO-mTQp-NzPW-Xphj-ezrY-HMNy-OJN7Fi --norestorefile /dev/mapper/mpathp
  Found duplicate PV 7DGexv5mGb83xwSTdIaKDHzahPeHZ87u: using /dev/disk/by-id/dm-name-mpathp-part2 not /dev/disk/by-id/dm-name-mpathi-part2
  Using duplicate PV /dev/disk/by-id/dm-name-mpathp-part2 which is last seen, replacing /dev/disk/by-id/dm-name-mpathi-part2
  Device /dev/mapper/mpathp not found (or ignored by filtering). Please run with -vvv option for more details

viper1986 commented at 2019-04-03 11:29:

I deleted all VG, LV and PV from rescue system.
But I got:

Device /dev/mapper/mpathp not found (or ignored by filtering). Please run with -vvv option for more details

Run fdisk and see that there are some partition exist:

RESCUE dwrdev01:~ # fdisk /dev/mapper/mpathp
Welcome to fdisk (util-linux 2.28).
Changes will remain in memory only, until you decide to write them.
Be careful before using the write command.

Command (m for help): p
Disk /dev/mapper/mpathp: 100 GiB, 107374182400 bytes, 209715200 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 32768 bytes / 32768 bytes
Disklabel type: dos
Disk identifier: 0x5bc10d00

Device                   Boot  Start      End  Sectors  Size Id Type
/dev/mapper/mpathp-part1 *      4096   419839   415744  203M  6 FAT16
/dev/mapper/mpathp-part2      419848 41943039 41523192 19.8G 8e Linux LVM

Partition 2 does not start on physical sector boundary.

Command (m for help): d
Partition number (1,2, default 2): 1

Partition 1 has been deleted.

Command (m for help): d
Selected partition 2
Partition 2 has been deleted.

Command (m for help): p

Disk /dev/mapper/mpathp: 100 GiB, 107374182400 bytes, 209715200 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 32768 bytes / 32768 bytes
Disklabel type: dos
Disk identifier: 0x5bc10d00

Command (m for help): w
The partition table has been altered.
Calling ioctl() to re-read partition table.
Re-reading the partition table failed.: Invalid argument

The kernel still uses the old table. The new table will be used at the next reboot or after you run partprobe(8) or kpartx(8).

RESCUE dwrdev01:~ # fdisk -l /dev/mapper/mpathp
Disk /dev/mapper/mpathp: 100 GiB, 107374182400 bytes, 209715200 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 32768 bytes / 32768 bytes
Disklabel type: dos
Disk identifier: 0x5bc10d00

Now recover works.

But my question is why REAR does not delete existing partitions on disks?

jsmeix commented at 2019-04-03 14:34:

@viper1986
according to
https://github.com/rear/rear/issues/2044#issuecomment-479434057
your issue has nothing to do with this issue here
so that you would need to report your issue as a new separtated issue
via the [New issue] button at https://github.com/rear/rear/issues

When you restore to another LPAR on PowerVM
that other LPAR must have fully clean storage and be fully
compatible to the LPAR of your original system, cf. the sections
"Fully compatible replacement hardware is needed" and
"Prepare replacement hardware for disaster recovery" in
https://en.opensuse.org/SDB:Disaster_Recovery

Because of
https://github.com/rear/rear/issues/2044#issuecomment-479449884
it seems what was missing in this case was to
"Prepare replacement hardware for disaster recovery".

ReaR does delete existing partitions on disks via parted mklabel
but that does only delete the plain partitioning data but not any other
kind of remainder data on an already used disk which wipefs is
supposed to do when wipefs is run in reverse ordering on each
storage object (listed by lsblk as KNAME) starting from higher-level
storage objects (like partitions) down to lower-level storage objects
(like whole disks) - except exceptions where only dd helps (see below).

As long as we do not have a "cleanupdisk" script in ReaR,
cf. https://github.com/rear/rear/issues/799
you should clear out your target disk if it had been ever used before
to be in general on the safe side against unexpected weird issues
because of whatever kind of remainder data on an already used disk.

Cf.
https://github.com/rear/rear/issues/2019#issuecomment-476598723
and the subsequent comments therein.
The interesting result in that case was that the only really reliable
working way was to completely zero out the replacement storage
via a "dumb brute force" command like

dd if=/dev/zero of=/dev/whole_disk

i.e. wipefs alone was not sufficient - and only deleting plain partitions
is in general not at all sufficient to remove any kind of remainder data
on an already used disk (for example remainders of RAID
or partition-table signatures and other kind of "magic strings"
like LVM metadata and whatever else)...

jsmeix commented at 2019-04-05 12:46:

With https://github.com/rear/rear/pull/2107 merged
this issue here as described in
https://github.com/rear/rear/issues/2044#issue-411069484
should be fixed.


[Export of Github issue for rear/rear.]