#3085 Issue closed: write_protected_candidate_device called for '/sys/block/nvme0c0n1' but '/dev/nvme0c0n1' is no block device

Labels: enhancement, bug

hitrikrtek opened issue at 2023-11-17 08:12:

  • ReaR version ("/usr/sbin/rear -V"):
    Relax-and-Recover 2.7 / 2022-07-13

  • If your ReaR version is not the current version, explain why you can't upgrade:
    Latest version with distribution

  • OS version ("cat /etc/os-release" or "lsb_release -a" or "cat /etc/rear/os.conf"):

NAME="SLES"
VERSION="15-SP5"
VERSION_ID="15.5"
PRETTY_NAME="SUSE Linux Enterprise Server 15 SP5"
ID="sles"
ID_LIKE="suse"
ANSI_COLOR="0;32"
CPE_NAME="cpe:/o:suse:sles:15:sp5"
  • ReaR configuration files ("cat /etc/rear/local.conf"):
BACKUP=NETFS
OUTPUT=ISO
BACKUP_URL=nfs://172.17.3.1/rear
BACKUP_OPTIONS=
NETFS_KEEP_OLD_BACKUP_COPY=yes
USE_DHCLIENT=
MODULES_LOAD=( )
#BACKUP_PROG_INCLUDE=("{BACKUP_PROG_INCLUDE[@]}" '/boot/grub2/i386-pc' '/boot/grub2/x86_64-efi' '/home' '/opt' '/root' '/srv' '/hana/shared' '/usr' '/var')
BACKUP_PROG_EXCLUDE=("{BACKUP_PROG_EXCLUDE[@]}" '/hana/backup' '/hana/data' '/hana/log' '/hana/shared/HSP/HDB05/backup' '/hana/shared/NSP/HDB50/backup' '/opt/dpsapps/agentsvc/dbs' '/opt/dpsapps/agentsvc/logs' '/tmp')
POST_RECOVERY_SCRIPT=('if snapper --no-dbus -r $TARGET_FS_ROOT get-config | grep -q "^QGROUP.*[0-9]/[0-9]" ; then snapper --no-dbus -r $TARGET_FS_ROOT set-config QGROUP= ; snapper --no-dbus -r $TARGET_FS_ROOT setup-quota && echo snapper setup-quota done || echo snapper setup-quota failed ; else echo snapper setup-quota not used ; fi')
REQUIRED_PROGS=(snapper chattr lsattr ${REQUIRED_PROGS[@]})
COPY_AS_IS=(/usr/lib/snapper/installation-helper /etc/snapper/config-templates/default ${COPY_AS_IS[@]})
EXCLUDE_MOUNTPOINTS=("/mnt")
ISO_MKISOFS_BIN=/usr/bin/ebiso
  • Hardware vendor/product (PC or PowerNV BareMetal or ARM) or VM (KVM guest or PowerVM LPAR):
    DELL PowerEdge R860

  • System architecture (x86 compatible or PPC64/PPC64LE or what exact ARM device):
    x86_64

  • Firmware (BIOS or UEFI or Open Firmware) and bootloader (GRUB or ELILO or Petitboot):
    BIOS Version 1.5.6
    grub2-x86_64-efi-2.06-150500.29.8.1

  • Storage (local disk or SSD) and/or SAN (FC or iSCSI or FCoE) and/or multipath (DM or NVMe):
    NVMe in RAID configuration

Backplane 1 on Connector 0 of RAID Controller in SL 1
DeviceDescription   Backplane 1 on Connector 0 of RAID Controller in SL 1
DeviceType  PCIeSSDBackPlane
FirmwareVersion 7.10
FQDD    Enclosure.Internal.0-1:RAID.SL.1-1
InstanceID  Enclosure.Internal.0-1:RAID.SL.1-1
MediaType   Solid State Drive
PCIExpressGeneration    Gen 4
ProductName BP_PSV 0:1
RollupStatus    OK
SlotCount   8
WiredOrder  1

Backplane 2 on Connector 0 of RAID Controller in SL 4
DeviceDescription   Backplane 2 on Connector 0 of RAID Controller in SL 4
DeviceType  PCIeSSDBackPlane
FirmwareVersion 7.10
FQDD    Enclosure.Internal.0-2:RAID.SL.4-1
InstanceID  Enclosure.Internal.0-2:RAID.SL.4-1
MediaType   Solid State Drive
PCIExpressGeneration    Gen 4
ProductName BP_PSV 0:2
RollupStatus    OK
SlotCount   8
WiredOrder  2
  • Storage layout ("lsblk -ipo NAME,KNAME,PKNAME,TRAN,TYPE,FSTYPE,LABEL,SIZE,MOUNTPOINT"):
NAME                          KNAME          PKNAME         TRAN   TYPE FSTYPE      LABEL   SIZE MOUNTPOINT
/dev/sda                      /dev/sda                             disk LVM2_member        23.3T
|-/dev/mapper/vghana-lvusrsap /dev/dm-2      /dev/sda              lvm  xfs                 150G /usr/sap
|-/dev/mapper/vghana-lvdata   /dev/dm-3      /dev/sda              lvm  xfs                   6T /hana/data
|-/dev/mapper/vghana-lvshared /dev/dm-4      /dev/sda              lvm  xfs                   2T /hana/shared
|-/dev/mapper/vghana-lvlog    /dev/dm-5      /dev/sda              lvm  xfs                   4T /hana/log
'-/dev/mapper/vghana-lvbackup /dev/dm-6      /dev/sda              lvm  xfs                   2T /hana/backup
/dev/sdb                      /dev/sdb                      iscsi  disk                     100M
/dev/sdc                      /dev/sdc                      iscsi  disk                     100M
/dev/sdd                      /dev/sdd                      iscsi  disk                       1G
/dev/nvme0n1                  /dev/nvme0n1                  nvme   disk                   447.1G
|-/dev/nvme0n1p1              /dev/nvme0n1p1 /dev/nvme0n1   nvme   part vfat        ESP     250M /boot/efi
|-/dev/nvme0n1p2              /dev/nvme0n1p2 /dev/nvme0n1   nvme   part vfat        OS        2G /boot
'-/dev/nvme0n1p3              /dev/nvme0n1p3 /dev/nvme0n1   nvme   part LVM2_member       444.8G
  |-/dev/mapper/vgroot-root   /dev/dm-0      /dev/nvme0n1p3        lvm  btrfs             442.8G /var
  '-/dev/mapper/vgroot-swap   /dev/dm-1      /dev/nvme0n1p3        lvm  swap                  2G [SWAP]
  • Description of the issue (ideally so that others can reproduce it):

When running rear recover we get this error:
image

  • Workaround, if any:
    None

  • Attachments, as applicable ("rear -D mkrescue/mkbackup/recover" debug log files):
    rear-srv-lx2013.log
    See attached debug log: rear-srv-lx2013.log

jsmeix commented at 2023-11-23 08:22:

@hitrikrtek
what is the output of each of the commands

# echo /sys/block/*

# ls -l /dev/nvme0c0n1

# ls -l /sys/block/nvme0c0n1

# ls -l $( readlink -e /sys/block/nvme0c0n1 )

?

Details:

The following excerpt from your
https://github.com/rear/rear/files/13389016/rear-srv-lx2013.log
shows how it fails:

++ for current_device_path in /sys/block/*
++ current_disk_name=nvme0c0n1
++ is_multipath_path nvme0c0n1
++ test nvme0c0n1
++ is_multipath_used
++ type multipath
++ return 1
++ return 1
++ test -d /sys/block/nvme0c0n1/queue
++ test 0 = 1
++ is_write_protected /sys/block/nvme0c0n1
+++ write_protected_candidate_device /sys/block/nvme0c0n1
+++ local device=/sys/block/nvme0c0n1
+++ [[ /sys/block/nvme0c0n1 == /sys/block/* ]]
++++ get_device_name /sys/block/nvme0c0n1
++++ local name=/sys/block/nvme0c0n1
++++ name=nvme0c0n1
++++ contains_visible_char nvme0c0n1
+++++ tr -d -c '[:graph:]'
++++ test nvme0c0n1
++++ [[ nvme0c0n1 =~ ^mapper/ ]]
++++ [[ -L /dev/nvme0c0n1 ]]
++++ [[ nvme0c0n1 =~ ^dm- ]]
++++ name=nvme0c0n1
++++ echo /dev/nvme0c0n1
++++ [[ -r /dev/nvme0c0n1 ]]
++++ return 1
+++ device=/dev/nvme0c0n1
+++ test -b /dev/nvme0c0n1
+++ BugError 'write_protected_candidate_device called for '\''/sys/block/nvme0c0n1'\'' but '\''/dev/nvme0c0n1'\'' is no block device'

This looks strange because it seems in your case
there is "nvme0c0n1" in /sys/block/ where

test -b /dev/nvme0c0n1

fails so it seems "nvme0c0n1" is a block device
(because "nvme0c0n1" is in /sys/block/) that
either has no matching /dev/nvme0c0n1 device node
or it has a matching /dev/nvme0c0n1 device node
but that /dev/nvme0c0n1 device node is no block device.

By "googling" for 'nvme0c0n1' I found
https://github.com/google/cadvisor/issues/3340

Not all /sys/block devices will have a "dev" file

which seems to explain the root cause.

For comparison:
On my laptop with a NVMe disk I have /sys/block/nvme0n1
and I have a matching device node /dev/nvme0n1
which is a block device.

The matching code is in
usr/share/rear/layout/prepare/default/250_compare_disks.sh
which is online for ReaR 2.7 starting at
https://github.com/rear/rear/blob/rear-2.7/usr/share/rear/layout/prepare/default/250_compare_disks.sh#L94

The is_write_protected and write_protected_candidate_device
functions are in
usr/share/rear/lib/write-protect-functions.sh
which is online for ReaR 2.7
https://github.com/rear/rear/blob/rear-2.7/usr/share/rear/lib/write-protect-functions.sh

So it seems we have to enhance the
write_protected_candidate_device function
to ignore it when it is called with a device
where no matching /dev/... device node exists.

hitrikrtek commented at 2023-11-23 08:28:

Hi @jsmeix
First of all thank you for your time looking into this.
I've ran the commands you requested and here's the output:

# echo /sys/block/*
/sys/block/dm-0 /sys/block/dm-1 /sys/block/dm-2 /sys/block/dm-3 /sys/block/dm-4 /sys/block/dm-5 /sys/block/dm-6 /sys/block/nvme0c0n1 /sys/block/nvme0n1 /sys/block/sda /sys/block/sdb /sys/block/sdc /sys/block/sdd

# ls -l /dev/nvme0c0n1
ls: cannot access '/dev/nvme0c0n1': No such file or directory

# ls -l /sys/block/nvme0c0n1
lrwxrwxrwx 1 root root 0 Nov 21 13:38 /sys/block/nvme0c0n1 -> ../devices/pci0000:00/0000:00:0a.0/0000:01:00.0/nvme/nvme0/nvme0c0n1

# ls -l $( readlink -e /sys/block/nvme0c0n1 )
total 0
-r--r--r-- 1 root root 4096 Nov 23 09:26 alignment_offset
-r--r--r-- 1 root root 4096 Nov 23 09:26 capability
lrwxrwxrwx 1 root root    0 Nov 21 13:38 device -> ../../nvme0
-r--r--r-- 1 root root 4096 Nov 23 09:26 discard_alignment
-r--r--r-- 1 root root 4096 Nov 23 09:26 diskseq
-r--r--r-- 1 root root 4096 Nov 23 09:26 eui
-r--r--r-- 1 root root 4096 Nov 23 09:26 events
-r--r--r-- 1 root root 4096 Nov 23 09:26 events_async
-rw-r--r-- 1 root root 4096 Nov 23 09:26 events_poll_msecs
-r--r--r-- 1 root root 4096 Nov 23 09:26 ext_range
-r--r--r-- 1 root root 4096 Nov 23 09:26 hidden
drwxr-xr-x 2 root root    0 Nov 23 09:26 holders
-r--r--r-- 1 root root 4096 Nov 23 09:26 inflight
drwxr-xr-x 2 root root    0 Nov 23 09:26 integrity
-rw-r--r-- 1 root root 4096 Nov 23 09:26 make-it-fail
drwxr-xr-x 5 root root    0 Nov 23 09:26 mq
-r--r--r-- 1 root root 4096 Nov 23 09:26 nsid
drwxr-xr-x 2 root root    0 Nov 23 09:26 power
drwxr-xr-x 2 root root    0 Nov 23 09:26 queue
-r--r--r-- 1 root root 4096 Nov 23 09:26 range
-r--r--r-- 1 root root 4096 Nov 23 09:26 removable
-r--r--r-- 1 root root 4096 Nov 23 09:26 ro
-r--r--r-- 1 root root 4096 Nov 23 09:26 size
drwxr-xr-x 2 root root    0 Nov 23 09:26 slaves
-r--r--r-- 1 root root 4096 Nov 23 09:26 stat
lrwxrwxrwx 1 root root    0 Nov 21 13:38 subsystem -> ../../../../../../../class/block
drwxr-xr-x 2 root root    0 Nov 23 09:26 trace
-rw-r--r-- 1 root root 4096 Nov 21 13:38 uevent
-r--r--r-- 1 root root 4096 Nov 23 09:26 wwid

jsmeix commented at 2023-11-23 08:31:

# echo /sys/block/*
... /sys/block/nvme0c0n1 /sys/block/nvme0n1 ...

# ls -l /dev/nvme0c0n1
ls: cannot access '/dev/nvme0c0n1': No such file or directory

proves that the root cause is that

Not all /sys/block devices will have a "dev" file

so we have to enhance the
write_protected_candidate_device function
to ignore it when it is called with a device
where no matching /dev/... device node exists.

jsmeix commented at 2023-11-23 10:08:

@hitrikrtek

I cannot reproduce your issue
because I don't have a system with /sys/block/nvme0c0n1
or something similar - i.e. where a /sys/block/device
does not have a matching /dev/device.

So as an offhanded untested workaround for now
you may try out how things behave when you replace in
usr/share/rear/lib/write-protect-functions.sh

test -b "$device" || BugError "write_protected_candidate_device called for '$1' but '$device' is no block device"

with

test -b "$device" || DebugPrint "write_protected_candidate_device called for '$1' but '$device' is no block device"

Perhaps with that you get other errors in other functions
in usr/share/rear/lib/write-protect-functions.sh

Alternatively (and preferred by me if you can) you could test
if my overhauled usr/share/rear/lib/write-protect-functions.sh
https://raw.githubusercontent.com/rear/rear/a403b5fe1c2c58420ba1b77db52283c041e4f7d4/usr/share/rear/lib/write-protect-functions.sh
in
https://github.com/rear/rear/pull/3091
makes things work well for your case.

hitrikrtek commented at 2023-11-23 10:16:

Alright! I've grabbed your updated function and am creating a backup now. Will report back the result.
Thank you

jsmeix commented at 2023-11-23 12:16:

@hitrikrtek
with

OUTPUT=ISO

you should get nothing automatically set for
WRITE_PROTECTED_IDS and/or WRITE_PROTECTED_FS_LABEL_PATTERNS
after "rear mkrescue/mkbackup" in your
/var/tmp/rear.XXX/rootfs/etc/rear/rescue.conf
When you also have
neither WRITE_PROTECTED_IDS
nor WRITE_PROTECTED_FS_LABEL_PATTERNS
specified in your etc/rear/local.conf file
the all is_write_protected function calls
would always result that nothing is write-protected
so the is_write_protected function call could be
simplified and made behave more robust against errors
when it checks if WRITE_PROTECTED_IDS
and WRITE_PROTECTED_FS_LABEL_PATTERNS are empty
and in this case directly return 1 (i.e. "not write-protected").

hitrikrtek commented at 2023-11-23 12:49:

Regarding the updated function - this seems to work now, we were able to do restore - but now we're experiencing some other issues after restore our network broke somehow... all disks and configs seem just fine, we're investigating but I don't think it's related to this device. I'll post my finding here, if we figure it out.

jsmeix commented at 2023-11-23 12:52:

@hitrikrtek
please do not post other things that do not belong to this issue
in this issue here but instead create new separated issues
for each separated issue that you have.
Otherwise all would get messed up.


[Export of Github issue for rear/rear.]