#2715 Issue closed: 150_wipe_disks.sh does not work with RAID devices

Labels: enhancement, fixed / solved / done, minor bug

jsmeix opened issue at 2021-11-19 14:10:

Current rear master code
with the changes in https://github.com/rear/rear/pull/2714
but that changes are not related to layout/recreate/default/150_wipe_disks.sh

I use original system and replacement system as in
https://github.com/rear/rear/pull/2714#issuecomment-970279152
i.e. both the original system and the replacement system are QEMU/KVM VMs
with 12GiB /dev/sda and /dev/sdb for the RAID1 array
and 1 GiB /dev/sdc for swap.

Disk layout of the original system

# lsblk -ipo NAME,KNAME,PKNAME,TRAN,TYPE,FSTYPE,SIZE,MOUNTPOINT,UUID
NAME                      KNAME        PKNAME       TRAN TYPE  FSTYPE             SIZE MOUNTPOINT UUID
/dev/sda                  /dev/sda                  ata  disk  linux_raid_member   12G            8d05eb84-2de8-31d1-dfed-54b2ad592118
`-/dev/md127              /dev/md127   /dev/sda          raid1                     12G            
  |-/dev/md127p1          /dev/md127p1 /dev/md127        part                      10M            
  `-/dev/md127p2          /dev/md127p2 /dev/md127        part  crypto_LUKS       11.9G            d0446c00-9e79-4872-abaa-2d464fd71c99
    `-/dev/mapper/cr_root /dev/dm-0    /dev/md127p2      crypt btrfs             11.9G /          85406026-0559-4b0d-8f67-ec19d3b556f5
/dev/sdb                  /dev/sdb                  ata  disk  linux_raid_member   12G            8d05eb84-2de8-31d1-dfed-54b2ad592118
`-/dev/md127              /dev/md127   /dev/sdb          raid1                     12G            
  |-/dev/md127p1          /dev/md127p1 /dev/md127        part                      10M            
  `-/dev/md127p2          /dev/md127p2 /dev/md127        part  crypto_LUKS       11.9G            d0446c00-9e79-4872-abaa-2d464fd71c99
    `-/dev/mapper/cr_root /dev/dm-0    /dev/md127p2      crypt btrfs             11.9G /          85406026-0559-4b0d-8f67-ec19d3b556f5
/dev/sdc                  /dev/sdc                  ata  disk                       1G            
`-/dev/sdc1               /dev/sdc1    /dev/sdc          part  swap              1023M [SWAP]     9c606f48-92cd-4f98-be22-0f8a75358bed

My etc/rear/local.conf

OUTPUT=ISO
BACKUP=NETFS
BACKUP_OPTIONS="nfsvers=3,nolock"
BACKUP_URL=nfs://192.168.122.1/nfs
REQUIRED_PROGS+=( snapper chattr )
PROGS+=( lsattr )
COPY_AS_IS+=( /usr/lib/snapper/installation-helper /etc/snapper/config-templates/default )
BACKUP_PROG_INCLUDE=( /boot/grub2/x86_64-efi /home /boot/grub2/i386-pc /root /srv /opt /tmp /usr/local /var )
POST_RECOVERY_SCRIPT=( 'if snapper --no-dbus -r $TARGET_FS_ROOT get-config | grep -q "^QGROUP.*[0-9]/[0-9]" ; then snapper --no-dbus -r $TARGET_FS_ROOT set-config QGROUP= ; snapper --no-dbus -r $TARGET_FS_ROOT setup-quota && echo snapper setup-quota done || echo snapper setup-quota failed ; else echo snapper setup-quota not used ; fi' )
SSH_ROOT_PASSWORD='rear'
USE_DHCLIENT="yes"
FIRMWARE_FILES=( 'no' )
MODULES=( 'loaded_modules' )
PROGRESS_MODE="plain"
PROGRESS_WAIT_SECONDS="3"
DISKS_TO_BE_WIPED=''
LUKS_CRYPTSETUP_OPTIONS+=" --force-password"
GRUB2_INSTALL_DEVICES="/dev/sda /dev/sdb"

I run "rear recover" on the replacement system
where another same "rear recover" had been run before
i.e. there already exists the same disk layout on the replacement system
as it is on the original system so those things should be wiped:

/dev/sda
`-/dev/md127
  |-/dev/md127p1
  `-/dev/md127p2
    `-/dev/mapper/cr_root /dev/dm-0
/dev/sdb
`-/dev/md127
  |-/dev/md127p1
  `-/dev/md127p2
    `-/dev/mapper/cr_root /dev/dm-0
/dev/sdc
`-/dev/sdc1

but only

/dev/sdc
`-/dev/sdc1

gets actually wiped.

During "rear recover" I get

Running 'layout/recreate' stage ======================
Enter the password to open LUKS device /dev/sda2 temporarily as luks-sda2 (or skip with [Ctrl]+[C])
Enter passphrase for /dev/sda2: 
LUKS device /dev/sda2 temporarily opened as luks-sda2
Enter the password to open LUKS device /dev/sdb2 temporarily as luks-sdb2 (or skip with [Ctrl]+[C])
Enter passphrase for /dev/sdb2: 
LUKS device /dev/sdb2 temporarily opened as luks-sdb2
Closed LUKS device /dev/sdb2 (was temporarily opened as luks-sda2)
Closed LUKS device /dev/sdb2 (was temporarily opened as luks-sdb2)
Skip wiping /dev/md127 (no output for 'lsblk -nlpo KNAME /dev/md127' or failed)
Wiping child devices of /dev/sdc in reverse ordering: /dev/sdc1 /dev/sdc 
Wiped first 16777216 bytes of /dev/sdc1
Wiped last 16777216 bytes of /dev/sdc1
Wiped first 16777216 bytes of /dev/sdc
Wiped last 16777216 bytes of /dev/sdc
Start system layout restoration.
Disk '/dev/sdc': creating 'gpt' partition table
Disk '/dev/sdc': creating partition number 1 with name ''sdc1''
Creating software RAID /dev/md127
Disk '/dev/md127': creating 'gpt' partition table
Disk '/dev/md127': creating partition number 1 with name ''md127p1''
Disk '/dev/md127': creating partition number 2 with name ''md127p2''
Creating swap on /dev/sdc1
Creating LUKS volume cr_root on /dev/md127p2
Set the password for LUKS volume cr_root (for 'cryptsetup luksFormat' on /dev/md127p2):
Enter passphrase for /dev/md127p2: 
Enter the password for LUKS volume cr_root (for 'cryptsetup luksOpen' on /dev/md127p2):
Enter passphrase for /dev/md127p2: 
Creating filesystem of type btrfs with mount point / on /dev/mapper/cr_root.
Mounting filesystem /
Disk layout created.

The part where 150_wipe_disks.sh does not work is

Skip wiping /dev/md127 (no output for 'lsblk -nlpo KNAME /dev/md127' or failed)

jsmeix commented at 2021-11-24 13:29:

In enforced migration mode things are more clear:

# export MIGRATION_MODE=yes

# rear -D recover
...
UserInput -I WIPE_DISKS_CONFIRMATION needed in /usr/share/rear/layout/recreate/default/120_confirm_wipedisk_disks.sh line 36
Disks to be overwritten: /dev/md127 /dev/sdc 
1) Confirm disks to be completely overwritten and continue 'rear recover'
2) Use Relax-and-Recover shell and return back to here
3) Abort 'rear recover'

so it tries to wipe the RAID device /dev/md127
but actually it should wipe its component devices /dev/sda and /dev/sdb
in disklayout.conf

# grep '^raid ' var/lib/rear/layout/disklayout.conf
raid /dev/md127 level=raid1 raid-devices=2 devices=/dev/sda,/dev/sdb name=raid1sdab metadata=1.0 uuid=8d05eb84:2de831d1:dfed54b2:ad592118

jsmeix commented at 2021-11-25 13:56:

How it looks on the already used replacement system disk
when the recovery system is booted:

# lsblk -ipo NAME,KNAME,PKNAME,TRAN,TYPE,FSTYPE,SIZE,MOUNTPOINT,UUID
NAME        KNAME     PKNAME   TRAN TYPE FSTYPE             SIZE MOUNTPOINT UUID
/dev/sda    /dev/sda           ata  disk linux_raid_member   12G            8d05eb84-2de8-31d1-dfed-54b2ad592118
|-/dev/sda1 /dev/sda1 /dev/sda      part linux_raid_member   10M            8d05eb84-2de8-31d1-dfed-54b2ad592118
`-/dev/sda2 /dev/sda2 /dev/sda      part crypto_LUKS       11.9G            d0446c00-9e79-4872-abaa-2d464fd71c99
/dev/sdb    /dev/sdb           ata  disk linux_raid_member   12G            8d05eb84-2de8-31d1-dfed-54b2ad592118
|-/dev/sdb1 /dev/sdb1 /dev/sdb      part linux_raid_member   10M            8d05eb84-2de8-31d1-dfed-54b2ad592118
`-/dev/sdb2 /dev/sdb2 /dev/sdb      part crypto_LUKS       11.9G            d0446c00-9e79-4872-abaa-2d464fd71c99
/dev/sdc    /dev/sdc           ata  disk                      1G            
`-/dev/sdc1 /dev/sdc1 /dev/sdc      part swap              1023M            9c606f48-92cd-4f98-be22-0f8a75358bed

I.e. there is no /dev/md127 RAID device in the booted recovery system.

jsmeix commented at 2021-11-25 14:12:

After "rear recover" the /dev/md127 RAID device is there

# lsblk -ipo NAME,KNAME,PKNAME,TRAN,TYPE,FSTYPE,SIZE,MOUNTPOINT,UUID
NAME                      KNAME        PKNAME       TRAN TYPE  FSTYPE             SIZE MOUNTPOINT UUID
/dev/sda                  /dev/sda                  ata  disk  linux_raid_member   12G            8d05eb84-2de8-31d1-dfed-54b2ad592118
|-/dev/sda1               /dev/sda1    /dev/sda          part  linux_raid_member   10M            8d05eb84-2de8-31d1-dfed-54b2ad592118
|-/dev/sda2               /dev/sda2    /dev/sda          part  crypto_LUKS       11.9G            d0446c00-9e79-4872-abaa-2d464fd71c99
`-/dev/md127              /dev/md127   /dev/sda          raid1                     12G            
  |-/dev/md127p1          /dev/md127p1 /dev/md127        part                      10M            
  `-/dev/md127p2          /dev/md127p2 /dev/md127        part  crypto_LUKS       11.9G            d0446c00-9e79-4872-abaa-2d464fd71c99
    `-/dev/mapper/cr_root /dev/dm-0    /dev/md127p2      crypt btrfs             11.9G /mnt/local 85406026-0559-4b0d-8f67-ec19d3b556f5
/dev/sdb                  /dev/sdb                  ata  disk  linux_raid_member   12G            8d05eb84-2de8-31d1-dfed-54b2ad592118
|-/dev/sdb1               /dev/sdb1    /dev/sdb          part  linux_raid_member   10M            8d05eb84-2de8-31d1-dfed-54b2ad592118
|-/dev/sdb2               /dev/sdb2    /dev/sdb          part  crypto_LUKS       11.9G            d0446c00-9e79-4872-abaa-2d464fd71c99
`-/dev/md127              /dev/md127   /dev/sdb          raid1                     12G            
  |-/dev/md127p1          /dev/md127p1 /dev/md127        part                      10M            
  `-/dev/md127p2          /dev/md127p2 /dev/md127        part  crypto_LUKS       11.9G            d0446c00-9e79-4872-abaa-2d464fd71c99
    `-/dev/mapper/cr_root /dev/dm-0    /dev/md127p2      crypt btrfs             11.9G /mnt/local 85406026-0559-4b0d-8f67-ec19d3b556f5
/dev/sdc                  /dev/sdc                  ata  disk                       1G            
`-/dev/sdc1               /dev/sdc1    /dev/sdc          part  swap              1023M            9c606f48-92cd-4f98-be22-0f8a75358bed

and after reebot it even looks normal (i.e. same as on the original system)

# lsblk -ipo NAME,KNAME,PKNAME,TRAN,TYPE,FSTYPE,SIZE,MOUNTPOINT,UUID
NAME                      KNAME        PKNAME       TRAN TYPE  FSTYPE             SIZE MOUNTPOINT UUID
/dev/sda                  /dev/sda                  ata  disk  linux_raid_member   12G            8d05eb84-2de8-31d1-dfed-54b2ad592118
`-/dev/md127              /dev/md127   /dev/sda          raid1                     12G            
  |-/dev/md127p1          /dev/md127p1 /dev/md127        part                      10M            
  `-/dev/md127p2          /dev/md127p2 /dev/md127        part  crypto_LUKS       11.9G            d0446c00-9e79-4872-abaa-2d464fd71c99
    `-/dev/mapper/cr_root /dev/dm-0    /dev/md127p2      crypt btrfs             11.9G /          85406026-0559-4b0d-8f67-ec19d3b556f5
/dev/sdb                  /dev/sdb                  ata  disk  linux_raid_member   12G            8d05eb84-2de8-31d1-dfed-54b2ad592118
`-/dev/md127              /dev/md127   /dev/sdb          raid1                     12G            
  |-/dev/md127p1          /dev/md127p1 /dev/md127        part                      10M            
  `-/dev/md127p2          /dev/md127p2 /dev/md127        part  crypto_LUKS       11.9G            d0446c00-9e79-4872-abaa-2d464fd71c99
    `-/dev/mapper/cr_root /dev/dm-0    /dev/md127p2      crypt btrfs             11.9G /          85406026-0559-4b0d-8f67-ec19d3b556f5
/dev/sdc                  /dev/sdc                  ata  disk                       1G            
`-/dev/sdc1               /dev/sdc1    /dev/sdc          part  swap              1023M [SWAP]     9c606f48-92cd-4f98-be22-0f8a75358bed

jsmeix commented at 2021-11-25 14:27:

With the recent commits in particular with
https://github.com/rear/rear/pull/2714/commits/19bd86decadb88b785782adc922cd2a20a660e0c
the user can now specify DISKS_TO_BE_WIPED (i.e. final power to the user)
to mitigate this issue here.

For me it works with

DISKS_TO_BE_WIPED="/dev/sda /dev/sdb"

in etc/rear/local.conf

# rear -D recover
...
Wiping child devices of /dev/sda in reverse ordering: /dev/sda2 /dev/sda1 /dev/sda 
Wiped first 16777216 bytes of /dev/sda2
Wiped last 16777216 bytes of /dev/sda2
Wiped first 10485760 bytes of /dev/sda1
Skip wiping at the end of /dev/sda1 (dvice size 10485760 not greater than the bytes that were wiped)
Wiped first 16777216 bytes of /dev/sda
Wiped last 16777216 bytes of /dev/sda
Wiping child devices of /dev/sdb in reverse ordering: /dev/sdb2 /dev/sdb1 /dev/sdb 
Wiped first 16777216 bytes of /dev/sdb2
Wiped last 16777216 bytes of /dev/sdb2
Wiped first 10485760 bytes of /dev/sdb1
Skip wiping at the end of /dev/sdb1 (dvice size 10485760 not greater than the bytes that were wiped)
Wiped first 16777216 bytes of /dev/sdb
Wiped last 16777216 bytes of /dev/sdb
Start system layout restoration.

For this test I did intentionall not specify /dev/sdc and it was not wiped as it should.

jsmeix commented at 2021-12-01 16:26:

https://github.com/rear/rear/pull/2721
should fix the bug part of this issue here
because it skips disks to be wiped that do not exist
on the bare hardware or VM in the recovery system.

jsmeix commented at 2021-12-03 14:32:

With the latest changes up to now in
https://github.com/rear/rear/pull/2721
wiping RAID array devices should work automatically
at least for RAID1 arrays - the only one I tested up to now.

jsmeix commented at 2021-12-07 15:25:

With https://github.com/rear/rear/pull/2721 merged
this issue should be sufficiently fixed - at least for now.


[Export of Github issue for rear/rear.]