#2996 PR merged: Add -f (force) option to wipefs command in 120_include_raid_code.sh

Labels: enhancement, discuss / RFC, fixed / solved / done

danboid opened issue at 2023-05-26 10:44:

Pull Request Details:
  • Type: Bug Fix

  • Impact: Normal

  • Reference to related issue (URL):

https://github.com/rear/rear/issues/2990

  • How was this pull request tested?

On a Ubuntu desktop machine using mdadm RAID 1.

  • Brief description of the changes in this pull request:

rear recover fails during disk layout recreation on my machine unless wipefs is called with the -f ( or --force) option.

rear may currently be a bit over-zealous with its use of wipefs when recreating md RAID arrays but if we are going to use wipefs then we should be consistent because wipefs uses the force option elsewhere in rear and recover failed to work at all for me without this extra -f.

schlomo commented at 2023-05-29 10:29:

Caution, CentOS 6 still has ancient wipefs that doesn't support --force which is why we have code like this in other places (e.g. in usr/share/rear/layout/prep-for-mount/GNU/Linux/131_include_filesystem_code.sh:

wipefs --all --force $device || wipefs --all $device || dd if=/dev/zero of=$device bs=512 count=1 || true

Also, in usr/share/rear/layout/save/GNU/Linux/230_filesystem_layout.sh I see that we include wipefs only optionally, so that apparently all the code should also handle the case that it actually doesn't exist.

Can you please adjust accordingly, at least with the retry in case --force doesn't work.

schlomo commented at 2023-05-29 10:31:

FYI, I found this out like this:

$ tools/run-in-docker -- 'wipefs --help 2>&1 | grep -E -- "--(all|force)"'
********** ubuntu:18.04                             **********
 -a, --all           wipe all magic strings (BE CAREFUL!)
 -f, --force         force erasure
********** Copying dist to dist-all/ubuntu-18-04
********** ubuntu:20.04                             **********
 -a, --all           wipe all magic strings (BE CAREFUL!)
 -f, --force         force erasure
********** Copying dist to dist-all/ubuntu-20-04
********** ubuntu:22.04                             **********
 -a, --all           wipe all magic strings (BE CAREFUL!)
 -f, --force         force erasure
********** Copying dist to dist-all/ubuntu-22-04
********** ubuntu:23.04                             **********
 -a, --all           wipe all magic strings (BE CAREFUL!)
 -f, --force         force erasure
********** Copying dist to dist-all/ubuntu-23-04
********** ubuntu:devel                             **********
 -a, --all           wipe all magic strings (BE CAREFUL!)
 -f, --force         force erasure
********** Copying dist to dist-all/ubuntu-devel
********** opensuse/leap:42                         **********
 -a, --all           wipe all magic strings (BE CAREFUL!)
 -f, --force         force erasure
********** Copying dist to dist-all/opensuse-leap-42
********** opensuse/leap:15                         **********
 -a, --all           wipe all magic strings (BE CAREFUL!)
 -f, --force         force erasure
********** Copying dist to dist-all/opensuse-leap-15
********** registry.suse.com/suse/sle15             **********
 -a, --all           wipe all magic strings (BE CAREFUL!)
 -f, --force         force erasure
********** Copying dist to dist-all/registry-suse-com-suse-sle15
********** centos:6                                 **********
 -a, --all           wipe all magic strings (BE CAREFUL!)
********** Copying dist to dist-all/centos-6
********** centos:7                                 **********
 -a, --all           wipe all magic strings (BE CAREFUL!)
 -f, --force         force erasure
********** Copying dist to dist-all/centos-7
********** centos:8                                 **********
 -a, --all           wipe all magic strings (BE CAREFUL!)
 -f, --force         force erasure
********** Copying dist to dist-all/centos-8
********** sl:7                                     **********
 -a, --all           wipe all magic strings (BE CAREFUL!)
 -f, --force         force erasure
********** Copying dist to dist-all/sl-7
********** quay.io/centos/centos:stream8            **********
 -a, --all           wipe all magic strings (BE CAREFUL!)
 -f, --force         force erasure
********** Copying dist to dist-all/quay-io-centos-centos-stream8
********** quay.io/centos/centos:stream9            **********
 -a, --all           wipe all magic strings (BE CAREFUL!)
 -f, --force         force erasure
********** Copying dist to dist-all/quay-io-centos-centos-stream9
********** fedora:29                                **********
 -a, --all           wipe all magic strings (BE CAREFUL!)
 -f, --force         force erasure
********** Copying dist to dist-all/fedora-29
********** fedora:31                                **********
 -a, --all           wipe all magic strings (BE CAREFUL!)
 -f, --force         force erasure
********** Copying dist to dist-all/fedora-31
********** fedora:34                                **********
 -a, --all           wipe all magic strings (BE CAREFUL!)
 -f, --force         force erasure
********** Copying dist to dist-all/fedora-34
********** fedora:37                                **********
 -a, --all           wipe all magic strings (BE CAREFUL!)
 -f, --force         force erasure
********** Copying dist to dist-all/fedora-37
********** fedora:38                                **********
 -a, --all           wipe all magic strings (BE CAREFUL!)
 -f, --force         force erasure
********** Copying dist to dist-all/fedora-38
********** fedora:rawhide                           **********
 -a, --all           wipe all magic strings (BE CAREFUL!)
 -f, --force         force erasure
********** Copying dist to dist-all/fedora-rawhide
********** archlinux                                **********
 -a, --all           wipe all magic strings (BE CAREFUL!)
 -f, --force         force erasure
********** Copying dist to dist-all/archlinux
********** manjarolinux/base                        **********
 -a, --all           wipe all magic strings (BE CAREFUL!)
 -f, --force         force erasure
********** Copying dist to dist-all/manjarolinux-base
** SCRIPT RUN TIME 15 SECONDS **

And here I can see that all distros 1) have wipefs and 2) support --force except CentOS 6

danboid commented at 2023-05-31 21:34:

I've added a fallback wipefs command without the -f for CentOS 6.

jsmeix commented at 2023-06-01 06:10:

@pcahyna
could you please have a look here because
I need your opinion because in
https://github.com/rear/rear/issues/2990#issuecomment-1561154428
you wrote

to me it seems that wiping a filesystem that is currently mounted
is a bad idea

and at least to me it seemed in
https://github.com/rear/rear/issues/2990#issuecomment-1561239894
@danboid
agreed with you.

But this pull request would contrdict that
so I am wondering what the common opinion is
(if there is a common opinion here)
whether or not calling 'wipefs -af' is a good or bad
at the specific places where it is called in ReaR.

I think
https://github.com/rear/rear/issues/2990#issuecomment-1562481526
shows that the wipefs man page is wrong, cf.
https://github.com/rear/rear/issues/2990#issuecomment-1560470135
because --force is not only meant when the filesystem is mounted
but also in other cases like

wipefs: /dev/sda1: ignoring nested "dos" partition table on non-whole disk device
wipefs: Use the --force option to force erase.

I think that during "rear recover" those block devices
where wipefs is called in ReaR will get overwritten anyway
(e.g. by creating new filesystems from scratch on them)
so at the code places where wipefs is called it does not matter
to worry about whether or not those block devices are mounted
or otherwise "in use".

In this specific case it may even happen that a block device
gets automatically mounted (or otherwise get "in use")
by whatever lower level stuff (e.g. udev or whatever else)
because of whatever kind of remainder data on an already used disk
(for example remainders of RAID or partition-table signatures and
other kind of "magic strings" like LVM metadata and whatever else).

So calling 'wipefs' with '--force' helps to let "rear recover"
succeed on block devices that are not completely zeroed out, cf.
"Prepare replacement hardware for disaster recovery" in
https://en.opensuse.org/SDB:Disaster_Recovery

It is a separated issue that during "rear recover"
ReaR should care about which block devices get overwritten
which is what WRITE_PROTECTED_IDS is meant for.

pcahyna commented at 2023-06-01 13:24:

@jsmeix , looking ...

pcahyna commented at 2023-06-01 13:41:

I think
#2990 (comment)
shows that the wipefs man page is wrong, cf.
#2990 (comment)
because --force is not only meant when the filesystem is mounted

This seems to be a valid reason to use --force.

I think that during "rear recover" those block devices
where wipefs is called in ReaR will get overwritten anyway
(e.g. by creating new filesystems from scratch on them)
so at the code places where wipefs is called it does not matter
to worry about whether or not those block devices are mounted
or otherwise "in use".

I would worry about this, because we are effectively pulling the rug out from under the rescue system kernel and things will be in an inconsistent state. What will happen if we overwrite a RAID partition that is assembled into an array, for example - won't the kernel try to resynchronize data from the other copy, or detect some inconsistency or abort?

In this specific case it may even happen that a block device
gets automatically mounted (or otherwise get "in use")
by whatever lower level stuff (e.g. udev or whatever else)
because of whatever kind of remainder data on an already used disk
(for example remainders of RAID or partition-table signatures and
other kind of "magic strings" like LVM metadata and whatever else).

But then we should make sure that the stuff stops "being in use" before manipulating it. For example, I see lots of vgchange -an and umount -f before pvremove and dd in the procedure described in README.wipe_disks, and the goal is exactly that. Maybe such calls need to be added to the disk wiping code?

I am curious what is the underlying problem here. I see that the partitions were reported to be busy - @danboid what are they used by? To check whether there are active RAID arrays, one can use

mdadm --detail --scan --verbose

-- can you please try that before running ReaR ?

pcahyna commented at 2023-06-01 13:43:

In general I am not against adding --force there - it is already in another place and as mentioned, there are other reasons besides device in use that may require using it (tha. I am afraid though that we may be hiding another problem in this particular case.

schlomo commented at 2023-06-02 08:38:

Given that the main goal of rear recover is to recreate the system from scratch, our implementation is in places rather "brute force", like if it refuses to cooperate we just try hitting it harder.

I see two somewhat separate issues here:

  1. if we use wipefs -a then better to use wipefs -af if supported
  2. instead of "blindly" running wipefs we might want to try to see if it is a good time to do so, e.g. if the resources to be wiped are actually in a good state for wiping.

This PR here is about problem 1) and therefore I'd like to merge it (and it helps a user with their problem). Problem 2) will remain as long as we don't change the current brute force approach, we can try to add more "disable mdadm/LVM/..." calls before but this would also be just more brute force attempts to get the system into a clean state. The "proper" solution of analysing the state and taking appropriate action is probably too difficult to implement.

As a DR tool ReaR as one additional advantage: If it doesn't work the user can always reboot and try again and maybe then it will work, especially if the first attempt managed to break existing mdadm/LVM/... structures so that the second attempt can then successfully wipe the disks and recreate the system.

Therefore, let's merge this and move on.

pcahyna commented at 2023-06-02 08:56:

anyway, let's continue this discussion in issue #2990


[Export of Github issue for rear/rear.]