#1327 Issue closed
: Stalled recovery due to mkfs interactive prompt¶
Labels: enhancement
, bug
, cleanup
, fixed / solved / done
deacomm opened issue at 2017-04-24 12:51:¶
During a restore, depending on what condition the target disk is in, wipefs will sometimes not wipe existing filesystem from the block device, and subsequent mkfs will stall the whole process, by interactively asking for confirmation:
2017-04-20 14:36:14 Creating filesystem of type ext3 with mount point /boot on /dev/sda1.
+++ Print 'Creating filesystem of type ext3 with mount point /boot on /dev/sda1.'
+++ test 1
+++ echo -e 'Creating filesystem of type ext3 with mount point /boot on /dev/sda1.'
+++ wipefs -a /dev/sda1
wipefs: /dev/sda1: ignoring nested "dos" partition table on non-whole disk device
wipefs: Use the --force option to force erase.
+++ mkfs -t ext3 -b 4096 -i 16384 -U 5dc25119-fc7c-4d93-93fb-2b26a6916036 /dev/sda1
mke2fs 1.42.11 (09-Jul-2014)
Found a dos partition table in /dev/sda1
Proceed anyway? (y,n)
This prompt is visible in the log file, but not on system's console, there is just an empty line with a blinking cursor. After I hit 'y' on console, restore process continues.
This was on SLES12SP1, with rear 2.00 package from OBS:
Name : rear Relocations: (not relocatable)
Version : 2.00 Vendor: obs://build.opensuse.org/Archiving
Release : 1 Build Date: Fri Jan 6 12:24:20 2017
Install Date: (not installed) Build Host: lamb10
Group : Applications/File Source RPM: rear-2.00-1.src.rpm
Size : 1184748 License: GPLv3
Signature : DSA/SHA1, Fri Jan 6 12:24:31 2017, Key ID 6b7485db725a0c43
schlomo commented at 2017-04-24 13:10:¶
We could add --force
to wipefs. Just need to find out if the older
distros support that parameter.
We already use --force
with many other tools so that this would be
good thing for wipefs too.
gozora commented at 2017-04-24 13:15:¶
I'm afraid that will not be universal enough :-(
on my Ubunty Trusty:
# wipefs --force
wipefs: unrecognized option '--force'
Maybe --force
is available only on newer versions ?
V.
schlomo commented at 2017-04-24 13:16:¶
Or the traditional wipefs .... <<<y
:-)
gozora commented at 2017-04-24 13:18:¶
You just don't like pipes, do you? :-)
jsmeix commented at 2017-04-24 13:21:¶
In general FYI:
The whole current 'wipefs' implementation is likely insufficient,
see
https://github.com/rear/rear/issues/799
schlomo commented at 2017-04-24 13:21:¶
I find y | wipefs ...
indeed less readable. And it spawns another
shell for literally nothing. I prefer to use Bash with all what it can
do for us.
jsmeix commented at 2017-04-24 13:24:¶
@schlomo @gozora
I think the issue is not about that wipefs needs user input,
the issue is that mkfs stops and needs user input
because wipefs did not fully clean up the disk before.
jsmeix commented at 2017-04-24 13:29:¶
@deacomm FYI:
wipefs does not wipe whole existing filesystems.
All what wipefs does is to wipe some well known areas
of partitions (like /dev/sda2) or disks (like /dev/sda).
Depending on what condition the target disk is in
wipefs may fail to wipe this or that particular remainders.
jsmeix commented at 2017-04-24 13:31:¶
Regarding
"prompt is visible in the log file, but not on system's console"
see
https://github.com/rear/rear/issues/885
which is about the other way round.
jsmeix commented at 2017-04-24 13:42:¶
I know this mke2fs behaviour
Found a dos partition table in /dev/sda1 Proceed anyway? (y,n)
very well from my experiments with
"Generic system installation with the plain SUSE installation system"
cf.
https://en.opensuse.org/SDB:Disaster_Recovery
I do not remember well but I think something like
wipefs -a -f /dev/sda1 wipefs -a -f /dev/sda
helped.
@gozora
regarding whether or not wipefs supports '-f'
remember how we implemented things like 'mkfs -U' in
layout/prepare/GNU/Linux/130_include_filesystem_code.sh
just try the preferred modern way and if that fails fall back
to the traditional way:
wipefs -a -f /dev/sda1 || wipefs -a /dev/sda1 wipefs -a -f /dev/sda || wipefs -a /dev/sda
On my SLES12 system:
# man wipefs ... -f, --force Force erasure, even if the filesystem is mounted. This is required in order to erase a partition-table signature on a block device.
versus on my SLES11 system where "man wipefs"
does not tell about '-f, --force'.
deacomm commented at 2017-04-24 13:43:¶
@jsmeix Either wipefs has to reliably prepare ground for subsequent mkfs, or the script calling mkfs has to account for the possibility of wipefs not doing its thing, and act accordingly. As it is now, between wipefs and mkfs calls the system is in indeterminate state.
One partial workaround would be to print mkfs output to console, so that user at least knows why the recovery is stalled, and knows that they can just press 'y' to continue.
jsmeix commented at 2017-04-24 13:49:¶
Only FYI
here some excerpts from my newest generic installation script
(my quick and ditry hack to install a system with LVM)
where I use wipefs as planned
in
https://github.com/rear/rear/issues/799
plus proper waiting for block device nodes according
to
https://github.com/rear/rear/issues/791
# Partitioning: harddisk_disklabel="msdos" harddisk_devices="/dev/sda /dev/sdb /dev/sdc" partition1_begin_percentage="0" partition1_end_percentage="40" partition2_begin_percentage="$partition1_end_percentage" partition2_end_percentage="100" # LVM: lvm_PVs="/dev/sda1 /dev/sda2 /dev/sdb1 /dev/sdb2 /dev/sdc1 /dev/sdc2" lvm_VG="system" lvm_LV_swap_size="2g" lvm_LV_root_size="4g" lvm_LV_home_size="3g" # Filesystems: LV_swap_prepare_command="mkswap -f" LV_filesystem="ext4" LV_make_filesystem_command="mkfs.$LV_filesystem -F" ... # First of all clean up possibly already existing partitions: for partition in $lvm_PVs do test -b $partition && wipefs -a -f $partition || true done for harddisk_device in $harddisk_devices do test -b $harddisk_device && wipefs -a -f $harddisk_device || true done # Make partitions on the harddisk devices: for harddisk_device in $harddisk_devices do # Wait until the harddisk device node exists: until test -b $harddisk_device do echo "Waiting until $harddisk_device exists (retrying in 1 second)" sleep 1 done # Erase filesystem, raid or partition-table signatures (magic strings) to clean up a used disk before making filesystems: wipefs -a -f $harddisk_device # Create new disk label. The new disk label will have no partitions: parted -s $harddisk_device mklabel $harddisk_disklabel # Make partitions on a harddisk device: # Use hardcoded parted fs-type "ext2" as dummy for now regardless what filesystem will be actually created there later: parted -s --align=optimal $harddisk_device unit % mkpart primary ext2 $partition1_begin_percentage $partition1_end_percentage parted -s $harddisk_device set 1 lvm on type 0x8e parted -s --align=optimal $harddisk_device unit % mkpart primary ext2 $partition2_begin_percentage $partition2_end_percentage parted -s $harddisk_device set 2 lvm on type 0x8e done # Enable boot flag on /dev/sda1: parted -s /dev/sda set 1 boot on # Report what is actually set up by parted: for harddisk_device in $harddisk_devices do parted -s $harddisk_device unit GiB print done # Wait until all partition device nodes exist: for partition_device in $lvm_PVs do until test -b $partition_device do echo "Waiting until $partition_device exists (retrying in 1 second)" sleep 1 done done # LVM setup: # LVM metadata cannot be backed up (to /etc/lvm) because /etc/lvm is read-only in the SUSE installation system # and any backups that are stored inside the SUSE installation system are useless # so that automated metadata backup is disabled with '--autobackup n'. # Do not use lvmetad in the SUSE installation system to avoid many confusing warning messages from LVM tools. # Because /etc/lvm is read-only it is moved away and a writable copy is created: mv /etc/lvm /etc/lvm.orig mkdir /etc/lvm set +f cp -a /etc/lvm.orig/* /etc/lvm set -f grep 'use_lvmetad = 1' /etc/lvm/lvm.conf && sed -i -e 's/use_lvmetad = 1/use_lvmetad = 0/' /etc/lvm/lvm.conf # Initialize all partitions for use by LVM as PVs: pvcreate --verbose --yes -ff $lvm_PVs # Create a single volume group of all PVs (i.e. of all partitions): vgcreate --verbose --yes --autobackup n $lvm_VG $lvm_PVs # Create logical volumes in the existing volume group: lvcreate --verbose --yes --autobackup n --size $lvm_LV_swap_size --name swap $lvm_VG swap_LV_device="/dev/$lvm_VG/swap" lvcreate --verbose --yes --autobackup n --size $lvm_LV_root_size --name root $lvm_VG root_LV_device="/dev/$lvm_VG/root" lvcreate --verbose --yes --autobackup n --size $lvm_LV_home_size --name home $lvm_VG home_LV_device="/dev/$lvm_VG/home" # Set up swap: # Wait until the swap LV device node exists: until test -b $swap_LV_device do echo "Waiting until $swap_LV_device exists (retrying in 1 second)" sleep 1 done # Erase filesystem, raid or partition-table signatures (magic strings) to clean up a used disk before making filesystems: wipefs -a -f $swap_LV_device # Prepare swap LV: $LV_swap_prepare_command $swap_LV_device # Use the swap LV: swapon --fixpgsz $swap_LV_device # Set up root filesystem: # Wait until the root LV device node exists: until test -b $root_LV_device do echo "Waiting until $root_LV_device exists (retrying in 1 second)" sleep 1 done # Erase filesystem, raid or partition-table signatures (magic strings) to clean up a used disk before making filesystems: wipefs -a -f $root_LV_device # Make root LV filesystem: $LV_make_filesystem_command $root_LV_device # Set up home filesystem: # Wait until the home LV device node exists: until test -b $home_LV_device do echo "Waiting until $home_LV_device exists (retrying in 1 second)" sleep 1 done # Erase filesystem, raid or partition-table signatures (magic strings) to clean up a used disk before making filesystems: wipefs -a -f $home_LV_device # Make home LV filesystem: $LV_make_filesystem_command $home_LV_device # Probably useless but to be on the safe side wait until udev has finished: udevadm settle --timeout=20
jsmeix commented at 2017-04-24 13:56:¶
I fixed my
https://github.com/rear/rear/issues/1327#issuecomment-296672094
On an already used disk with partitions on it
one must first 'wipefs' each partition block device
and finally one can 'wipefs' the whole disk, cf.
https://github.com/rear/rear/issues/799#issue-141001306
The other way round after 'wipefs' the whole disk
there are probably no longer any partition block devices
so that one cannot 'wipefs' them which may still leave
this or that unwanted remainders on the disk that may
re-appear after same partitions had been recreated.
jsmeix commented at 2017-04-24 14:03:¶
@deacomm
I think on SLE12 the easiest way is
when you just adapt wipefs_command
or perhaps even simply run
yes | rear ...
gdha commented at 2017-04-24 14:26:¶
@jsmeix is it ok we assign this to you as SME?
jsmeix commented at 2017-04-24 14:43:¶
One cannot run
yes | rear recover
because the recovery system does not have 'yes'
but one can run
echo -e 'y\ny\ny\ny\ny\ny\n' | rear recover
;-)
jsmeix commented at 2017-04-24 14:50:¶
@deacomm
when you replace in
usr/share/rear/layout/prepare/GNU/Linux/130_include_filesystem_code.sh
wipefs_command="wipefs -a $device"
with
wipefs_command="wipefs -a -f $device || wipefs -a $device"
does it then also work for you?
(For me it works on SLE12.)
schlomo commented at 2017-04-24 18:06:¶
Careful. Trying one and then the other might also fail for other reasons. Then better auto-detect, e.g. like this:
wipefs_command="wipefs --all $(wipefs --help 2>&1 | grep --quiet force && echo --force) $device"
(I prefer to spell out arguments for better readability)
jsmeix commented at 2017-04-25 09:51:¶
@schlomo
I do not understand what could go wrong with things like
try_something || try_something_else || do_fallback
Could you explain and provide an example
why this kind of code is bad?
Of course in
try_something || try_something_else || do_fallback
each one could fail for its own specific and different reasons
but compared to what we do currently which is only plain
try_something
without any failure or error handling
try_something || try_something_else || do_fallback
should be an improvement.
deacomm commented at 2017-04-25 11:38:¶
Schlomo's wipefs_command works, wipefs is always called with --all --force, and the recovery process finishes.
Thinking about it, I prefer the simplicity of try_something || try_something else, especially since you can't tell what wipefs --help text will say in the future. Naively checking for presence of string "force" might not be the safest, although it is very improbable to cause issues in this particular case.
It's a matter of taste, really.
schlomo commented at 2017-04-25 12:28:¶
@jsmeix:
command1 || command1 || failed
will always fall through tocommand2
if anything goes wrong withcommand1
. You won't know why it failed and the log file will be spammed with error messages that we consciously ignore. We cannot hide the STDERR of the first wipe because we might need its errors in case it fails for another reason (not the missing support for force).- How long takes
command1
? If the first variant takes a long time to fail then the user has to wait a long time for nothing. Of course in this case ofwipefs
this is less likely. - Being more explicit is always better for people who read the code
later. Failing the
wipefs
command because of missing parameter support and then trying without the problematic parameter means that we don't explicitly test for the--force
support but implicitly fail if it is not there (or anything else goes wrong).
@deacomm IMHO checking the help output if --force
is mentioned is a
very safe interface to decide that we can use the --force
option. The
only case where this would fail if the developers for wipefs
remove
that option and leave a text like "--force is no longer supported". Also
that I find extremely unlikely for wipefs
BTW, several years ago we had a similar discussion in relation to the
different syslinux
versions found on different distros and there in
the end @dagwieers introduced the concept of feature flags (see
set_syslinux_features()).
The rationale there was also to favor explicit checks over implicit
failing.
jsmeix commented at 2017-04-25 13:01:¶
Anyone who likes may do an appropriate GitHub pull request.
jsmeix commented at 2017-04-26 08:10:¶
I hit the wrong button. This issue should stay open
until an appropriate GitHub pull request was merged.
jsmeix commented at 2017-04-28 10:26:¶
@schlomo because of your
https://github.com/rear/rear/issues/700#issuecomment-297960017
I like to mention that
wipefs_command="wipefs --all $(wipefs --help 2>&1 | grep --quiet force && echo --force) $device"
would let ReaR fail on SLE11 if 'set -eu' was set in
layout/prepare/GNU/Linux/130_include_filesystem_code.sh
What happens on my SLE11 system
(where wipefs does not support '--force')
# ( set -e ; device=/dev/sdXn ; wipefs_command="wipefs --all $(wipefs --help 2>&1 | grep --quiet force && echo --force) $device" ; echo $wipefs_command ) [no output]
This is because here 'grep' results non-zero exit code.
An additional side-note
wipefs_command="wipefs --all $(wipefs --help 2>&1 | grep --quiet force && echo --force) $device"
would call 'wipefs --all --force' when any kind of 'force' (sub)string
appears in 'wipefs --help' (not necessarily the word '--force').
jsmeix commented at 2017-04-28 14:10:¶
In
https://github.com/rear/rear/pull/1336
I implemented basically my above proposal in
https://github.com/rear/rear/issues/1327#issuecomment-296672094
and additionally now dd is used as generic fallback in any case,
see my comments in the new code.
It seems to work o.k. for me on SLE12 with and without wipefs.
Further enhancements welcome from whoever likes
to maintain his enhancements also in the future ;-)
jsmeix commented at 2017-04-28 14:45:¶
With
https://github.com/rear/rear/pull/1336
merged
this issue should no longer happen in practice
regardless that
https://github.com/rear/rear/pull/1336
is not yet the ultimate solution, cf.
https://github.com/rear/rear/pull/1336#issuecomment-298011438
jsmeix commented at 2017-05-04 08:15:¶
With
https://github.com/rear/rear/pull/1341
merged
the wipefs code is now cleaned up as described in
https://github.com/rear/rear/pull/1336#issuecomment-298016104
It is still not yet the ultimate solution, i.e.
https://github.com/rear/rear/pull/1336#issuecomment-298011438
still applies.
schlomo commented at 2017-06-19 09:05:¶
I just realized that the title was somewhat misleading. We started from
mkfs
needing a confirmation and ended up with a better implementation
of wipefs
which removes most mkfs
confirmation requests.
However, mkfs
will still ask for confirmation on whole disks without
partition tables and this fix does not solve that use case.
Do you see any problem with adding the appropriate force option to
the various mkfs
calls?
jsmeix commented at 2017-06-19 11:28:¶
In general I think that "rear recover" should enforce
to recreate filesystems because when the admin
runs "rear recover" he knows and wants to get
all existing stuff on that machine replaced.
FYI some details:
Currently we use '-f' in case of xfs and btrfs cf.
layout/prepare/GNU/Linux/130_include_filesystem_code.sh
But as far as I see we do not use '-f' for reiserfs.
For ext2/ext3/ext4 we do not use '-F' as far as I see.
For vfat it seems there is no force option
according to "man mkfs.vfat" (except '-I' which also
seems to be for whole disks without partition table).
For the fallback case we cannot use a force option
because various mkfs.filesystem commands use
different option characters (like '-f' versus 'F')
or do not provide a force option.
deacomm commented at 2017-06-26 10:09:¶
FWIW, the title was correct. The interactive prompt came from mkfs. Wipefs was involved because it failed to wipe the previous filesystem signature before the mkfs call. I'm not sure whether wipefs even has an interactive prompt of any kind.
This is just splitting hair, of course. :)
jsmeix commented at 2017-06-26 11:29:¶
@deacomm
thanks for mentioning it.
I think it is not just splitting hair to have a correct title
because otherwise things are misleading and might
get falsely fixed.
Actually this issue is two issues:
The one that I noticed matches your
https://github.com/rear/rear/issues/1327#issuecomment-311018185
but that alone is not sufficient as @schlomo found out in his
https://github.com/rear/rear/issues/1327#issuecomment-309381866
so that the full fix for issues like that is:
Better clean up disks to avoid most mkfs confirmation requests
(that was what I did)
plus
call mkfs with a 'force' option as far as possible for each
particular filesystem (what @schlomo does).
[Export of Github issue for rear/rear.]