#540 Issue closed: Implement a generic "cleanupdisk" function.

Labels: enhancement, bug, fixed / solved / done

jsmeix opened issue at 2015-01-27 12:55:

I think rear needs a generic "cleanupdisk" function that basically makes an already used harddisk behave as if it was a new harddisk.

When recovery is done on an already used harddisk, the still existing old data on the disk cause varius kind of unexpected weird failures where each kind is difficult to reproduce (because it depends on what exact old data there is).

For some example see https://github.com/rear/rear/issues/533 (therein the issue that on RHEL6 on a used disk mdadm interferes with parted) and https://github.com/rear/rear/issues/415 ("mkfs -t btrfs" needs option "-f" to enfore making a btrfs on a disk when there is already a btrfs).

Implementing it might become complicated in practice. (I have new harddisks in mind that might have ex factory a somewhat hidden special partition which must not be wiped or something like this - isn't there something like this for UEFI boot, see http://en.wikipedia.org/wiki/EFI_System_partition ).

A starting point could be "wipefs" a tool that wipes filesystem signatures from a device, see
http://karelzak.blogspot.de/2009/11/wipefs8.html

Regardless how complicated it is in practice, one dedicated function that cleans up the disk before anything else is done makes it much cleaner how rear works (instead of various workarounds here and there as needed).

tbsky commented at 2015-01-28 03:08:

@jsmeix

in my opinion, it is much easier to use specific right tools to do the job. it is very hard to have a generic cleanupdisk function. it is because linux support so many different storage objects. and these objects can be dependent to each other. if the basement object is missed, then you can not reach upper object easily without scanning the whole disk. take one machine in my environment for example:

it has two hard disk: sda,sdb
partition is: sda1,sda2,sdb1,sdb2
make software raid above partition: md0 (sda1+sdb1), md1 (sda2+sdb2)
make lvm above software raid: rootvg-> my-lv1,my-lv2 (above md1)
make drbd above lvm: drbd1 (above my-lv1)
make file system above drbd: xfs (above drbd1)

each storage object has it's own metadata structure, at beginning/end/middle(lvm lv) of the object.
if the harddisk partition table is corrupted, all the upper layer information is missed, so the cleanup disk tools can not reach these layers to do clean job.

and there are several metadata revisions of each storage object.it is hard for a general tool to track all the changes. but the job is easy for specific right tools (mdadm,lvcreate,mkfs.xfs...). and all these tools has parameters or workarround to overwrite existing metadata.

so maybe we just need to make sure the specific tools to deal with the storage object can overwrite existing data correctly, then we are safe.

or we can full erase all the disks. but as you said, it will spend too many hours/days to do the job.

jsmeix commented at 2015-01-28 10:24:

@tbsky,
again many thanks for your valuable descriptive information!

For me it is perfectly o.k. to have for each kind of storage object a separated generic cleanup function.

In particluar because I do prefer very much to Keep Separated Stuff Separated ( "KSSS" ;-) cf. what I wrote "on 11 Dec 2014" in https://github.com/rear/rear/issues/497

tbsky commented at 2015-01-28 10:38:

@jsmeix

I know I will enjoy by your hard work when RHEL officially support btrfs someday. I hope it can be there this year :)

jsmeix commented at 2015-01-28 10:49:

@tbsky

I forgot to reply to your "maybe we just need to make sure the specific tools to deal with the storage object can overwrite existing data correctly":

From what I learned during https://github.com/rear/rear/issues/533 it is not possible that the specific tools overwrite existing data because when the specific tool runs it can be already too late.

In https://github.com/rear/rear/issues/533 it failed for RHEL6 at the partitioning level because of old data of the MD level so that before partitioning the MD tool would have to be run to clean up old MD data.

This is exactly the reason why I think we need a generic way how to clean up old data.

Perhaps your information may lead to the conclusion that such a generic way is not possible.

This would be also a perfectly valid result because then no installer can reliably install on an already used disk with arbitrary old data on it.

In this case the solution is to document this and then it is left to the user to make sure in advance that his disks are sufficiently clean and if anything goes wrong because of old data on the disk it is no longer an issue for an installer (in particular rear).

jsmeix commented at 2015-01-28 10:56:

Only for fun:

I predict when RHEL officially supports btrfs, they will devise another special way how one can set up btrfs where my current implementation fails ;-)

@tbsky
I rely on you to report issues with btrfs on RHEL early!

jsmeix commented at 2015-01-28 11:34:

Regarding clean up DRBD I found in
https://www.suse.com/documentation/sle_ha/book_sleha/data/sec_ha_drbd_overview.html

DRBD uses the last 128 MB of the raw device for metadata

This means what @schlomo wrote in https://github.com/rear/rear/issues/533 "maybe just delete the first and last couple of MB on each previously existing partition?" does not work - specifically I mean "just a couple of MB" is not sufficient (if 128 MB is not "just a couple of MB").

schlomo commented at 2015-01-28 12:14:

I update my suggestion: Wipe the first 256MB, the last 256MB and the
middle 256 MB of each disk / device :-)

On 28 January 2015 at 12:34, Johannes Meixner notifications@github.com
wrote:

Regarding clean up DRBD I found in

https://www.suse.com/documentation/sle_ha/book_sleha/data/sec_ha_drbd_overview.html

DRBD uses the last 128 MB of the raw device for metadata

This means what @schlomo https://github.com/schlomo wrote in #533
https://github.com/rear/rear/issues/533 "maybe just delete the first
and last couple of MB on each previously existing partition?" does not work

  • specifically I mean "just a couple of MB" is not sufficient (if 128 MB is
    not "just a couple of MB").


Reply to this email directly or view it on GitHub
https://github.com/rear/rear/issues/540#issuecomment-71819883.

tbsky commented at 2015-01-28 13:31:

@jsmeix
the suse drbd document seems not quite correct. drbd with fixed external metadata has limit with 128MB. with internal metadata it may over 128MB. please check below if you are interested :)
http://lists.linbit.com/pipermail/drbd-user/2008-June/009628.html

the external metadata may make the clean-up more crazy. but I think they still fit schlomo's 256MB plan :-D

jsmeix commented at 2015-11-06 12:50:

Regarding "wipefs" see also https://github.com/rear/rear/issues/649

In particular therein see
https://github.com/rear/rear/issues/649#issuecomment-148725865
how wipefs could be used by deault in rear - regardless that wipefs
did not help in that particular case it could be nevertheless used
by default in rear to avoid possible issues.

gdha commented at 2015-11-17 13:56:

@jsmeix wipefs function could be useful indeed. However, we should foresee a fall-back when wipefs is not available, e.g. use dd instead.

jsmeix commented at 2015-11-19 13:39:

As a first step I implemented in https://github.com/rear/rear/pull/704
using wipefs when available

Currently "when available" means that on has to manually add it
to the rear recovery system in in /etc/rear/local.conf via

REQUIRED_PROGS=( "${REQUIRED_PROGS[@]}" wipefs )

cf. https://github.com/rear/rear/issues/649#issuecomment-148725865

Making wipefs automatically available to the rear recovery system
when it is available in the original system is the next step that I will
implement.

Using another program (e.g. dd) as fallback if wipefs is not available
is something for the future.

jsmeix commented at 2015-12-02 15:54:

With https://github.com/rear/rear/pull/728 I think this issue is sufficiently fixed.

Using another program (e.g. dd) as fallback if wipefs is not available
is something for the future - perhaps best via a separated follow-up issue - provided such a fallback is really needed on nowadays systems: wipefs is available on SLE11 and SLE12 (wipefs was added on Apr 13 2011 in the SUSE's util-linux RPM (therefore wipefs is not available on SLE10-SP4 where the util-linux RPM changelog ends on Feb 03 2011) and I guess wipefs is also available on recent Red Hat systems.

jsmeix commented at 2015-12-03 09:47:

With #728 this issue is sufficiently fixed.

jsmeix commented at 2015-12-09 13:48:

Right now (during https://github.com/rear/rear/issues/732) I detected that in some cases one must use

wipefs -a -f /dev/sdXn

because without '-f' (--force) wipefs will not erase a partition table
on a block device that is a partition (e.g. /dev/sda1).

I had the strange case that wipefs detected a DOS partition table
on the partition /dev/sda2 (note that it is not the whole disk /dev/sda
where a partition table must exist - but here it is a partition table
at the beginning of a partition).

With the DOS partition table at the beginning of /dev/sda2
the subsequent "mkfs -t ext4 /dev/sda2" stopped with
a yes/no question whether or not to proceed.

Probably it is even a bug when "rear recover" hangs
because of such an issue?

Perhaps an alternative to "wipefs -a -f /dev/sdXn" is

mkfs.ext4 -F /dev/sdXn

I will think a bit about it what is better.

Currently I perefer the generic way via "wipefs -a -f..."
over adding specific "force" options to each mkfs call.

I only need to ensure to never do "wipefs -a -f /dev/sda"
because that would erase the partition table on the disk
that was just before created by parted (see diskrestore.sh).

tbsky commented at 2015-12-09 23:51:

hi jsmeix:

I don't use ext4 so I am surprised about your finding. as I said I don't think wipefs can detect/clean complicated storage object (like stacking of mdadm/drbd/lvm), so the correct behavior of specific storage tools is the last rescue. all the tools I need in rear now has correct behavior like below:

mkfs.xfs -f
mdadm --create --force
lvm lvcreate <<<y
drbdadm -- --force create-md

so I think no matter what wipefs can do, "mkfs.ext4 -F" is must to be last rescue.

tbsky commented at 2015-12-10 03:06:

@jsmeix

it's maybe off topic. as you mentioned, the original problem wipefs want to solve is like issue #533.
the root cause of these issues is the "automatic behavior" of linux environment. if linux didn't have these "automatic behavior" rear would be much more easier to do its job.

in my case I found two kinds of "automatic behavior":

the first is from boot module loading, which cause issues like #480 and #626. we can blacklist useless modules, or maybe whitelist useful modules after booting. I don't know if there are general ways to do this for every distribution. maybe hide the modules at the beginning and unhide them after booting?

the second comes from udev event, which cause issues like #518. at first I think solve the issue via udevadm is a great idea. but unfortunately SuSE also try to solve the same problem so the two patches conflict hence issue #533.

I don't know if hide/unhide the modules will help stopping these kind of "automatic behavior". but as there will more storage modules in the future, if we can find a general way to handle the "automatic behavior" then rear will be much easier to re-create these storage object.

jsmeix commented at 2016-03-15 15:20:

I close this one because it is "somewhat done" for version 1.18 and probably it will become "mostly obsoleted" by https://github.com/rear/rear/issues/799 which might even (hopefully) really fix those issues.


[Export of Github issue for rear/rear.]