#1681 Issue closed: Make ReaR more fail-safe in case of sparse partition schemes

Labels: enhancement, documentation, fixed / solved / done

jsmeix opened issue at 2018-01-05 16:59:

See the "problem with sparse partition-names" mail thread
on the rear-users@lists.relax-and-recover.org mailing list
in particular
http://lists.relax-and-recover.org/pipermail/rear-users/2018-January/003506.html

Summary:

In case of non-consecutive partition device nodes for a disk
(e.g. /dev/sda1 and /dev/sda3 exist but there is no /dev/sda2)
"rear mkrescue/mkbackup" produces a right disklayout.conf
but "rear recover" recreates something different
(i.e. /dev/sda1 and /dev/sda2 but no /dev/sda3)
and that causes various issues.

The workaround for the user is to manually add
on his original system placeholder partitions
(e.g. a tiny /dev/sda2 placeholder partition)
to ensure consecutive partition device nodes for the disk.

There should be at least tests in ReaR that warn or error out
in case of non-consecutive partition device nodes for a disk.

In case of traditional MBR partitioning with an extended partition
that contains logical partitions like

/dev/sda1 is a boot partition
/dev/sda2 is an extended partition
/dev/sda5 is the first logical partition
/dev/sda6 is the second logical partition

the "missing" /dev/sda3 and /dev/sda4 are not an issue
but things like

/dev/sda1 is a boot partition
/dev/sda3 is an extended partition
/dev/sda5 is the first logical partition
/dev/sda6 is the second logical partition

(/dev/sda2 is missing)
or

/dev/sda1 is a boot partition
/dev/sda2 is an extended partition
/dev/sda5 is the first logical partition
/dev/sda7 is the second logical partition

(/dev/sda6 is missing)
would be an issue.

Details:

In case of non-consecutive partition device nodes for a disk
i.e. when partition table entries are empty on the disk
for example when in case of traditional MBR partitioning
the two first partition table entries are empty
so that there is no /dev/sda1 or /dev/sda2 like this

# fdisk -l
Disk /dev/sda: 20 GiB, 21474836480 bytes, 41943040 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: dos
Disk identifier: 0x00050095
Device     Boot    Start      End  Sectors Size Id Type
/dev/sda3  *     6299664 12597263  6297600   3G 83 Linux
/dev/sda4       12597272 41943039 29345768  14G  f W95 Ext'd (LBA)
/dev/sda5       12597280 37769247 25171968  12G 83 Linux
/dev/sda6       37769256 41941031  4171776   2G 83 Linux

# mount | grep sda
/dev/sda5 on / type ext4 (rw,relatime,data=ordered)
/dev/sda3 on /boot type ext4 (rw,relatime,data=ordered)

"rear mkrescue" produces correct entries in disklayout.conf like

# Disk /dev/sda
# Format: disk   
disk /dev/sda 21474836480 msdos
# Partitions on /dev/sda
# Format: part      /dev/
part /dev/sda 3224371200 3225427968 primary boot /dev/sda3
part /dev/sda 1024 6449803264 extended lba /dev/sda4
part /dev/sda 12888047616 6449807360 logical none /dev/sda5
part /dev/sda 2135949312 19337859072 logical none /dev/sda6
# Filesystems (only ext2,ext3,ext4,vfat,xfs,reiserfs,btrfs are supported).
# Format: fs    [uuid=] [label=

but during "rear recover" it fails because diskrestore.sh
looks like this (excerpts)

parted -s /dev/sda mklabel msdos >&2
parted -s /dev/sda mkpart 'primary' 2097152B 3226468351B >&2
parted -s /dev/sda set 3 boot on >&2
parted -s /dev/sda mkpart 'extended' 3226472448B 21474836479B >&2
parted -s /dev/sda set 4 lba on >&2
parted -s /dev/sda mkpart 'logical' 3226476544B 16114524159B >&2
parted -s /dev/sda mkpart 'logical' 16114528256B 18250477567B >&2

which fails at "parted -s /dev/sda set 3 boot on" with

+++ parted -s /dev/sda set 3 boot on
Error: Partition doesn't exist.

because there is no partition 3 because what was
the boot partition /dev/sda3 on the original system
gets recreated as first primary partition /dev/sda1
and the extended partition /dev/sda4 on the original system
will be recreated as second partition /dev/sda2.

Because the disklayout.conf saves the right partition device nodes
of the original system it should be possible (with reasonable effort)
to use this for tests preferably during "rear mkrescue/mkbackup"
and/or at least during "rear recover" when partition device nodes
that get or will be created on the replacement hardware
will or do not match the ones on the original system.

An addedum:
As far as I see the current MIGRATION_MODE in ReaR
only has some basic support to migrate whole disks
but currently there is no support in ReaR to migrate
partitions on a disk as it would be needed here
but migrating partitions is not considered here.
Migrating partitions on a disk could become a future
enhancement - provided there is actual need for it.

jsmeix commented at 2018-01-05 17:15:

My offhanded idea is to add a new
layout/save/default/950_verify_disklayout_conf.sh
(same idea behind as for build/default/980_verify_rootfs.sh)
that tests if there are non-consecutive partition device nodes
for a disk in the disklayout.conf file.

jsmeix commented at 2018-05-03 13:21:

Not doable in the currently planned time until the ReaR 2.4 release,
cf. https://github.com/rear/rear/issues/1790#issue-319663141
so that I postpone it for the next ReaR 2.5 release.

GreenBlood commented at 2018-10-29 15:17:

That would be a drastic change, but maybe using disk UUIDs in disklayout.conf instead of names, bypassing completely the devices names. The resulting partition scheme would not be 100% equal but it would be simpler for the recovery process imho

I'm not familiar with rear workflow so I may be heading in the wrong direction though.

jsmeix commented at 2018-10-29 15:43:

@GreenBlood
can you explain in more detail how you think using disk identifiers
(instead of plain kernel device nodes) could help?

A side note:
In general what is called a UUID in this area is a filesystem UUID
but on empty disks on replacement hardware there are none.

I think on new replacement hardware whatever hardware disk identifiers
would usually not match what there was on the original system, cf.
https://github.com/rear/rear/issues/1057#issuecomment-258832970

Furthermore I think that any kind of other naming for disk devices
would not help because in the end there is a sequence of parted calls
that create all partitions on the replacement disk from scratch
where the first parted call is parted mklabel which creates
a new partition table from scratch and then all partitions are
created one by one via subsequent parted mkpart calls,
see what during rear recover a 'diskrestore.sh' contains
or my initial description above which contains excerpts
of a 'diskrestore.sh' on my test system.

I think the root cause is that parted mkpart has no support
to specify the intended partition number - at least I do not see
how one could do that in the parted manual at
https://www.gnu.org/software/parted/manual/parted.html
except by creating dummy placeholder partitions in between
that get later removed via parted rm as workaround.

jsmeix commented at 2019-02-22 15:18:

A disklayout.conf syntax check as proposed in
https://github.com/rear/rear/issues/2006#issuecomment-460646685
is much related to other checks as the above proposed
https://github.com/rear/rear/issues/1681#issuecomment-355611173
so that all such checks should be in one same script.

jsmeix commented at 2019-02-22 15:20:

Not doable in the currently planned time until the ReaR 2.5 release
so that I postpone it for the next ReaR 2.6 release, cf.
https://github.com/rear/rear/milestones

jsmeix commented at 2019-02-27 16:55:

I should really implement something for the next ReaR 2.6 release,
see the mail thread with subject

ReaR 2.4/RHEL7u3 x86_64 disk layout recreation script failed
Error: Partition doesn't exist (smart array RAID0)

on the rear-users mailing list at
http://lists.relax-and-recover.org/pipermail/rear-users/2019-February/thread.html
in particular see
http://lists.relax-and-recover.org/pipermail/rear-users/2019-February/003665.html
where "rear recover" fails when there is no /dev/sda1 on the original system like:

       Start       End   Size Type            Name
sda2  526336   1050623   256M EFI System      EFI System Partition
sda3 1050624   2099199   512M Microsoft basic
sda4 2099200 585871359 278.4G Linux LVM

jsmeix commented at 2019-03-05 10:53:

With https://github.com/rear/rear/pull/2060 merged
I consider this issue to be done.

The new verify script layout/save/default/950_verify_disklayout_file.sh
tests disklayout.conf that was created by "rear mkrescue/mkbackup"
for non-consecutive partitions and lets "rear mkrescue/mkbackup"
error out if non-consecutive partitions are found because currently
"rear recover" fails when there are non-consecutive partitions and
it is better to errors out early during "rear mkrescue/mkbackup"
when things cannot work during "rear recover", cf.
https://github.com/rear/rear/wiki/Developers-Guide

jsmeix commented at 2019-03-07 13:17:

It is now documented in the

New features, bigger enhancements, and possibly backward incompatible changes

section of the ReaR 2.5 release notes
https://github.com/rear/rear.github.com/blob/master/documentation/release-notes-2-5.md


[Export of Github issue for rear/rear.]