#1731 Issue closed: Possibly destructive 400_autoresize_disks.sh must be completely overhauled

Labels: enhancement, bug, cleanup, fixed / solved / done

jsmeix opened issue at 2018-02-14 12:08:

The 400_autoresize_disks.sh script results possibly destructive data
and therefore that script needs to be completely overhauled.

Related issues are
https://github.com/rear/rear/issues/102
see in particular
https://github.com/rear/rear/issues/102#issuecomment-365243334
and
https://github.com/rear/rear/issues/1718

Details:

On my two QEMU/KVM virtual machines I added a sdb for the following test
whether or not 400_autoresize_disks.sh works fail safe - and it does not.

On the original system:

# parted -s /dev/sdb unit MiB print
Model: ATA QEMU HARDDISK (scsi)
Disk /dev/sdb: 2048MiB
Sector size (logical/physical): 512B/512B
Partition Table: gpt
Number  Start    End      Size    File system     Name   Flags
 1      8.00MiB  808MiB   800MiB  ext2            data1
 2      808MiB   1208MiB  400MiB  linux-swap(v1)  swap1
 3      1208MiB  1408MiB  200MiB  ext2            data2
 4      1408MiB  2008MiB  600MiB  linux-swap(v1)  swap2

# parted -s /dev/sdb unit B print
Model: ATA QEMU HARDDISK (scsi)
Disk /dev/sdb: 2147483648B
Sector size (logical/physical): 512B/512B
Partition Table: gpt
Number  Start        End          Size        File system     Name   Flags
 1      8388608B     847249407B   838860800B  ext2            data1
 2      847249408B   1266679807B  419430400B  linux-swap(v1)  swap1
 3      1266679808B  1476395007B  209715200B  ext2            data2
 4      1476395008B  2105540607B  629145600B  linux-swap(v1)  swap2

# usr/sbin/rear -D mkrescue
...

# cat var/lib/rear/layout/disklayout.conf
...
# Disk /dev/sdb
# Format: disk <devname> <size(bytes)> <partition label type>
#disk /dev/sdb 2147483648 gpt
# Partitions on /dev/sdb
# Format: part <device> <partition size(bytes)> <partition start(bytes)> <partition type|name> <flags> /dev/<partition>
#part /dev/sdb 838860800 8388608 data1 none /dev/sdb1
#part /dev/sdb 419430400 847249408 swap1 none /dev/sdb2
#part /dev/sdb 209715200 1266679808 data2 none /dev/sdb3
#part /dev/sdb 629145600 1476395008 swap2 none /dev/sdb4
...

On the replacement system (same kind of virtual machine)
I use a sdb with double size (i.e. now 4 GiB).

In the running ReaR recovery system before "rear recover"
I edited disklayout.conf and enabled the sdb disk and its partitions
and added testing swap entries for my 'swap1' and 'swap2'
to see what 400_autoresize_disks.sh will make of it:

...
# Disk /dev/sdb
# Format: disk <devname> <size(bytes)> <partition label type>
disk /dev/sdb 2147483648 gpt
# Partitions on /dev/sdb
# Format: part <device> <partition size(bytes)> <partition start(bytes)> <partition type|name> <flags> /dev/<partition>
part /dev/sdb 838860800 8388608 data1 none /dev/sdb1
part /dev/sdb 419430400 847249408 swap1 none /dev/sdb2
part /dev/sdb 209715200 1266679808 data2 none /dev/sdb3
part /dev/sdb 629145600 1476395008 swap2 none /dev/sdb4
...
# Swap partitions or swap files
# Format: swap <filename> uuid=<uuid> label=<label>
...
swap /dev/sdb2 uuid=12345 label=swap1
swap /dev/sdb4 uuid=23456 label=swap2

This is what 400_autoresize_disks.sh made of the sdb partitions:

# Partitions on /dev/sdb
# Format: part <device> <partition size(bytes)> <partition start(bytes)> <partition type|name> <flags> /dev/<partition>
part /dev/sdb 2597113036 8388608 data1 none /dev/sdb1
part /dev/sdb 419430400 847249408 swap1 none /dev/sdb2
part /dev/sdb 649278259 1266679808 data2 none /dev/sdb3
part /dev/sdb 629145600 1476395008 swap2 none /dev/sdb4

There are two kind of possibly destructive false values now:

1.)
Because swap partition entries are not changed but others are enlarged
now the enlagred data1 partition overlaps with the subsequent swap1 partition.
The enlagred data1 partition ends at byte 2597113036 + 8388608 = 2605501644
and the unchanged swap1 partition starts at byte 847249408
but 847249408 - 2605501644 = -1758252236
i.e. the new enlagred data1 partition overlaps with the unchanged swap1 partition
by 1758252236 bytes (which is exactly the amount of enlagred bytes).

2.)
Because the beginning of the other enlarged data2 partition is not changed
(only its size is changed) now also the data2 partition overlaps with the
enlarged data1 partition because the data1 partition ends at byte 2605501644
and the data2 partition still starts at byte 1266679808
but 1266679808 - 2605501644 = -1338821836
i.e. the new enlagred data1 partition overlaps with the new enlagred data2 partition
by 1338821836 bytes.

Fortunately subsequent scripts that generate diskrestore.sh somehow fix that
so that the actual parted calls in diskrestore.sh do not have overlapping
partition data

parted -s /dev/sdb mklabel gpt >&2
parted -s /dev/sdb mkpart "'data1'" 2097152B 2599210187B >&2
parted -s /dev/sdb mkpart "'swap1'" 2599211008B 3018641407B >&2
parted -s /dev/sdb mkpart "'data2'" 3018645504B 3667923762B >&2
parted -s /dev/sdb mkpart "'swap2'" 3667927040B 4294950399B >&2

so that the actual partitioning result is not wrong

# parted -s /dev/sdb unit B print
Model: ATA QEMU HARDDISK (scsi)
Disk /dev/sdb: 4294967296B
Sector size (logical/physical): 512B/512B
Partition Table: gpt
Number  Start        End          Size         File system  Name   Flags
 1      2097152B     2599210495B  2597113344B               data1
 2      2599211008B  3018641407B  419430400B                swap1
 3      3018645504B  3667923967B  649278464B                data2
 4      3667927040B  4294950399B  627023360B                swap2

# parted -s /dev/sdb unit MiB print
Model: ATA QEMU HARDDISK (scsi)
Disk /dev/sdb: 4096MiB
Sector size (logical/physical): 512B/512B
Partition Table: gpt
Number  Start    End      Size     File system  Name   Flags
 1      2.00MiB  2479MiB  2477MiB               data1
 2      2479MiB  2879MiB  400MiB                swap1
 3      2879MiB  3498MiB  619MiB                data2
 4      3498MiB  4096MiB  598MiB                swap2

But it is more or less luck or an accident what the actual partitioning result is
because the actual partitioning result is not what is in disklayout.conf.

In particular the swap2 partition was shrinked from 600MiB to 598MiB
(not much in practice but "plain wrong" from a logical point of view)
regardless that the new disk has double size.

Or in other words - simply put:
The 400_autoresize_disks.sh script can result plain wrong data in disklayout.conf.

gdha commented at 2018-02-15 12:41:

@jsmeix Personally I think bug is not completely correct in this case. It does not break ReaR, therefore, I propose to change the label from bug to minor bug. Albeit, a difficult nut to crack IMHO to get it right.
It is the hardest piece of code to write (within ReaR).

jsmeix commented at 2018-02-15 15:25:

@gdha
personally I think when "rear recover" may result partitioning
that is worse than what it was on the original system
it can be considered as a real bug in ReaR even if the
recreated system also works with the worse partitioning, cf.
https://github.com/rear/rear/issues/102#issuecomment-150703886

I am currently working on it and I think meanwhile I have something
that ensures the recreated partitioning will not be worse.

If things go sufficiently well for me I will do a pull request tomorrow
so that you can have a look...

jsmeix commented at 2018-02-16 13:35:

@gdha
I did an "submit early" pull request https://github.com/rear/rear/pull/1733
so that you (and others) can have an early look what I am doing...

jsmeix commented at 2018-02-23 14:08:

There is a severe bug in current ReaR master code
at least regarding the size of an extended partition:
https://github.com/rear/rear/pull/1733#issuecomment-367680598

Now it seems at least in migration mode also
layout/prepare/GNU/Linux/100_include_partition_code.sh
works "possibly destructive" because it can result an
arbitrarily changed partitioning compared to what the user
had confirmed in disklayout.conf that should be set up.

From my point of view it does not actually mitigate the bug
that there is another subsequent user confirmation dialog
where the user has to also confirm the diskrestore.sh script.

If what the user confirmed in disklayout.conf that should be set up
cannot be set up (e.g. because the disk is too small for it)
layout/prepare/GNU/Linux/100_include_partition_code.sh
has to error out (at least in migration mode) so that the user
can adapt disklayout.conf but 100_include_partition_code.sh
must not arbitrarily change the user's confirmed partitioning
to whatever that script "thinks" is best.

jsmeix commented at 2018-03-01 13:44:

With https://github.com/rear/rear/pull/1733 merged
this issue should be (hopefully) fixed, cf.
https://github.com/rear/rear/pull/1733#issuecomment-369514406

jsmeix commented at 2018-03-08 10:17:

Only an addedum FYI:
https://github.com/rear/rear/issues/1718
proves what I suspected above in
https://github.com/rear/rear/issues/1731#issue-297072881
that it is more or less luck or an accident when the actual
partitioning result is not wrong because in case of
https://github.com/rear/rear/issues/1718
things actually went wrong.


[Export of Github issue for rear/rear.]