#102 Issue closed: Investigate "not properly aligned for best performance"

Labels: enhancement, bug, fixed / solved / done

dagwieers opened issue at 2012-06-07 09:55:

When doing a rear restore parted complains about the fact that partitions are not properly aligned. This might have a performance impact on some hardware and therefor we need to make sure if parted is correct in its claims and whether we should improve our logic (in those cases where that's possible depending on parted version).

In my case I have seen this:

 +++ echo -e 'Creating partitions for disk /dev/sda (msdos)'
 +++ parted -s /dev/sda mklabel msdos
 +++ parted -s /dev/sda mkpart primary 32768B 542836735B
 Warning: The resulting partition is not properly aligned for best performance.
 +++ parted -s /dev/sda set 1 boot on
 +++ parted -s /dev/sda mkpart primary 542838784B 240057409535B
 Warning: The resulting partition is not properly aligned for best performance.
 +++ parted -s /dev/sda set 2 lvm on

Looking at the offsets, the main question is whether the offsets starts at 0 or 1. Once we have determined what the correct start offset is, we can make sure the offsets used are aligned on 4k boundaries. Here is an example for the above boundaries:

[root@moria ~]# echo alignment: $((32768%4096)) $((542836735%4096))
alignment: 0 2047
[root@moria ~]# echo alignment: $((542838784%4096)) $((240057409535%4096))
alignment: 0 4095

So in the above case, either the start offset, or the stop offset are off-by-one. In essence, we should only be concerned by the start offset, but we could do the right thing anyway.

dagwieers commented at 2012-06-09 00:35:

Looking into this a bit closer, it's hard to know what properly aligned means. According to some fora, if the EBS (Erase Block Size) is unknown, using 512*1024 is a safe size to use. Looking at my offset:

[root@moria rear]# parted /dev/sda unit b print
Model: ATA Corsair Force GT (scsi)
Disk /dev/sda: 240057409536B
Sector size (logical/physical): 512B/512B
Partition Table: msdos

Number  Start       End            Size           Type     File system  Flags
 1      32768B      542836735B     542803968B     primary  ext4         boot
 2      542838784B  240057409535B  239514570752B  primary               lvm

The first partition does not start at an erase block, but the second partition does:

[root@moria rear]# echo $((542838784%512*1024))
0

Still parted complained ?

kpieth commented at 2015-10-17 11:43:

This is an important feature, because after a rear recovery you always have unaligned Partitions and slow systems.
Currently I always have to manually enter Starting Offsets before recovery. The starting Position of the first partition is on my Systems(HP Proliant, VMware, SSD Laptop) always Sector 2048 to get proper alignment. A good explanation in german can be found here:
https://www.thomas-krenn.com/de/wiki/Partition_Alignment

schlomo commented at 2015-10-17 17:58:

@kpieth yes you are very right. Unfortunately nobody figured out an automated way of doing this so far. Any ideas?

jsmeix commented at 2015-10-19 08:16:

Only at a very quick first glance:

In the rear 1.17.2 sources I do not find any usage of "parted ... --align=optimal"

It seems regardless that in usr/share/rear/layout/prepare/GNU/Linux/10_include_partition_code.sh a FEATURE_PARTED_ALIGNMENT exists, parted alignment is nowhere used.

I am not at all a sufficient parted expert to understand if "parted ... --align=optimal" is actually the right solution here. The wording makes it look as if it does "the right thing" but I do not (yet) know if it actually does the right thing because http://www.gnu.org/software/parted/manual/parted.html is not really explanatory what actually happens for each of parted's alignment types none, cylinder, minimal and optimal.

At least when YaST installs an openSUSE or SLE system it calls "parted ... --align=optimal", see also my script at "Generic disaster recovery with the plain SUSE installation system" in https://en.opensuse.org/SDB:Disaster_Recovery how I call parted there (basicaly I copied it from how YaST calls parted, see the comments in my script).

jsmeix commented at 2015-10-19 08:27:

A general question:

Because at least low-end flash devices (a.k.a. USB sticks) require alignment at 4MiB or even 8 MiB for not-too-bad performance (cf. my last comment at https://hackweek.suse.com/12/projects/23 ) I wonder if rear should by default align at 4MiB or even 8 MiB - at least when the partition size is much bigger than 4MiB or 8 MiB (e.g. for partitions bigger that 100 MB).

This way a badly aligned original system could become even faster after recovery. Is such a difference allowed or must rear recovery slavishly produce a byte-by-byte identical copy whenever possible?

schlomo commented at 2015-10-19 09:17:

Wild guess: I am not sure if --align=optimal will have any effect as long as we give partedexact Byte ranges as input. After all it would have to change our precise input to make the alignment. Maybe part of the problem is also to convert the math to work in MB so that partedwill have some leeway how to interpret the MB numbers.

jsmeix commented at 2015-10-19 10:21:

Yes, that exact byte ranges as input for parted is what I question if that is really reasonable (at least reasonable by default) to be done by rear in my above comment regarding using 4MiB or even 8 MiB alignment by default (for for sufficiently big partitions).

In other words: I wonder if rear should by default round exact byte values to 4MiB or even 8 MiB chunks?

An obvious direct drawback is that a partition could become several MiB smaller than it was before.

As an extreme example assume a 2020 MiB disk consists of 10 partitions each with 202 MiB. Each is rounded to 4MiB resulting the first 9 partitions rounded to 204 MiB. As a consequence for the last partition only 184 MiB = 2020 MiB - ( 9 * 204 MiB ) are available space on the disk so that after recovery the 10. partition becomes 18 MiB = 9 * 2 MiB smaller than it was originally (i.e. about 10% smaller in this example).

With GPT partitioning and 110 such 202 MiB partitions on a 22220 MiB disk it is left to the reader to calculate that there would be no space left on the device for the last partition.

jsmeix commented at 2015-10-19 10:50:

@kpieth regarding
https://www.thomas-krenn.com/de/wiki/Partition_Alignment
that reads (excerpt)

4 KiB / 8 KiB Pages der SSDs

I do not agree from my current point of understanding.

For flash strorage the physical block size for read/write operations is usually something like 4MiB or 8MiB (mega bytes not kilo bytes!) cf. my last comment in https://hackweek.suse.com/12/projects/23

I think SSDs usually have sufficiently intelligent controllers with sufficiently huge caches built in so that small reads or writes could be sufficiently well buffered - in particular consecutive small reads or writes are combined by the controller into big reads or writes that mach the physical block size of the actual storage hardware.

I think in the end for a sufficiently well made SSD it should not matter too much how it is accessed or aligned.

In contrast I think for cheap flash-based storagae it does matter in what chunks and with what alignment it is accessed.

But I am not at all a sufficient expert in this area to make really authoritative statements.

I only think that 4 KiB alignment which should be the right one for spinning traditional harddisks could be totally insufficient for cheap flash based storage where 4 MiB or even 8 MiB would have to be used.

This is the reason why I am thinking about using 4 MiB or even 8 MiB alingnment by default in rear for sufficiently big partitions (i.e. where rounding differences should not matter in reasonable scenarios).

jsmeix commented at 2015-10-20 11:58:

Wait!
Going back to square one:

I verified that with "rear recover" (using rear 1.17.2)
one gets a byte-by-byte identical copy of the partitioning
of the original disk:

My original system disk:

# parted -s /dev/sda unit B print
Model: ATA QEMU HARDDISK (scsi)
Disk /dev/sda: 26843545600B
Sector size (logical/physical): 512B/512B
Partition Table: msdos
Disk Flags: 
Number  Start         End           Size          Type     File system     Flags
 1      1048576B      1570766847B   1569718272B   primary  linux-swap(v1)  type=82
 2      1570766848B   14459863039B  12889096192B  primary  btrfs           boot, type=83
 3      14459863040B  26843545599B  12383682560B  primary  xfs             type=83

My recovered system disk on a identical second machine:

# parted -s /dev/sda unit B print
Model: ATA QEMU HARDDISK (scsi)
Disk /dev/sda: 26843545600B
Sector size (logical/physical): 512B/512B
Partition Table: msdos
Disk Flags: 
Number  Start         End           Size          Type     File system     Flags
 1      1048576B      1570766847B   1569718272B   primary  linux-swap(v1)  type=83
 2      1570766848B   14459863039B  12889096192B  primary  btrfs           boot, type=83
 3      14459863040B  26843545599B  12383682560B  primary  xfs             type=83

This means when after "rear recover" the partitions are badly aligned,
they must have been already badly aligned on the original system.

I think it does not belong to rear to somehow try to fix badly aligned
partitions.

Or do I misunderstand something so that it can happen
that well aligned partitions on the original system become
badly aligned partitions after "rear recover"?

@kpieth
please provide the output of

# parted -s /dev/sdX unit B print

(replaxe 'X' with what is appropriate for your harddisk)
both on your original system and after "rear recover".

jhoekx commented at 2015-10-20 12:29:

When I was working on the partitioning code, the goal was to reuse the exact same offsets if no resizing had to be done. As soon as we enter migration mode, the code tried to align on boundaries.

It would be interesting to know if we are in migration mode or not?

Maybe the choice of boundaries is not optimal?

kpieth commented at 2015-10-23 22:05:

Yes we are talking about migration mode. A well aligned partition gets bad aligned after rear recovery. If no resizing is done, everything is at it was. We use rear for installing new machines(VM and Hardware). My machines usually run virtualized and so have several layers of storage under it. Alignment really has a performance impact.

gdha commented at 2016-09-07 14:49:

added it to the sponsor list - close it

jsmeix commented at 2018-02-13 11:45:

Right now the following is mainly only an offhanded guess
but perhaps I may have found the root cause of this issue:

With bigger replacement disk size one gets partitions recreated
with some automatically resized partitions via
usr/share/rear/layout/prepare/default/400_autoresize_disks.sh
which evenly distribute the size changes on byte values so that
the automatically resized partitions can be arbitrarily badly aligned.

Later some partition alignment happens in
usr/share/rear/layout/prepare/GNU/Linux/100_include_partition_code.sh
where my offhanded guess is the alignment is something like 1 MiB
because of the '/ 1024 / 1024' in the code
but perhaps one may even get only 1 Byte as alignment
in case of FEATURE_PARTED_ANYUNIT which may finally
explain the root cause behind this issue.

Summary from my current point of view:

With same replacement disk size one gets partitions recreated
at the exact same byte values as they have been on the original system
which is the right behaviour because ReaR is first and foremost meant
to recreate a system as much as possible exactly as it was before.

In contrast with bigger replacement disk size one may get
automatically resized partitions that can be arbitrarily badly aligned.

I think even with bigger replacement disk size the default should also be
to get partitions recreated at the exact same byte values
as they have been on the original system so that by default one gets
the system recreated as much as possible exactly as it was before.

Optionally - only after a user confirmation dialog - with bigger replacement disk size
one could get automatically resized partitions but then with a specific
alignment value PARTITION_ALIGN_BLOCK_SIZE that is by default
the same as USB_PARTITION_ALIGN_BLOCK_SIZE i.e. 8 MiB
to create things reasonably right out of the box also on SSDs, cf.
https://github.com/rear/rear/issues/1201

jsmeix commented at 2018-02-16 14:02:

In my current https://github.com/rear/rear/pull/1733
I do not implement any PARTITION_ALIGN_BLOCK_SIZE support
because I think this is not needed when only the end value
of the last partition on each disk may get changed
(I use a hardcoded 1 MiB alignment for the partition end values)
for details see https://github.com/rear/rear/pull/1733

jsmeix commented at 2018-02-21 12:30:

Only FYI:

Right now it happened to me that I got such a parted
"Warning: The resulting partition is not properly aligned for best performance.".

In my case it happened this way:

# parted -s /dev/sdb unit MiB print
Model: ATA QEMU HARDDISK (scsi)
Disk /dev/sdb: 2048MiB
Sector size (logical/physical): 512B/512B
Partition Table: msdos
Number  Start    End      Size    Type      File system     Flags
 1      8.00MiB  808MiB   800MiB  primary   ext2            type=83
 2      808MiB   1208MiB  400MiB  primary   linux-swap(v1)  type=82
 3      1208MiB  1708MiB  500MiB  extended                  lba, type=0f

# parted -s -a optimal /dev/sdb unit MiB mkpart logical 1208 1408
Warning: The resulting partition is not properly aligned for best performance.

# parted -s /dev/sdb unit MiB print
Model: ATA QEMU HARDDISK (scsi)
Disk /dev/sdb: 2048MiB
Sector size (logical/physical): 512B/512B
Partition Table: msdos
Number  Start    End      Size    Type      File system     Flags
 1      8.00MiB  808MiB   800MiB  primary   ext2            type=83
 2      808MiB   1208MiB  400MiB  primary   linux-swap(v1)  type=82
 3      1208MiB  1708MiB  500MiB  extended                  lba, type=0f
 5      1208MiB  1408MiB  200MiB  logical                   type=83

# parted -s /dev/sdb unit B print
Model: ATA QEMU HARDDISK (scsi)
Disk /dev/sdb: 2147483648B
Sector size (logical/physical): 512B/512B
Partition Table: msdos
Number  Start        End          Size        Type      File system     Flags
 1      8388608B     847249407B   838860800B  primary   ext2            type=83
 2      847249408B   1266679807B  419430400B  primary   linux-swap(v1)  type=82
 3      1266679808B  1790967807B  524288000B  extended                  lba, type=0f
 5      1266680320B  1476395519B  209715200B  logical                   type=83

# echo '1208 * 1024 * 1024' | bc -l
1266679808

# echo '1266680320 / 1024 / 1024' | bc -l
1208.00048828125000000000

# echo '1266680320 - 1266679808' | bc -l
512

I had specified the logical partition start with 1 MiB unit
at the same point where the extended partition starts
but parted knows that the first sector (i.e. the first 512 bytes)
of the extended partition cannot be used by the logical partition
so that parted automatically moved the actual logical partition start
by only 512 bytes which results an actual logical partition start point
that is "not properly aligned for best performance"
(whatever that exactly means in parted's opinion - see next paragraph)
and - as far as I know - there should be an at least 63 sectors gap
(i.e. at least 63 * 512 bytes = 32256 bytes) gap between the
extended partition start and the first logical partition start.

I think when parted can make such a "Warning" message
cf. http://blog.schlomo.schapiro.org/2015/04/warning-is-waste-of-my-time.html
parted must have some built-in "knowledge" what would be
"properly aligned for best performance" and then I wonder why
parted does not automatically align it "properly for best performance"
regardless that I had explicitly called parted with -a optimal
i.e. what the heck is parted's alignment type optimal meant for?

When I manually specify the logical partition start with 1 MiB unit
at one MiB more from where the extended partition starts it works:

# parted -s /dev/sdb rm 5

# parted -s /dev/sdb unit MiB mkpart logical 1209 1408

# parted -s /dev/sdb unit MiB print
Model: ATA QEMU HARDDISK (scsi)
Disk /dev/sdb: 2048MiB
Sector size (logical/physical): 512B/512B
Partition Table: msdos
Number  Start    End      Size    Type      File system     Flags
 1      8.00MiB  808MiB   800MiB  primary   ext2            type=83
 2      808MiB   1208MiB  400MiB  primary   linux-swap(v1)  type=82
 3      1208MiB  1708MiB  500MiB  extended                  lba, type=0f
 5      1209MiB  1408MiB  199MiB  logical                   type=83

# parted -s /dev/sdb unit B print
Model: ATA QEMU HARDDISK (scsi)
Disk /dev/sdb: 2147483648B
Sector size (logical/physical): 512B/512B
Partition Table: msdos
Number  Start        End          Size        Type      File system     Flags
 1      8388608B     847249407B   838860800B  primary   ext2            type=83
 2      847249408B   1266679807B  419430400B  primary   linux-swap(v1)  type=82
 3      1266679808B  1790967807B  524288000B  extended                  lba, type=0f
 5      1267728384B  1476395007B  208666624B  logical                   type=83

# echo '1267728384 / 1024 / 1024' | bc -l
1209.00000000000000000000

Lesson learned:

Do not rely on parted's automated alignment
(in particular do not rely on parted's optimal alignment)
but calculate the right values manually and
use the right values in parted calls.

jsmeix commented at 2018-02-23 14:50:

I think
https://github.com/rear/rear/pull/1733#issuecomment-367680598
and
https://github.com/rear/rear/pull/1733#issuecomment-368028494
prove that "rear recover" at least in migration mode
can result a changed and badly aligned partitioning.

jsmeix commented at 2018-03-01 13:47:

With https://github.com/rear/rear/pull/1733 merged
this issue should be (hopefully) fixed, cf.
https://github.com/rear/rear/pull/1733#issuecomment-369514406

But this means there is now a changed default behaviour
how ReaR behaves in migration mode when partitions can or must be
resized to fit on replacement disks with different size, cf. the merge commit comment
https://github.com/rear/rear/commit/6414936ba30d6c13020eee8313e93a4e29debc54


[Export of Github issue for rear/rear.]