#102 Issue closed
: Investigate "not properly aligned for best performance"¶
Labels: enhancement
, bug
, fixed / solved / done
dagwieers opened issue at 2012-06-07 09:55:¶
When doing a rear restore
parted complains about the fact that
partitions are not properly aligned. This might have a performance
impact on some hardware and therefor we need to make sure if parted is
correct in its claims and whether we should improve our logic (in those
cases where that's possible depending on parted version).
In my case I have seen this:
+++ echo -e 'Creating partitions for disk /dev/sda (msdos)'
+++ parted -s /dev/sda mklabel msdos
+++ parted -s /dev/sda mkpart primary 32768B 542836735B
Warning: The resulting partition is not properly aligned for best performance.
+++ parted -s /dev/sda set 1 boot on
+++ parted -s /dev/sda mkpart primary 542838784B 240057409535B
Warning: The resulting partition is not properly aligned for best performance.
+++ parted -s /dev/sda set 2 lvm on
Looking at the offsets, the main question is whether the offsets starts at 0 or 1. Once we have determined what the correct start offset is, we can make sure the offsets used are aligned on 4k boundaries. Here is an example for the above boundaries:
[root@moria ~]# echo alignment: $((32768%4096)) $((542836735%4096))
alignment: 0 2047
[root@moria ~]# echo alignment: $((542838784%4096)) $((240057409535%4096))
alignment: 0 4095
So in the above case, either the start offset, or the stop offset are off-by-one. In essence, we should only be concerned by the start offset, but we could do the right thing anyway.
dagwieers commented at 2012-06-09 00:35:¶
Looking into this a bit closer, it's hard to know what properly aligned means. According to some fora, if the EBS (Erase Block Size) is unknown, using 512*1024 is a safe size to use. Looking at my offset:
[root@moria rear]# parted /dev/sda unit b print
Model: ATA Corsair Force GT (scsi)
Disk /dev/sda: 240057409536B
Sector size (logical/physical): 512B/512B
Partition Table: msdos
Number Start End Size Type File system Flags
1 32768B 542836735B 542803968B primary ext4 boot
2 542838784B 240057409535B 239514570752B primary lvm
The first partition does not start at an erase block, but the second partition does:
[root@moria rear]# echo $((542838784%512*1024))
0
Still parted complained ?
kpieth commented at 2015-10-17 11:43:¶
This is an important feature, because after a rear recovery you always
have unaligned Partitions and slow systems.
Currently I always have to manually enter Starting Offsets before
recovery. The starting Position of the first partition is on my
Systems(HP Proliant, VMware, SSD Laptop) always Sector 2048 to get
proper alignment. A good explanation in german can be found here:
https://www.thomas-krenn.com/de/wiki/Partition_Alignment
schlomo commented at 2015-10-17 17:58:¶
@kpieth yes you are very right. Unfortunately nobody figured out an automated way of doing this so far. Any ideas?
jsmeix commented at 2015-10-19 08:16:¶
Only at a very quick first glance:
In the rear 1.17.2 sources I do not find any usage of "parted ... --align=optimal"
It seems regardless that in usr/share/rear/layout/prepare/GNU/Linux/10_include_partition_code.sh a FEATURE_PARTED_ALIGNMENT exists, parted alignment is nowhere used.
I am not at all a sufficient parted expert to understand if "parted ... --align=optimal" is actually the right solution here. The wording makes it look as if it does "the right thing" but I do not (yet) know if it actually does the right thing because http://www.gnu.org/software/parted/manual/parted.html is not really explanatory what actually happens for each of parted's alignment types none, cylinder, minimal and optimal.
At least when YaST installs an openSUSE or SLE system it calls "parted ... --align=optimal", see also my script at "Generic disaster recovery with the plain SUSE installation system" in https://en.opensuse.org/SDB:Disaster_Recovery how I call parted there (basicaly I copied it from how YaST calls parted, see the comments in my script).
jsmeix commented at 2015-10-19 08:27:¶
A general question:
Because at least low-end flash devices (a.k.a. USB sticks) require alignment at 4MiB or even 8 MiB for not-too-bad performance (cf. my last comment at https://hackweek.suse.com/12/projects/23 ) I wonder if rear should by default align at 4MiB or even 8 MiB - at least when the partition size is much bigger than 4MiB or 8 MiB (e.g. for partitions bigger that 100 MB).
This way a badly aligned original system could become even faster after recovery. Is such a difference allowed or must rear recovery slavishly produce a byte-by-byte identical copy whenever possible?
schlomo commented at 2015-10-19 09:17:¶
Wild guess: I am not sure if --align=optimal
will have any effect as
long as we give parted
exact Byte ranges as input. After all it would
have to change our precise input to make the alignment. Maybe part of
the problem is also to convert the math to work in MB so that
parted
will have some leeway how to interpret the MB numbers.
jsmeix commented at 2015-10-19 10:21:¶
Yes, that exact byte ranges as input for parted is what I question if that is really reasonable (at least reasonable by default) to be done by rear in my above comment regarding using 4MiB or even 8 MiB alignment by default (for for sufficiently big partitions).
In other words: I wonder if rear should by default round exact byte values to 4MiB or even 8 MiB chunks?
An obvious direct drawback is that a partition could become several MiB smaller than it was before.
As an extreme example assume a 2020 MiB disk consists of 10 partitions each with 202 MiB. Each is rounded to 4MiB resulting the first 9 partitions rounded to 204 MiB. As a consequence for the last partition only 184 MiB = 2020 MiB - ( 9 * 204 MiB ) are available space on the disk so that after recovery the 10. partition becomes 18 MiB = 9 * 2 MiB smaller than it was originally (i.e. about 10% smaller in this example).
With GPT partitioning and 110 such 202 MiB partitions on a 22220 MiB disk it is left to the reader to calculate that there would be no space left on the device for the last partition.
jsmeix commented at 2015-10-19 10:50:¶
@kpieth regarding
https://www.thomas-krenn.com/de/wiki/Partition_Alignment
that reads (excerpt)
4 KiB / 8 KiB Pages der SSDs
I do not agree from my current point of understanding.
For flash strorage the physical block size for read/write operations is usually something like 4MiB or 8MiB (mega bytes not kilo bytes!) cf. my last comment in https://hackweek.suse.com/12/projects/23
I think SSDs usually have sufficiently intelligent controllers with sufficiently huge caches built in so that small reads or writes could be sufficiently well buffered - in particular consecutive small reads or writes are combined by the controller into big reads or writes that mach the physical block size of the actual storage hardware.
I think in the end for a sufficiently well made SSD it should not matter too much how it is accessed or aligned.
In contrast I think for cheap flash-based storagae it does matter in what chunks and with what alignment it is accessed.
But I am not at all a sufficient expert in this area to make really authoritative statements.
I only think that 4 KiB alignment which should be the right one for spinning traditional harddisks could be totally insufficient for cheap flash based storage where 4 MiB or even 8 MiB would have to be used.
This is the reason why I am thinking about using 4 MiB or even 8 MiB alingnment by default in rear for sufficiently big partitions (i.e. where rounding differences should not matter in reasonable scenarios).
jsmeix commented at 2015-10-20 11:58:¶
Wait!
Going back to square one:
I verified that with "rear recover" (using rear 1.17.2)
one gets a byte-by-byte identical copy of the partitioning
of the original disk:
My original system disk:
# parted -s /dev/sda unit B print Model: ATA QEMU HARDDISK (scsi) Disk /dev/sda: 26843545600B Sector size (logical/physical): 512B/512B Partition Table: msdos Disk Flags: Number Start End Size Type File system Flags 1 1048576B 1570766847B 1569718272B primary linux-swap(v1) type=82 2 1570766848B 14459863039B 12889096192B primary btrfs boot, type=83 3 14459863040B 26843545599B 12383682560B primary xfs type=83
My recovered system disk on a identical second machine:
# parted -s /dev/sda unit B print Model: ATA QEMU HARDDISK (scsi) Disk /dev/sda: 26843545600B Sector size (logical/physical): 512B/512B Partition Table: msdos Disk Flags: Number Start End Size Type File system Flags 1 1048576B 1570766847B 1569718272B primary linux-swap(v1) type=83 2 1570766848B 14459863039B 12889096192B primary btrfs boot, type=83 3 14459863040B 26843545599B 12383682560B primary xfs type=83
This means when after "rear recover" the partitions are badly aligned,
they must have been already badly aligned on the original system.
I think it does not belong to rear to somehow try to fix badly aligned
partitions.
Or do I misunderstand something so that it can happen
that well aligned partitions on the original system become
badly aligned partitions after "rear recover"?
@kpieth
please provide the output of
# parted -s /dev/sdX unit B print
(replaxe 'X' with what is appropriate for your harddisk)
both on your original system and after "rear recover".
jhoekx commented at 2015-10-20 12:29:¶
When I was working on the partitioning code, the goal was to reuse the exact same offsets if no resizing had to be done. As soon as we enter migration mode, the code tried to align on boundaries.
It would be interesting to know if we are in migration mode or not?
Maybe the choice of boundaries is not optimal?
kpieth commented at 2015-10-23 22:05:¶
Yes we are talking about migration mode. A well aligned partition gets bad aligned after rear recovery. If no resizing is done, everything is at it was. We use rear for installing new machines(VM and Hardware). My machines usually run virtualized and so have several layers of storage under it. Alignment really has a performance impact.
gdha commented at 2016-09-07 14:49:¶
added it to the sponsor list - close it
jsmeix commented at 2018-02-13 11:45:¶
Right now the following is mainly only an offhanded guess
but perhaps I may have found the root cause of this issue:
With bigger replacement disk size one gets partitions recreated
with some automatically resized partitions via
usr/share/rear/layout/prepare/default/400_autoresize_disks.sh
which evenly distribute the size changes on byte values so that
the automatically resized partitions can be arbitrarily badly aligned.
Later some partition alignment happens in
usr/share/rear/layout/prepare/GNU/Linux/100_include_partition_code.sh
where my offhanded guess is the alignment is something like 1 MiB
because of the '/ 1024 / 1024' in the code
but perhaps one may even get only 1 Byte as alignment
in case of FEATURE_PARTED_ANYUNIT which may finally
explain the root cause behind this issue.
Summary from my current point of view:
With same replacement disk size one gets partitions recreated
at the exact same byte values as they have been on the original system
which is the right behaviour because ReaR is first and foremost meant
to recreate a system as much as possible exactly as it was before.
In contrast with bigger replacement disk size one may get
automatically resized partitions that can be arbitrarily badly aligned.
I think even with bigger replacement disk size the default should also
be
to get partitions recreated at the exact same byte values
as they have been on the original system so that by default one gets
the system recreated as much as possible exactly as it was before.
Optionally - only after a user confirmation dialog - with bigger
replacement disk size
one could get automatically resized partitions but then with a
specific
alignment value PARTITION_ALIGN_BLOCK_SIZE that is by default
the same as USB_PARTITION_ALIGN_BLOCK_SIZE i.e. 8 MiB
to create things reasonably right out of the box also on SSDs, cf.
https://github.com/rear/rear/issues/1201
jsmeix commented at 2018-02-16 14:02:¶
In my current
https://github.com/rear/rear/pull/1733
I do not implement any PARTITION_ALIGN_BLOCK_SIZE support
because I think this is not needed when only the end value
of the last partition on each disk may get changed
(I use a hardcoded 1 MiB alignment for the partition end values)
for details see
https://github.com/rear/rear/pull/1733
jsmeix commented at 2018-02-21 12:30:¶
Only FYI:
Right now it happened to me that I got such a parted
"Warning: The resulting partition is not properly aligned for best
performance.".
In my case it happened this way:
# parted -s /dev/sdb unit MiB print Model: ATA QEMU HARDDISK (scsi) Disk /dev/sdb: 2048MiB Sector size (logical/physical): 512B/512B Partition Table: msdos Number Start End Size Type File system Flags 1 8.00MiB 808MiB 800MiB primary ext2 type=83 2 808MiB 1208MiB 400MiB primary linux-swap(v1) type=82 3 1208MiB 1708MiB 500MiB extended lba, type=0f # parted -s -a optimal /dev/sdb unit MiB mkpart logical 1208 1408 Warning: The resulting partition is not properly aligned for best performance. # parted -s /dev/sdb unit MiB print Model: ATA QEMU HARDDISK (scsi) Disk /dev/sdb: 2048MiB Sector size (logical/physical): 512B/512B Partition Table: msdos Number Start End Size Type File system Flags 1 8.00MiB 808MiB 800MiB primary ext2 type=83 2 808MiB 1208MiB 400MiB primary linux-swap(v1) type=82 3 1208MiB 1708MiB 500MiB extended lba, type=0f 5 1208MiB 1408MiB 200MiB logical type=83 # parted -s /dev/sdb unit B print Model: ATA QEMU HARDDISK (scsi) Disk /dev/sdb: 2147483648B Sector size (logical/physical): 512B/512B Partition Table: msdos Number Start End Size Type File system Flags 1 8388608B 847249407B 838860800B primary ext2 type=83 2 847249408B 1266679807B 419430400B primary linux-swap(v1) type=82 3 1266679808B 1790967807B 524288000B extended lba, type=0f 5 1266680320B 1476395519B 209715200B logical type=83 # echo '1208 * 1024 * 1024' | bc -l 1266679808 # echo '1266680320 / 1024 / 1024' | bc -l 1208.00048828125000000000 # echo '1266680320 - 1266679808' | bc -l 512
I had specified the logical partition start with 1 MiB unit
at the same point where the extended partition starts
but parted knows that the first sector (i.e. the first 512 bytes)
of the extended partition cannot be used by the logical partition
so that parted automatically moved the actual logical partition start
by only 512 bytes which results an actual logical partition start
point
that is "not properly aligned for best performance"
(whatever that exactly means in parted's opinion - see next paragraph)
and - as far as I know - there should be an at least 63 sectors gap
(i.e. at least 63 * 512 bytes = 32256 bytes) gap between the
extended partition start and the first logical partition start.
I think when parted can make such a "Warning" message
cf.
http://blog.schlomo.schapiro.org/2015/04/warning-is-waste-of-my-time.html
parted must have some built-in "knowledge" what would be
"properly aligned for best performance" and then I wonder why
parted does not automatically align it "properly for best performance"
regardless that I had explicitly called parted with -a optimal
i.e. what the heck is parted's alignment type optimal
meant for?
When I manually specify the logical partition start with 1 MiB unit
at one MiB more from where the extended partition starts it works:
# parted -s /dev/sdb rm 5 # parted -s /dev/sdb unit MiB mkpart logical 1209 1408 # parted -s /dev/sdb unit MiB print Model: ATA QEMU HARDDISK (scsi) Disk /dev/sdb: 2048MiB Sector size (logical/physical): 512B/512B Partition Table: msdos Number Start End Size Type File system Flags 1 8.00MiB 808MiB 800MiB primary ext2 type=83 2 808MiB 1208MiB 400MiB primary linux-swap(v1) type=82 3 1208MiB 1708MiB 500MiB extended lba, type=0f 5 1209MiB 1408MiB 199MiB logical type=83 # parted -s /dev/sdb unit B print Model: ATA QEMU HARDDISK (scsi) Disk /dev/sdb: 2147483648B Sector size (logical/physical): 512B/512B Partition Table: msdos Number Start End Size Type File system Flags 1 8388608B 847249407B 838860800B primary ext2 type=83 2 847249408B 1266679807B 419430400B primary linux-swap(v1) type=82 3 1266679808B 1790967807B 524288000B extended lba, type=0f 5 1267728384B 1476395007B 208666624B logical type=83 # echo '1267728384 / 1024 / 1024' | bc -l 1209.00000000000000000000
Lesson learned:
Do not rely on parted's automated alignment
(in particular do not rely on parted's optimal
alignment)
but calculate the right values manually and
use the right values in parted calls.
jsmeix commented at 2018-02-23 14:50:¶
I think
https://github.com/rear/rear/pull/1733#issuecomment-367680598
and
https://github.com/rear/rear/pull/1733#issuecomment-368028494
prove that "rear recover" at least in migration mode
can result a changed and badly aligned partitioning.
jsmeix commented at 2018-03-01 13:47:¶
With
https://github.com/rear/rear/pull/1733
merged
this issue should be (hopefully) fixed, cf.
https://github.com/rear/rear/pull/1733#issuecomment-369514406
But this means there is now a changed default behaviour
how ReaR behaves in migration mode when partitions can or must be
resized to fit on replacement disks with different size, cf. the merge
commit comment
https://github.com/rear/rear/commit/6414936ba30d6c13020eee8313e93a4e29debc54
[Export of Github issue for rear/rear.]