#1269 Issue closed: 400_autoresize_disks.sh incorrectly calculates $new_size with mawk (Debian)

Labels: enhancement, cleanup, fixed / solved / done, minor bug

gozora opened issue at 2017-03-28 18:10:

Relax-and-Recover (ReaR) Issue Template

  • rear version (/usr/sbin/rear -V): Relax-and-Recover 2.00 / Git
  • OS version (cat /etc/rear/os.conf or lsb_release -a): Debian GNU/Linux 8.7 (jessie)
  • rear configuration files (cat /etc/rear/site.conf or cat /etc/rear/local.conf):
BACKUP=NETFS
OUTPUT=ISO

OUTPUT_URL=nfs://beta/mnt/rear/iso
BACKUP_URL=nfs://beta/mnt/rear

EXCLUDE_BACKUP=( ${EXCLUDE_BACKUP[@]} fs:/crash fs:/usr/sap fs:/oracle )
BACKUP_PROG_EXCLUDE=( ${BACKUP_PROG_EXCLUDE[@]} '/mnt/*' '/media/*' )

ISO_MKISOFS_BIN=/usr/bin/ebiso

GRUB_RESCUE=n

USE_STATIC_NETWORKING=y
NETWORKING_PREPARATION_COMMANDS=(ifconfig eth1 inet 192.168.0.23)
  • Are you using legacy BIOS or UEFI boot? UEFI
  • Brief description of the issue:
    During rear recover partition size is incorrectly calculated with mawk.
    Following code is executed by 400_autoresize_disks.sh
new_size=$(echo "$partition_size $resizeable_space $available_space" | awk '{ printf "%d", ($1/$2)*$3; }')

With gawk values are correct:

echo '8333033472 8333033472 39473643520' | awk '{ printf "%d", ($1/$2)*$3; }'
39473643520

With mawk however, values are maxed to 0xFFFFFFFF (2147483647)

echo '8333033472 8333033472 39473643520' | awk '{ printf "%d", ($1/$2)*$3; }'
2147483647
  • Work-around, if any:
    Change of awk conversion specification format:
- new_size=$(echo "$partition_size $resizeable_space $available_space" | awk '{ printf "%d", ($1/$2)*$3; }')
+ new_size=$(echo "$partition_size $resizeable_space $available_space" | awk '{ printf "%.0f", ($1/$2)*$3; }')

Or maybe use bash arithmetic expansion ?

- new_size=$(echo "$partition_size $resizeable_space $available_space" | awk '{ printf "%d", ($1/$2)*$3; }')
+ new_size=$(( ($partition_size/$resizeable_space)*$available_space ))

jsmeix commented at 2017-03-29 07:27:

I do very very much appreciate it whenever
needless external tool calls are replaced
by native bash 3.0 functionality to
Keep the Implementation Simple and Straightforward.

Furthermore I personally prefer when spaces
are used when possible to aid readability like

new_size=$(( ( $partition_size / $resizeable_space ) * $available_space ))

which at least helps my elderly eyes ;-)
cf. https://github.com/rear/rear/wiki/Coding-Style

For me with GNU bash version 3.2.51 on SLES11 32-bit
bash arithmetic evaluation and expansion works with your values

# partition_size=8333033472 ; resizeable_space=8333033472 ; available_space=39473643520 ; new_size=$(( ( $partition_size / $resizeable_space ) * $available_space )) ; echo $new_size
39473643520

According to
http://unix.stackexchange.com/questions/117280/what-is-the-rationale-for-the-bash-shell-not-warning-you-of-arithmetic-overflow
I tried what the maximum value is for my GNU bash version 3.2.51
on SLES11 32-bit and it is 2^63 - 1

# max=$(( 2**63 - 1 )) ; echo $max
9223372036854775807

# max=$(( 2**63 )) ; echo $max
-9223372036854775808

# echo '2^63 - 1' | bc -l
9223372036854775807

# echo '2^63' | bc -l
9223372036854775808

and I get same results with GNU bash version 4.2.47
on openSUSE Laep 42.1 64-bit:

# max=$(( 2**63 - 1 )) ; echo $max
9223372036854775807

# max=$(( 2**63 )) ; echo $max
-9223372036854775808

so that I conclude that in practice the maximum value for
bash arithmetic evaluation is 2^63 - 1 = 9223372036854775807
which is 8388608 GiB - 1 so that we can use bash arithmetic
for disks up to 8388607 GiB.

gdha commented at 2017-03-29 08:08:

@gozora choose the bash way 👍

gozora commented at 2017-03-29 08:52:

@gdha, @jsmeix

Thanks a lot for your inputs!
I'll prepare PR later today.

V.

jsmeix commented at 2017-03-29 09:00:

@gozora
I think in layout/prepare/GNU/Linux/100_include_partition_code.sh

    start=$( echo "$start" | awk '{printf "%u", $1+4096-($1%4096);}')

is also a place where mawk could fail because I think
the partition start values in ReaR use bytes as unit
so that on bigger disks it could overflow with mawk.

jsmeix commented at 2017-03-29 09:01:

@gozora
thanks a lot for your careful testing
that reveals those generic bugs in ReaR!

gozora commented at 2017-03-29 09:05:

I think in layout/prepare/GNU/Linux/100_include_partition_code.sh

start=$( echo "$start" | awk '{printf "%u", $1+4096-($1%4096);}')

is also a place where mawk could fail because I think
the partition start values in ReaR use bytes as unit
so that on bigger disks it could overflow with mawk.

Good catch @jsmeix, I'll rewrite this one as well.

thanks a lot for your careful testing
that reveals those generic bugs in ReaR!

No problem, it just somehow happens that I keep finding those bugs :-)

jsmeix commented at 2017-03-29 09:19:

I fear the whole

start=$( echo "$start" | awk '{printf "%u", $1+4096-($1%4096);}')

is plain wrong - or I do something wrong here:

# start=$(( 4096 * 1234 ))
# echo $start
5054464

# rounded_start=$( echo "$start" | awk '{printf "%u", $1+4096- $1%4096);}')
# echo $rounded_start
5058560

# echo $(( rounded_start / 4096 ))
1235

I.e. even for a exact matching start value, the current code
recalculates it to a new start + 4096 value.

I guess nobody notices a change of 4096 bytes in practice
but it would be a major bug in ReaR when it does not recreate
the partitioning exactly as it was before when possible.

If I am right the above is perhaps the cause of the mysterious
changes of partitioning after "rear recover" that you detected
during your tests with your BACKUP=BLOCKCLONE
in particular for Windows NTFS partitions, cf.
https://github.com/rear/rear/issues/1078#issuecomment-266099227

jsmeix commented at 2017-03-29 09:37:

The more I try to understand it the less I understand it.

I cannot reproduce it in practice on my original system:

# parted /dev/sda unit B print
Model: ATA QEMU HARDDISK (scsi)
Disk /dev/sda: 21474836480B
Sector size (logical/physical): 512B/512B
Partition Table: msdos
Disk Flags: 

Number  Start        End           Size          Type     File system     Flags
 1      1048576B     1562378239B   1561329664B   primary  linux-swap(v1)  type=82
 2      1562378240B  21474836479B  19912458240B  primary  ext4            boot, type=83

and after "rear recover"

# parted /dev/sda unit B print
Model: ATA QEMU HARDDISK (scsi)
Disk /dev/sda: 21474836480B
Sector size (logical/physical): 512B/512B
Partition Table: msdos
Disk Flags: 

Number  Start        End           Size          Type     File system     Flags
 1      1048576B     1562378239B   1561329664B   primary  linux-swap(v1)  type=83
 2      1562378240B  21474836479B  19912458240B  primary  ext4            boot, type=83

but when I try the current code with my /dev/sda2 partition
start value it calculates a wrong new start value:

# start=$(( 1048576 + 1561329664 )) ; echo $start ; echo $(( start / 4096 ))
1562378240
381440

# rounded_start=$( echo "$start" | awk '{printf "%u", $1+4096-($1%4096);}')

# echo $rounded_start ; echo $(( rounded_start / 4096 ))
1562382336
381441

Am I mad or what goes on here?

gozora commented at 2017-03-29 10:22:

Hehe, looks like a skeleton in the closet.
I'll try to check it as well ...

jsmeix commented at 2017-03-29 10:43:

@gozora
only a side note FYI if you work on 100_include_partition_code.sh
and do not like the hardcoded '4096' guess of
"most device's block size":
You may have a look at the current
USB_PARTITION_ALIGN_BLOCK_SIZE
implementation in 300_format_usb_disk.sh
cf. https://github.com/rear/rear/issues/1201
and https://github.com/rear/rear/pull/1217

jsmeix commented at 2017-03-29 13:48:

@gozora
cf. https://github.com/rear/rear/issues/1270
i.e. be careful with parted units when you like
to stay backward compatible with older parted.

jsmeix commented at 2017-03-30 08:50:

Hooray!
Even with GNU bash, version 3.1.17 on SLES10
bash arithmetic evaluation works up to 2^63 - 1
at least on x86_64:

# cat /etc/issue
Welcome to SUSE Linux Enterprise Server 10 SP4  (x86_64)

# bash --version
GNU bash, version 3.1.17(1)-release (x86_64-suse-linux)

# max=$(( 2**63 - 1 )) ; echo $max
9223372036854775807

# max=$(( 2**63 )) ; echo $max
-9223372036854775808

cf. https://github.com/rear/rear/issues/1269#issuecomment-290006467

Therefore we can use bytes 'B' as unit for parted
(cf. https://github.com/rear/rear/issues/1270)
up to disk sizes of 9223372036854775807 bytes
i.e. up to 8589934592 GiB minus one byte
or 838861 TiB minus one byte
or 8192 PiB minus one byte
or 8 EiB minus one byte
which looks sufficiently future-proof
(at least from my current point of view).

jsmeix commented at 2017-03-31 07:23:

With https://github.com/rear/rear/pull/1272 merged,
this particular issue is fixed.

Any "grand cleanup" of the code in 100_include_partition_code.sh
can be done later as time permits.

jsmeix commented at 2017-04-03 11:27:

An addendum FYI:
Also on a 32-bit SLES10 installation on a 32-bit virtual machine
(we did not find some real 32-bit hardware for that test)
with bash version 3.1.17 arithmetic evaluation works
up to 2^63 - 1

ProBackup-nl commented at 2017-04-16 20:33:

@jsmeix For GNU Bash 4.4.12 x86_64 the bash arithmetic resize code fails at least for each situation where $resizeable_space is larger than $partition_size.

( $partition_size / $resizeable_space )

$ echo $(( (2/3)*9999 ))
0

Failing workaround: Mimicking the example by first multiplying fails because bash is not able to correctly multiply 20G with 30G.

$ bash --version
GNU bash, version 3.2.53(1)-release (x86_64-apple-darwin13)
$ echo $(( 20971520000*30971520000 ))
3883808530565693440
$ bash --version
GNU bash, version 4.4.12(1)-release (x86_64-unknown-linux-gnu)
$ echo $(( 20971520000*30971520000 ))
3883808530565693440

jsmeix commented at 2017-04-26 09:40:

Summary from what we learned above
and in https://github.com/rear/rear/issues/1307

When a tool calculates correctly for numbers up to 2^N
one can in practice only use it for input numbers up to 2^(N/2)
when a single multiplication of such numbers should work
because 2^(N/2) * 2^(N/2) = 2^N

This means when bash can calculate up to 2^63 - 1
we can use in practice bash only for input numbers
up to something about 2^31 - 1.

Inputs of 2^32 - 1 do not work in general:

# echo '2^32 - 1' | bc
4294967295

# echo '4294967295 * 4294967295' | bc
18446744065119617025

# echo $(( 4294967295 * 4294967295 ))
-8589934591

Multiplication works in general up to 2^31 - 1

# echo '2^31 - 1' | bc
2147483647

# echo '2147483647 * 2147483647' | bc
4611686014132420609

# echo $(( 2147483647 * 2147483647 ))
4611686014132420609

Some bigger special cases may still work
like (2^31 - 1) * (2^32)

# echo '2^32' | bc
4294967296

# echo '2147483647 * 4294967296' | bc
9223372032559808512

# echo $(( 2147483647 * 4294967296 ))
9223372032559808512

but nothing more

# echo '2^31' | bc
2147483648

# echo '2147483648 * 4294967296' | bc
9223372036854775808

# echo $(( 2147483648 * 4294967296 ))
-9223372036854775808

jsmeix commented at 2017-04-26 11:56:

With https://github.com/rear/rear/pull/1332 merged
this issue should (hopefully) be finally solved.

ProBackup-nl commented at 2017-05-10 21:29:

@gozora My source OS (Arch Linux) has no bc tool installed (by default).

Despite all your effort for fixing the mawk (Mike Brennan's AWK speedup by using a bytecode compiler) issue, the result is that a broken awk now introduces an additional dependency for ReaR to work properly: bc.

I would love a solution that leaves the old code in place, and extend that for the cases where (m)awk will fail. For instance change 'awk' to explicitly use 'gawk'. And for the users that haven't got gawk installed, fall back to bc for math calculations.

gozora commented at 2017-05-11 06:33:

@ProBackup-nl how can we be sure that awk will be installed (by default) on your Arch next time?

V.

ProBackup-nl commented at 2017-05-11 07:41:

@gozora We can never be sure. However at the moment the gawk package is required by, for example:

jsmeix commented at 2017-05-11 08:28:

Simply put:
One cannot use any kind of 'awk' or any kind of
traditional tool for calculations nowadays.

All those tools have usually certain limitations
that hit us in this or that way when calculating big numbers.

For example using usual floating point arithmetik also leads
to wrong results when calculating big integer numbers.

We need a tool that is known to work for big numbers.

E.g. 2^100 + 1 - 2^100 must result 1 and never ever 0:

# echo '2^100 + 1 - 2^100' | bc -l
1

# awk 'BEGIN{print 2^52 + 1 - 2^52 }'
1

# awk 'BEGIN{print 2^53 + 1 - 2^53 }'  || echo fail
0

I do not even want to know why my 'awk' fails at 2^53
and not at things like 2^31 2^32 2^63 2^64 and why
my 'awk' even does not show its calculation failure
with a non-zero exit code or an error message.

All what matters is that tools like 'awk' cannot be used
because calculations just have to be correct - always.

Accordingly ReaR cannot be used on systems without
a tool that is known to work for big numbers.

FWIW:
I was 'awk' (more precisely 'gawk') package maintainer
some time ago and - guess what - every now and then
on the bug-gawk@gnu.org mailing list issues were reported
by this or that gawk user that "awk calculates wrong"
in this or that particular unexpected way.

ProBackup-nl commented at 2017-05-11 09:53:

A link to a 2013 bug submission by gratien....@gmail.com for the lack of large integer support in mawk / incorrect output of %d format and the quoted response:

This is a known limitation: mawk's format for %d is limited by the format.
The limitation is done to improve performance.

You can get more precision using one of the floating formats (and can construct
one which prints like a %d, e.g., by putting a ".0" on the end of the format).

Does ReaR need this kind of large integer precision here?
Or can the math in ReaR work with a floating point calculation that is converted to an integer?

gozora commented at 2017-05-11 10:22:

@ProBackup-nl using different conversion type was discussed already (see https://github.com/rear/rear/issues/1269#issue-217646048) and we decided to use bc instead ...

V.

ProBackup-nl commented at 2017-05-11 10:36:

In case the choice for bc decision will be reconsidered, floating point gawk returns the expected output for gratien's example:

# echo '26341277696 26341278720 47819259904' | gawk '{ printf "%.0f", ($1/$2)*$3; }'
47819258045
# gawk -V
GNU Awk 4.1.4, API: 1.1 (GNU MPFR 3.1.5-p2, GNU MP 6.1.2)

schlomo commented at 2017-05-11 12:51:

@ProBackup-nl ReaR will always require some software that might not be installed by default on some Linux distro. Therefore we check for it with REQUIRED_PROGS and also include those packages in the distro packages.

I therefore see absolutely no value in writing code that supplements bc instead of requiring bc to be present.

If you are an Archlinux user then I would kindly ask you to help ReaR by making sure that the Archlinuc package also includes such required tools.

Specifically to this point about gawk or bc I see that the current PKGBUILD file already includes a hard dependency on gawk which we could easily extend to also include bc:

depends=(lsb-release iproute2 parted cpio openssl gawk)

ProBackup-nl commented at 2017-05-11 18:17:

@schlomo As long as bc isn't an optional dependency, I don't feel any need.


[Export of Github issue for rear/rear.]