#1706 PR merged: Again support partition names with blanks issues 212 and 1563

Labels: enhancement, bug, documentation, cleanup, fixed / solved / done

jsmeix opened issue at 2018-01-24 15:35:

Implemeted two new general useful functions percent_encode() and percent_decode()
but currently percent_encode() fails for me on SLES11-SP4 with GNU bash version 3.2.57
in case of UTF-8 encoded strings (hopefully nobody uses a UTF-8 encoded GPT partition name)
cf. https://github.com/rear/rear/issues/1563#issuecomment-359797990

Those functions are used to again support GPT partition names with blanks and
re-implemented some quoting hell to provide what parted command line calls need
according to
https://github.com/rear/rear/issues/1563#issuecomment-359784102

By the way I additionally
replaced is_numeric() with hopefully new more generally usable functions
is_integer() and is_positive_integer() that also return a useful return code
and I better documented what read_and_strip_file() actually does
(it also skips lines with leading spaces) and let it return a useful return code
to be more generally usable.

Currently it is not tested.
I did the pull request nevertheless so that you can have an early look.

jsmeix commented at 2018-01-25 15:42:

@gdha
many thanks for testing it!

Also for me it works well.
Details:
On my original SLES11-SP4 system I have

# parted -l

Model: ATA QEMU HARDDISK (scsi)
Disk /dev/sda: 21.5GB
Sector size (logical/physical): 512B/512B
Partition Table: msdos
Number  Start   End     Size    Type     File system     Flags
 1      1049kB  1571MB  1570MB  primary  linux-swap(v1)  type=82
 2      1571MB  21.5GB  19.9GB  primary  ext3            boot, type=83

Model: ATA QEMU HARDDISK (scsi)
Disk /dev/sdb: 2147MB
Sector size (logical/physical): 512B/512B
Partition Table: gpt
Number  Start   End     Size    File system  Name                Flags
 1      10.5MB  21.0MB  10.5MB  ext2         my first partition
 2      31.5MB  41.9MB  10.5MB  ext2         my 2. +% partition

# rear mkbackup

# grep -v '^#' var/lib/rear/layout/disklayout.conf
disk /dev/sda 21474836480 msdos
part /dev/sda 1569718272 1048576 primary none /dev/sda1
part /dev/sda 19904069632 1570766848 primary boot /dev/sda2
disk /dev/sdb 2147483648 gpt
part /dev/sdb 10485760 10485760 my%20first%20partition none /dev/sdb1
part /dev/sdb 10485760 31457280 my%202.%20%2B%25%20partition none /dev/sdb2
fs /dev/sda2 / ext3 uuid=de51dda4-8a5d-4206-b5ee-6eaa9e88bec3 label= blocksize=4096 reserved_blocks=4% max_mounts=-1 check_interval=0d bytes_per_inode=16370 options=rw,acl,user_xattr
fs /dev/sdb1 /sdb1 ext2 uuid=26465ea0-43d1-4f10-9d27-c832c0ad46a3 label= blocksize=1024 reserved_blocks=5% max_mounts=37 check_interval=180d bytes_per_inode=4096 options=rw
fs /dev/sdb2 /sdb2 ext2 uuid=89372717-fb2e-49ce-b4e0-e8629e0b5c76 label= blocksize=1024 reserved_blocks=5% max_mounts=23 check_interval=180d bytes_per_inode=4096 options=rw
swap /dev/sda1 uuid=a0a21610-d949-4b43-9a8c-3db8de0a2b0c label=

After "rear recover" I got on the recreated system

 # parted -l

Model: ATA QEMU HARDDISK (scsi)
Disk /dev/sda: 21.5GB
Sector size (logical/physical): 512B/512B
Partition Table: msdos
Number  Start   End     Size    Type     File system     Flags
 1      2097kB  1572MB  1570MB  primary  linux-swap(v1)  type=83
 2      1572MB  21.5GB  19.9GB  primary  ext3            boot, type=83

Model: ATA QEMU HARDDISK (scsi)
Disk /dev/sdb: 2147MB
Sector size (logical/physical): 512B/512B
Partition Table: gpt
Number  Start   End     Size    File system  Name                Flags
 1      2097kB  12.6MB  10.5MB  ext2         my first partition
 2      12.6MB  23.1MB  10.5MB  ext2         my 2. +% partition

Same GPT partition names on sdb but the partitions on sdb moved.
I assume that happened because during "rear recover" it printed

Ambiguous possible target disks need manual configuration (more than one with same size found)
Switching to manual disk layout configuration

i.e. I got MIGRATION_MODE=true and I accepted the subsequent dialogs "as is".
I was wondering why it detected "Ambiguous possible target disks"
because on both systems there is a 20GiB sda and a 2 GiB sdb
so that there should not be "Ambiguous possible target disks".
Looking at the code in layout/prepare/default/250_compare_disks.sh shows

    # MIGRATION_MODE is needed when more than one possible target disk exists for a disk on the original system:
    if test "$found_orig_size_on_replacement_hardware" -gt 1 ; then
        LogPrint "Ambiguous possible target disks need manual configuration (more than one with same size found)"
        MIGRATION_MODE='true'
    fi

so that there is a (minor) issue in 250_compare_disks.sh
(that is not related to this issue here) that somehow 250_compare_disks.sh
detects more than one possible target disks with same size
even if actually there is only one target disk with same size.
I guess this happens because the same disks can be detected
several times under different "names" or something like that.
But the good news is that 250_compare_disks.sh bahaves
on the safe side:
Better too often detect "Ambiguous possible target disks"
than to blindly proceed and be sorry when it is too late.

jsmeix commented at 2018-01-25 15:45:

If no objections appear I will merge it tomorrow.

jsmeix commented at 2018-01-26 15:46:

FYI
how it fails for me on SLES12 with UTF-8 encoded GPT partition names:

On the original system I did

# parted -s /dev/sdb mklabel gpt

# parted -s /dev/sdb unit MiB mkpart "'my first partition'" 10 20

# part_name="$( echo -en "UTF-8 name bin\0303\0244r" )"

# echo "'$part_name'"
'UTF-8 name binär'

# parted -s /dev/sdb unit MiB mkpart "'$part_name'" 30 40

# parted -s /dev/sdb unit MiB print
Model: ATA QEMU HARDDISK (scsi)
Disk /dev/sdb: 2048MiB
Sector size (logical/physical): 512B/512B
Partition Table: gpt
Disk Flags: 
Number  Start    End      Size     File system  Name                Flags
 1      10.0MiB  20.0MiB  10.0MiB               my first partition
 2      30.0MiB  40.0MiB  10.0MiB               UTF-8 name binär

# locale
LANG=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=

# mkfs.ext4 /dev/sdb1

# mkfs.ext4 /dev/sdb2

# mkdir /sdb1

# mkdir /sdb2

# mount /dev/sdb1 /sdb1

# mount /dev/sdb2 /sdb2

# usr/sbin/rear -D mkbackup
...

# grep -v '^#' ./var/lib/rear/layout/disklayout.conf
disk /dev/sda 21474836480 msdos
part /dev/sda 1561329664 1048576 primary none /dev/sda1
part /dev/sda 19912458240 1562378240 primary boot /dev/sda2
disk /dev/sdb 2147483648 gpt
part /dev/sdb 10485760 10485760 my%20first%20partition none /dev/sdb1
part /dev/sdb 10485760 31457280 UTF-8%20name%20bin%C3%A4r none /dev/sdb2
fs /dev/sda2 / ext4 uuid=46d7e8be-7812-49d1-8d24-e25ed0589e94 label= blocksize=4096 reserved_blocks=5% max_mounts=-1 check_interval=0d bytes_per_inode=16377 default_mount_options=user_xattr,acl options=rw,relatime,data=ordered
fs /dev/sdb1 /sdb1 ext4 uuid=df776ff0-007b-437c-be99-97a2f6ab8463 label= blocksize=1024 reserved_blocks=5% max_mounts=-1 check_interval=0d bytes_per_inode=4096 default_mount_options=user_xattr,acl options=rw,relatime,data=ordered
fs /dev/sdb2 /sdb2 ext4 uuid=645e5656-acd1-4235-9f79-9e21612e77ed label= blocksize=1024 reserved_blocks=5% max_mounts=-1 check_interval=0d bytes_per_inode=4096 default_mount_options=user_xattr,acl options=rw,relatime,data=ordered
swap /dev/sda1 uuid=28e43119-dac1-4426-a71a-1d70b26d33d7 label=

The percent-encoded GPT partition name UTF-8%20name%20bin%C3%A4r
looks correct, nevertheless it fails during "rear recover" as follows:

RESCUE e205:~ # rear -D recover
...
Comparing disks
Ambiguous possible target disks need manual configuration (more than one with same size found)
Switching to manual disk layout configuration
Using /dev/sda (same name and same size) for recreating /dev/sda
Using /dev/sdb (same name and same size) for recreating /dev/sdb
Current disk mapping table (source -> target):
    /dev/sda /dev/sda
    /dev/sdb /dev/sdb
...
Start system layout restoration.
Creating partitions for disk /dev/sda (msdos)
Creating partitions for disk /dev/sdb (gpt)
UserInput -I LAYOUT_CODE_RUN needed in /usr/share/rear/layout/recreate/default/200_run_layout_code.sh line 80
The disk layout recreation script failed
1) Rerun disk recreation script (/var/lib/rear/layout/diskrestore.sh)
2) View 'rear recover' log file (/var/log/rear/rear-e205.log)
3) Edit disk recreation script (/var/lib/rear/layout/diskrestore.sh)
4) View original disk space usage (/var/lib/rear/layout/config/df.txt)
5) Use Relax-and-Recover shell and return back to here
6) Abort 'rear recover'
(default '1' timeout 300 seconds)
2
...
+++ parted -s /dev/sdb mkpart ''\''UTF-8 name bin<C3><A4>r'\''' 12587008B 23072767B
Error during translation: Invalid or incomplete multibyte or wide character
...
3) Edit disk recreation script (/var/lib/rear/layout/diskrestore.sh)
...
5) Use Relax-and-Recover shell and return back to here
...
rear> diff -U0 /var/lib/rear/layout/diskrestore.sh.orig /var/lib/rear/layout/diskrestore.sh     
--- /var/lib/rear/layout/diskrestore.sh.orig    2018-01-26 15:19:03.510868089 +0000
+++ /var/lib/rear/layout/diskrestore.sh 2018-01-26 15:19:43.270868089 +0000
@@ -114 +114 @@
-parted -s /dev/sdb mkpart "'UTF-8 name binär'" 12587008B 23072767B >&2
+parted -s /dev/sdb mkpart "'UTF-8 name binaer'" 12587008B 23072767B >&2
@@ -117 +117 @@
-parted -s /dev/sdb name 2 "'UTF-8 name binär'" >&2
+parted -s /dev/sdb name 2 "'UTF-8 name binaer'" >&2
rear> exit
...
1) Rerun disk recreation script (/var/lib/rear/layout/diskrestore.sh)
...
User reruns disk recreation script
Start system layout restoration.
Skipping /dev/sda (disk) as it has already been created.
Skipping /dev/sda1 (part) as it has already been created.
Skipping /dev/sda2 (part) as it has already been created.
Creating partitions for disk /dev/sdb (gpt)
Creating filesystem of type ext4 with mount point / on /dev/sda2.
Mounting filesystem /
Creating filesystem of type ext4 with mount point /sdb1 on /dev/sdb1.
Mounting filesystem /sdb1
Creating filesystem of type ext4 with mount point /sdb2 on /dev/sdb2.
Mounting filesystem /sdb2
Creating swap on /dev/sda1
Disk layout created.
Restoring from '/tmp/rear.qw7mAFK5MsOmLKr/outputfs/e205/backup.tar.gz'...
Backup restore program 'tar' started in subshell (PID=3640)
Restored 245 MiB [avg. 83823 KiB/sec] 
...
Finished recovering your system. You can explore it under '/mnt/local'.
...
RESCUE e205:~ # reboot

I.e. I replaced in diskrestore.sh the UTF-8 encoded 'binär' by the ASCII 'binaer'
and then "rear recover" works.
In the recreated system I have

# parted -s /dev/sdb unit MiB print
Model: ATA QEMU HARDDISK (scsi)
Disk /dev/sdb: 2048MiB
Sector size (logical/physical): 512B/512B
Partition Table: gpt
Disk Flags: 
Number  Start    End      Size     File system  Name                Flags
 1      2.00MiB  12.0MiB  10.0MiB  ext4         my first partition
 2      12.0MiB  22.0MiB  10.0MiB  ext4         UTF-8 name binaer

Same GPT partition sizes on sdb but the partitions
on sdb were moved as explained above in
https://github.com/rear/rear/pull/1706#issuecomment-360505133

OliverO2 commented at 2018-01-31 18:48:

You probably get this message

+++ parted -s /dev/sdb mkpart ''\''UTF-8 name bin<C3><A4>r'\''' 12587008B 23072767B
Error during translation: Invalid or incomplete multibyte or wide character

as a result of not setting the locale during recovery. Parted tries to convert your partition name into a wide-character string. If the locale (more specifically, the LC_CTYPE category) is set to C, your argument is an invalid string which cannot be converted.

Could you try again by setting LC_CTYPE on your recovery system to the same value that was used during rear mkrescue and make sure the respective locale files are present during recovery?

jsmeix commented at 2018-02-01 11:18:

@OliverO2
I also assume the reason is the inappropriate locale when running parted cf.
https://github.com/rear/rear/issues/1563#issuecomment-361557086

Note that current ReaR uses the POSIX/C locale in any case
(because that locale is set in any case in usr/sbin/rear)
i.e. both during "rear mkrescue" and during "rear recover".

This even might mean that - strictly speaking - there might be
a bug in parted if parted in POSIX/C locale would not output
non-ASCII characters in a GPT partition name in an "appropriate way"
so that parted's own GPT partition name output in POSIX/C locale
could be later used "as is" as input for parted in POSIX/C locale
to (re)-create a partition what that same GPT partition name.

This pull request is done because its intent was only
to again support partition names with blanks but
not to support partition names with non-ASCII characters.

I filed a new issue "Implement support for UTF-8 encoded values in ReaR"
https://github.com/rear/rear/issues/1721

OliverO2 commented at 2018-02-01 12:27:

@jsmeix

This even might mean that - strictly speaking - there might be
a bug in parted if parted in POSIX/C locale would not output
non-ASCII characters in a GPT partition name in an "appropriate way"

Don't know if conforming to the POSIX Portable Character Set standard should be called a bug, but at least one could wish for such a plain ASCII-encoded output.

Actually, there seems to be a real bug in parted with respect to character set translations: It uses the C language's wide character type to convert arguments into GPT partition names (which should be UTF-16 encoded). Quoting Wikipedia: "The width of wchar_t is compiler-specific and can be as small as 8 bits. Consequently, programs that need to be portable across any C or C++ compiler should not use wchar_t for storing Unicode text. The wchar_t type is intended for storing compiler-defined wide characters, which may be Unicode characters in some compilers."

jsmeix commented at 2018-02-01 13:24:

@OliverO2
I do not understand what you like to say regarding
the POSIX Portable Character Set because
I think that is even a sub-set of ASCII according to
https://en.wikipedia.org/wiki/Portable_character_set
and the "Reference" listed therein?

OliverO2 commented at 2018-02-01 13:54:

@jsmeix
It's just that POSIX allows, but does not require support for characters outside the portable character set in the "C" locale. Parted uses the POSIX-conformant standard library function mbsrtowcs(3) so its actual behavior depends of the respective standard library implementation. You could still call this limitation a but do you have to?

Cf. Behavior of extended bytes/characters in C/POSIX locale - Stack Overflow

jsmeix commented at 2018-02-09 14:52:

@OliverO2
many thanks for the link
https://stackoverflow.com/questions/15649165/behavior-of-extended-bytes-characters-in-c-posix-locale
that reads (excerpt)

C and POSIX both require only a very limited set
of characters be present in the C/POSIX locale,
but allow additional characters to exist.
This leaves a great deal of freedom to the
implementation; for instance, supporting
all of Unicode (as UTF-8) in the C locale
is conforming behavior.

Now I understand that it is POSIX conforming behavior
when a tool (e.g. parted) that is run in POSIX locale
outputs non-ASCII characters e.g. in UTF-8 encoding
(or in anything else - whatever each particular tool likes).

Welcome to the "warm feeling" in encoding hell ;-)


[Export of Github issue for rear/rear.]