#2903 PR merged: Protect against colons in pvdisplay output

Labels: enhancement, fixed / solved / done

pcahyna opened issue at 2023-01-02 15:12:

Pull Request Details:
  • Type: Bug Fix

  • Impact: Normal

  • Reference to related issue (URL): fixes #1958

  • How was this pull request tested?

    • full back-up and recovery on RHEL 8.7 / x86_64 (BIOS) with SATA disk, LVM in default configuration
    • full back-up and recovery on RHEL 8.7 / x86_64 (BIOS) with SATA disk, LVM configured to show /dev/disk/by-path device symlinks instead of the usual /dev/sd* device nodes by using this line in /etc/lvm/lvm.conf:
      filter = [ "r|/dev/disk/by-path/.*-usb-|", "a|/dev/disk/by-path/pci-.*-nvme-|", "a|/dev/disk/by-path/pci-.*-scsi-|", "a|/dev/disk/by-path/pci-.*-ata-|", "a|/dev/disk/by-path/pci-.*-sas-|", "a|loop|", "r|.*|" ].
      This case breaks with the original code already during rear savelayout and is fixed by this change.
  • Brief description of the changes in this pull request:

LVM can be configured to show device names under /dev/disk/by-path in command output. These names often contain colons that separate fields like channel and target (for example /dev/disk/by-path/pci-*-scsi-0:0:1:0-*, similarly the pci-* part, which contains colon-separated PCI bus and device numbers). Since the "pvdisplay -c" output also uses colons as field separators and does not escape embedded colons in any way, embedded colons break parsing of this output.

As a fix, use the pipe character '|' as the field separator in pvdisplay output. (This would break if a PV device has a '|' in its name, but this is very much less likely than having a ':' .)

I did a survey of how similar problems are handled in other tools. Some Solaris tools (ipadm, dladm, zoneadm) have a special switch (-p) to request machine-parseable format. This format has colon-separated fields, and embedded colons are escaped by backslash to allow parsing by the shell "read" command (this escaping is the key functionality missing in the pvdisplay command). zfs has a -H option which uses an alternative parseable format: in addition to omitting headers it separates the fields by a single TAB character. This would be in principle ideal for our purpose (TABs are unlikely to occur in the values), but in our case the output is processed via an unquoted "echo" command, which would not preserve the TAB character. (This is needed to remove the two leading spaces in the output, not sure what their purpose is.) I believe that using | as the separator is the best choice in our situation.

Also, configure explicitly what fields to output - "pvdisplay -c" prints many fields, but I have not found documentation about what fields is it using exactly, so one had to guess what the output means. Using "pvdisplay -C" and selecting the fields explicitly is much clearer.

This also changes the PV size field to match documentation, the comment says that size is in bytes, but it actually was not in bytes. As nothing is actually using the PV size field, this inconsistency has not caused any problem in practice, and no code needs adjusting for the change.

Fix provided by a customer in https://bugzilla.redhat.com/show_bug.cgi?id=2091163 .

jsmeix commented at 2023-01-02 16:09:

@pcahyna
while you are at that code
and when you like to do it:

I think when we have '|' as field separator
the whole parsing could be simplified and
made more straightforward as follows

# { echo '  /dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:1:0-part1|system|107340627968|7wwpcO-KmNN-qsTE-7sp7-JBJS-vBdC-Zyt1W7' ; \
    echo '  /dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:1:0-part2 | system2 |  207340627962|2wwpcO-KmNN-qsTE-7sp7-JBJS-vBdC-Zyt1W2  ' ; } \
  | tr -d ' ' \
  | while IFS='|' read pdev vgrp size uuid ; \
    do echo -e "\npdev='$pdev'\nvgrp='$vgrp'\nsize='$size'\nuuid='$uuid'\n" ; \
    done

pdev='/dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:1:0-part1'
vgrp='system'
size='107340627968'
uuid='7wwpcO-KmNN-qsTE-7sp7-JBJS-vBdC-Zyt1W7'


pdev='/dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:1:0-part2'
vgrp='system2'
size='207340627962'
uuid='2wwpcO-KmNN-qsTE-7sp7-JBJS-vBdC-Zyt1W2'

I remove all space characters via tr -d ' '
because space is the field separator in disklayout.conf
according to "Disk layout file syntax" in
https://github.com/rear/rear/blob/master/doc/user-guide/06-layout-configuration.adoc
so spaces in values are forbidden anyway.

The

... | while IFS='|' read ... ; do ...

sets the non-standard IFS value only for the 'read' command
but it keeps the standard IFS value (space, tab, newline)
for the commands in the 'while' body:

# echo 'foo|bar' | while IFS='|' read a b ; do echo "a=$a" ; echo -n "IFS=$IFS" | od -a ; echo "b=$b" ; done

a=foo
0000000   I   F   S   =  sp  ht  nl
0000007
b=bar

pcahyna commented at 2023-01-04 15:32:

Hello @jsmeix , thank you for the suggestion, but I must say I am reluctant to make such changes to the layout code, if there is no bug to fix (except the bug fixed by the PR in the current form) .

jsmeix commented at 2023-01-05 09:42:

@pcahyna
it is perfectly fine with me to only fix the actual issue here.

I had meant it only as a suggestion for further improvement.
Perhaps I may do that when I have to work on that code again.


[Export of Github issue for rear/rear.]