#2943 PR merged: s390x (IBM Z) disk formatting fixes

Labels: enhancement, bug, cleanup, fixed / solved / done

pcahyna opened issue at 2023-02-21 20:17:

Pull Request Details:
  • Type: Bug Fix

  • Impact: High

  • Reference to related issue (URL):

  • How was this pull request tested?
    Full backup and recovery on RHEL 8 s390x VM with 4 DASDs: 2 in the root VG, one for storing the backup and one that was not included in the backup (ONLY_INCLUDE_VG=( rootvg ) was set). Verified that the extra VG and the disk with the backup survived recovery and that DASDs were properly mapped during recovery (in a situation where the correct disk mapping is not an identity map).

  • Brief description of the changes in this pull request:
    The s390x architecture needs some unique handling of disks (called DASDs - Direct Access Storage Devices) during layout restoration. The DASDs need to be enabled, otherwise they are not recognized as block devices, and then to get formatted (at low level), unless they are formatted already, before they can be partitioned. The current code to do this suffers from several problems.
    Formatting of the disks is controlled by the dasdfmt directive in the disklayout.conf file. This directive is emitted for all dasds, regardless of whether they should be included in the layout. While there is component exclusion code that comments out the "disk" lines for unused disks automatically, the dasdfmt lines for unused disks stay in the layout file, which later causes the unused disks to be reformatted unconditionally. Crucially, this includes the disk with the backup (one must use network backup storage because of this). Moreover, when the user is prompted to confirm the disk layout file and layout recreation script, it is too late, because the formatting code is unconditionally executed very early, before the rest of processing of the layout, and is not part of the disklayout.sh script.
    As the formatting code runs early, it is also not affected by disk mappings, so when the device names change, formatting will be applied to wrong disks.
    Another problem is that obtaining the disk entry from the lsdasd command output uses a simple grep command. Unfortunately, this breaks for more than 26 DASDs, because grep dasda will also match dasdaa, dasdab etc., so multiple lines will pass through the filter instead of just one and break the format of the disklayout.conf file.

This PR does the following changes:

  • Simplify syntax of dasd_channel directive. The previous syntax had some useless fields, like the major:minor number and status.
  • Fix obtaining the entry from lsdasd output
  • Eliminate the "dasdfmt" directive, merge its fields into the "disk" directive. This has the advantage that the exclusion code eliminates the "disk" lines for unused disks automatically and thus prevents them from being reformatted (in addition to preventing them from being repartitioned). Unfortunately it means that the format of the "disk" directive now depends on the disk label type: dasd disks have extra fields that other disk types don't have.
  • Generate the DASD formatting code as a separate script and let the user confirm it before it gets executed. Heavily inspired by the disk wiping code (it is similarly destructive), but executed during the "layout/prepare" stage and not during the "layout/recreate" stage. The reason is that unformatted DASDs do not have their size in bytes known (it depends on format), but other scripts in the "layout/prepare" stage need to know the disk sizes (e.g. for resizing partitions). The script is generated after the mapping code has mapped original to current disk devices, ensuring that we format the correct DASDs. Reformatting can be switched off entirely by setting the FORMAT_DASDS variable to a false value.
  • The heuristics in disk comparison and mapping script will now be less accurate, because it won't know the current sizes if disks are not formatted. Since the DASDs have permanent identifiers (virtual device number) that can be used to reliably identify disks, pass this information to the mapping code in a variable called DISK_MAPPING_HINTS and use them there in preference to the usual device name and size comparison.
  • Since DASD enabling and DASD formatting code are now separate and the former is non-destructive, run it also for "rear mountonly". This fixes "rear mountonly" on s390x.

pcahyna commented at 2023-02-21 23:14:

@mutable-dan I believe you are the original author of the S/390 code - can you please have a look?

jsmeix commented at 2023-02-22 09:25:

WOW!
Looks like a severe improvement.

@pcahyna
I have a question because you wrote

The heuristics in disk comparison and mapping script
will now be less accurate

Will it be less accurate only in case of DASD disks
or will it be less accurate in general?

My fear is that a possibly disastrous automated mapping
might happen when it is less accurate in general.

jsmeix commented at 2023-02-22 09:35:

Currently (i.e. since ReaR 2.6) there is only

Initial preliminary first basic support for IBM Z
...
so that interested users can try out early
how far things work

see
https://github.com/rear/rear/blob/rear-2.7/doc/rear-release-notes.txt#L1520

@pcahyna
do I understand it correctly that this pull request
enhances ReaR towards official basic support for IBM Z
that is limited to DASD disks and perhaps some other
limitations?

pcahyna commented at 2023-02-22 10:10:

WOW! Looks like a severe improvement.

@pcahyna I have a question because you wrote

The heuristics in disk comparison and mapping script
will now be less accurate

Will it be less accurate only in case of DASD disks or will it be less accurate in general?

My fear is that a possibly disastrous automated mapping might happen when it is less accurate in general.

Only in case of DASDs, because the loss of accuracy results from the loss of size information on unformatted DASDs at this point, due to low-level formatting being performed after mapping. This is not an issue for disks that do not need low-level formatting.

pcahyna commented at 2023-02-22 10:18:

Currently (i.e. since ReaR 2.6) there is only

Initial preliminary first basic support for IBM Z
...
so that interested users can try out early
how far things work

see https://github.com/rear/rear/blob/rear-2.7/doc/rear-release-notes.txt#L1520

@pcahyna do I understand it correctly that this pull request enhances ReaR towards official basic support for IBM Z that is limited to DASD disks and perhaps some other limitations?

It is limited to virtual machines using z/VM hypervisors, I have no idea whether the code would work in a LPAR or under KVM.

Concerning disks, the other choice of disks on IBM Z are SCSI disks accessed using Fibre Channel (FCP). My limited understanding is that the SCSI disks do not need any special handling compared to SCSI disks on other architectures, so they might "just work", but I have not tested them.

pcahyna commented at 2023-02-22 10:58:

And yes, with these changes I would not consider ReaR support "preliminary" anymore (on the supported configurations).

schlomo commented at 2023-02-28 21:46:

We have too many open PRs that could be merged IMHO, so I'll just go ahead and merge this now.

@pcahyna maybe you can follow this up with some documentation changes to update the support status of s390x?

pcahyna commented at 2023-03-14 13:37:

@schlomo thanks, sorry for my belated reply. As @jsmeix said - the scripts to confirm and rerun the generated DASD format scrips are adapted from existing scripts, especially in the case of 400_run_dasd_format_code.sh. I have not tested it in entirety - I believe have I tested editing and confirming the script, but not rerunning it in case it has failed. I trust this part because it was adapted from prior code with only minimal changes. That's why I have not followed your style suggestions: they would increase the difference from the original script and thus increase the risk of introducing a bug.

Ideally all the boilerplate (the state machine) should be shared and only the messages, file names etc. would change, but I don't see how to do that easily, especially in shell.

schlomo commented at 2023-03-14 13:53:

Yes, I totally understand that. Let's hope that there will be more users testing this and provide us feedback


[Export of Github issue for rear/rear.]