#1998 Issue closed: SLE15 on ppcl: mkfs.xfs fails if sunit > 0 but swidth = 0

Labels: support / question, fixed / solved / done, not ReaR / invalid

abbbi opened issue at 2018-12-06 13:34:

Relax-and-Recover (ReaR) Issue Template

Fill in the following items before submitting a new issue
(quick response is not guaranteed with free support):

  • ReaR version ("/usr/sbin/rear -V"):

2.4

  • OS version ("cat /etc/rear/os.conf" or "lsb_release -a" or "cat /etc/os-release"):

SLES15 on PPC

Recover works until system tries to create filesystem via MKFS, errors out with:

++++ date '+%Y-%m-%d %H:%M:%S.%N '
+++ local 'timestamp=2018-12-06 13:28:11.711452462 '
+++ test 1 -gt 0
+++ echo '2018-12-06 13:28:11.711452462 Creating filesystem of type xfs with mount point / on /dev/mapper/system-root.'
2018-12-06 13:28:11.711452462 Creating filesystem of type xfs with mount point / on /dev/mapper/system-root.
+++ Print 'Creating filesystem of type xfs with mount point / on /dev/mapper/system-root.'
+++ test 1
+++ echo -e 'Creating filesystem of type xfs with mount point / on /dev/mapper/system-root.'
+++ wipefs --all --force /dev/mapper/system-root
+++ mkfs.xfs -f -m uuid=dd0e1c7d-e0ff-46f3-bd58-76abfc46b284 -i size=512 -d agcount=16 -s size=512 -i attr=2 -i projid32bit=1 -m crc=1 -m finobt=1 -b size=4096 -i maxpct=25 -d sunit=64 -d swidth=0 -l version=2 -l lazy-count=1 -n size=4096 -n version=2 -r extsize=4096 /dev/mapper/system-root
data stripe width (0) must be a multiple of the data stripe unit (64)
Usage: mkfs.xfs
/* blocksize */         [-b size=num]
/* metadata */          [-m crc=0|1,finobt=0|1,uuid=xxx,rmapbt=0|1,reflink=0|1]
/* data subvol */       [-d agcount=n,agsize=n,file,name=xxx,size=num,
                            (sunit=value,swidth=value|su=num,sw=num|noalign),
                            sectsize=num
/* force overwrite */   [-f]
/* inode size */        [-i log=n|perblock=n|size=num,maxpct=n,attr=0|1|2,
                            projid32bit=0|1,sparse=0|1]
/* no discard */        [-K]
/* log subvol */        [-l agnum=n,internal,size=num,logdev=xxx,version=n
                            sunit=value|su=num,sectsize=num,lazy-count=0|1]
/* label */             [-L label (maximum 12 characters)]
/* naming */            [-n size=num,version=2|ci,ftype=0|1]
/* no-op info only */   [-N]
/* prototype file */    [-p fname]
/* quiet */             [-q]
/* realtime subvol */   [-r extsize=num,size=num,rtdev=xxx]
/* sectorsize */        [-s size=num]
/* version */           [-V]
                        devicename
<devicename> is required unless -d name=xxx is given.
<num> is xxx (bytes), xxxs (sectors), xxxb (fs blocks), xxxk (xxx KiB),

gozora commented at 2018-12-06 13:40:

Hello @abbbi,

Do you know if this system was upgraded from older (SLES12, SLES11) or clean installed ?
Can you provide content of /var/lib/rear/layout/xfs ?

Thanks

V.

abbbi commented at 2018-12-06 13:46:

problem is that swidth is set to 0:

-d sunit=64 -d swidth=0

data stripe width (0) must be a multiple of the data stripe unit (64)

working around by setting -d swidth=64 in diskrestore.sh

System was freshly installed from scratch.

gozora commented at 2018-12-06 13:49:

what about /var/lib/rear/layout/xfs, can you provide it ?

V.

abbbi commented at 2018-12-06 13:54:

rear> more system-root.xfs
meta-data=/dev/mapper/system-root isize=512    agcount=16, agsize=589824 blks
         =                       sectsz=512   attr=2, projid32bit=1
         =                       crc=1        finobt=1 spinodes=0 rmapbt=0
         =                       reflink=0
data     =                       bsize=4096   blocks=9437184, imaxpct=25
         =                       sunit=8      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0 ftype=1
log      =internal               bsize=4096   blocks=4608, version=2
         =                       sectsz=512   sunit=0 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0

gozora commented at 2018-12-06 13:59:

Thanks,
I'll try to reproduce this problem, maybe I've forgotten about yet another XFS special case in filesystems-functions.sh

V.

abbbi commented at 2018-12-06 14:05:

Another interesting thing is that it also trys to mount with mount option -o sunit=64, so one has to remove this from the mount command line in diskrestore.sh aswell. I endet up removing the -d sunit=64 -d swidth=0 options from the mkfs.xfs command and the mount option and was able to recover the system then.

gozora commented at 2018-12-06 14:15:

Honestly I don't remember how all this XFS magic work, I'll need to take a closer look, later today ...

V.

gozora commented at 2018-12-06 14:16:

Anyhow, if you could upload log created by rear -d -D recover it would help ...

V.

abbbi commented at 2018-12-06 14:28:

Unfortunately it was a customer system which im not remote anymore. I tried to reproduce on a SLES15 system in house on PPC, and it seems the sunit=8 setting on the other system is already an difference. My XFS filesystem (created by the standard sles15 installation) looks like:

meta-data=/dev/sda3              isize=512    agcount=4, agsize=404544 blks
         =                       sectsz=512   attr=2, projid32bit=1
         =                       crc=1        finobt=1 spinodes=0 rmapbt=0
         =                       reflink=0
data     =                       bsize=4096   blocks=1618176, imaxpct=25
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0 ftype=1
log      =internal               bsize=4096   blocks=2560, version=2
         =                       sectsz=512   sunit=0 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0

Probably thats some SAP related filesystem options.

abbbi commented at 2018-12-06 14:32:

These options are probably auto detected by mkfs during filesystem creation. As the system uses Multipath SAN disks only, i think the xfs tools might suspect other options than on a regular filesystem:

http://xfs.org/index.php/XFS_FAQ#Q:_How_to_calculate_the_correct_sunit.2Cswidth_values_for_optimal_performance

honestly its the first time i see these options used too :)

gozora commented at 2018-12-06 14:35:

Never mind.
It looks like XFS introduced some new features,.. Happens quite a lot.

Can you tell me what version of XFS is used on SLE15 ? ( I'm unfortunately still missing such VM in my collection.)

V.

abbbi commented at 2018-12-06 14:36:

its

mkfs.xfs version 4.15.0

but as i can see the filesystem sh already has some logic caring about stuff like this:

https://github.com/rear/rear/blob/master/usr/share/rear/lib/filesystems-functions.sh#L205

gozora commented at 2018-12-06 14:38:

but as i can see the filesystem sh already has some logic caring about stuff like this:

Yes, that is my humble work ;-)

V.

jsmeix commented at 2018-12-06 14:51:

FYI:
On a simple SLES15 system (I use a KVM/QEMU virtual machine)
with a XFS root partition (no LVM) as the SLES15 YaST installer creates it
when I only switch in YaST from btrfs to XFS as filesystem
"rear recover" just works for me.

My disklayout.conf:

# Disk /dev/sda
# Format: disk <devname> <size(bytes)> <partition label type>
disk /dev/sda 21474836480 gpt
# Partitions on /dev/sda
# Format: part <device> <partition size(bytes)> <partition start(bytes)> <partition type|name> <flags> /dev/<partition>
part /dev/sda 8388608 1048576 rear-noname bios_grub /dev/sda1
part /dev/sda 19316867072 9437184 rear-noname legacy_boot /dev/sda2
part /dev/sda 2148515328 19326304256 rear-noname swap /dev/sda3
# Filesystems (only ext2,ext3,ext4,vfat,xfs,reiserfs,btrfs are supported).
# Format: fs <device> <mountpoint> <fstype> [uuid=<uuid>] [label=<label>] [<attributes>]
fs /dev/sda2 / xfs uuid=877d5ff9-eee3-4fe4-a1c5-90ca58f02b5c label=  options=rw,relatime,attr2,inode64,noquota
# Swap partitions or swap files
# Format: swap <filename> uuid=<uuid> label=<label>
swap /dev/sda3 uuid=729a6343-f595-498b-ade6-27e6d16297a4 label=

My sda2.xfs

meta-data=/dev/sda2              isize=512    agcount=4, agsize=1179008 blks
         =                       sectsz=512   attr=2, projid32bit=1
         =                       crc=1        finobt=1 spinodes=0 rmapbt=0
         =                       reflink=0
data     =                       bsize=4096   blocks=4716032, imaxpct=25
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0 ftype=1
log      =internal               bsize=4096   blocks=2560, version=2
         =                       sectsz=512   sunit=0 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0

jsmeix commented at 2018-12-06 15:22:

@abbbi
is that system really a plain usual "SLES15 on PPC" i.e.
"SUSE Linux Enterprise Server 15" on PPC or is it perhaps
"SUSE Linux Enterprise Server for SAP Applications 15 GA",
e.g. see
https://www.suse.com/documentation/sles-for-sap-15/

I ask because what the YaST installer creates can be different
depending on the exact product so that e.g. the YaST installer
on "SUSE Linux Enterprise Server for SAP Applications 15 GA"
might create a XFS with some subtle special differences
compared to plain usual "SUSE Linux Enterprise Server 15".

Or was that particular XFS perhaps specifically set up
by your customer with those special XFS option settings?

abbbi commented at 2018-12-06 15:28:

Cant say for sure at the moment. It was running SAP HANA so i think chances are high its the SAP4HANA Release. All i have left at the moment ist the os.conf from REAR from the rear backup log:

> 2018-12-06 11:05:35.336033347 Entering debugscripts mode via 'set -x'.
> + source /etc/rear/os.conf
> ++ OS_VENDOR=SUSE_LINUX
> ++ OS_VERSION=15
> ++ ARCH=Linux-ppc64le
> ++ OS=GNU/Linux
> ++ OS_VERSION=15
> ++ OS_VENDOR=SUSE_LINUX
> ++ OS_VENDOR_VERSION=SUSE_LINUX/15
> ++ OS_VENDOR_ARCH=SUSE_LINUX/ppc64le
>

gozora commented at 2018-12-06 17:08:

ReaR XFS code must work regardless of Linux distribution.
I guess that some new features/rules emerged that must be included into ReaR, XFS guys are quite progressive when it comes to implementing new stuff.
I'm currently installing SLE15 and will suspend my work on EFISTUB for a while and try to look on this issue in upcoming days, hope it will not be too much work :-).

V.

abbbi commented at 2018-12-06 19:29:

Im not sure you will hit this problem with a regular SLE15 installation. I have various SLES15 systems around that do not have an XFS formatted with -d sunit=8 -d swidth=0.

I even cannot format an volume using these options to be able to get an xfs_info output that causes the problem:

sles15ppcl:~ # mkfs.xfs -d sunit=8 -d swidth=0 ./myfile
data stripe width (0) must be a multiple of the data stripe unit (64)
[..]

im not quite sure how it was even possible to format it in that way.

One thing i could imagine is that the volume was created too small and then has been resized with xfs_growfs so that the width information output by xfs_info doesnt match anymore. Or that the multipath configuration does something strange resulting in an xfs_info output not making sense.

gozora commented at 2018-12-06 19:42:

im not quite sure how it was even possible to format it in that way.

:-) I'm exactly at the same spot right now... Can't create filesystem where sunit > 0 and swidth == 0

One think i could imagine is that the volume was created too small and then has been resized with xfs_growfs so that the width information output by xfs_info doesnt match anymore.

I'm afraid that this is not possible either, because if you specify sunit > 0 you must specify swidth as well and swidth must be multiple of sunit (and can't be 0)
This is why I was initially asking whether system was upgraded :-), because there is a chance that older version of xfs-util was used for formatting (which in theory could allow such combination)

V.

abbbi commented at 2018-12-06 21:19:

Not sure about the upgrade, looking at the xfsprogs source this check has been around since 2002.
But the git logs also show some things about misaligment from certain hardware devices that result in width values not beeing right. At least its smelling like this (Power, multipath, SAN ..)

One recent commit from xfsprogs:

commit 99a3f52e337d9c785315e64a599755bc6f7b2118
Author: Jeff Mahoney <jeffm@suse.com>
Date:   Wed Aug 1 17:06:45 2018 -0500

    mkfs: avoid divide-by-zero when hardware reports optimal i/o size as 0

    Commit 051b4e37f5e (mkfs: factor AG alignment) factored out the
    AG alignment code into a separate function.  It got rid of
    redundant checks for dswidth != 0 since calc_stripe_factors was
    supposed to guarantee that if dsunit is non-zero dswidth will be
    as well.  Unfortunately, there's hardware out there that reports its
    optimal i/o size as larger than the maximum i/o size, which the kernel
    treats as broken and zeros out the optimal i/o size.

not sure we hit an issue in xfs here.
It seems the version shipped in sles15 still have this issue, as it was introduced in 2017 and fixed in 2018. Not sure if Suse have backported this fix. Its too late to stare at more git logs now :)

I have asked at #xfs in freenode, lets see if i get response.

abbbi commented at 2018-12-06 21:58:

response from #xfs

22:53 <dchinner_> the problem described in that commit was that mkfs crashed because devide by zero, not that it made a filesystem with swidth = 0
22:54 <dchinner_> before that, mkfs would ignore invalid stripe configs where dswidth was zero
22:55 <abi`> hm, okey. I still cant imagine how one is able to format with sunit=8,swidth=0.. so i guess it was some auto-detection thing that went wrong.
22:55 <dchinner_> I doubt we'll ever know what caused swidth=0 to be considered valid
22:56 <dchinner_> it's a CRC enabled fs, so it had to be an XFS tool that set it that way
22:56 <dchinner_> I can create the situation manually by editting the superblock with xfs_db
22:57 <dchinner_> but I cannot make any of the tools or mount options cause it
22:59 <dchinner_> and, well, I can't mount a filesystem with that config, either
22:59 <dchinner_> the superblock verifier fires and detects an invalid stripe unit config
23:00 <dchinner_> one the first read during mount
23:00 <dchinner_> what kernel does SLES15 run?
23:01 <abi`> 4.12.14-23-default
23:02 <dchinner_> yeah, Ok, I added the superblock stripe config verification in 4.18
23:17 <sandeen> dchinner_, abi` there was also 14c57d50 mkfs.xfs: if either sunit or swidth is nonzero, the other must be as well

abbbi commented at 2018-12-06 22:17:

here is how to reproduce this situation. The filesystem is only mountable by older kernels such as on sles15 which do not check swidth/sunit parameters during mount. This should help to make REAR at least handle such situations gracefully :)

dd if=/dev/zero of=XFS bs=1M count=50
mkfs.xfs XFS
sles15ppclefix:~ # xfs_db -x XFS 
xfs_db> sb
xfs_db> write unit 8
unit = 8
xfs_db> quit
sles15ppclefix:~ # mount -o loop XFS /mnt
sles15ppclefix:~ # xfs_info /mnt/
meta-data=/dev/loop0             isize=512    agcount=2, agsize=6400 blks
         =                       sectsz=512   attr=2, projid32bit=1
         =                       crc=1        finobt=1 spinodes=0 rmapbt=0
         =                       reflink=0
data     =                       bsize=4096   blocks=12800, imaxpct=25
         =                       sunit=8      swidth=0 blks

gozora commented at 2018-12-07 07:43:

Hello @abbbi,

From what I've understood from your communication with David, sunit=8 swidth=0 can be considered incorrect and we can only guess how/why it happened.

First I was thinking that ReaR XFS code incorrectly parsed stored output from xfs_info but I was wrong. ReaR handled situation correctly and tried to recreate filesystem as it originally was. This behavior attracted your attention and you've find out that something is not right so you can start further investigation on "why XFS changed its behavior yet another time :-)). At the same time ReaR allowed reasonable fallback (by editing diskrestore.sh) ale let you decide what are the correct values for your XFS filesystem.
In general this is how I intended this code should behave.
Graceful handling of XFS filesystem creation code was present in ReaR already, where we just blindly run mkfs.xfs <device> without explicitly handling filesystem option. This lead us to quite complex issues. The one mentioned include couple other issues, describing various ways of what can go wrong if you are using graceful handling of XFS filesystem creation.

I've assigned labels Discuss / RFC because this topic can be widely and endlessly discussed :-) and other people might have other opinions ...
And I've assigned nor ReaR / Invalid because of reasons mentioned previously in this comment, but it can be changed during discussion ;-).

In either way, if you have ideas how current code can be improved don't hesitate to share them.

V.

schlomo commented at 2018-12-07 07:50:

I still don't understand if this is a transitioning problem or a permanent problem. Or is the problem that after the xfsprogs update the output of xfs_info contains data (size 0) that is not accepted by the mkfs.xfs program?

gozora commented at 2018-12-07 08:04:

Hello @schlomo,
We don't know it either, in ideal world sunit=8 swidth=0 blks should not be possible ...

V.

gozora commented at 2018-12-07 08:07:

In theory mkfs.xfs called during installation could allow such option combination, and it could be upgraded to newer version which does not allow it any more. But this is pure speculation ...

V.

jsmeix commented at 2018-12-07 10:22:

@gozora
I do fully agree with your reasoning in
https://github.com/rear/rear/issues/1998#issuecomment-445149675

By the way (cf. "by the way" below):

We already had some other issues where "rear recover" revealed
that something was severely broken on the original system.

For an example see my
https://github.com/rear/rear/issues/1907#issuecomment-434218293
which actually belongs to https://github.com/rear/rear/issues/1927
that reads (excerpt)

By setting up a disaster recovery procedure with ReaR
and by verifying that 'rear recover' actually works to
recreate the original system on replacement hardware
one gets "by the way" some kind of confirmation that
the basic setup of the original system is o.k. because
when the basic setup of the original system is broken
it is likely that then it is not possible to recreate such
a broken original system on replacement hardware.

Simply put:
Run 'rear mkbackup' on your original system
and 'rear recover' on your replacement hardware
to verify that your original system is still o.k.

In general when someone likes to use ReaR for disaster recovery
he must set up his original system so that ReaR can recreate it
but not the other way round because that might only work by luck,
cf. sections like "Disaster recovery does not just work" and
"Let's face it: Deployment via the recovery installer is a must" and
"The limitation is what the special ReaR recovery system can do" in
https://en.opensuse.org/SDB:Disaster_Recovery

That SDB article is referenced in our SLE-HA manuals, e.g. cf.
https://www.suse.com/documentation/sle-ha-15/book_sleha_guide/data/sec_ha_rear_more.html
and our SLE-HA manuals also tell very clearly that the user
must verify in advance that his disaster recovery setup does
actually work in his particular environment, cf.
https://www.suse.com/documentation/sle-ha-15/book_sleha_guide/data/sec_ha_rear_testing.html

Users who do not do that are not supported,
at least not officially supported by SUSE, cf. the sections
"Help and Support" "Feasible in advance" "Hopeless in retrospect" in
https://en.opensuse.org/SDB:Disaster_Recovery

But of course we try to help them here at
ReaR upstream on a voluntary base - and even more:

ReaR is by design prepared to let an experienced admin even deal with such
unexpected issues during "rear recover" to be able to successfully recreate
his system with some manual intervention even in problematic cases,
cf. "Disaster recovery with Relax-and-Recover (ReaR)" in
https://en.opensuse.org/SDB:Disaster_Recovery
that reads:

Experienced users and system admins can
adapt or extend the ReaR scripts to make it work
for their particular cases and - if the worst comes
to the worst - even temporary quick and dirty
workarounds are relatively easily possible. 

Bottom line:
I think everything worked perfectly well as it should here and
this issue could be rightfully closed as "perfectly well done".

gdha commented at 2018-12-07 10:31:

@abbbi Is this not a perfect case for a short FAQ? We would like to share it on our ReaR site - see https://github.com/rear/rear.github.com/blob/master/documentation/faq.md (perhaps prepare a PR to make our life easier ;-)

jsmeix commented at 2018-12-07 10:37:

( finally I won my fight against markdown ;-)

abbbi commented at 2018-12-07 12:29:

@gozora for me its some weird case and as we have seen not a problem in REAR. But if you look closely, there seems to be something messed up in the REAR part too. The xfs_info output prints

sunit=8

but rear tries to format with:

-d sunit=64

not sure if that is an error crafted by the fact that swidth=0, but worth a look anyway, not that we
overlook some other issue here :)

gozora commented at 2018-12-07 12:37:

@abbbi it is a feature ;-)
if you specify mkfs.xfs -d sunit=64 you will get sunit 8 in xfs_info output.
It was great pleasure to write this code I can tell you ;-).

V.

jsmeix commented at 2018-12-14 13:29:

@gozora
regarding "ideas how current code can be improved" in your
https://github.com/rear/rear/issues/1998#issuecomment-445149675

I have one and I like to do a pull request as soon as time permits
so that you could have a look.

Hint:
It will be about "final power to the user" or in other words:
Make @abbbi 's life a bit easier...

jsmeix commented at 2018-12-14 14:55:

@gozora
please have a look at
https://github.com/rear/rear/pull/2005

jsmeix commented at 2018-12-18 13:55:

With https://github.com/rear/rear/pull/2005 merged
it should be a bit easier to get things back to work
when mkfs.xfs failed within the recovery system, cf.
https://github.com/rear/rear/pull/2005#issuecomment-447799986

abbbi commented at 2021-06-09 11:39:

Additional note, because as today this problem striked again at another customer, again on PowerPC with SLES15.

The introduced MKFS_XFS_OPTIONS property helps during creation of the filesystems, but one has to make sure that the mount options which are passed to the volume are the right ones too. It might happen that the mount options include the sunit=64 option too, resulting in mount issues:

mount -o rw,noatime,nodiratime,attr2,nobarrier,inode64,sunit=64,noquota /dev/mapper/vol /mnt/local/vol
mount: /dev/mapper/vol: wrong fs type, bad option, bad superblock on [...]

So the disklayout.conf has to be edited before recovery too, to get things working again.


[Export of Github issue for rear/rear.]