#2927 Issue closed: SUSE 15.4 layout recreation fails due to missing chattr binary

Labels: bug, cleanup, fixed / solved / done

schlomo opened issue at 2023-02-14 07:04:

ReaR 2.7 on Leap 15.4 x86_64

image

I first thoght that this is another example for #2922, and that at least on SUSE we should add chattr to REQUIRED_PROGS IMHO, in https://github.com/rear/rear/blob/master/usr/share/rear/conf/SUSE_LINUX.conf to be specific.

OTOH, this might be a different problem. I checked for chattr in the layout save code:

$ grep -r chattr usr/share/rear/layout/
usr/share/rear/layout/prepare/GNU/Linux/136_include_btrfs_subvolumes_SLES_code.sh:            echo "chattr +C $target_system_mountpoint/$subvolume_path"
usr/share/rear/layout/prepare/GNU/Linux/135_include_btrfs_subvolumes_generic_code.sh:            echo "chattr +C '$target_subvolume_path'"
usr/share/rear/layout/save/default/550_barrel_devicegraph.sh:#  lsattr chattr
usr/share/rear/layout/save/default/550_barrel_devicegraph.sh:         lsattr chattr
usr/share/rear/layout/save/GNU/Linux/230_filesystem_layout.sh:                # The 'no copy on write' attribute is shown as 'C' in the lsattr output (see "man chattr"):
usr/share/rear/layout/save/GNU/Linux/230_filesystem_layout.sh:                        PROGS+=( chattr lsattr )

And it seems like in 136_include_btrfs_subvolumes_SLES_code.sh there is code to add chattr calls but in 230_filesystem_layout.sh there is code to actually add chattr to PROGS. I guess that those two things should be better coupled, so that it cannot happen that chattr is used but not added to the rescue system. Or add it by default.

@jsmeix what would be the right solution here?

jsmeix commented at 2023-02-14 11:14:

There is no such thing as a "right solution"
when dealing with the SUSE default btrfs structure ;-)

I will have a look...

On first glance I think in
layout/save/GNU/Linux/230_filesystem_layout.sh

PROGS+=( chattr lsattr )

should be

REQUIRED_PROGS+=( chattr )
PROGS+=( lsattr )

Can it be the case that you use SLE with its
default btrfs structure but your etc/rear/local.conf
is not based on a matching SLE template like
usr/share/rear/conf/examples/SLE12-SP2-btrfs-example.conf
https://github.com/rear/rear/blob/rear-2.7/usr/share/rear/conf/examples/SLE12-SP2-btrfs-example.conf#L48
If yes, you will run into various problems.
See also the SUSE documentation at
https://documentation.suse.com/sle-ha/15-SP4/html/SLE-HA-all/cha-ha-rear.html
therein in the section
"Limitations with Btrfs"

Your SLE system needs matching ReaR configuration

For example, the setups in SLE12 GA, SLE12 SP1,
and SLE12 SP2 have several incompatible Btrfs default structures.
As such, it is crucial to use a matching ReaR configuration file.
See the example files
/usr/share/rear/conf/examples/SLE12*-btrfs-example.conf.

FYI:
Since several years recreating the SUSE default btrfs structure
had worked without issues and I also got no customer reports.
But I know that code is more complex than I can fully understand
so there are certainly gaps in the code in this or that cases.

jsmeix commented at 2023-02-14 11:25:

I assume it is fixed via
https://github.com/rear/rear/commit/64e9228e9178e717534526ba8ccd9a51af0cb4a8

I don't want to overhaul the whole code that
deals with the SUSE specific btrfs structure.
More precisely:
I cannot overhaul it with reasonable effort.
I fear if I change it at one place it falls apart at another place.

schlomo commented at 2023-02-14 12:16:

I don't think that this solves the problem because chattr was available on the source system but it was not included in the rescue image. Therefore this code path was not used at all during mkrescue.

Also, let's keep it open till I can validate.

jsmeix commented at 2023-02-14 13:42:

Show me how it failes in your specific case.
I cannot reproduce it.
For me it works.

schlomo commented at 2023-02-14 14:55:

The fix didn't solve the problem, chattr is still missing.

rear-rear-sle15sp4.log

Running exit tasks
To remove the build area you may use (with caution): rm -Rf --one-file-system /var/tmp/rear.JeuiqOKrMeXMTPD
rear-sle15sp4:/src/rear # type -p chattr 
/usr/bin/chattr
rear-sle15sp4:/src/rear # grep chattr /var/log/rear/rear-rear-sle15sp4.log
rear-sle15sp4:/src/rear # find /var/tmp/rear.JeuiqOKrMeXMTPD -name chattr
rear-sle15sp4:/src/rear # cat /etc/fstab 
UUID=5f0f1233-512d-4f15-8686-606257e81b64  /                       btrfs  defaults                      0  0
UUID=5f0f1233-512d-4f15-8686-606257e81b64  /var                    btrfs  subvol=/@/var                 0  0
UUID=5f0f1233-512d-4f15-8686-606257e81b64  /usr/local              btrfs  subvol=/@/usr/local           0  0
UUID=5f0f1233-512d-4f15-8686-606257e81b64  /tmp                    btrfs  subvol=/@/tmp                 0  0
UUID=5f0f1233-512d-4f15-8686-606257e81b64  /srv                    btrfs  subvol=/@/srv                 0  0
UUID=5f0f1233-512d-4f15-8686-606257e81b64  /root                   btrfs  subvol=/@/root                0  0
UUID=5f0f1233-512d-4f15-8686-606257e81b64  /opt                    btrfs  subvol=/@/opt                 0  0
UUID=5f0f1233-512d-4f15-8686-606257e81b64  /home                   btrfs  subvol=/@/home                0  0
UUID=5f0f1233-512d-4f15-8686-606257e81b64  /boot/grub2/x86_64-efi  btrfs  subvol=/@/boot/grub2/x86_64-efi  0  0
UUID=5f0f1233-512d-4f15-8686-606257e81b64  /boot/grub2/i386-pc     btrfs  subvol=/@/boot/grub2/i386-pc  0  0
rear-sle15sp4:/src/rear # lsb_release -a
LSB Version:    n/a
Distributor ID: openSUSE
Description:    openSUSE Leap 15.4
Release:        15.4
Codename:       n/a

In the disklayout.conf I find this:

# Mounted btrfs subvolumes that have the 'no copy on write' attribute set.
# Format: btrfsnocopyonwrite <btrfs_subvolume_path>
btrfsnocopyonwrite @/var

Adding PROGS+=( ... ) like that probably doesn't work because it is in a subshell that ends on line 539 so that this assignment has no effect on ReaR.

I'd suggest being more generic here:

  1. If BTRFS is in use, then add chattr and lsattr to REQUIRED_PROGS
  2. If SUSE_LINUX, then add chattr and lsattr to REQUIRED_PROGS

BTW, we actually check for REQUIRED_PRODS in the init stage which happens very early. Adding stuff to that array after the init stage only ensures that it is also in the rescue system, but doesn't check in the source system.

If you want to get hands-on then I can provide SSH access for you to look at this.

schlomo commented at 2023-02-16 16:39:

@jsmeix ?

jsmeix commented at 2023-02-17 09:09:

Be patient...
(fist things first and one thing at a time).

The "workaround" (i.e. the currently documented solution) is
to use a local.conf that is based on a matching SLE template.

schlomo commented at 2023-02-17 09:42:

Good point with the examples, especially as we also refer to them in the default local.conf.

However, they are only about SLE12 and not SLE15 and I'd hope that we can cover "known" issues like including chattr and other progs already via our code to make it simpler for users.

The other aspect is that we can't change global variables in a subshell so that your fix won't work like that:
https://github.com/rear/rear/blob/46d4dd3fe03595cc7b969db66d6ab1a881fcb6eb/usr/share/rear/layout/save/GNU/Linux/230_filesystem_layout.sh#L505-L540

The entire - very long - block of code runs in a subshell writing the stdout to $DISKLAYOUT_FILE- hence variable assignments are lost at the end of this subshell.

jsmeix commented at 2023-02-22 13:40:

Argh!

@schlomo
thank you for finding the root cause!

Long code parts within uncommon environments (e.g. a subshell)
call for trouble when someone "just adds usual code" in between
without looking at the global stuff.

I think that subshell usage is legacy code because
a subshell is not needed to have a redirection
for a longer piece of code, cf.
https://github.com/rear/rear/blob/rear-2.7/usr/share/rear/layout/save/GNU/Linux/190_opaldisk_layout.sh#L24

jsmeix commented at 2023-02-22 13:46:

We use ) >> $DISKLAYOUT_FILE there:

usr/share/rear/layout/save/GNU/Linux/230_filesystem_layout.sh:
) >> $DISKLAYOUT_FILE

usr/share/rear/layout/save/GNU/Linux/240_swaps_layout.sh:
) >> $DISKLAYOUT_FILE

usr/share/rear/layout/save/GNU/Linux/200_partition_layout.sh:
) >> $DISKLAYOUT_FILE

I think I can replace in all those cases

(
    longer
    block
    of
    code
) >> $DISKLAYOUT_FILE

by

{
    longer
    block
    of
    code
} >> $DISKLAYOUT_FILE

which would - by the way - also fix this issue.

But I will have to have a bit closer look
if perhaps an actual subshell is really needed
in some of those cases.

jsmeix commented at 2023-02-27 13:25:

Even more Argh!

ShellCheck found it:

In usr/share/rear/layout/save/GNU/Linux/230_filesystem_layout.sh line 505:
                        REQUIRED_PROGS+=( chattr )
                        ^------------^ SC2030 (info): Modification of REQUIRED_PROGS is local (to subshell caused by (..) group).

See in
https://github.com/rear/rear/pull/2847#issuecomment-1446253316
the list of most often reported ShellCheck codes
where this one is the very last one
but it is just an 'info' - so "don't worry".
Sigh!

But at least it proves my
https://github.com/rear/rear/issues/1040#issuecomment-1034890880

the SCnnnn that appear rarely may more likely indicate real errors

which is confirmed by @pcahyna in his subsequent comment and my
https://github.com/rear/rear/issues/1040#issuecomment-1060704029

jsmeix commented at 2023-02-27 13:36:

For the (not so) fun of it:

ShellCheck also found

In usr/share/rear/layout/save/GNU/Linux/230_filesystem_layout.sh line 546:
grep -q '^fs ' $DISKLAYOUT_FILE && REQUIRED_PROGS+=( mkfs )
                                   ^------------^ SC2031 (info): REQUIRED_PROGS was modified in a subshell. That change might be lost.

which is the last but one in the list in
https://github.com/rear/rear/pull/2847#issuecomment-1446253316

As far as I see this one is plain wrong by ShellCheck
because line 546 is already outside of the subshell
so this one seems to be a parsing bug in ShellCheck
(it seems to fail to recognize the end of the subshell).

I fail to understand what
https://github.com/koalaman/shellcheck/wiki/SC2031
likes to tell.

I understand what SC2030 means
but I do not understand what SC2031 means.
Whether or not ARRAY had been modified in a subshell
has no influence on ARRAY+=( foo ) outside of a subshell
because the latter will append 'foo' to the ARRAY.

Regardless what SC2031 means,
https://github.com/rear/rear/pull/2947
will fix both SC2030 and SC2031.

thomasmerz commented at 2023-03-01 08:56:

ShellCheck found it:

This prooves how important some code-checking would and will be ❤️

jsmeix commented at 2023-03-01 09:15:

@thomasmerz
for me this proves how carefully and thorougly
we must implement our automated ShellCheck
to get the right balance between usefulness
and annoyance for us and our contributors.

E.g. in this case here:
ShellCheck reported the bug only as 'info'
but we cannot enable 'info' in general
because most what ShellCheck reports as 'info'
is an annoyance for us and our contributors.

Unfortunately '-S error --include=...' does not work
to report errors plus explicitly specified non-error codes

# shellcheck -s bash -S error --include=SC2030 usr/share/rear/layout/save/GNU/Linux/230_filesystem_layout.sh
[no output]

jsmeix commented at 2023-03-01 09:54:

With
https://github.com/rear/rear/pull/2947
merged this issue should be fixed.

@schlomo
thanks again for finding the root cause!
Likely I would have searched elsewhere for some longer time.


[Export of Github issue for rear/rear.]