#1714 Issue closed: Inaccurate btrfs example ?

Labels: enhancement, documentation, fixed / solved / done, minor bug

gozora opened issue at 2018-01-27 11:03:

Hello @jsmeix,
in SLE12-SP1-btrfs-example.conf and SLE12-SP2-btrfs-example.conf

# Files in btrfs subvolumes are excluded by 'tar --one-file-system'
# so that such files must be explicitly included to be in the backup.
# Files in the following SLE12-SP1 and SP2 default btrfs subvolumes are
# in the below example not included to be in the backup
#   /.snapshots/*  /var/crash/*
# but files in /home/* are included to be in the backup.
# You may use a command like
#   findmnt -n -r -t btrfs | cut -d ' ' -f 1 | grep -v '^/$' | egrep -v 'snapshots|crash' | sed -e "s/$/\/*'/" -e "s/^/'/" | tr '\n' ' '

As I'm far from being btrfs expert, I've used yours findmnt command without problems until now.
Let me illustrate the problem on following example:
I have /home as btrfs subvolume which is included via:

BACKUP_PROG_INCLUDE=( '/usr/local/*' '/boot/grub2/i386-pc/*' '/var/spool/*' '/var/lib/libvirt/images/*' '/var/lib/machines/*' '/var/lib/mailman/*' '/var/log/*' '/tmp/*' '/var/opt/*' '/var/lib/pgsql/*' '/var/cache/*' '/var/lib/named/*' '/opt/*' '/home/*' '/var/tmp/*' '/srv/*' '/boot/grub2/x86_64-efi/*' '/var/lib/mariadb/*' '/var/lib/mysql/*' )

With BACKUP_PROG_INCLUDE set like this and content of /home as follows:

sp2:~ # ls -alR /home
/home:
total 8
drwxr-xr-x 1 root root  52 Jan 27 11:18 .
drwxr-xr-x 1 root root 154 Dec 12 16:57 ..
-rw-r--r-- 1 root root  26 Jan 27 10:27 .password
-rw-r--r-- 1 root root  20 Jan 27 10:28 placeholder
drwxr-xr-x 1 root root  24 Jan 27 11:18 secret

/home/secret:
total 4
drwxr-xr-x 1 root root 24 Jan 27 11:18 .
drwxr-xr-x 1 root root 52 Jan 27 11:18 ..
-rw-r--r-- 1 root root 20 Jan 27 11:18 .secret_file

/home/.password will not be backed up because /home/* does not represent files that beginning with dot.
Not sure if this is a big deal or not and how many people used this example but in my opinion this could backfire in some scenarios.

btw, I did some tests and end up with following configuration directive working fine:

BACKUP_PROG_INCLUDE=( $(findmnt -n -r -t btrfs | cut -d ' ' -f 1 | grep -v '^/$' | egrep -v 'snapshots|crash' ) )

So I have two questions:

  1. Why did you go the extra mile with findmnt ... | sed -e "s/$/\/*'/" -e "s/^/'/" | tr '\n' ' ' and did not directly used findmnt ... command to expand to BACKUP_PROG_INCLUDE ?
  2. Don't you think that asterisk (*) should be removed from examples as it omits files that begin with dot in subvolume root?

V.

jsmeix commented at 2018-01-29 10:52:

The main important part of that stuff is that it is only a comment

# You may use a command like
#   findmnt -n -r -t btrfs ...
# to generate the values

where the "you may use [something] like" should indicate that the user
should care what is actually the right one for his particular needs.

Nevertheless such a comment should not mislead the user
and I agree to improve it.

I cannot remember why I used the /directory/* form
and not simply /directory/ or even only /directory.
I guess at that time I was somehow confused
by tar's own internal pattern matching, cf.
https://www.gnu.org/software/tar/manual/tar.html#SEC113
that reads in particular

These two pairs of member lists are used
in the following operations:
`--diff', `--extract', `--list', `--update'.

There are no inclusion members in
create mode (`--create' and `--append'),
since in this mode the names obtained
from the command line refer to files,
not archive members.

versus bash file name globbing that seems to be applied
according to how tar is called in
backup/NETFS/default/500_make_backup.sh

... -c -f - $(cat $TMP_DIR/backup-include.txt) ...

so that the contents of the backup-include.txt file
(which are basically the elements of the BACKUP_PROG_INCLUDE array)
appear 'as is' on the tar command line so that the
BACKUP_PROG_INCLUDE array values get evaluated
by bash file name globbing (with "nullglob" set in usr/sbin/rear
which can have sometimes unexpected "interesting effects"
but here ist seems o.k.) - as far as I currently understand the code.

On my SLE10, SLE11, and SLE12 systems
tar behaves consistent as follows:

# mkdir /foo

# echo foo >/foo/foo

# echo '.foo' >/foo/.foo

# mkdir /foobar

# echo foobar >/foobar/foobar

# echo '.foobar' >/foobar/.foobar

# find /foo*
/foo
/foo/foo
/foo/.foo
/foobar
/foobar/foobar
/foobar/.foobar

# ( echo '/foo/*' >/tmp/files ; set -x ; tar -cv $( cat /tmp/files ) | tar -tv )
++ cat /tmp/files
+ tar -cv /foo/foo
tar: Removing leading `/' from member names
+ tar -tv
/foo/foo
-rw-r--r-- root/root         4 2018-01-29 11:06:25 foo/foo

# ( echo '/foo/' >/tmp/files ; set -x ; tar -cv $( cat /tmp/files ) | tar -tv )
++ cat /tmp/files
+ tar -cv /foo/
tar: Removing leading `/' from member names
+ tar -tv
/foo/
/foo/foo
/foo/.foo
drwxr-xr-x root/root         0 2018-01-29 11:06:32 foo/
-rw-r--r-- root/root         4 2018-01-29 11:06:25 foo/foo
-rw-r--r-- root/root         5 2018-01-29 11:06:32 foo/.foo

# ( echo '/foo' >/tmp/files ; set -x ; tar -cv $( cat /tmp/files ) | tar -tv )
++ cat /tmp/files
+ tar -cv /foo
tar: Removing leading `/' from member names
+ tar -tv
/foo/
/foo/foo
/foo/.foo
drwxr-xr-x root/root         0 2018-01-29 11:06:32 foo/
-rw-r--r-- root/root         4 2018-01-29 11:06:25 foo/foo
-rw-r--r-- root/root         5 2018-01-29 11:06:32 foo/.foo

# ( echo '/foo*' >/tmp/files ; set -x ; tar -cv $( cat /tmp/files ) | tar -tv )
++ cat /tmp/files
+ tar -cv /foo /foobar
tar: Removing leading `/' from member names
+ tar -tv
/foo/
/foo/foo
/foo/.foo
/foobar/
/foobar/foobar
/foobar/.foobar
drwxr-xr-x root/root         0 2018-01-29 11:06:32 foo/
-rw-r--r-- root/root         4 2018-01-29 11:06:25 foo/foo
-rw-r--r-- root/root         5 2018-01-29 11:06:32 foo/.foo
drwxr-xr-x root/root         0 2018-01-29 11:06:53 foobar/
-rw-r--r-- root/root         7 2018-01-29 11:06:45 foobar/foobar
-rw-r--r-- root/root         8 2018-01-29 11:06:53 foobar/.foobar

Because it seems tar works as expected with plain directory names
I change the comment accordingly.

But I am against autogenerated entries in the
BACKUP_PROG_INCLUDE array by default
which would get all currently existing btrfs subvolumes
automatically into the backup because there are
certain tools that generate whatever btrfs subvolumes,
e.g. docker cf.
https://github.com/rear/rear/issues/1496

... I'd like to convince ReaR to just ignore anything
that looks like a docker-generated subvolume
(snapshot or not). AFAIK, these btrfs docker-trees
cannot be safely restored anyway in a consistent fashion.

Therefore having automatically all existing btrfs subvolumes
in the backup could suddenly blow up the backup by tons of
unexpected/unwanted stuff.

It is my assumption that usually the user prefers
to specify which btrfs subvolumes he actually wants
to have in his backup (i.e. "whitelisting").

For example in the ...-btrfs-example.conf files /tmp and /var/tmp
are listed in the BACKUP_PROG_INCLUDE example array
but the user must decide if he really wants to have them
in his backup.

gozora commented at 2018-01-29 11:11:

Hello @jsmeix

Thanks for comprehensive answer!

But I am against autogenerated entries in the
BACKUP_PROG_INCLUDE array by default
which would get all currently existing btrfs subvolumes
automatically into the backup because there are
certain tools that generate whatever btrfs subvolumes,
e.g. docker cf.

I was just wondering if there is a technical problem with such automatic approach. If only concern here is right backup content that would make my life easier ;-).
I have currently task to somehow automate btrfs subvolume includes into ReaR, because explicitly setting includes for each and every subvolume causes headaches when managed across > 1000 servers.
Using BACKUP_PROG_INCLUDE=( findmnt ...) directly, is a good starting point how this could be solved.

Thanks!

V.

jsmeix commented at 2018-01-29 11:37:

There is no technical problem with such automatic approach
because basically all files in ReaR are bash scripts.
E.g. also /usr/share/rear/conf/default.conf and /etc/rear/local.conf
are sourced (and executed) as bash scripts.

But there is the problem that nobody can fully imagine
how particular things in a bash sctipt actually evaluate
on a real system - unless one tries it out ;-)

Cf. the comment in default.conf:

# Some variables are actually bash arrays and should be treated with care.
# Use VAR=() to set an empty array.
# Use VAR=( "${VAR[@]}" 'value' ) to add a fixed value to an array.
# Use VAR=( "${VAR[@]}" "$var" ) to add a variable value to an array.
# Whether or not the latter case works as intended depends on when and
# how "$var" is set and evaluated by the Relax-and-Recover scripts.
# Be careful with values that are globbing patterns (cf. COPY_AS_IS below).
# In general using ${VAR[*]} is problematic and using ${VAR[@]} without
# double-quotes is also problematic, see 'Arrays' in "man bash" and
# see https://github.com/rear/rear/issues/1068 for some examples.
...
# Files and directories to copy as-is (with tar) into the ReaR recovery system.
# Usually globbing patterns in COPY_AS_IS are specified without quoting
# like COPY_AS_IS=( "${COPY_AS_IS[@]}" /my/directory/* /path/to/my/files* )
# so that the bash pathname expansion works as usually intended
# (for details see the build/GNU/Linux/100_copy_as_is.sh script).
# Because usr/sbin/rear sets the nullglob option globbing patterns
# like /path/to/my/files* expand to nothing if no file matches.
COPY_AS_IS=( $SHARE_DIR $VAR_DIR )

Enjoy the "warm feeling" in quoting hell ;-)

gozora commented at 2018-01-29 11:44:

Yes, I'll do lot of testing on this topic in upcoming weeks.

Enjoy the "warm feeling" in quoting hell ;-)

I'm really looking forward for all the quoting already :-)

As I currently know everything I needed, I guess we can close this issue!

Thanks again for your help!

V.

jsmeix commented at 2018-01-29 11:45:

With https://github.com/rear/rear/pull/1716 merged
I consider this issue to be sufficiently fixed.

jsmeix commented at 2018-01-30 15:01:

I verified on SLES12-SP2 with before

BACKUP_PROG_INCLUDE=( '/usr/local/*' '/var/spool/*' '/var/tmp/*' '/boot/grub2/i386-pc/*' '/tmp/*' '/var/lib/libvirt/images/*' '/var/lib/mailman/*' '/var/cache/*' '/var/lib/machines/*' '/var/lib/mysql/*' '/opt/*' '/home/*' '/var/lib/named/*' '/var/opt/*' '/srv/*' '/var/lib/pgsql/*' '/var/log/*' '/boot/grub2/x86_64-efi/*' '/var/lib/mariadb/*' )

versus now with

BACKUP_PROG_INCLUDE=( /usr/local /var/spool /var/tmp /boot/grub2/i386-pc /tmp /var/lib/libvirt/images /var/lib/mailman /var/cache /var/lib/machines /var/lib/mysql /opt /home /var/lib/named /var/opt /srv /var/lib/pgsql /var/log /boot/grub2/x86_64-efi /var/lib/mariadb )

that it works well now - in particular in the latter case
there are no unexpected additional files in the backup.

A positive side effect of the new way is that the logged tar command in backup.log
is now much shorter because with plain directories in BACKUP_PROG_INCLUDE
nothing gets expanded by bash file name globbing (i.e. the logged tar command
contains the BACKUP_PROG_INCLUDE array values "as is").

jsmeix commented at 2018-02-01 11:57:

@gozora
for even more fun of mind confusion with that stuff,
see my documentation about
BACKUP_ONLY_INCLUDE and BACKUP_ONLY_EXCLUDE
in current ReaR GitHub master code default.conf
https://raw.githubusercontent.com/rear/rear/master/usr/share/rear/conf/default.conf
that reads (excerpts):

# BACKUP_PROG_EXCLUDE is an array of strings
# that get written into a backup-exclude.txt file
# that is used e.g. in 'tar -X backup-exclude.txt'
# to get things excluded from the backup.
# Proper quoting of the BACKUP_PROG_EXCLUDE array
# members is crucial to avoid bash expansions.
# In /etc/rear/local.conf use
# BACKUP_PROG_EXCLUDE=( "${BACKUP_PROG_EXCLUDE[@]}" '/this/*' '/that/*' )
...
# BACKUP_PROG_INCLUDE is an array of strings
# that get written into a backup-include.txt file
# that is used e.g. in 'tar -c $(cat backup-include.txt)'
# to get things included in the backup.
# Proper quoting of the BACKUP_PROG_INCLUDE array
# members is crucial to avoid bash expansions.
# In /etc/rear/local.conf use
# BACKUP_PROG_INCLUDE=( '/this/*' '/that/*' )

Of course when no bash globbing charaters are used
quoting is not needed.

But it seems there could be a crucial difference
when bash globbing charaters are used
between how BACKUP_PROG_EXCLUDE is used for 'tar'
versus how BACKUP_PROG_INCLUDE is used for 'tar'
because I think
the array members of BACKUP_PROG_EXCLUDE are 'tar' patterns cf.
https://www.gnu.org/software/tar/manual/tar.html#SEC113
while in contrast the array members of BACKUP_PROG_INCLUDE
are bash globbing patterns, cf. my "tar behaves..." in the above
https://github.com/rear/rear/issues/1714#issuecomment-361208923


[Export of Github issue for rear/rear.]