#1496 Issue closed: Error during layout recreation with btrfs and docker

Labels: enhancement, fixed / solved / done

OliverO2 opened issue at 2017-09-15 12:08:

Basis

  • rear version (/usr/sbin/rear -V): Relax-and-Recover 2.2 / Git
  • OS version (cat /etc/rear/os.conf or lsb_release -a): Ubuntu 16.04.3 LTS
  • rear configuration files (cat /etc/rear/site.conf or cat /etc/rear/local.conf): see below

Problem

Using rear mkrescue on a system with docker running on a btrfs file system makes a subsequent rear recover stop with the following error messages:

Mounting filesystem /
[  169.768852] BTRFS error (device sda2): '@/var/lib/docker/btrfs' is not a valid subvolume
An error occurred during layout recreation.

Workaround

Use systemctl stop docker.service before running rear mkrescue.

Probable root cause

Docker somehow manages to put a seemingly impossible entry in the kernel's notion of mounted file systems (proc/self/mounts):

# mount | grep docker
/dev/sda2 on /var/lib/docker/btrfs type btrfs (rw,noatime,nodiratime,ssd,space_cache,subvolid=257,subvol=/@/var/lib/docker/btrfs)

The option subvol=/@/var/lib/docker/btrfs breaks it. Btrfs subvolumes can be mounted by subvolid= or subvolume path (subvol=). In this example, subvolid 257 is the root subvolume and there is no such subvolume with a path of @/var/lib/docker/btrfs or /@/var/lib/docker/btrfs. Trying to mount such a beast manually fails:

# mount -t btrfs -o rw,noatime,nodiratime,ssd,space_cache,subvolid=257,subvol=/@/var/lib/docker/btrfs /dev/sda2 M
BTRFS error (device sda2): '/@/var/lib/docker/btrfs' is not a valid subvolume
# mount -t btrfs -o rw,noatime,nodiratime,ssd,space_cache,subvolid=257,subvol=@/var/lib/docker/btrfs /dev/sda2 M
BTRFS error (device sda2): '@/var/lib/docker/btrfs' is not a valid subvolume

This is the current list of relevant btrfs subvolumes (excluding volume snapshots named .@*):

# btrfs subvolume list -a / | grep -v '<FS_TREE>/.@'
ID 257 gen 105922 top level 5 path <FS_TREE>/@
ID 258 gen 105922 top level 5 path <FS_TREE>/@home
ID 5125 gen 27216 top level 257 path @/var/lib/docker/btrfs/subvolumes/8bfcbb9c775e910d504ac7d0f091e9668d700fffcc4c10e71118330ddad47b80
ID 5126 gen 27217 top level 257 path @/var/lib/docker/btrfs/subvolumes/4209efbc4922061100d9922741e2b5135d396a1586d1111f72d1beb9ea1cc513
ID 5127 gen 27218 top level 257 path @/var/lib/docker/btrfs/subvolumes/e44faf88d5e4a25366e7e0eb2882f40b1c0d2b5fef1142f06b0dad3cd4e1ee4f
ID 5128 gen 27219 top level 257 path @/var/lib/docker/btrfs/subvolumes/268b6c560568411d38215d3d87c08ddc0fb373bb02b4d1a114c589c790a69101
ID 5129 gen 28716 top level 257 path @/var/lib/docker/btrfs/subvolumes/4c48e4299061b98be208013f5be89f05107f075fea4c4e659f117fa55225971b
ID 5134 gen 80934 top level 257 path @/var/lib/docker/btrfs/subvolumes/e18c9690831dd193f37d5dadf0e0ff0f9cac75e6bba09e5b471b7a16538e639b
ID 5137 gen 80934 top level 257 path @/var/lib/docker/btrfs/subvolumes/813f7476ee473d6a1e64b4e08877b0fd76ce0e6a472dfea8ca4f44711e545946
ID 5149 gen 80934 top level 257 path @/var/lib/docker/btrfs/subvolumes/1f45e82870cc392d533d0976f2e1a732b8fab8b85f322ae77533c8cfd676dbff
ID 5311 gen 80934 top level 257 path @/var/lib/docker/btrfs/subvolumes/86c787fc596f7f11931fdc6403ad713014d36b958464012d73b8940460cd2664-init
ID 5312 gen 80934 top level 257 path @/var/lib/docker/btrfs/subvolumes/86c787fc596f7f11931fdc6403ad713014d36b958464012d73b8940460cd2664

What to do

I have no idea why the docker folks did this, but since this stuff is out in the wild, I'd like to convince ReaR to just ignore anything that looks like a docker-generated subvolume (snapshot or not). AFAIK, these btrfs docker-trees cannot be safely restored anyway in a consistent fashion.

I've tried to use EXCLUDE_RECREATE=("${EXCLUDE_RECREATE[@]}" "fs:@/var/lib/docker/btrfs") but apparently this doesn't match. Is there any overview documentation on how the exclusion mechanism actually works. Any hints where I should look?

jsmeix commented at 2017-09-15 12:59:

@OliverO2
Oh dear!

It seems you have to fight with btrfs subvolume stuff in Ubuntu.
I had fights with the btrfs subvolume structure in SUSE.
If you can - run away from btrfs subvolumes ;-)
If you find nested btrfs subvolumes, run even more away!

First and foremost:
I know nothing about the btrfs subvolume structure in Ubuntu.
ReaR may need adaptions and enhancements to make
it also work with the btrfs subvolume structure in Ubuntu.
Currently ReaR seems to work with btrfs for SUSE and Red Hat.

In general when you have to deal with btrfs subvolumes:

(1)
You basically must use the findmnt command
to know what is really mounted where, e.g. for btrfs use

# findmnt -t btrfs

All other comands may only lead you into more
and more confusion because they may more or less
hide what is really mounted where.

(2)
You basically must mount the whole btrfs filesystem
(which means to mount btrfs root subvolume)
at a mount point as you like e.g. via

mkdir /tmp/btrfs-root
mount -t btrfs -o subvolid=0 /dev/sdXn /tmp/btrfs-root

and then go to that mount point directory and
inspect the whole btrfs filesystem.

(3)
When not the btrfs root subvolume but a btrfs subvolume
is mounted at the '/' mountpoint then you do not see
the whole btrfs filesystem below '/' but only what is
below the there mounted btrfs subvolume.
Use (1) and (2) to get a better overview what there really is.

(4)
When not the btrfs root subvolume but a btrfs subvolume
is mounted at the '/' mountpoint then creating
more subvolumes just with a command like

# btrfs subvolume create /path/to/mysubvol

creates that new subvolume below the btrfs subvolume
that is mounted at the '/' mountpoint and usually this is
not what is intended.
Usually what is intended is to create a new subvolume
at a path below the root of the whole btrfs filesystem.
To create a new subvolume at a path below the root
of the whole btrfs filesystem you need to first mount
the btrfs root subvolume as described in item (2)
and then create the new subvolume via the mountpoint
directory where the root of the whole btrfs filesystem is.
But such a new subvolume will not be visible/accessible
when not the btrfs root subvolume but a btrfs subvolume
is mounted at the '/' mountpoint.
To make such a new subvolume visible and accessible
it must be mounted at a mountpoint directory that is
inside the btrfs subvolume that is mounted at the '/' mountpoint.
Usually a /etc/fstab entry is created to mount such a
new subvolume automatically during boot at a
mountpoint directory that is inside the btrfs subvolume
that is mounted at the '/' mountpoint.
For details how this works under SUSE see
https://lists.opensuse.org/opensuse-factory/2016-10/msg00397.html

Accordingly for this issue I guess:
When in Ubuntu not the btrfs root subvolume but
a btrfs subvolume is mounted at the '/' mountpoint
then the docker folks may have created their new subvolume
in a wrong way (i.e. as a second-level sub-subvolume below
the btrfs subvolume that is mounted at the '/' mountpoint)
and during "rear recover" such second-level sub-subvolumes
cause issues, cf.
https://github.com/rear/rear/issues/944
therein see in particular
https://github.com/rear/rear/issues/944#issuecomment-238239926
and you may also follow the other links therein.

Have fun with nested btrfs subvolumes!
;-)

OliverO2 commented at 2017-09-15 15:57:

@jsmeix
Thanks for the extensive explanation.

If you can - run away from btrfs subvolumes ;-)

Not at all! We've been using Btrfs for more than three years and it has shown to be a perfect fit over here. We snapshot each volume every 5 minutes for system-wide short-term backup (with a snapshot cleanup schedule, of course). Our unattended backup solution creates backups from snapshots, enabling consistent anytime backups without interrupting work or shutting down services. And, once a week, recent backups are compared to their snapshots to ensure that the whole backup system actually does what it is meant to do.

(1)
You basically must use the findmnt command

Thanks for pointing me to findmnt. Output over here is:

# findmnt -t btrfs
TARGET                  SOURCE                             FSTYPE OPTIONS
/                       /dev/sda2[/@]                      btrfs  rw,noatime,nodiratime,ssd,space_cache,subvolid=257,subvol=/@
├─/volumes/01           /dev/sda2                          btrfs  rw,noatime,nodiratime,ssd,space_cache,subvolid=5,subvol=/
├─/home                 /dev/sda2[/@home]                  btrfs  rw,noatime,nodiratime,ssd,space_cache,subvolid=258,subvol=/@home
├─/volumes/02           /dev/mapper/Foxtrot-02             btrfs  rw,noatime,nodiratime,ssd,space_cache,subvolid=5,subvol=/
├─/mnt/reserve          /dev/mapper/Foxtrot-02[/@]         btrfs  rw,noatime,nodiratime,ssd,space_cache,subvolid=257,subvol=/@
└─/var/lib/docker/btrfs /dev/sda2[/@/var/lib/docker/btrfs] btrfs  rw,noatime,nodiratime,ssd,space_cache,subvolid=257,subvol=/@/var/lib/docker/btrfs

Note that Btrfs automatically translates subvolid=0 to subvolid=5, its internal hardwired root id.

(2)
You basically must mount the whole btrfs filesystem

I do. Our volume organization is as follows:

  • For each disk partition the root subvolume (subvolid=0) is mounted on /volumes/ (e.g. /volumes/01).
  • On these root subvolumes, the primary system subvolumes (@, @home) are created, like (e.g. /volumes/01/@, /volumes/01/@home) and mounted in the usual locations (/, /home).
  • Snapshots are created from primary system subvolumes on the root subvolume, like /volumes/01/.@-2017-09-15T16:20:01+02:00 and /volumes/01/.@home-2017-09-15T16:20:01+02:00

So far, this is all as intended. The root subvolume (subvolid=0) is used for subvolume management only. The '/' mountpoint uses a subvolume named @.

and inspect the whole btrfs filesystem.

# btrfs subvol list -a /volumes/01 | grep -v ' \.@'
ID 257 gen 106407 top level 5 path @
ID 258 gen 106407 top level 5 path @home
ID 5125 gen 27216 top level 257 path <FS_TREE>/@/var/lib/docker/btrfs/subvolumes/8bfcbb9c775e910d504ac7d0f091e9668d700fffcc4c10e71118330ddad47b80
ID 5126 gen 27217 top level 257 path <FS_TREE>/@/var/lib/docker/btrfs/subvolumes/4209efbc4922061100d9922741e2b5135d396a1586d1111f72d1beb9ea1cc513
ID 5127 gen 27218 top level 257 path <FS_TREE>/@/var/lib/docker/btrfs/subvolumes/e44faf88d5e4a25366e7e0eb2882f40b1c0d2b5fef1142f06b0dad3cd4e1ee4f
ID 5128 gen 27219 top level 257 path <FS_TREE>/@/var/lib/docker/btrfs/subvolumes/268b6c560568411d38215d3d87c08ddc0fb373bb02b4d1a114c589c790a69101
ID 5129 gen 28716 top level 257 path <FS_TREE>/@/var/lib/docker/btrfs/subvolumes/4c48e4299061b98be208013f5be89f05107f075fea4c4e659f117fa55225971b
ID 5134 gen 80934 top level 257 path <FS_TREE>/@/var/lib/docker/btrfs/subvolumes/e18c9690831dd193f37d5dadf0e0ff0f9cac75e6bba09e5b471b7a16538e639b
ID 5137 gen 80934 top level 257 path <FS_TREE>/@/var/lib/docker/btrfs/subvolumes/813f7476ee473d6a1e64b4e08877b0fd76ce0e6a472dfea8ca4f44711e545946
ID 5149 gen 80934 top level 257 path <FS_TREE>/@/var/lib/docker/btrfs/subvolumes/1f45e82870cc392d533d0976f2e1a732b8fab8b85f322ae77533c8cfd676dbff
ID 5311 gen 80934 top level 257 path <FS_TREE>/@/var/lib/docker/btrfs/subvolumes/86c787fc596f7f11931fdc6403ad713014d36b958464012d73b8940460cd2664-init
ID 5312 gen 80934 top level 257 path <FS_TREE>/@/var/lib/docker/btrfs/subvolumes/86c787fc596f7f11931fdc6403ad713014d36b958464012d73b8940460cd2664

(3)
When not the btrfs root subvolume but a btrfs subvolume
is mounted at the '/' mountpoint then you do not see
the whole btrfs filesystem below '/'

True when looking at the directory tree, but the output of btrfs subvol list is identical regardless of the path provided, as long as that path is somewhere within the same Btrfs file system. (Even if I use btrfs subvol list -a /var/lib/docker/btrfs | grep -v '\.@' the output is the same.)

The strange thing is: Nowhere does a subvolume named /@/var/lib/docker/btrfs or @/var/lib/docker/btrfs show up, so it seems that Btrfs knows nothing about it.

(4)
When not the btrfs root subvolume but a btrfs subvolume
is mounted at the '/' mountpoint then creating
more subvolumes [...]
creates that new subvolume below the btrfs subvolume
that is mounted at the '/' mountpoint and usually this is
not what is intended.

In general, I don't see a problem with nested subvolumes as long as

  • one has a reasonable mounting strategy and
  • keeps in mind that snapshots do not cross subvolume boundaries.

This may be exactly what's intended.

So yes, docker creates subvolumes below the @ subvolume, which in turn is a subvolume of the root subvolume. Docker also creates subvolumes and snapshots even further down the hierarchy (see How the btrfs storage driver works).

In my eyes there is nothing wrong with this except for that 'bogus' volume name /@/var/lib/docker/btrfs unknown to Btrfs itself.

Conclusion so far

I think ReaR should find a way to deal with it, which includes

  • accepting subvolumes of subvolumes as normal Btrfs use,
  • automatically filtering out 'bogus' Btrfs volumes, which are impossible for ReaR to recreate, and
  • offering a configuration option to filter out Btrfs volumes on request.

I would still be grateful if someone could point me to information on how to make EXCLUDE_RECREATE cover Btrfs volumes.

jsmeix commented at 2017-09-18 07:35:

@OliverO2
curently there is no support for EXCLUDE_RECREATE
for any of the btrfs* components in ReaR.

jsmeix commented at 2017-09-18 07:45:

@OliverO2
only a side note related to what you wrote:

We snapshot each volume every 5 minutes
for system-wide short-term backup ...
Our unattended backup solution creates
backups from snapshots, enabling
consistent anytime backups without
interrupting work or shutting down services.

I argue that such backups are not consistent,
depending on what kind of consistency one
expects to get from a backup of files:
Reason:
See the section
"Generic files backup with the plain SUSE installation system" at
https://en.opensuse.org/SDB:Disaster_Recovery
that reads (excerpts):

Consistent backup means that all files
in the backup are consistent on the
application level. For example assume
there is an application program running
that changes several files simultaneously.
When that program is running during files backup,
the backup may contain old data in some files
and new data in other files which could be
an inconsistent state that leads to errors
in that application after such an
inconsistent backup was restored. 

OliverO2 commented at 2017-09-18 11:24:

@jsmeix

I argue that such backups are not consistent,

What you get is intra-file-system consistency: Since btrfs snapshots are atomic operations, snapshots offer the same consistency level you would get from modern file systems after a power failure. When you backup from such snapshots, you won't have problems with any application programmed to survive a power failure (which you should expect from any reasonable application) For example, file based DBMS generally change multiple files in a safe sequence (possibly with added journaling) so that any inconsistencies from a sudden outage can and will be rectified during the next startup.

Of course, if you have unsafe applications, you have a problem. But then you'd have a problem anyway since such an application wouldn't survive its own crash due to internal (bug) or external circumstances (resource exhaustion).

jsmeix commented at 2017-09-19 08:22:

With https://github.com/rear/rear/pull/1499
and https://github.com/rear/rear/pull/1497
this issue should be fixed.

@OliverO2
many thanks for your valuable contributions to ReaR!


[Export of Github issue for rear/rear.]