#2158 Issue closed: OOM when restoring with BACKUP=BAREOS

Labels: support / question, won't fix / can't fix / obsolete, external tool

MiningPickaxe opened issue at 2019-06-09 19:29:

  • ReaR version ("/usr/sbin/rear -V"): 2.3 / Git

  • OS version ("cat /etc/rear/os.conf" or "lsb_release -a" or "cat /etc/os-release"): Ubuntu 18.04 (issue happens in the ReaR bootable iso though)

  • ReaR configuration files ("cat /etc/rear/site.conf" and/or "cat /etc/rear/local.conf"):
    /etc/rear/local.conf:

BACKUP=BAREOS
BAREOS_CLIENT=$(hostname).FQDN.com-fd
BAREOS_FILESET=UnixDefault
OUTPUT=ISO
OUTPUT_URL=cifs://secret
OUTPUT_OPTIONS="cred=/etc/rear/cifs"
  • Hardware (PC or PowerNV BareMetal or ARM) or virtual machine (KVM guest or PoverVM LPAR): KVM with 1 core (Xeon E3-1270v6), 2 (later 4)GB of RAM

  • System architecture (x86 compatible or PPC64/PPC64LE or what exact ARM device): x64-x86 Intel Xeon E3-1270v6

  • Firmware (BIOS or UEFI or Open Firmware) and bootloader (GRUB or ELILO or Petitboot): SeaBIOS

  • Storage (local disk or SSD) and/or SAN (FC or iSCSI or FCoE) and/or multipath (DM or NVMe): Virtual Disk (virtIO) on SSD

  • Description of the issue (ideally so that others can reproduce it):
    When recovering the VM which I broke by doing rm -rf / (This was a test of the recovery system before I roll it out, doesnt matter if data is lost, there was none) with Bareos and ReaR, the OOM-Killer kills the bareos-fd as well as other services and the restore stops half way through.

Detailed Walkthrough:

  • Boot from ISO, choose Recover test01

  • login as root

  • execute rear recover

  • enter yes for automatic disk layout configuration
    When "waiting for job to start" appears do the following on the bareos-director:

  • restore client=test01.FQDN.com-fd

  • select most recent backup

  • mark *

Verify the settings:

JobName:         test01.FQDN.com-restore
Bootstrap:       /var/lib/bareos/bareos-dir.restore.5.bsr
Where:           /
Replace:         Always
FileSet:         UnixDefault
Backup Client:   test01.FQDN.com-fd
Restore Client:  test01.FQDN.com-fd
Format:          Native
Storage:         File
When:            2019-06-09 19:12:24
Catalog:         MyCatalog
Priority:        1
Plugin Options:  *None*

The following appears on screen: https://paste.pics/73e7a398577b3f328d2daa01f5937f86
The Bareos Director log tells the following: https://gist.github.com/MiningPickaxe/9079c61d60fc2abde7ad31c3ef1b5499

  • Workaround, if any: found none so far

  • Attachments, as applicable ("rear -D mkrescue/mkbackup/recover" debug log files):

gdha commented at 2019-06-12 07:59:

@MiningPickaxe You are using an old version of rear (2.3) and the latest bareos version - IMHO this is not a supported combination. Please use the latest rear version (2.5) instead. If the problem still remains then we can look further.

MiningPickaxe commented at 2019-06-12 11:38:

@gdha Ok I just reinstalled the machine and instead of using the version avaible via apt I built it with the soures from git, newest version.

The issue still persists.

gdha commented at 2019-06-12 15:24:

Before we released rear-2.5 we did test Ubuntu 18.04 with bareos 18.2.5 (cfr. https://gist.githubusercontent.com/gdha/94c8f5842c96f5404c6696fafb142849/raw/b7ca59c1174c2551c59972ddf38110f820d7a710/rear-client-recover.log) and had no issues with it.
I did found an older OOM issue with bareos, but nothing recent. What you can do is open a support ticket at https://bugs.bareos.org

@MiningPickaxe opened a bug report at Bareos: https://bugs.bareos.org/view.php?id=1092

jsmeix commented at 2019-06-19 12:18:

@MiningPickaxe
does your thumbs up emoji to
https://github.com/rear/rear/issues/2158#issuecomment-501324331
mean the issue is meanwhile solved for you?

If yes, what was the root cause and how was it solved?

If no:
In your https://paste.pics/73e7a398577b3f328d2daa01f5937f86
the RSS values of the OOM killed processes are rather small (only some MB)
so that it seems the memory usage of those OOM killed processes
is not the root cause and something else has filled up the memory.

What is the memory usage when you are logged in as root
in the booted ReaR recovery system (without running "rear recover")?

If almost no free memory is left inside the ReaR recovery system
what is the size of your ReaR recovery system ISO image
and what does your ReaR recovery system contain?

To find out what your ReaR recovery system contains
set in your /etc/rear/local.conf

KEEP_BUILD_DIR="yes"

run rear mkrescue and inspect the contents
of $TMPDIR/rear.XXXXXXXXXXXXXXX/rootfs/
cf. the KEEP_BUILD_DIR description in usr/share/rear/conf/default.conf
e.g. see our current GitHub master code at
https://github.com/rear/rear/blob/master/usr/share/rear/conf/default.conf#L128

If your ReaR recovery system contents are rather big
you need to investigate what makes them so big.

For example on my test system
(I do not use BACKUP=BAREOS but BACKUP=NETFS):

# usr/sbin/rear -D mkrescue
...
Wrote ISO image: /root/rear.github.master/var/lib/rear/output/rear-g243.iso (172M)
...

# du -hs /tmp/rear.*/rootfs/
447M    /tmp/rear.rN8koxyLA17c3pR/rootfs/

I get a ReaR recovery system of 447M as a 172M ISO image.

MiningPickaxe commented at 2019-06-19 22:49:

@jsmeix the thumbs up is because I messaged the bareos usergroup for help.

Memory usage when just beeing logged in, ideling is at 27mb, 733 reserved with the largest "memory hogs" beeing systemd

The ISO is about 250mb in size, the tmp directory beeing 700mb with the size of the folders beeing like this:

38M     bin
7.1M    boot
100K    dev
2.8M    etc
0       init
570M    lib
4.0K    lib64
940K    md5sums.txt
12K     mnt
4.0K    proc
16K     root
36K     run
0       sbin
4.0K    selinux
4.0K    sys
20K     tmp
80M     usr
168K    var

which I can't really cut down further I think, the biggest parts of lib are drivers.

jsmeix commented at 2019-06-25 08:25:

@MiningPickaxe
the next step to find out what may fill up the memory during "rear recover"
is to run "rear recover" in MIGRATION_MODE as follows:

# export MIGRATION_MODE='true'

# rear -D recover

Now you get several user dialogs that you can confirm step by step.
In each one you can also Use Relax-and-Recover shell
and return back to that particular user dialog.
In the Relax-and-Recover shell you can inspect the memory usage
at that state during the "rear recover" process.

In particular I think the user dialog

Confirm the recreated disk layout or go back one step

is of interest because after that dialog the backup gets restored.
I assume when you Use Relax-and-Recover shell in that dialog
the memory usage is still rather low.
I assume that after that dialog when the backup gets restored
the memory usage goes up until OOM happens.
This would indicate the root cause is actually your particular
backup restore program (i.e. the BAREOS backup restore program)
that somehow fills up your memory (and not something else
that was run before the backup restore program).
I think in this case only BAREOS could help you further.
In particular I do not use BAREOS or any other third-party backup tool.
Therefore I can neither reproduce such issues
nor can I verify what could actually fix it.

jsmeix commented at 2019-06-25 08:42:

@MiningPickaxe
I see you still have ReaR 2.3

But for the Confirm the recreated disk layout or go back one step
user dialog you would need at least ReaR 2.4, cf.
https://github.com/rear/rear/commit/c09bae8c230b55ae5c9310af0c2bb040bd97c712
and
https://github.com/rear/rear/pull/1733

But meanwhile ReaR 2.5 was released so that I recommend
to upgrate to our latest version, cf.
http://relax-and-recover.org/download/

Alternatively you could try out our current ReaR GitHub master code
from within a separated directory as a test to find out if things work better
with our current master code compared to your ReaR 2.3.

Basically "git clone" our current ReaR upstream GitHub master code
into a separated directory and then configure and run ReaR
from within that directory like:

# git clone https://github.com/rear/rear.git

# mv rear rear.github.master

# cd rear.github.master

# vi etc/rear/local.conf

# usr/sbin/rear -D mkbackup

Note the relative paths "etc/rear/" and "usr/sbin/".

I recommend to try out our current ReaR upstream GitHub master code
because that is the only place where we fix bugs - i.e. bugs in older
ReaR versions are not fixed by us (i.e. by ReaR upstream).
If it works with our current ReaR upstream GitHub master code
we would appreciate an explicit positive feedback.

MiningPickaxe commented at 2019-07-05 17:24:

Thank you @jsmeix for the assistance, I tried running it in Migration Mode and with the -D option and examined the memory usage after each step, it is staying around 35mb, just as you predicted :D So I guess it really is an issue with bareos itself.

About your comment with the version, as I stated in a previous comment I updated to 2.5, my packet sources were outdated and I built it from source code directly, the correct version would be 2.5-git.3374.a60d6191.master / 2019-06-07. So that should no longer be an issue.

gdha commented at 2019-09-11 12:17:

@MiningPickaxe Did you involved Bareos to look into your issue?

cyberfarer commented at 2019-10-08 15:45:

Hello,
I'm also encountering this issue on Ubuntu 18.04, REAR 2.5 and BareOS 18.2.5-131.1.

Running bareos-fd -d 500 -f on the rear client and initiating the restore from the director, I see a lot of communication and then the OOM error after a Make path /tmp/bareos-restores//lib/systemd/network

Not sure if it's related, but I'm stuck.

cyberfarer commented at 2019-10-08 17:12:

For me, this is a noob error. When setting up clients I am using proper host names and not appending '-fd'. The rear scripts specifically look for $HOSTNAME-fd. Adjusting the restore script resolved the issue and the error message is erroneous. Hopefully, this may help someone else.

jsmeix commented at 2019-10-10 11:58:

@cyberfarer
that $HOSTNAME-fd value is just the default/fallback that is used
by the scripts
https://github.com/rear/rear/blob/master/usr/share/rear/prep/BAREOS/default/500_check_BAREOS_bconsole_results.sh#L32
and
https://github.com/rear/rear/blob/master/usr/share/rear/restore/BAREOS/default/400_restore_backup.sh#L103

See
https://github.com/rear/rear/blob/master/usr/share/rear/conf/default.conf#L1794
which shows that BAREOS_CLIENT is meant to be set as needed by the user
and it is set by the initial reporter @MiningPickaxe - see his initial post
https://github.com/rear/rear/issues/2158#issue-453930524

jsmeix commented at 2019-10-10 12:07:

@MiningPickaxe
I wonder if perhaps your BAREOS_CLIENT=$(hostname).FQDN.com-fd
results a wrong value which causes an OOM error as for @cyberfarer in
https://github.com/rear/rear/issues/2158#issuecomment-539612769

For example I get

# hostname
g243

# BAREOS_CLIENT=$(hostname).FQDN.com-fd

# echo $BAREOS_CLIENT
g243.FQDN.com-fd

Did you perhaps mean

# hostname --fqdn
g243.suse.de

# BAREOS_CLIENT=$(hostname --fqdn).com-fd

# echo $BAREOS_CLIENT
g243.suse.de.com-fd

?

jsmeix commented at 2019-10-10 14:10:

@aussendorf
could you have a look here what might be the root cause
why OOM can happen when restoring with BACKUP=BAREOS?

jsmeix commented at 2020-02-12 07:46:

No activity since months so I assume
this issue was either somehow solved
or is no longer of interest.

amtuannguyen commented at 2021-09-17 19:01:

In case someone tumble upon this. I tried this today and got the same error. It turned out the default restore location Where is /tmp/bareos-restores, which on the rear recover console is in ram, thus you will get OOM pretty quick. You will need to specify the Where parameter and set it to /mnt/local which is the restoration destination disk. Hope it helps someone.


[Export of Github issue for rear/rear.]