#2158 Issue closed
: OOM when restoring with BACKUP=BAREOS¶
Labels: support / question
, won't fix / can't fix / obsolete
,
external tool
MiningPickaxe opened issue at 2019-06-09 19:29:¶
-
ReaR version ("/usr/sbin/rear -V"): 2.3 / Git
-
OS version ("cat /etc/rear/os.conf" or "lsb_release -a" or "cat /etc/os-release"): Ubuntu 18.04 (issue happens in the ReaR bootable iso though)
-
ReaR configuration files ("cat /etc/rear/site.conf" and/or "cat /etc/rear/local.conf"):
/etc/rear/local.conf:
BACKUP=BAREOS
BAREOS_CLIENT=$(hostname).FQDN.com-fd
BAREOS_FILESET=UnixDefault
OUTPUT=ISO
OUTPUT_URL=cifs://secret
OUTPUT_OPTIONS="cred=/etc/rear/cifs"
-
Hardware (PC or PowerNV BareMetal or ARM) or virtual machine (KVM guest or PoverVM LPAR): KVM with 1 core (Xeon E3-1270v6), 2 (later 4)GB of RAM
-
System architecture (x86 compatible or PPC64/PPC64LE or what exact ARM device): x64-x86 Intel Xeon E3-1270v6
-
Firmware (BIOS or UEFI or Open Firmware) and bootloader (GRUB or ELILO or Petitboot): SeaBIOS
-
Storage (local disk or SSD) and/or SAN (FC or iSCSI or FCoE) and/or multipath (DM or NVMe): Virtual Disk (virtIO) on SSD
-
Description of the issue (ideally so that others can reproduce it):
When recovering the VM which I broke by doing rm -rf / (This was a test of the recovery system before I roll it out, doesnt matter if data is lost, there was none) with Bareos and ReaR, the OOM-Killer kills the bareos-fd as well as other services and the restore stops half way through.
Detailed Walkthrough:
-
Boot from ISO, choose
Recover test01
-
login as root
-
execute
rear recover
-
enter
yes
for automatic disk layout configuration
When "waiting for job to start" appears do the following on the bareos-director: -
restore client=test01.FQDN.com-fd
-
select most recent backup
-
mark *
Verify the settings:
JobName: test01.FQDN.com-restore
Bootstrap: /var/lib/bareos/bareos-dir.restore.5.bsr
Where: /
Replace: Always
FileSet: UnixDefault
Backup Client: test01.FQDN.com-fd
Restore Client: test01.FQDN.com-fd
Format: Native
Storage: File
When: 2019-06-09 19:12:24
Catalog: MyCatalog
Priority: 1
Plugin Options: *None*
The following appears on screen:
https://paste.pics/73e7a398577b3f328d2daa01f5937f86
The Bareos Director log tells the following:
https://gist.github.com/MiningPickaxe/9079c61d60fc2abde7ad31c3ef1b5499
-
Workaround, if any: found none so far
-
Attachments, as applicable ("rear -D mkrescue/mkbackup/recover" debug log files):
gdha commented at 2019-06-12 07:59:¶
@MiningPickaxe You are using an old version of rear (2.3) and the latest bareos version - IMHO this is not a supported combination. Please use the latest rear version (2.5) instead. If the problem still remains then we can look further.
MiningPickaxe commented at 2019-06-12 11:38:¶
@gdha Ok I just reinstalled the machine and instead of using the version avaible via apt I built it with the soures from git, newest version.
The issue still persists.
gdha commented at 2019-06-12 15:24:¶
Before we released rear-2.5 we did test Ubuntu 18.04 with bareos 18.2.5
(cfr.
https://gist.githubusercontent.com/gdha/94c8f5842c96f5404c6696fafb142849/raw/b7ca59c1174c2551c59972ddf38110f820d7a710/rear-client-recover.log)
and had no issues with it.
I did found an older OOM issue with bareos, but nothing recent. What you
can do is open a support ticket at
https://bugs.bareos.org
@MiningPickaxe opened a bug report at Bareos: https://bugs.bareos.org/view.php?id=1092
jsmeix commented at 2019-06-19 12:18:¶
@MiningPickaxe
does your thumbs up emoji to
https://github.com/rear/rear/issues/2158#issuecomment-501324331
mean the issue is meanwhile solved for you?
If yes, what was the root cause and how was it solved?
If no:
In your
https://paste.pics/73e7a398577b3f328d2daa01f5937f86
the RSS values of the OOM killed processes are rather small (only some
MB)
so that it seems the memory usage of those OOM killed processes
is not the root cause and something else has filled up the memory.
What is the memory usage when you are logged in as root
in the booted ReaR recovery system (without running "rear recover")?
If almost no free memory is left inside the ReaR recovery system
what is the size of your ReaR recovery system ISO image
and what does your ReaR recovery system contain?
To find out what your ReaR recovery system contains
set in your /etc/rear/local.conf
KEEP_BUILD_DIR="yes"
run rear mkrescue
and inspect the contents
of $TMPDIR/rear.XXXXXXXXXXXXXXX/rootfs/
cf. the KEEP_BUILD_DIR
description in
usr/share/rear/conf/default.conf
e.g. see our current GitHub master code at
https://github.com/rear/rear/blob/master/usr/share/rear/conf/default.conf#L128
If your ReaR recovery system contents are rather big
you need to investigate what makes them so big.
For example on my test system
(I do not use BACKUP=BAREOS
but BACKUP=NETFS
):
# usr/sbin/rear -D mkrescue
...
Wrote ISO image: /root/rear.github.master/var/lib/rear/output/rear-g243.iso (172M)
...
# du -hs /tmp/rear.*/rootfs/
447M /tmp/rear.rN8koxyLA17c3pR/rootfs/
I get a ReaR recovery system of 447M as a 172M ISO image.
MiningPickaxe commented at 2019-06-19 22:49:¶
@jsmeix the thumbs up is because I messaged the bareos usergroup for help.
Memory usage when just beeing logged in, ideling is at 27mb, 733 reserved with the largest "memory hogs" beeing systemd
The ISO is about 250mb in size, the tmp directory beeing 700mb with the size of the folders beeing like this:
38M bin
7.1M boot
100K dev
2.8M etc
0 init
570M lib
4.0K lib64
940K md5sums.txt
12K mnt
4.0K proc
16K root
36K run
0 sbin
4.0K selinux
4.0K sys
20K tmp
80M usr
168K var
which I can't really cut down further I think, the biggest parts of lib are drivers.
jsmeix commented at 2019-06-25 08:25:¶
@MiningPickaxe
the next step to find out what may fill up the memory during "rear
recover"
is to run "rear recover" in MIGRATION_MODE as follows:
# export MIGRATION_MODE='true'
# rear -D recover
Now you get several user dialogs that you can confirm step by step.
In each one you can also Use Relax-and-Recover shell
and return back to that particular user dialog.
In the Relax-and-Recover shell you can inspect the memory usage
at that state during the "rear recover" process.
In particular I think the user dialog
Confirm the recreated disk layout or go back one step
is of interest because after that dialog the backup gets restored.
I assume when you Use Relax-and-Recover shell
in that dialog
the memory usage is still rather low.
I assume that after that dialog when the backup gets restored
the memory usage goes up until OOM happens.
This would indicate the root cause is actually your particular
backup restore program (i.e. the BAREOS backup restore program)
that somehow fills up your memory (and not something else
that was run before the backup restore program).
I think in this case only BAREOS could help you further.
In particular I do not use BAREOS or any other third-party backup
tool.
Therefore I can neither reproduce such issues
nor can I verify what could actually fix it.
jsmeix commented at 2019-06-25 08:42:¶
@MiningPickaxe
I see you still have ReaR 2.3
But for the Confirm the recreated disk layout or go back one step
user dialog you would need at least ReaR 2.4, cf.
https://github.com/rear/rear/commit/c09bae8c230b55ae5c9310af0c2bb040bd97c712
and
https://github.com/rear/rear/pull/1733
But meanwhile ReaR 2.5 was released so that I recommend
to upgrate to our latest version, cf.
http://relax-and-recover.org/download/
Alternatively you could try out our current ReaR GitHub master code
from within a separated directory as a test to find out if things work
better
with our current master code compared to your ReaR 2.3.
Basically "git clone" our current ReaR upstream GitHub master code
into a separated directory and then configure and run ReaR
from within that directory like:
# git clone https://github.com/rear/rear.git
# mv rear rear.github.master
# cd rear.github.master
# vi etc/rear/local.conf
# usr/sbin/rear -D mkbackup
Note the relative paths "etc/rear/" and "usr/sbin/".
I recommend to try out our current ReaR upstream GitHub master code
because that is the only place where we fix bugs - i.e. bugs in older
ReaR versions are not fixed by us (i.e. by ReaR upstream).
If it works with our current ReaR upstream GitHub master code
we would appreciate an explicit positive feedback.
MiningPickaxe commented at 2019-07-05 17:24:¶
Thank you @jsmeix for the assistance, I tried running it in Migration
Mode and with the -D
option and examined the memory usage after each
step, it is staying around 35mb, just as you predicted :D So I guess it
really is an issue with bareos itself.
About your comment with the version, as I stated in a previous comment I
updated to 2.5, my packet sources were outdated and I built it from
source code directly, the correct version would be
2.5-git.3374.a60d6191.master / 2019-06-07
. So that should no longer be
an issue.
gdha commented at 2019-09-11 12:17:¶
@MiningPickaxe Did you involved Bareos to look into your issue?
cyberfarer commented at 2019-10-08 15:45:¶
Hello,
I'm also encountering this issue on Ubuntu 18.04, REAR 2.5 and BareOS
18.2.5-131.1.
Running bareos-fd -d 500 -f on the rear client and initiating the restore from the director, I see a lot of communication and then the OOM error after a Make path /tmp/bareos-restores//lib/systemd/network
Not sure if it's related, but I'm stuck.
cyberfarer commented at 2019-10-08 17:12:¶
For me, this is a noob error. When setting up clients I am using proper host names and not appending '-fd'. The rear scripts specifically look for $HOSTNAME-fd. Adjusting the restore script resolved the issue and the error message is erroneous. Hopefully, this may help someone else.
jsmeix commented at 2019-10-10 11:58:¶
@cyberfarer
that $HOSTNAME-fd
value is just the default/fallback that is used
by the scripts
https://github.com/rear/rear/blob/master/usr/share/rear/prep/BAREOS/default/500_check_BAREOS_bconsole_results.sh#L32
and
https://github.com/rear/rear/blob/master/usr/share/rear/restore/BAREOS/default/400_restore_backup.sh#L103
See
https://github.com/rear/rear/blob/master/usr/share/rear/conf/default.conf#L1794
which shows that BAREOS_CLIENT
is meant to be set as needed by the
user
and it is set by the initial reporter @MiningPickaxe - see his initial
post
https://github.com/rear/rear/issues/2158#issue-453930524
jsmeix commented at 2019-10-10 12:07:¶
@MiningPickaxe
I wonder if perhaps your BAREOS_CLIENT=$(hostname).FQDN.com-fd
results a wrong value which causes an OOM error as for @cyberfarer in
https://github.com/rear/rear/issues/2158#issuecomment-539612769
For example I get
# hostname
g243
# BAREOS_CLIENT=$(hostname).FQDN.com-fd
# echo $BAREOS_CLIENT
g243.FQDN.com-fd
Did you perhaps mean
# hostname --fqdn
g243.suse.de
# BAREOS_CLIENT=$(hostname --fqdn).com-fd
# echo $BAREOS_CLIENT
g243.suse.de.com-fd
?
jsmeix commented at 2019-10-10 14:10:¶
@aussendorf
could you have a look here what might be the root cause
why OOM can happen when restoring with BACKUP=BAREOS?
jsmeix commented at 2020-02-12 07:46:¶
No activity since months so I assume
this issue was either somehow solved
or is no longer of interest.
amtuannguyen commented at 2021-09-17 19:01:¶
In case someone tumble upon this. I tried this today and got the same error. It turned out the default restore location Where is /tmp/bareos-restores, which on the rear recover console is in ram, thus you will get OOM pretty quick. You will need to specify the Where parameter and set it to /mnt/local which is the restoration destination disk. Hope it helps someone.
[Export of Github issue for rear/rear.]