#465 Issue closed: rear configured with netfs backup deletes other folders on NFS at failure

Labels: bug, fixed / solved / done

andreas-tarp opened issue at 2014-09-24 15:31:

Hi,

when configuring rear with BACKUP=NETFS and NETFS_URL=... pointing to a folder not only containing backups of one single node, but other data (like backups of other nodes) rear can delete all files on the NFS location when cleaning up tmp directories. Here the scenario to reproduce:

  1. rear version Relax-and-Recover 1.16.1 (on SLES 11.3)
  2. configure rear with BACKUP=NETFS and NETFS_URL point to a directory which is not empty, so used for example by other machines to store also their backups
  3. login via ssh to the machine to be backed up
  4. start to create a backup with "rear -v mkbackup" and wait until rear shows progress about files system backup on terminal (something like "Archived 434 MiB [avg 8722 KiB/sec]")
  5. loose connection to your machine (either by fault or use "kill PID" where PID is the PID of your ssh session
  6. rear now recognizes that the session died
    (Log entry in rear-.log:
    /usr/share/rear/lib/_input-output-functions.sh: line 1: 38654 Hangup)
  7. rear now tries to unmount the NFS share, but this fails unfortunately as most likely still some data is transferred
    (Log entry in rear-.log:
    ++++ umount -f -v /tmp/rear.ZsnBUJDGiKeYRNw/outputfs
    .....................
    umount2: Device or resource busy)
  8. rear ignores the umount problem and forces removal of "temp" data
    (Log entry in rear-.log:
    2014-09-24 16:56:43 Removing build area /tmp/rear.ZsnBUJDGiKeYRNw
    ++++ rm -Rf /tmp/rear.ZsnBUJDGiKeYRNw/tmp
    ++++ rm -Rf /tmp/rear.ZsnBUJDGiKeYRNw/rootfs
    ++++ rm -Rf /tmp/rear.ZsnBUJDGiKeYRNw/outputfs)
  9. result is that all folders on NFS directory mounted by rear are gone (so also the ones not created by rear)

In my point of view this is a very nasty behaviour. Rear should not cleanup outputfs folder when NFS unmount fails.

Workaround is to point NETFS_URL= to a dedicated directory on NFS to ensure that rear can only delete backups of that single node.

Edit: I have a rear-.log with debugscript mode enabled. Unfortunately I can not attach it, github only accepts images?

Edit 2: Besides this rear works like a charm on HP DL380G8 servers running SLES 11.3. Thanks a lot for the great tool.

BR, Andreas

gdha commented at 2014-09-24 17:09:

@andreas-tarp It is the /tmp/rear.ZsnBUJDGiKeYRNw/outputfs which is causing you the pains. You can add the log via gist.github.com and copy the link in here.

andreas-tarp commented at 2014-09-25 06:40:

Hi,

thanks for the gist.gitbug.com tip. Here the logfile:
https://gist.github.com/andreas-tarp/210bee9b4ef52dec4d60/download#

gdha commented at 2014-09-25 07:20:

I see in the log file:

++++ rm -Rf /tmp/rear.ZsnBUJDGiKeYRNw/outputfs
rm: cannot remove `/tmp/rear.ZsnBUJDGiKeYRNw/outputfs': Device or resource busy
++++ rmdir -v /tmp/rear.ZsnBUJDGiKeYRNw
rmdir: removing directory, `/tmp/rear.ZsnBUJDGiKeYRNw'
rmdir: failed to remove `/tmp/rear.ZsnBUJDGiKeYRNw': Directory not empty

that the outputfs directory could not be removed. Therefore, I cannot understand (yet) that files beneath the outputfs get deleted? Are you saying that all the files got deleted execpt the directory itself?

andreas-tarp commented at 2014-09-25 07:41:

on NFS server all files and folders below the folder configured in rear local.conf parameter NETFS_URL are deletet. I think the message

rm: cannot remove `/tmp/rear.ZsnBUJDGiKeYRNw/outputfs': Device or resource busy

is shown as rear can not delete the outputfs folder on machine running "rear mkbackup" as NFS directory is still mounte (umount failed due to device being busy).

Below some printouts from NFS server showing the behavior there before and after rear crashed due to lost ssh connection:

  1. rear backup runnning. ls -lsh output on NFS server at backup destination folder configured in /etc/rear/local.conf. As you can see there are also a few "nodeX" subfolders not belonging to current backup prozess which is writing to subfolder "Node20"
-bash-3.00# pwd
/flardir/Core/
-bash-3.00# ls -lsh *
node1:
Gesamt 20496
20496 -rw------T   1 root     root         10M Sep 24 16:54 backup.0

node2:
Gesamt 0

node3:
Gesamt 0

Node20:
Gesamt 800776
33200 -rw-------   1 root     root         16M Sep 23 15:43 backup.log
642640 -rw-------   1 root     root        314M Sep 24 16:56 backup.tar.gz
110608 -rw-------   1 root     root         54M Sep 24 16:55 Node20.initrd.cgz
7728 -rw-------   1 root     root        3,8M Sep 24 16:55 Node20.kernel
   2 -rw-------   1 root     root         271 Sep 24 16:55 Node20.message
   2 -rw-------   1 root     root         516 Sep 24 16:55 README
   2 -rw-------   1 root     root         529 Sep 24 16:55 rear-Node20
6592 -rw-------   1 root     root        3,2M Sep 24 16:55 rear.log
   2 -rw-------   1 root     root         271 Sep 24 16:55 VERSION
  1. ssh session with rear running in foreground got killed, afterwards all files on mounted directory on NFS Server are deleted (ls does not show anything anymore):
-bash-3.00# pwd
/flardir/Core/
-bash-3.00# ls -lsh *
  1. on the machine which had "rear mkbackup" running NFS destionation is still mounted, but rear itself has already terminated. Thats the reason why outputfs can not be deleted:
Node20:/var/log/rear # mount
..............
192.168.0.177:/flardir/Core/Node20/rear/ on /tmp/rear.ZsnBUJDGiKeYRNw/outputfs type nfs (rw,nfsvers=3,nolock,addr=192.168.0.177)

gdha commented at 2014-10-08 10:33:

@andreas-tarp perhaps you could try the following : change in lib/framework-functions.shthe function cleanup_build_area_and_end_program the line rm -Rf $BUILD_DIR/outputfs into rmdir $v $BUILD_DIR/outputfs
Keep me informed if this would fix your problem?

andreas-tarp commented at 2014-10-13 11:11:

@gdha thanks a lot for looking into the issue. Now with the replaced line the system behaves much better. Rear is not deleting folders on the NFS location anymore. Of course it now fails to delete the local /tmp/rear.xxxx folder as it is not empty. On top the NFS locations is still beeing mounted after rear finished.
But a manual unmount works fine after logging in again to the system Maybe it would make sense to implement some waiting (maybe 10 or 20 seconds) and a umount retry afterwards. I assume, that the umount fails until system has written cached data to NFS location, which might take several seconds.

gdha commented at 2015-02-13 17:38:

added to the release notes so we can close this issue

pcahyna commented at 2021-05-05 19:53:

The bug was reintroduced in 1.18. I reported it as #2611.


[Export of Github issue for rear/rear.]