#36 Issue closed: Copy complains 'are the same file' during mkbackup

Labels: bug

dagwieers opened issue at 2012-03-29 09:06:

Reported by retvog at SF#3485728 on 2012-02-08 05:20:11 PST

Original report

I just tried to make the ISO of an RHEL x64 6.2 Installation with ReaR version 1.12. But during the execution of rear mkbackup I always get the following error:

[root@ws11 tmp]# rear -d mkbackup
2012-02-08 14:08:38.970702426 Using \'blkid\' for vol_id
Relax and Recover 1.12.0 / 2011-11-22 10:21:35 +0100
Creating disk layout
Creating root FS layout
WARNING: To login as root via ssh you need to setup an authorized_keys file in /root/.ssh
Copy files and directories
Copy program files & libraries
ERROR: [LibCopyTo] Could not copy \'/usr/local/ibm/gsk8/lib/libgsk8cms.so\' to \'/tmp/rear.euxhvPZ676LzJGh/rootfs/lib\'
Aborting due to an error, check /tmp/rear-ws11.log for details
Finished in 3 seconds
You should also rm -Rf /tmp/rear.euxhvPZ676LzJGh
Terminated
[root@ws11 tmp]#

The same procedure with version 1.11 does work. We use IBM Tivoli as Backup solution.

The error log reports a file duplication ...:

... (list of binaries and libraries ommited)

cp: `/usr/local/ibm/gsk8/lib/libgsk8cms.so\' and `/tmp/rear.ySBEWR3uxIv22as/rootfs/lib/libgsk8cms.so\' are the same file
Trace: 122 StopIfError /usr/share/rear/lib/_input-output-functions.sh
Trace: 134 LibCopyTo /usr/share/rear/lib/linux-functions.sh
Trace: 61 source /usr/share/rear/build/GNU/Linux/39_copy_binaries_libraries.sh
Trace: 40 Source /usr/share/rear/lib/framework-functions.sh
Trace: 79 SourceStage /usr/share/rear/lib/framework-functions.sh
Trace: 24 WORKFLOW_mkbackup /usr/share/rear/lib/mkbackup-workflow.sh
Trace: 242 main /usr/sbin/rear
2012-02-08 14:12:55 ERROR: [LibCopyTo] Could not copy \'/usr/local/ibm/gsk8/lib/libgsk8cms.so\' to \'/tmp/rear.ySBEWR3uxIv22as/rootfs/lib\'
2012-02-08 14:12:55 Running exit tasks.
2012-02-08 14:12:55 Finished in 3 seconds
2012-02-08 14:12:55 Removing build area /tmp/rear.ySBEWR3uxIv22as
2012-02-08 14:12:55 End of program reached

What could be the problem here?

dagwieers commented at 2012-03-29 09:06:

The original report on Sourceforge at SF#3485728 contains a lot more details.

dagwieers commented at 2012-03-29 11:26:

What is happening here is this:

[dag@moria test]$ vim a.txt
[dag@moria test]$ ln -sf a.txt b.txt
[dag@moria test]$ cp -a a.txt b.txt
cp: `a.txt' and `b.txt' are the same file

The reason this is happening is because both file's device and inode are the same (the device and inode of the symlink are those of the file):

[dag@moria test]$ stat --printf '%n %D %i\n' a.txt
a.txt fd02 1705409
[dag@moria test]$ stat --printf '%n %D %i\n' b.txt
b.txt fd02 1705408
[dag@moria test]$ stat -L --printf '%n %D %i\n' b.txt
b.txt fd02 1705409

Which means that cp is dereferencing TARGET.

Why the symlink already exists, that is the real question, I thought the list of files to copy would be sort'd/uniq'd. Either we prevent the same file to be copied twice, or we check if the file already exists (and points to the same file). If not, error out...

dagwieers commented at 2012-03-29 11:29:

Or we could simply use:

cp -a --remove-destination a.txt b.txt

or

cp -au a.txt b.txt

dagwieers commented at 2012-03-29 12:07:

OK, after talking this through with Jeroen, we understand what is happening and the cause of the issue is that /usr/bin is a symlink to /bin. Let me explain:

  • If a binary file in /opt is symlinked to both /bin and /usr/bin, our copying will first copy the symlink in /bin and then the symlink in /usr/bin, since /usr/bin is a symlink to bin we are replacing two identical symlinks (and we are dereferencing both SOURCE and TARGET), BOOM same file
  • Imagine a binary file in /bin that is symlinked to /usr/bin, our copying will first copy the original file to /bin, and then the symlink in /usr/bin, since /usr/bin is a symlink to /bin we are trying to replace the real file with a symlink (to itself). BOOM same file

The exact same thing happens for /lib and /usr/lib, which is what we believe are seeing here. Which means that none of the proposed solutions would actually help:

  • we cannot remove the destination, because we might replace a binary with a symlink to itself
  • we cannot update the file (if newer) because the original could be the symlink, not the real file
  • we cannot harmonize (sort -u) the complete list of files, because those files look unique but are not because of the /bin-and-/usr/bin-symlink

Which brings us to the heart of the matter: Why are we symlinking /usr/bin to /bin and /usr/lib to /lib ?

Maybe we should leave the merge of /bin, /lib and /usr up to the distribution and is it better to leave everything as it is working on the system ?

schlomo commented at 2012-03-29 13:55:

Hi,

as the one who originally decided upon the file structure in the rescue
system I can try to shed some light on the history...

On 29 March 2012 14:07, Dag Wieƫrs <
reply@reply.github.com

wrote:

OK, after talking this through with Jeroen, we understand what is
happening and the cause of the issue is that /usr/bin is a symlink to /bin.
Let me explain:

  • If a binary file in /opt is symlinked to both /bin and /usr/bin, our
    copying will first copy the symlink in /bin and then the symlink in
    /usr/bin, since /usr/bin is a symlink to bin we are replacing two identical
    symlinks (and we are dereferencing both SOURCE and TARGET), BOOM same
    file
  • Imagine a binary file in /bin that is symlinked to /usr/bin, our
    copying will first copy the original file to /bin, and then the symlink in
    /usr/bin, since /usr/bin is a smlink to bin we are trying to replace the
    real file with a symlink (to itself). BOOM same file

Thanks for analyzing this, I concur that this sound like the likely root
cause.

The exact same thing happens for /lib and /usr/lib, which is what we
believe are seeing here. Which brings us to the heart of the matter: Why
are we symlinking /usr/bin to /bin and /usr/lib to /lib ?

Maybe we should leave the merge of /bin, /lib and /usr up to the
distribution and is it better to leave it as it is working on the system ?

Back then we found that different distros spread the binaries a little bit
differntly through /bin and /usr/bin. Instead of faithfully copying that I
decided to unify everything into /bin and /lib and symlink all other
locations to support software that uses hard-coded paths. Back then this
decision significantly simplified the code.

Maybe we should revisit this decision and think about faithfully cloning
the source OS. Maybe it is enough to simply not fail ReaR if cp meets the
same file and rely on cp to dereference symlinks. That way we could keep
our simplified rescue root.

I would say that the final decision is up to those who write the code and
validate it...

Regards,
Schlomo

dagwieers commented at 2012-03-29 14:05:

We have moved this issue to Rear v1.15 milestone since it has worked for so long without any issues. Since a change to the core of Rear is something that needs to be well tested, we will wait until we have the automated testing infrastructure (Rear v1.14) in place so we can see what the effect is of our changes on the various supported and tested distributions.

So while I would like to see this fixed asap, let's make sure we get our priorities right.

gdha commented at 2012-05-10 10:21:

Be aware that in Fedora 17 /bin is a symlink to /usr/bin (so fedora reversed the logic). In rear we keep the symlink to /bin and create other directories if required. That piece of code has already been changed (especially for F17)

dagwieers commented at 2012-06-03 00:16:

Fixed by Upstart expert @jhoekx in commit c1ef53238ee009d726d8a2da8ad4b8526393ca88 :-)


[Export of Github issue for rear/rear.]