#1012 Issue closed: Deduplicate files in the rescue/recovery system

Labels: enhancement, cleanup, fixed / solved / done, won't fix / can't fix / obsolete

jsmeix opened issue at 2016-09-28 10:03:

Currently there are can be identical files (under different filenames)
in the rescue/recovery system, for example see
https://github.com/rear/rear/issues/953#issuecomment-249210310

Hereby I like to investigate if it is possible (with reasonable effort)
to add a script that deduplicates files in the rescue/recovery system.

For example the "fdupes" tool can be used to deduplicate files,
but it needs to be used carefully, see
https://en.opensuse.org/openSUSE:Packaging_Conventions_RPM_Macros#.25fdupes

I like to experimet with "fdupes" and if it works well for me
we may call it optionally (when it is installed in the original system)
to deduplicate files in the rescue/recovery system.

jsmeix commented at 2016-09-28 10:39:

Luckily fdupes is installed on my SLE12-SP2 system
because it is required by some Ruby stuff

# rpm -e --test fdupes
error: Failed dependencies:
fdupes is needed by (installed) ruby-common-2.1-16.1.noarch

and Ruby in general is needed in particular by YaST so that
at least on SUSE systems it seems fdupes is always installed.

gozora commented at 2016-09-28 11:37:

Hi, @jsmeix careful with de-duplication around ReaR recover ISO images. Remember that ELILO vs GRUB2 differences when booting from ISO? (same kernel and initrd had to be present in two different places in order to do successful boot ...)

jsmeix commented at 2016-09-28 12:23:

@gozora
many thanks for the hint!
But I think kernel and initrd for the ISO image are
not in $ROOTFS_DIR (e.g. /tmp/rear.ZNEvXvGgqgJ00Ij/rootfs )
but in $TMP_DIR (e.g. /tmp/rear.ZNEvXvGgqgJ00Ij/tmp )
and I would only deduplicate in $ROOTFS_DIR.

Here some first data after "rear -d -D mkrescue"
on my SLES12-SP2 test system

# fdupes -rSm /tmp/rear.ZNEvXvGgqgJ00Ij/rootfs
517 duplicate files (in 418 sets), occupying 12.1 megabytes
# du -hs /tmp/rear.ZNEvXvGgqgJ00Ij/rootfs
342M    /tmp/rear.ZNEvXvGgqgJ00Ij/rootfs

I.e. deduplication in $ROOTFS_DIR could at most
save about 12 MB which is only about 3.5% of the 342M.

From my current point of view that is too little benefit
compared to the additional effort and compared to
the additional risk to mess up something subtle
in the recovery system because of deduplication.

The intent of that issue is hereby fulfilled for me:

I investigated if deduplication is possible (with reasonable effort)
and I found out that it is not.

Therefore I can close it as both "fixed " and "won't fix" ;-)


[Export of Github issue for rear/rear.]