#851 Issue closed: First boot after restore on FC21 fails trying to find UUID originally generated by mkfs before tune2fs changes it

Labels: enhancement, bug, waiting for info, fixed / solved / done

exedor opened issue at 2016-05-27 22:36:

  • rear version (/usr/sbin/rear -V): rear-1.18-1.git201605271003
  • OS version (cat /etc/rear/os.conf or lsb_release -a): Fedora 21
  • rear configuration files (cat /etc/rear/site.conf or cat /etc/rear/local.conf):
OUTPUT=USB
BACKUP=NETFS
BACKUP_URL=usb:///dev/disk/by-label/REAR-000
USB_RETAIN_BACKUP_NR=1
BACKUP_PROG_EXCLUDE=( "${BACKUP_PROG_EXCLUDE[@]}"
                      '/backup'
                      '/sys'
                      '/root/.ssh/known_hosts'
                      '/root/.ssh/known_hosts2'
                      '/root/.ccache'
                      '/root/.gem'
                      '/selinux'
                      '/var/cache/ccache'
                      '/var/cache/yum'
                      '/var/cache/man'
                      '/usr/local/src'
                      '/usr/share/doc'
                      '/usr/share/man'
                      '/usr/share/gtk-doc'
                      '/usr/share/info'
                      '/usr/src/*'
                      '/usr/lib/modules/3.8.13-ds'
                      '/var/log/messages'
                      '/var/log/maillog'
                      '/var/log/Xorg.0.log'
                      '/var/log/Xorg.0.log.old'
                      '/var/log/secure'
                      '/var/log/journal'
                      '/var/log/audit'
                      '/var/log/wtmp'
                      '/var/log/lastlog'
                      '/dev'
                      '/proc'
                      '/var/lib/rear/output' )
COPY_AS_IS_EXCLUDE=( ${copy_as_is_exclude[@]}
                      '/backup'
                      '/sys'
                      '/root/.ssh/known_hosts'
                      '/root/.ssh/known_hosts2'
                      '/root/.ccache'
                      '/root/.gem'
                      '/selinux'
                      '/var/cache/ccache'
                      '/var/cache/yum'
                      '/var/cache/man'
                      '/usr/local/src'
                      '/usr/share/doc'
                      '/usr/share/man'
                      '/usr/share/gtk-doc'
                      '/usr/share/info'
                      '/usr/src/*'
                      '/var/log/messages'
                      '/var/log/maillog'
                      '/var/log/Xorg.0.log'
                      '/var/log/Xorg.0.log.old'
                      '/var/log/secure'
                      '/var/log/journal'
                      '/var/log/audit'
                      '/var/log/wtmp'
                      '/var/log/lastlog'
                      '/dev'
                      '/proc'
                      '/var/lib/rear/output' )
USE_DHCLIENT=No
USE_STATIC_NETWORKING=Yes
TMPDIR="/home/sim/release-1-9-40/devel/install/images/temp"
  • Brief description of the issue:

This is only a problem on a system with an identical disk drive. The initrd image of the restored system fails to boot after completing restore because it is still looking for the UUID's that were allocated when the filesystem was created and used prior to the call to tune2fs to change them back to what they were on the backed up system. This makes no sense unless something is monitoring the disk for changes and then automatically updating initrd but therein lies the mystery.

  • Work-around, if any:

1: Let dracut go to its emergency shell (takes around 5 minutes)
2: Mount root file system somewhere
3: chroot to mounted root file system
4: mount /boot at its respective mount point
5: run dracut -f /boot/initrd_image_filename
6: reboot

exedor commented at 2016-05-28 11:35:

I found a fix for this in the code. There is clearly something very wrong with first calling mkfs and then afterwards calling tune2fs, at least on Fedora there is something wrong with it. Trying to figure out how to create a pull request now...
Tried to submit pull request, permission denied. Tried to upload patch file, file type not supported, attempted to zip it first, still get "we don't support that file type," so I posted the patch to the mailing list.

gdha commented at 2016-05-30 07:45:

See small tutorial at https://yangsu.github.io/pull-request-tutorial/

On Sat, May 28, 2016 at 1:35 PM, jb notifications@github.com wrote:

I found a fix for this in the code. There is clearly something very wrong
with first calling mkfs and then afterwards calling tune2fs, at least on
Fedora there is something wrong with it. Trying to figure out how to create
a pull request now...


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
https://github.com/rear/rear/issues/851#issuecomment-222304072, or mute
the thread
https://github.com/notifications/unsubscribe/AA2POXOwu0dZhvkdRhZlRTuiX2EnG6XUks5qGCiEgaJpZM4Io6ft
.

exedor commented at 2016-05-30 07:49:

I read docs and looked up a different tutorial URL but git tells me I
have no permission to push my branch with the changes in it.

On 5/30/2016 1:45 AM, gdha wrote:

See small tutorial at https://yangsu.github.io/pull-request-tutorial/

On Sat, May 28, 2016 at 1:35 PM, jb notifications@github.com wrote:

I found a fix for this in the code. There is clearly something very
wrong
with first calling mkfs and then afterwards calling tune2fs, at least on
Fedora there is something wrong with it. Trying to figure out how to
create
a pull request now...


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
https://github.com/rear/rear/issues/851#issuecomment-222304072, or
mute
the thread

https://github.com/notifications/unsubscribe/AA2POXOwu0dZhvkdRhZlRTuiX2EnG6XUks5qGCiEgaJpZM4Io6ft
.


You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
https://github.com/rear/rear/issues/851#issuecomment-222432463, or
mute the thread
https://github.com/notifications/unsubscribe/AGGIOGhTlhujlQcqDX6xP_izzgBqzxntks5qGpWIgaJpZM4Io6ft.

gdha commented at 2016-05-30 15:17:

@exedor you first need to fork rear to your own github account and do your modification on your fork. When done you can request a pull request.

exedor commented at 2016-05-30 19:17:

Ahhh, that was the missing link. Thank you!

On 5/30/2016 9:17 AM, gdha wrote:

@exedor https://github.com/exedor you first need to fork rear to
your own github account and do your modification on your fork. When
done you can request a pull request.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/rear/rear/issues/851#issuecomment-222513215, or
mute the thread
https://github.com/notifications/unsubscribe/AGGIOMnvtYGRUbKKy335sIQN4aG55QvGks5qGv-mgaJpZM4Io6ft.

gdha commented at 2016-06-01 09:54:

@exedor just applied your pull request - please try it out - let me know if it works well?

jsmeix commented at 2016-06-22 15:04:

It seems now it does no longer work on RHEL 5.10
see https://github.com/rear/rear/issues/890

gdha commented at 2016-06-22 15:41:

@jsmeix perhaps it is wiser to revert this pull request and investigate why the tune2fs failed on F21?

jsmeix commented at 2016-06-23 09:01:

@exedor
your change causes a regression on RHEL 5.10
see https://github.com/rear/rear/issues/890

Therefore we need to find another way.

Regardless that I am not at all an expert in initrd issues
I like to get some understanding in this issue.

First an unrelated side note regarding your

USE_DHCLIENT=No

means the same as if you set

USE_DHCLIENT=yes_please_really_use_it

because /usr/share/rear/conf/default.conf reads

# PLEASE NOTE:
...
# * Boolean variables can be set to anything as we only check wether the variable
#   is not empty.

which matches the code in
usr/share/rear/skel/default/etc/scripts/system-setup.d/58-start-dhclient.sh

# check if we have USE_DHCLIENT=y, if not then we run 60/62 scripts
[[ -z "$USE_DHCLIENT"  ]] && return

If you do not want to USE_DHCLIENT
you must set it to empty and not to "No".

Now to the actual issue:

I do not understand what you mean with

This is only a problem on a system with an identical disk drive.

I would assume (but I did not verify it) that
if the disk is identical, then its UUID must be identical
(otherwise the disk is not identical)
and when the UUID did not change I do not understand
how there could be an UUID issue.

In other words when the UUID is identical
it should not matter if rear can set it to the same value
or if that fails and the old UUID on the disk is left as is.

Could you explain in more detail how the issue happens.

In particular what is the UUID on the original system,
what is the UUID in the recovery system before "rear recover" is run
what is the UUID in the recovery system after "rear recover" had finished.

exedor commented at 2016-06-23 14:58:

On 6/23/2016 3:01 AM, Johannes Meixner wrote:

@exedor https://github.com/exedor
your change causes a regression on RHEL 5.10
see #890 https://github.com/rear/rear/issues/890

Therefore we need to find another way.

Agreed.

Regardless that I am not at all an expert in initrd issues
I like to get some understanding in this issue.

Now to the actual issue:

I do not understand what you mean with

This is only a problem on a system with an identical disk drive.
When doing an "AUTOMATIC" restore to a system with different drive
geometry where the partitions must be proportionately rebuilt, I get
extra prompts where I must use option 5 to proceed....twice. That's
fine, and the restore proceeds and executes successfully, the system
reboots, the partitions using their original UUID mount up just fine
from the UUID references contained in the original restored fstab file,
and all is well.

When doing an "AUTOMATIC" restore to a system that has a hard drive with
identical geometry to the backed up system, I run into problems.
Everything is normal including the extra prompts disappearing, and after
selecting which backup to use (I only have one btw, so that menu, IMO is
superfluous) the system restores as expected. However upon reboot,
intird is unable to read any data on any of the partitions because it is
attempting to access them using UUIDs that do not exist and never
existed on the source system.

I would assume (but I did not verify it) that

if the disk is identical, then its UUID must be identical
(otherwise the disk is not identical)

By identical I mean restoring to a system with identical drive geometry,
manufacturer, model, etc..

and when the UUID did not change I do not understand
how there could be an UUID issue.

Here is how. The system backups up fine. During restore, restoring the
filesystems including their respective UUIDs was a two step process.
1: Call mkfs to create the file systems
2: Check if there were originally UUIDs via the former if [ -n "$uuid"
]... and if so, use tune2fs to "fix" the UUID to it's original value
from the backed up system.

The problem was that mkfs, when it creates the new filesystem generates
its own new random UUIDs for the file system, then tune2fs comes back
around and "fixes them up" to be what they were previously. That step
was not working or not syncing or something so that when initrd was
created it grabbed the wrong UUIDs because the ones tune2fs should have
written apparently didn't. Using the shell when boot failed during the
restore, I was having a heck of a time figuring out where those UUIDs
were coming from in the error messages initrd was puking all over the
screen until I found out they were the random ones generated by mkfs call.

As I said, I didn't have the time to keep digging to determine why
tune2fs was not doing it's job or if there was some other issue like
maybe no sync to desk prior to initrd generation. But whenever a brand
new drive, using identical geometry was used in the target system for
restore, the boot immediately after restore would fail every single time
on failure to find filesystems with the original UUIDs since tune2fs
call didn't seem to take effect when it should have.

I tested my revision so that instead of calling tune2fs after the file
system was created, I instead check at the mkfs call and then use it's
-U parameter so it just uses the correct UUID at filesystem creation
time instead of first creating bogus UUIDs and then trying to fix them
up later. Seemed like a cleaner approach until it broke older versions.
It is possible that doing something more, like tune2fs followed
immediately by a sync call could also resolve the problem. I didn't
have a week to climb into the guts of those utilities to find out why
they weren't actually writing their changes to disk.

In other words when the UUID is identical
it should not matter if rear can set it to the same value
or if that fails and the old UUID on the disk is left as is.

There is no UUID on the disk when doing a restore to a blank
drive....that is until mkfs creates one....and then tune2fs (pre-patch)
tries to set it back to what it needs to be in order for originally
backed up OS to find it after restore.

Could you explain in more detail how the issue happens.

Yep, see above.

In particular what is the UUID on the original system,

UUID on original system is arbitrary. I can put it in here if you want,
but it doesn't matter. It can just be the UUID on whatever system was
backed up.

what is the UUID in the recovery system before "rear recover" is run

There isn't one....brand new clean, unmolested, untouched, virgin drive.

what is the UUID in the recovery system after "rear recover" had finished.

They match, which is what made this issue so puzzling and took so long
to find, but initrd during boot was still failing because it was looking
for the UUIDs mkfs originally generated during restore....I'm still
unsure exactly why. Like I said, I can only imagine it's because the new
UUIDs were not actually written to disk by the time the restore process
created the new initrd for the restored system. If you want, I can go
back to the original code and try adding a sync call or something to the
script immediately following the tune2fs call but it sure seemed cleaner
to just have mkfs use the right one (if it was needed) in the first place.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/rear/rear/issues/851#issuecomment-227990748, or
mute the thread
https://github.com/notifications/unsubscribe/AGGIOH13cnREq3Cf327-qFNXWSMS-Cbtks5qOkuCgaJpZM4Io6ft.

jsmeix commented at 2016-06-23 16:22:

Many thanks for your explanation.

Curently I don't have a Fedora system for testing.

I will try to install one and see how it behaves.

But tomorrow I don't have time
and next week is SUSE Hackweek (see https://github.com/rear/rear/issues/841)
which means you need to be patient...

jsmeix commented at 2016-06-27 12:29:

With https://github.com/rear/rear/pull/894
it still uses "mkfs -U" by default but if that fails
it falls back to the old behaviour
using mkfs without -U plus "tune2fs/tune4fs -U".

I tested it on SLE11 with ext3 and SLE12 with ext4.
On those systems "mkfs -U" works.

@exedor
accordingly I assume current rear master
also still works for you on FC21
but to be on the safe side I would appreciate it
if you could verify that current rear master
still works for you on FC21.

gdha commented at 2016-08-31 12:05:

related to issue #649 and #790


[Export of Github issue for rear/rear.]