#210 Issue closed: grub recovery gets confused by --prefix option, system hangs during boot

Labels: bug

abbbi opened issue at 2013-03-20 10:36:

hi,

while trying to recover a SLES11 x64 system with REAR everything went well but the system was unable to boot. GRUB installation was reported as beeing succesfull but in the detailed logging it failed:

GUB version 0.97 (640K lower / 3072K upper memory)

[ Minimal BASH-like line editing is supported. For the first word, TAB
lists possible command completions. Anywhere else TAB lists the possible
completions of a device/filename. ]
grub> device (hd0) /dev/sda
grub> root (hd0,1)
Filesystem type is reiserfs, partition type 0x83
grub> setup --stage2=/boot/grub/stage2 --prefix=/grub (hd0)
Checking if "/grub/stage1" exists... no

Error 15: File not found
grub> quit

it seems the --prefix option did confuse it some kind of a way, with a simple:

setup --stage2=/boot/grub/stage2 (hd0)

everything went well.

Relax-and-Recover 1.14 / Git

abbbi commented at 2013-03-20 10:40:

the system did not have a dedicated boot partition, i think it went wrong because of this, according to the script the prefix stays /grub if no seperate disk is found:

finalize/Linux-i386/21_install_grub.sh

bootparts=$( (find_partition fs:/boot; find_partition fs:/) | sort | uniq -u )
grub_prefix=/grub
if [[ -z "$bootparts" ]]; then
    bootparts=$(find_partition fs:/)
    grub_prefix=/boot/grub
fi

i think defaulting to /boot/grub as the prefix makes sense anyway?

gdha commented at 2013-03-26 08:39:

do you mean /boot is not mounted by default?

abbbi commented at 2013-03-26 15:04:

yes, there is no seperate partition for /boot, it exists on / however:

/dev/sda2 on / type reiserfs (rw,acl,user_xattr)
/proc on /proc type proc (rw)
sysfs on /sys type sysfs (rw)
debugfs on /sys/kernel/debug type debugfs (rw)
udev on /dev type tmpfs (rw)
devpts on /dev/pts type devpts (rw,mode=0620,gid=5)
fusectl on /sys/fs/fuse/connections type fusectl (rw)
none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw)

dbupdate:~ # cat /etc/fstab
/dev/sda2 / reiserfs acl,user_xattr 1 1
/dev/sda1 swap swap defaults 0 0
proc /proc proc defaults 0 0
sysfs /sys sysfs noauto 0 0
debugfs /sys/kernel/debug debugfs noauto 0 0
devpts /dev/pts devpts mode=0620,gid=5 0 0
/dev/fd0 /media/floppy auto noauto,user,sync 0 0

gdha commented at 2013-03-31 15:10:

@jhoekx any suggestions on this topic?

jhoekx commented at 2013-03-31 15:18:

@dagwieers knows this better. I was just sitting idly next to him when we wrote than and all I did was just nod when he asked a question :-)

But if I look at it, I think we want our initial search fot $bootparts to return nothing, then the prefix is set to /boot/grub. I don't know why we also search for fs:/ and do the sort and uniq there.

mclien commented at 2013-04-09 10:35:

Had/have the same issue. Did the workaround another way by creating a symlink /grub -> /boot/grub on the System before doing the backup
But I think the script should get /boot/grub as prefix, when no separate boot partition is found

abbbi commented at 2013-06-04 08:29:

hi guys,

anything new on that? I just ran into the same troubles again and i think the fix is quite straight-forward :)

gdha commented at 2013-06-04 09:24:

Some background info (which is important to understand the why's I guess)

From the mail archive I found the following (from @dagwieers ):
Re: [rear-users] mbr broken after recover

Anyway, the reason there is a distinction between /grub/stage1 and
/boot/grub/stage1 is related to the fact that it could be on a separate
filesystem, in that case /grub/stage1 is correct (in fact in most cases
this is what is happening). So apparently it now fails for cases where it
should be using /boot/grub/stage1.

The reason the grub installation code is more complex is because we had to
support this second possibility. Why it now fails, I don't know. We did
test it when we wrote it ;-)

So how did we do this ?

To know exactly what devices are involved with the boot partition, we
search for the dependencies of fs:/boot and we remove any dependencies we
find for fs:/ as shown below:

 bootparts=$( (find_partition fs:/boot; find_partition fs:/) | sort | uniq -u )

If in this case bootparts is empty, it means that fs:/boot and fs:/ share
the same partition(s). In which case we need to use /boot/grub as
grub_prefix.

The reason we have this complexity is because if you have software raid,
you need to be sure both disks are being updated, so we find more than one
partition ! This explains the complexity, otherwise we could just compare
the partition of fs:/ and fs:/boot.

Once we know the boot partition(s) and we have the correct grub_prefix, we
can go and look for the disks that relate to these partitions and install
grub on these.

So how to proceed to debug the problem ? Enable debugging and check what
Rear reports for find_partition fs:/boot and find_partition fs:/, that
should give an indication to what is going on.

PS: we're talking about code from finalize/Linux-i386/21_install_grub.sh and/or finalize/Linux-i386/22_install_grub2.sh

dagwieers commented at 2013-06-04 11:16:

@gdha I am sorry, I should have replied to the bug report, instead of the mailinglist :-/

(unknown) commented at 2013-06-05 12:05:

hi,

i tried to reproduce, the strange thing is:

after recovering the data to /mnt/local and exiting the rear> command prompt with "exit" it does not show any error, the grub failure however is reported to the logfile

if get back into the recovery mode again, skipping the disk partitioninig it then complains correctly about
missing the boot partition:

rear> exit
Did you restore the backup to /mnt/local ? Are you ready to continue recovery ? yes
exit
Updated initramfs with new drivers for this system.
Installing GRUB boot loader
ERROR: BUG BUG BUG! Unable to find any /boot partitions
=== Issue report ===
Please report this unexpected issue at: https://github.com/rear/rear/issues
Also include the relevant bits from /var/log/rear/rear-dbupdate.log

HINT: If you can reproduce the issue, try using the -d or -D option !

Aborting due to an error, check /var/log/rear/rear-dbupdate.log for details
Terminated

so it seems there must be a difference between executing the recovery the first and the second time!
Attached you can find the logfile for the first recovery which does NOT catch the BugIfError, here is the relevant
part with -D

++++ grep -E '^[^ ]+ /dev/sda2 ' /var/opt/sesam/var/lib/rear//var/lib/rear/layout/disktodo.conf
++++ cut -d ' ' -f 3
+++ sort
+++ type=part
+++ [[ part != \p\a\r\t ]]
+++ echo /dev/sda2
+++ for component in '"${ancestors[@]}"'
+++ [[ -n part ]]
++++ get_component_type /dev/sda
++++ grep -E '^[^ ]+ /dev/sda ' /var/opt/sesam/var/lib/rear//var/lib/rear/layout/disktodo.conf
++++ cut -d ' ' -f 3
+++ uniq -u
+++ type=disk
+++ [[ disk != \p\a\r\t ]]
+++ continue
++ bootparts=/dev/sda2
++ grub_prefix=/grub
++ [[ -z /dev/sda2 ]]
++ [[ -n /dev/sda2 ]]
++ BugIfError 'Unable to find any /boot partitions'
++ (( 0 != 0 ))
+++ grep '^disk ' /var/opt/sesam/var/lib/rear//var/lib/rear/layout/disklayout.conf
+++ cut '-d ' -f2

contents of diskalyout/todo:

disk /dev/sda 5368709120 msdos
part /dev/sda 12409206784 32256 primary none /dev/sda1
part /dev/sda 4770662400 575769600 primary boot /dev/sda2
fs /dev/sda2 / reiserfs uuid=3c83197e-b9f8-46ed-9586-105a48932e1b label= options=rw,acl,user_xattr
swap /dev/sda1 uuid= label=

done /dev/sda disk
done /dev/sda1 part
done /dev/sda2 part
done fs:/ fs
done swap:/dev/sda1 swap

I think the error is maybe also caused by the situation that /dev/sda1 is swap, and /dev/sda2 is / aswell as /boot/
if you need any further logfiles please tell me!

dagwieers commented at 2013-06-07 17:38:

I think I get it now, although I don't know why. Look at this piece of code:

    # Find exclusive partitions belonging to /boot (subtract root partitions from deps)
    bootparts=$( (find_partition fs:/boot; find_partition fs:/) | sort | uniq -u )
    grub_prefix=/grub
    if [[ -z "$bootparts" ]]; then
        bootparts=$(find_partition fs:/)
        grub_prefix=/boot/grub
    fi
    # Should never happen
    [[ "$bootparts" ]]
    BugIfError "Unable to find any /boot partitions"

If in your case /boot is in the root partition, we expect to get from find_partition foor both fs:/ and fs:/boot to get the same partition back (/dev/sda2). This means that "sort | uniq -u" removes non-unique entries:

echo -e "/dev/sda2\n/dev/sda2" | sort | uniq -u

And so bootparts is expected to be empty, but in this case it isn't empty and that's the real problem. From your code I cannot tell what is going on (not enough copy&pasted).

Let me do this on my own running system (this is a good way to debug the code BTW:

[root@moria ~]# LAYOUT_DEPS=/var/lib/rear/layout/diskdeps.conf
[root@moria ~]# LAYOUT_FILE=/var/lib/rear/layout/disklayout.conf 
[root@moria ~]# LAYOUT_TODO=/var/lib/rear/layout/disktodo.conf 
[root@moria ~]# source /usr/share/rear/lib/array-functions.sh 
[root@moria ~]# source /usr/share/rear/lib/layout-functions.sh
[root@moria ~]# find_partition fs:/
/dev/sda2
[root@moria ~]# find_partition fs:/boot
/dev/sda1
/dev/sda2
[root@moria ~]# ( find_partition fs:/; find_partition fs:/boot )
/dev/sda2
/dev/sda1
/dev/sda2
[root@moria ~]# ( find_partition fs:/; find_partition fs:/boot ) | sort
/dev/sda1
/dev/sda2
/dev/sda2
[root@moria ~]# ( find_partition fs:/; find_partition fs:/boot ) | sort | uniq -u
/dev/sda1

This is to be expected in my case (/dev/sda1 is the /boot partition). In your case it should return nothing. Which means / and /boot are the same device.

dagwieers commented at 2013-06-07 19:37:

Ok, I got it:

[root@moria ~]# find_partition fs:/
/dev/sda2
[root@moria ~]# find_partition fs:/usr

So the find_partition only works for mountpoints :-(

dagwieers commented at 2013-06-07 20:21:

Please test the fix and reopen the bug if it does not work.


[Export of Github issue for rear/rear.]