#210 Issue closed
: grub recovery gets confused by --prefix option, system hangs during boot¶
Labels: bug
abbbi opened issue at 2013-03-20 10:36:¶
hi,
while trying to recover a SLES11 x64 system with REAR everything went well but the system was unable to boot. GRUB installation was reported as beeing succesfull but in the detailed logging it failed:
GUB version 0.97 (640K lower / 3072K upper memory)
[ Minimal BASH-like line editing is supported. For the first word,
TAB
lists possible command completions. Anywhere else TAB lists the
possible
completions of a device/filename. ]
grub> device (hd0) /dev/sda
grub> root (hd0,1)
Filesystem type is reiserfs, partition type 0x83
grub> setup --stage2=/boot/grub/stage2 --prefix=/grub (hd0)
Checking if "/grub/stage1" exists... no
Error 15: File not found
grub> quit
it seems the --prefix option did confuse it some kind of a way, with a simple:
setup --stage2=/boot/grub/stage2 (hd0)
everything went well.
Relax-and-Recover 1.14 / Git
abbbi commented at 2013-03-20 10:40:¶
the system did not have a dedicated boot partition, i think it went wrong because of this, according to the script the prefix stays /grub if no seperate disk is found:
finalize/Linux-i386/21_install_grub.sh
bootparts=$( (find_partition fs:/boot; find_partition fs:/) | sort | uniq -u )
grub_prefix=/grub
if [[ -z "$bootparts" ]]; then
bootparts=$(find_partition fs:/)
grub_prefix=/boot/grub
fi
i think defaulting to /boot/grub as the prefix makes sense anyway?
gdha commented at 2013-03-26 08:39:¶
do you mean /boot is not mounted by default?
abbbi commented at 2013-03-26 15:04:¶
yes, there is no seperate partition for /boot, it exists on / however:
/dev/sda2 on / type reiserfs (rw,acl,user_xattr)
/proc on /proc type proc (rw)
sysfs on /sys type sysfs (rw)
debugfs on /sys/kernel/debug type debugfs (rw)
udev on /dev type tmpfs (rw)
devpts on /dev/pts type devpts (rw,mode=0620,gid=5)
fusectl on /sys/fs/fuse/connections type fusectl (rw)
none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw)
dbupdate:~ # cat /etc/fstab
/dev/sda2 / reiserfs acl,user_xattr 1 1
/dev/sda1 swap swap defaults 0 0
proc /proc proc defaults 0 0
sysfs /sys sysfs noauto 0 0
debugfs /sys/kernel/debug debugfs noauto 0 0
devpts /dev/pts devpts mode=0620,gid=5 0 0
/dev/fd0 /media/floppy auto noauto,user,sync 0 0
gdha commented at 2013-03-31 15:10:¶
@jhoekx any suggestions on this topic?
jhoekx commented at 2013-03-31 15:18:¶
@dagwieers knows this better. I was just sitting idly next to him when we wrote than and all I did was just nod when he asked a question :-)
But if I look at it, I think we want our initial search fot $bootparts
to return nothing, then the prefix is set to /boot/grub
. I don't know
why we also search for fs:/ and do the sort and uniq there.
mclien commented at 2013-04-09 10:35:¶
Had/have the same issue. Did the workaround another way by creating a
symlink /grub -> /boot/grub on the System before doing the backup
But I think the script should get /boot/grub as prefix, when no separate
boot partition is found
abbbi commented at 2013-06-04 08:29:¶
hi guys,
anything new on that? I just ran into the same troubles again and i think the fix is quite straight-forward :)
gdha commented at 2013-06-04 09:24:¶
Some background info (which is important to understand the why's I guess)
From the mail archive I found the following (from @dagwieers ):
Re: [rear-users] mbr broken after recover
Anyway, the reason there is a distinction between /grub/stage1
and
/boot/grub/stage1
is related to the fact that it could be on a
separate
filesystem, in that case /grub/stage1
is correct (in fact in most
cases
this is what is happening). So apparently it now fails for cases where
it
should be using /boot/grub/stage1
.
The reason the grub installation code is more complex is because we had
to
support this second possibility. Why it now fails, I don't know. We
did
test it when we wrote it ;-)
So how did we do this ?
To know exactly what devices are involved with the boot partition, we
search for the dependencies of fs:/boot and we remove any dependencies
we
find for fs:/ as shown below:
bootparts=$( (find_partition fs:/boot; find_partition fs:/) | sort | uniq -u )
If in this case bootparts is empty, it means that fs:/boot
and fs:/
share
the same partition(s). In which case we need to use /boot/grub
as
grub_prefix.
The reason we have this complexity is because if you have software
raid,
you need to be sure both disks are being updated, so we find more than
one
partition ! This explains the complexity, otherwise we could just
compare
the partition of fs:/
and fs:/boot
.
Once we know the boot partition(s) and we have the correct grub_prefix,
we
can go and look for the disks that relate to these partitions and
install
grub on these.
So how to proceed to debug the problem ? Enable debugging and check
what
Rear reports for find_partition fs:/boot
and find_partition fs:/
,
that
should give an indication to what is going on.
PS: we're talking about code from
finalize/Linux-i386/21_install_grub.sh
and/or
finalize/Linux-i386/22_install_grub2.sh
dagwieers commented at 2013-06-04 11:16:¶
@gdha I am sorry, I should have replied to the bug report, instead of the mailinglist :-/
(unknown) commented at 2013-06-05 12:05:¶
hi,
i tried to reproduce, the strange thing is:
after recovering the data to /mnt/local and exiting the rear> command prompt with "exit" it does not show any error, the grub failure however is reported to the logfile
if get back into the recovery mode again, skipping the disk
partitioninig it then complains correctly about
missing the boot partition:
rear> exit
Did you restore the backup to /mnt/local ? Are you ready to continue
recovery ? yes
exit
Updated initramfs with new drivers for this system.
Installing GRUB boot loader
ERROR: BUG BUG BUG! Unable to find any /boot partitions
=== Issue report ===
Please report this unexpected issue at:
https://github.com/rear/rear/issues
Also include the relevant bits from /var/log/rear/rear-dbupdate.log
HINT: If you can reproduce the issue, try using the -d or -D option !¶
Aborting due to an error, check /var/log/rear/rear-dbupdate.log for
details
Terminated
so it seems there must be a difference between executing the recovery
the first and the second time!
Attached you can find the logfile for the first recovery which does NOT
catch the BugIfError, here is the relevant
part with -D
++++ grep -E '^[^ ]+ /dev/sda2 '
/var/opt/sesam/var/lib/rear//var/lib/rear/layout/disktodo.conf
++++ cut -d ' ' -f 3
+++ sort
+++ type=part
+++ [[ part != \p\a\r\t ]]
+++ echo /dev/sda2
+++ for component in '"${ancestors[@]}"'
+++ [[ -n part ]]
++++ get_component_type /dev/sda
++++ grep -E '^[^ ]+ /dev/sda '
/var/opt/sesam/var/lib/rear//var/lib/rear/layout/disktodo.conf
++++ cut -d ' ' -f 3
+++ uniq -u
+++ type=disk
+++ [[ disk != \p\a\r\t ]]
+++ continue
++ bootparts=/dev/sda2
++ grub_prefix=/grub
++ [[ -z /dev/sda2 ]]
++ [[ -n /dev/sda2 ]]
++ BugIfError 'Unable to find any /boot partitions'
++ (( 0 != 0 ))
+++ grep '^disk '
/var/opt/sesam/var/lib/rear//var/lib/rear/layout/disklayout.conf
+++ cut '-d ' -f2
contents of diskalyout/todo:
disk /dev/sda 5368709120 msdos
part /dev/sda 12409206784 32256 primary none /dev/sda1
part /dev/sda 4770662400 575769600 primary boot /dev/sda2
fs /dev/sda2 / reiserfs uuid=3c83197e-b9f8-46ed-9586-105a48932e1b label=
options=rw,acl,user_xattr
swap /dev/sda1 uuid= label=
done /dev/sda disk
done /dev/sda1 part
done /dev/sda2 part
done fs:/ fs
done swap:/dev/sda1 swap
I think the error is maybe also caused by the situation that /dev/sda1
is swap, and /dev/sda2 is / aswell as /boot/
if you need any further logfiles please tell me!
dagwieers commented at 2013-06-07 17:38:¶
I think I get it now, although I don't know why. Look at this piece of code:
# Find exclusive partitions belonging to /boot (subtract root partitions from deps)
bootparts=$( (find_partition fs:/boot; find_partition fs:/) | sort | uniq -u )
grub_prefix=/grub
if [[ -z "$bootparts" ]]; then
bootparts=$(find_partition fs:/)
grub_prefix=/boot/grub
fi
# Should never happen
[[ "$bootparts" ]]
BugIfError "Unable to find any /boot partitions"
If in your case /boot
is in the root partition, we expect to get from
find_partition foor both fs:/ and fs:/boot to get the same partition
back (/dev/sda2). This means that "sort | uniq -u" removes non-unique
entries:
echo -e "/dev/sda2\n/dev/sda2" | sort | uniq -u
And so bootparts
is expected to be empty, but in this case it isn't
empty and that's the real problem. From your code I cannot tell what is
going on (not enough copy&pasted).
Let me do this on my own running system (this is a good way to debug the code BTW:
[root@moria ~]# LAYOUT_DEPS=/var/lib/rear/layout/diskdeps.conf
[root@moria ~]# LAYOUT_FILE=/var/lib/rear/layout/disklayout.conf
[root@moria ~]# LAYOUT_TODO=/var/lib/rear/layout/disktodo.conf
[root@moria ~]# source /usr/share/rear/lib/array-functions.sh
[root@moria ~]# source /usr/share/rear/lib/layout-functions.sh
[root@moria ~]# find_partition fs:/
/dev/sda2
[root@moria ~]# find_partition fs:/boot
/dev/sda1
/dev/sda2
[root@moria ~]# ( find_partition fs:/; find_partition fs:/boot )
/dev/sda2
/dev/sda1
/dev/sda2
[root@moria ~]# ( find_partition fs:/; find_partition fs:/boot ) | sort
/dev/sda1
/dev/sda2
/dev/sda2
[root@moria ~]# ( find_partition fs:/; find_partition fs:/boot ) | sort | uniq -u
/dev/sda1
This is to be expected in my case (/dev/sda1 is the /boot partition). In your case it should return nothing. Which means / and /boot are the same device.
dagwieers commented at 2013-06-07 19:37:¶
Ok, I got it:
[root@moria ~]# find_partition fs:/
/dev/sda2
[root@moria ~]# find_partition fs:/usr
So the find_partition only works for mountpoints :-(
dagwieers commented at 2013-06-07 20:21:¶
Please test the fix and reopen the bug if it does not work.
[Export of Github issue for rear/rear.]