#492 Issue closed: Grub2 Install Issue Upon Recovery of CentOS 7 System

Labels: waiting for info, support / question, fixed / solved / done

zwilliams360 opened issue at 2014-10-24 16:44:

I was attempting to recover a CentOS 7 machine to the same machine with different (larger) hard drives. My general setup is as follows:

  • two drives each with 2 partitions:
    • 512 MB partition that held /boot a long time ago - now unused
    • 80 GB partition
  • the 80 GB partitions are pulled into a a RAID 1 software raid array (/dev/md127)
  • the /dev/md127 array is defined as an LVM physical volume, and holds a volume group with my root, swap, and any snapshot logical volumes.

rear -V shows: Relax-and-Recover 1.16.1 / Git

/etc/rear/local.conf contains the following additions:
BACKUP=NETFS
BACKUP_URL=usb:///dev/disk/by-label/REAR-000
OUTPUT=USB
I also included the following line in the same file and commented it out in /usr/share/rear/conf/default.conf. I realize I probably should have just appended the "/data01/" directory here, but I didn't and I don't think it's relevant to any issues I'm having.
BACKUP_PROG_EXCLUDE=( '/tmp/' '/dev/shm/' $VAR_DIR/output/* '/data01/*')

Recovery seems mostly successful, except when I reach the part when it is time to install Grub2:

Installing GRUB2 boot loader

WARNING ! For this system
RedHatEnterpriseServer/7 on Linux-i386 (based on Fedora/7/i386)
there is no code to install a boot loader on the recovered system or the code
that we have failed to install the boot loader correctly.

Please contribute this code to the Relax-and-Recover project. To do so
please take a look at the scripts in /usr/share/rear/finalize,
for an example you can use the script for Fedora (and RHEL/CentOS/SL) in
/usr/share/rear/finalize/Linux-i386/20_install_grub.sh

--------------------  ATTENTION ATTENTION ATTENTION -------------------
|                                                                     |
|          IF YOU DO NOT INSTALL A BOOT LOADER MANUALLY,              |
|                                                                     |
|          THEN YOUR SYSTEM WILL N O T BE ABLE TO BOOT !              |
|                                                                     |
-----------------------------------------------------------------------

You can use 'chroot /mnt/local bash --login' to access the recovered system.
Please remember to mount /proc before trying to install a boot loader.

I'm curious to know if the cause of this is:

  • there is no code to install a boot loader on the recovered system
    or
  • the code that we have failed to install the boot loader correctly

Are there known issues surrounding the ReaR Grub2 install on RedHat/CentOS 7 systems? I've run into Grub2 installation problems multiple times now using completely different sets of machines. I've been consistently using the same setup though with similar partitioning, software raid, and LVM. I've had to manually install Grub2 and it's been a pretty painful procedure each time, so I would love to see if I can get this part automated. I likely have more information I can provide such as the disklayout.sh, diskrecovery.sh, and logs, but I want to make sure I don't start pasting too much information yet that may not be relevant.

Zach Williams

gdha commented at 2014-10-27 17:14:

will check it out myself...

Reiner030 commented at 2014-10-27 19:05:

I guess thats the same problem as on Debian => https://github.com/rear/rear/issues/473

zwilliams360 commented at 2014-10-27 20:41:

Feel free to request more information from me if it will help confirm. I can run more backups and recoveries and add debug flags if possible. Meanwhile, I'll see if I can understand what is going on in issue #473.

Reiner030 commented at 2014-10-27 22:32:

My last comment could be perhaps sufficient from mentioned issue.
@gdha has added common grub2 support but it was failing in my test environment with automatic recovery. When I tested some days ago with manual recovery it was working fine.

zwilliams360 commented at 2014-10-31 20:54:

I believe I have the same issue whether I choose the first option under my hostname, or the second one labelled with "AUTOMATIC RECOVERY." Are these the manual and autorecovery methods you are referring to? It actually took me a while to figure out exactly what you meant by manual, but I think I understand. Under the first option, I end up typing "rear -v recover" and then everything seems about the same as the second option.

My issue is that my partition table comes back differently. The disklayout.sh script shows the correct number of bytes (I believe the units are bytes). But, the diskrestore.sh script ends up with different numbers, in effect, shifting the partitions back to the first track. My first partition /dev/sda1 for example originally began at 4096. Upon recovery, the disklayout.sh file reflects the partition beginning at the 4096 sector boundary, but gets created at the 64 sector boundry. It seems like this shouldn't be a big deal since the partitions do get created with the correct size. (I was going for filesystem alignment and SSD cell alignment, but in a disaster recovery situation, that probably wouldn't be the highest priority concern.)

So, all goes well until the Grub2 installation where I get the above mentioned error. I also saw a message about the reason being that my core.img was too large to fit in the embedding area. So, my solution right now is to simply edit the parted commands in the diskrestore.sh script so that the partitions are created as they were on the original machine (beginning at the 4096 boundry instead of 64). This isn't too bad, it's just something I'll need to document for our recovery plan.

Is it expected that ReaR wants to create the partition right after the first track? It would seem reasonable, except for maybe the issue I am running into with Grub not able to fit.

I should mention that one thing that caused me to pull my hair out a bit while troubleshooting was that multiple times I ran into an issue where it would (seem to) hang while creating the LVM volumes. I finally figured out what was going on by opening another terminal, realizing I could access the log file, and saw that it was just waiting for a (y/n) response due to seeing previous LVM info on the disk. So I just hit 'y' and 'enter' where I thought it was hanging, and the recovery scripts kept going. Don't get me wrong, I'm just giving some feedback - not complaining. This software is awesome!

gdha commented at 2014-11-01 09:24:

Perhaps you could check if the code from
https://github.com/gsbit/rear/commit/f5ef907e7cc54f2cfe486bf98479fd3649420d4f
merged with rear (issue #487) could have caused the shift in disk boundary?

On Fri, Oct 31, 2014 at 9:54 PM, zwilliams360 notifications@github.com
wrote:

I believe I have the same issue whether I choose the first option under my
hostname, or the second one labelled with "AUTOMATIC RECOVERY." Are these
the manual and autorecovery methods you are referring to? It actually took
me a while to figure out exactly what you meant by manual, but I think I
understand. Under the first option, I end up typing "rear -v recover" and
then everything seems about the same as the second option.

My issue is that my partition table comes back differently. The
disklayout.sh script shows the correct number of bytes (I believe the units
are bytes). But, the diskrestore.sh script ends up with different numbers,
in effect, shifting the partitions back to the first track. My first
partition /dev/sda1 for example originally began at 4096. Upon recovery,
the disklayout.sh file reflects the partition beginning at the 4096 sector
boundary, but gets created at the 64 sector boundry. It seems like this
shouldn't be a big deal since the partitions do get created with the
correct size. (I was going for filesystem alignment and SSD cell alignment,
but in a disaster recovery situation, that probably wouldn't be the highest
priority concern.)

So, all goes well until the Grub2 installation where I get the above
mentioned error. I also saw a message about the reason being that my
core.img was too large to fit in the embedding area. So, my solution right
now is to simply edit the parted commands in the diskrestore.sh script so
that the partitions are created as they were on the original machine
(beginning at the 4096 boundry instead of 64). This isn't too bad, it's
just something I'll need to document for our recovery plan.

Is it expected that ReaR wants to create the partition right after the
first track? It would seem reasonable, except for maybe the issue I am
running into with Grub not able to fit.

I should mention that one thing that caused me to pull my hair out a bit
while troubleshooting was that multiple times I ran into an issue where it
would (seem to) hang while creating the LVM volumes. I finally figured out
what was going on by opening another terminal, realizing I could access the
log file, and saw that it was just waiting for a (y/n) response due to
seeing previous LVM info on the disk. So I just hit 'y' and 'enter' where I
thought it was hanging, and the recovery scripts kept going. Don't get me
wrong, I'm just giving some feedback - not complaining. This software is
awesome!


Reply to this email directly or view it on GitHub
https://github.com/rear/rear/issues/492#issuecomment-61329905.

zwilliams360 commented at 2014-11-03 16:37:

I tried adding the new code from #487, but it didn't work for me and I don't think it applies since I'm only working with primary partitions. I looked through the rest of the code in 10_include_partition_code.sh and I see the "while" loop where the partition start information ($pstart) is read, but it looks like $pstart only gets set if I'm not in "Migration Mode." Otherwise it just remains set to 32768 which is what it was initialized to. In my case, I am migrating to slightly larger disks, so my MIGRATION_MODE variable is set to true. I didn't get the chance yet to try altering the "if" statement to force it to use the $pstart value, but I'll try this next.

zwilliams360 commented at 2014-11-03 21:31:

I just tried updating this line of the code (line 134) in /usr/share/rear/layout/prepare/GNU/Linux/10_include_partition_code.sh:
if [ -z "$MIGRATION_MODE" ] && ! [ "$pstart" = "unknown" ] ; then
to be
if ! [ -z "$MIGRATION_MODE" ] && ! [ "$pstart" = "unknown" ] ; then
Notice the "!" in front of the test for whether or not "$MIGRATION_MODE" is an empty string.

This allowed the script to progress into the "if" block and set $pstart to the correct value instead of leaving it as its initialized value of 32768. If I do this, I can proceed through the remaining recovery steps and the partitions get created as expected.

Edit - I changed the line number above from 124 to 134.

gdha commented at 2014-11-04 14:20:

@jhoekx Jeroen - I value your feedback concerning @zwilliams360 tests with /usr/share/rear/layout/prepare/GNU/Linux/10_include_partition_code.sh modification.

@zwilliams360 concerning the statement waiting for a (y/n) response due to seeing previous LVM info on the disk above: do you still remember which LVM command is was (in the diskrestore.sh script?)

zwilliams360 commented at 2014-11-04 16:29:

The y/n issue comes from: /usr/share/rear/layout/prepare/GNU/Linux/11_include_lvm_code.sh
Line 107: echo "lvm lvcreate -l $nrextents -n ${lvname} ${vgrp#/dev/} >&2"

This is in the create_lvmvol() function. A quick search suggested adding a --yes parameter to lvcreate would force the volume creation, but maybe just sending it to stdout instead of stderr is better here?

I found a picture I took of one the messages related to this. It said: "WARNING: xfs signature detected on /dev/centos/root at offset 0. Wipe it? [y/n]" The message before this one was while creating the swap LV and it was similar except saying a swap signature had been detected.

gdha commented at 2014-12-05 17:04:

@jhoekx could you please give your feedback on above mentioned migration topic?

jhoekx commented at 2014-12-05 17:58:

Sorry, didn't see your previous question...

I think we can just increase the default start to be at 4096*512. I don't see problems there, but I might of course be wrong. If we restore the original system to equally sized disks, we use the old value. In case we're migrating we need to be prepared for the common scenario, which in these days means we need a little bit of extra space for grub2.

That would solve @zwilliams360 's problem and let the code keeps its original intent?

gdha commented at 2015-02-13 17:41:

added to the release notes so we can close this issue


[Export of Github issue for rear/rear.]