#649 Issue closed: wrong uuid in initrd for bootfs

Labels: bug, support / question, fixed / solved / done

mattihautameki opened issue at 2015-09-09 14:34:

Hi!

I am having problems with the uuid of /dev/sda1 in the initrd created within the recovery process.
my setup:

  • SLES12 with latest patches applied
  • Relax-and-Recover 1.17.1 / Git
  • /boot on /dev/sda1
  • the rest on lvm with /dev/sda2 as PV

I do a restore of the same machine where the backup comes from. Before the restore my disks are discoverd like this:

RESCUE BLIXESX3:/dev/disk/by-uuid # ls -l
total 0
lrwxrwxrwx 1 root root  9 Sep  9 14:27 2015-09-02-16-23-48-00 -> ../../sr0
lrwxrwxrwx 1 root root 10 Sep  9 14:27 ef2113b0-9341-415d-a45d-6835beae7148 -> ../../sda1

After the restore i have strange devicelinks underneath /dev/disk/by-uuid/. Sadly the links in there are used by the systemd in the initrd.

RESCUE BLIXESX3:/dev/disk/by-uuid # ls -l
total 0
lrwxrwxrwx 1 root root 10 Sep  9 13:20 0e7135a4-1606-443c-99c5-064a03f26932 -> ../../dm-3
lrwxrwxrwx 1 root root 10 Sep  9 13:20 0fcf1353-0909-4423-a88f-c00f5272b6ab -> ../../dm-3
lrwxrwxrwx 1 root root 10 Sep  9 13:20 128f2ecd-a000-46f6-933f-2b6661458544 -> ../../dm-1
lrwxrwxrwx 1 root root 10 Sep  9 13:20 14208f02-35c0-47cd-8201-c1fd12042f68 -> ../../dm-6
lrwxrwxrwx 1 root root  9 Sep  9 13:20 2015-09-02-16-23-48-00 -> ../../sr0
lrwxrwxrwx 1 root root 10 Sep  9 13:20 47378618-9b36-4a4e-8d5a-195a6e0e4e8a -> ../../dm-1
lrwxrwxrwx 1 root root 10 Sep  9 13:20 5907f02a-b6e4-4729-82e3-a234ebea3fea -> ../../dm-8
lrwxrwxrwx 1 root root 10 Sep  9 13:20 5f5c8f2a-789f-43b4-ae84-f7c737a3c2e9 -> ../../dm-4
lrwxrwxrwx 1 root root 10 Sep  9 13:20 641a1c07-3107-43e5-b458-5afcf288562c -> ../../dm-2
lrwxrwxrwx 1 root root 10 Sep  9 13:20 81cc7d96-07c2-4518-a036-9c1c37d9a0ec -> ../../dm-4
lrwxrwxrwx 1 root root 10 Sep  9 13:20 83da97b5-5b9c-40aa-bf6e-1caaad73b2b2 -> ../../dm-5
lrwxrwxrwx 1 root root 10 Sep  9 13:20 a99b81a7-90b5-45e2-8816-352f9cdd9097 -> ../../dm-0
lrwxrwxrwx 1 root root 10 Sep  9 13:20 bc597625-8240-45c7-a389-d5abae9cb5ef -> ../../dm-8
lrwxrwxrwx 1 root root 10 Sep  9 13:20 c9227301-e341-47df-970f-46f786beead8 -> ../../sda1
lrwxrwxrwx 1 root root 10 Sep  9 13:20 d5f49f17-21c6-4415-9a1b-a496ef5fee88 -> ../../dm-2
lrwxrwxrwx 1 root root 10 Sep  9 13:20 ed9f0cef-ef0f-4640-b7ae-73811d5162d2 -> ../../dm-6
lrwxrwxrwx 1 root root 10 Sep  9 13:20 f0811e55-82a6-41d7-85ed-fa3231c5105c -> ../../dm-0
lrwxrwxrwx 1 root root 10 Sep  9 13:20 f5729f73-47dc-4bed-8671-6e4e8a8465a8 -> ../../dm-7

I don't know why there are 2 links for every dm-file. But in my opinion the link for /dev/sda1 should match the output of blkid:

RESCUE BLIXESX3:/dev/disk/by-uuid # blkid /dev/sda1
/dev/sda1: UUID="ef2113b0-9341-415d-a45d-6835beae7148" TYPE="xfs" PARTUUID="000dc137-01"

This is also the uuid which is defined in disklayout.conf and according to the logfile, the creation of the xfs filesystem on /dev/sda1 works without errors.

RESCUE BLIXESX3:/var/lib/rear/layout # cat disklayout.conf | grep sda1
part /dev/sda 270532608 1048576 primary boot /dev/sda1
fs /dev/sda1 /boot xfs uuid=ef2113b0-9341-415d-a45d-6835beae7148 label=  options=rw,relatime,attr2,inode64,noquota

The problem is that the generated initrd searchs for the disk c9227301-e341-47df-970f-46f786beead8 and therefore the restored system won't boot.

chrooted to /mnt/local/

BLIXESX3:/ # lsinitrd | grep "initrd.target.wants/dev"
lrwxrwxrwx   1 root     root           78 Sep  9 15:22 etc/systemd/system/initrd.target.wants/dev-disk-by\\x2duuid-c9227301\\x2de341\\x2d47df\\x2d970f\\x2d46f786beead8.device -> ../dev-disk-by\\x2duuid-c9227301\\x2de341\\x2d47df\\x2d970f\\x2d46f786beead8.device

The expected result would be a unitfile for ef2113b0-9341-415d-a45d-6835beae7148 in the initrd.
As a workaround I can again reboot the system with to rear iso, mount all the filesystems and recreate the initrd with dracut. this way the correct unitfile is created.
If its a bug maybe it is fixed in 1.17.2 but I didn't find anything that sounds related to me.

Please excuse my bad english!

gdha commented at 2015-09-10 07:36:

@mattihautameki I am trying to understand where the c9227301-e341-47df-970f-46f786beead8 -> ../../sda1 is coming from. Did you check the rear.log after the recovery?
If I understand it well in the savelayout file and before the recovery the UUID was the correct one (ef2113b0-9341-415d-a45d-6835beae7148)? Therefore, we need to find out why the UUID was changed after recovery? Was it forced somehow?

mattihautameki commented at 2015-09-10 11:26:

There is a problem in the rear.log which belongs to the integrity check. But this is because of my ISO_MAX_SIZE setting.
Yes, before the recovery and in the savedlayout the uuid is correct. Accodring to the log, the Filesystem aswell is created with the correct uuid.

I found some errors in the journal

Sep 10 10:44:49 BLIXESX3 systemd-udevd[2240]: failed to execute '/usr/sbin/multipath' '/usr/sbin/multipath -i -u sda': No such file or directory
Sep 10 10:44:49 BLIXESX3 systemd[1]: dev-disk-by\x2duuid-ef2113b0\x2d9341\x2d415d\x2da45d\x2d6835beae7148.device changed plugged -> dead
Sep 10 10:44:49 BLIXESX3 systemd[1]: dev-disk-by\x2dpath-pci\x2d0000:00:10.0\x2dscsi\x2d0:0:0:0\x2dpart1.device changed plugged -> dead
Sep 10 10:44:49 BLIXESX3 systemd[1]: dev-sda1.device changed plugged -> dead
Sep 10 10:44:49 BLIXESX3 systemd[1]: sys-devices-pci0000:00-0000:00:10.0-host0-target0:0:0-0:0:0:0-block-sda-sda1.device changed plugged -> dead

Thing is, that multipath does not exist in the recovery and on the original System it is located under /sbin/ instead of /usr/sbin/

gdha commented at 2015-10-15 19:02:

@jsmeix Is this something you have seen before with SLES12?

jsmeix commented at 2015-10-16 07:35:

I do not regularly test rear with LVM.

My recent tests are with SLES12-SP1 and rear 1.17.2 on a virtual KVM/QEMU machine with a single 20GB virtual harddisk and there only the SLES12-SP1 default btrfs structure (i.e. no LVM) and for me the recovered system boots.

I am totally unable to regularly test various different kind of setups that are listed as supported by rear - I simply will never ever have the time for that.

@mattihautameki
what is the complete content of your disklayout.conf file (i.e. without "grep sda1")?

In general when a particular rear version does not work, you should try out the newest version on a test machine (of course not on your production server). Even if it also does not work with the newest rear version this is the version where development happens.

jsmeix commented at 2015-10-16 11:49:

@mattihautameki
if you like you may try out the newest SUSE-specific rear version
that is rear 1.17.2 with SLE12-SP1-btrfs.patch that makes it work
specifically for the default btrfs structure in SLE12-SP1, see
https://github.com/rear/rear/issues/556

You can get it from the openSUSE Build Service project "Archiving"
therein the package "rear" e.g. for direct RPM download
for SLE12 x86_64 from
http://download.opensuse.org/repositories/Archiving/SLE_12/

mattihautameki commented at 2015-10-16 13:00:

I now upgraded the Server to the latest SLE12 Patchlevel and installed rear-1.17.2-1.x86_64. So far no difference. Initrd still has wrong Disk in initrd.target.wants.

Interestingly, I noticed that the problem only exists when I do the restore on the same machine the backup was created from. When the "destination /dev/sda1" has a different uuid than the the one from disklayout.conf or it has no layout at all, then everything works fine.

disklayout.conf:

disk /dev/sda 33285996544 msdos
part /dev/sda 270532608 2097152 primary boot /dev/sda1
part /dev/sda 33013362688 272633856 primary lvm /dev/sda2
lvmdev /dev/vg0 /dev/sda2 V7mYIN-wcYA-uCbz-7Me2-zRLx-LCKV-3HopVx 64479224
lvmgrp /dev/vg0 4096 7870 32235520
lvmvol /dev/vg0 lvdata0 256 2097152 
lvmvol /dev/vg0 lvhome 256 2097152 
lvmvol /dev/vg0 lvlogs 256 2097152 
lvmvol /dev/vg0 lvproducts 256 2097152 
lvmvol /dev/vg0 lvroot 1536 12582912 
lvmvol /dev/vg0 lvswap0 256 2097152 
lvmvol /dev/vg0 lvsysbkp 1280 10485760 
lvmvol /dev/vg0 lvtmp 256 2097152 
lvmvol /dev/vg0 lvvar 512 4194304 
# Filesystems (only ext2,ext3,ext4,vfat,xfs,reiserfs,btrfs are supported).
# Format: fs <device> <mountpoint> <fstype> [uuid=<uuid>] [label=<label>] [<attributes>]
fs /dev/mapper/vg0-lvdata0 /data0 xfs uuid=a99b81a7-90b5-45e2-8816-352f9cdd9097 label=  options=rw,relatime,attr2,inode64,noquota
fs /dev/mapper/vg0-lvhome /home xfs uuid=47378618-9b36-4a4e-8d5a-195a6e0e4e8a label=  options=rw,nodev,relatime,attr2,inode64,noquota
fs /dev/mapper/vg0-lvlogs /logs xfs uuid=d5f49f17-21c6-4415-9a1b-a496ef5fee88 label=  options=rw,relatime,attr2,inode64,noquota
fs /dev/mapper/vg0-lvproducts /products xfs uuid=0e7135a4-1606-443c-99c5-064a03f26932 label=  options=rw,relatime,attr2,inode64,noquota
fs /dev/mapper/vg0-lvroot / xfs uuid=5f5c8f2a-789f-43b4-ae84-f7c737a3c2e9 label=  options=rw,relatime,attr2,inode64,noquota
fs /dev/mapper/vg0-lvsysbkp /sysbkp xfs uuid=14208f02-35c0-47cd-8201-c1fd12042f68 label=  options=rw,relatime,attr2,inode64,noquota
fs /dev/mapper/vg0-lvtmp /tmp xfs uuid=f5729f73-47dc-4bed-8671-6e4e8a8465a8 label=  options=rw,nosuid,nodev,noexec,relatime,attr2,inode64,noquota
fs /dev/mapper/vg0-lvvar /var xfs uuid=5907f02a-b6e4-4729-82e3-a234ebea3fea label=  options=rw,relatime,attr2,inode64,noquota
fs /dev/sda1 /boot xfs uuid=ef2113b0-9341-415d-a45d-6835beae7148 label=  options=rw,relatime,attr2,inode64,noquota
swap /dev/mapper/vg0-lvswap0 uuid=83da97b5-5b9c-40aa-bf6e-1caaad73b2b2 label=

I will give the rpm from http://download.opensuse.org/repositories/Archiving/SLE_12/ a try and post the outcomes.

jsmeix commented at 2015-10-16 13:42:

You wrote
"problem only exists when I do the restore on the same machine"
and this together with that strange "multipath" stuff might
indicate that the root cause is "magic strings" on the disk
that get autodetected by such tools, see "man wipefs"
(excerpt)

wipefs can erase filesystem, raid or partition-table signatures
(magic strings) from the specified device to make the
signatures invisible for libblkid.

In the rear 1.17.2 files I cannot find "wipefs" which indicates that
rear does not use it so that such special signatures on the disk
are not removed during recovery.

@gdha
wasn't there already another issue related to "wipefs"?

In general when re-using a used disk one should use wipefs
to clean up possible remainders of the previous usage.

You may have a look at my script for
"Generic system installation with the plain SUSE installation system"
in https://en.opensuse.org/SDB:Disaster_Recovery
how I use wipefs on all partitions before I create filesystems there
(YaST does the same during installation).

@gdha
I think wipefs support should be added to rear.
It could be done optionally like:
If wipefs executable exists in recovery system, then use it.

jsmeix commented at 2015-10-16 13:47:

@mattihautameki
the rpm from
http://download.opensuse.org/repositories/Archiving/SLE_12/
has no change compared to upstream rear 1.17.2
that could solve this issue (as far as I can imagine).

jsmeix commented at 2015-10-16 14:06:

@mattihautameki

an untested proposal how to add wipefs support:

In
usr/share/rear/layout/prepare/GNU/Linux/13_include_filesystem_code.sh
directly before each "mkfs ... $device" command
insert a "wipefs -a $device" for example as follows (untested !):

wipefs -a $device
mkfs -t ${fstype}${blocksize}${fragmentsize}${bytes_per_inode} $device >&2
.
.
.
wipefs -a $device
mkfs.xfs -f $device
.
.
.
mount | grep -q $device || { wipefs -a $device ; mkfs -t $fstype -f $device ; }
.
.
.
wipefs -a $device
mkfs.vfat $device
.
.
.
wipefs -a $device
mkfs -t $fstype $device >&2

Because this way wipefs is called unconditioned
in the recovery system you must make it available there
ususlly in /etc/rear/local.conf by using

REQUIRED_PROGS=( "${REQUIRED_PROGS[@]}" wipefs )

jsmeix commented at 2015-10-16 14:31:

I implemented my above proposal how to add wipefs support
and did a quick test with my SLE12-SP1 test system
with btrfs and recovery still works for me (but I never had
such an issue where wipefs would actually be needed).

@mattihautameki
please also implement my above proposal how to add wipefs support
and test on your system if wipefs helps in your case.

mattihautameki commented at 2015-10-23 11:11:

I put a simple Debug log in every ".sh" File in /usr/share/rear just to see when the the link is not correct anymore.

1st and 2nd Line:

MYDEBUGDEV="`ls -l /dev/disk/by-uuid/ |grep sda1`"
 Log "DEVICE for sda: $MYDEBUGDEV"

Here is the output http://pastebin.ca/3213322
I also noticed that a udevadm trigger does correct the by-uuid-link instantly after the restore.

@jsmeix
I will report the outcome of your wipefs approach shortly.

mattihautameki commented at 2015-10-28 09:17:

@jsmeix
Sadly, its the same outcome with wipefs applied. http://pastebin.ca/3222953

jsmeix commented at 2015-11-06 12:52:

Regarding my above question
"wasn't there already another issue related to 'wipefs'?"

I found that other issue:
It is
https://github.com/rear/rear/issues/540
"Implement a generic 'cleanupdisk' function."

gdha commented at 2015-11-20 08:58:

@mattihautameki Now that wipefs mechanism has been added by @jsmeix could you retry your experiments to verify your problem has gone by this patch?

jsmeix commented at 2015-11-20 13:48:

@mattihautameki
note that currently for using wipefs you must have

REQUIRED_PROGS=( "${REQUIRED_PROGS[@]}" wipefs )

in your /etc/rear/local.conf.

Making wipefs automatically available to the rear recovery system
when it is available in the original system is a next step, cf.
https://github.com/rear/rear/pull/704

gdha commented at 2016-02-09 11:19:

@mattihautameki we received no feedback since a long time from you - is this issue closed for you?

mattihautameki commented at 2016-02-11 12:09:

Sry! I was busy. Just now I tested commit 8190db4ffd7d1a5a7bc5cec796bdaf2d48788f95 with no success. I inserted "udevadm trigger" before the initrd is created. This seems to fix the problem but not the cause. https://gist.github.com/mattihautameki/249ffbe89cfbc32e5d40 (line 965 and 966)

mattihautameki commented at 2016-03-02 15:43:

@gdha I inserted 'udevadm trigger' in https://github.com/mattihautameki/rear/commit/f46722a3d1480cd391568be4bd73e21280278713. Is it possible to implement something like that or do you think that this may cause any other problems?

gdha commented at 2016-03-02 15:58:

@mattihautameki it won't hurt so why not preparing a pull request?

mattihautameki commented at 2016-03-04 23:33:

@gdha I just created a pull request. I wanted to ask because I was not sure you would accept a workaround rather than a fix for the real cause.

exedor commented at 2016-05-26 07:09:

For what it's worth, I'm having the same problem and I don't think it has anything to do with wipefs or pre-existing metadata. Every time I repeat the installation I get the same error about how /dev/disk/by-uuid/--random-UUID-- does not exist. but --random-UUID-- is different each time I run it even though it fails. Out of the 8 systems I'm testing on, 2 of them exhibit this behavior. One has identical drive geometry and the other does not.

mattihautameki commented at 2016-06-17 11:27:

@exedor What distribution do you use? For me this issue only appears on SLES12 and not on RHEL7/CentOS7. Since the workaround in https://github.com/rear/rear/commit/f46722a3d1480cd391568be4bd73e21280278713 SLES12 works for me. I always tested on one machine (make backup -> restore backup on the same host)

exedor commented at 2016-06-18 02:09:

I was running Fedora Core 21 and using the latest checkout of the source repository. I eventually figured out what the problem was and tested a fix that works equivalently on all other platforms as well. The shell script in rear was using mkfs to create the file system which generates a random UUID. Then afterward it was calling tune2fs to set the UUID back to the one used on the original backed up system. That call to tune2fs was not sticking so when the system would go to mount the filesystems using the original UUID on reboot after restore the OS load would fail looking for the wrong UUID. I'm still not entirely sure why that was the case but mkfs takes a command line param for setting the UUID and using that (only when present of course) instead of tune2fs resolved the issue.

jsmeix commented at 2016-09-21 13:20:

I think https://github.com/rear/rear/issues/649 fixes it.
Therefore I close it.
If I am wrong, it can be reopened.

exedor commented at 2016-09-22 16:39:

Sounds good.

On Wed, Sep 21, 2016 at 7:20 AM, Johannes Meixner notifications@github.com
wrote:

Closed #649 https://github.com/rear/rear/issues/649.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/rear/rear/issues/649#event-797291132, or mute the
thread
https://github.com/notifications/unsubscribe-auth/AGGIOKoRbC-E6E4i8KNUSeirnpctuZ1hks5qsS8FgaJpZM4F6Tv8
.

jsmeix commented at 2016-09-23 07:33:

Typo in my comment https://github.com/rear/rear/issues/649#issuecomment-248609839

I meant the fixes for https://github.com/rear/rear/issues/851
(in particular https://github.com/rear/rear/pull/894)
also fixes this issue here.


[Export of Github issue for rear/rear.]