#1846 Issue closed: Partitioning errors in RAWDISK creation on Debian 7 and CentOS 6

Labels: enhancement, fixed / solved / done

GreenBlood opened issue at 2018-06-28 13:17:

Relax-and-Recover (ReaR) Issue

  • ReaR version:
    Relax-and-Recover 2.4-git.3020.aa7b197.master / 2018-06-21
  • OS version:
    At least CentOS 6 and Debian 7 (fully updated)
  • ReaR configuration files:
BACKUP=NETFS
OUTPUT=RAWDISK
BACKUP_TYPE=incremental
FULLBACKUPDAY="Sun"


SSH_ROOT_PASSWORD="XXXXXXX"
USE_DHCLIENT=yes

BACKUP_URL=cifs://XXXXX/XXXXX
BACKUP_OPTIONS="cred=/etc/rear/.cifs_credentials"
  • System architecture (x86 compatible or POWER and/or what kind of virtual machine):
    amd64
  • Are you using BIOS or UEFI or another way to boot?
    classic BIOS
  • Brief description of the issue:
    I've found an issue while trying to use the RAWDISK output on old linux distroes (Centos 6 and Debian 7)
    When the mkrescue commands reaches the 280_create_bootable_disk_image.sh file, it creates the raw file with dd, and adds correctly the rescue partition. However, on line 86 (mkfs.vfat) the program does not find the loop device's partition (/dev/loop0p1).
+++ losetup --show --find /tmp/rear.H1ohZi7mUy6CZWh/tmp/rear-debian7.raw
++ disk_device=/dev/loop0
++ StopIfError 'Could not create loop device on /tmp/rear.H1ohZi7mUy6CZWh/tmp/rear-debian7.raw'
++ ((  0 != 0  ))
++ AddExitTask 'losetup -d /dev/loop0 >&2'
++ EXIT_TASKS=("$*" "${EXIT_TASKS[@]}")
++ Debug 'Added '\''losetup -d /dev/loop0 >&2'\'' as an exit task'
++ test 1
++ Log 'Added '\''losetup -d /dev/loop0 >&2'\'' as an exit task'
+++ date '+%Y-%m-%d %H:%M:%S.%N '
++ local 'timestamp=2018-06-28 14:58:49.154127597 '
++ test 1 -gt 0
++ echo '2018-06-28 14:58:49.154127597 Added '\''losetup -d /dev/loop0 >&2'\'' as an exit task'
2018-06-28 14:58:49.154127597 Added 'losetup -d /dev/loop0 >&2' as an exit task
++ partprobe /dev/loop0
++ local boot_partition=/dev/loop0p1
++ mkfs.vfat -v /dev/loop0p1 -n 'RESCUE SYS'
/dev/loop0p1: No such file or directory
mkfs.vfat 3.0.13 (30 Jun 2012)
++ Error 'Could not create boot file system'
++ LogPrintError 'ERROR: Could not create boot file system'

I tried running the losetup command myself and in fact no /dev/loop0p1 appears, even though if I run gdisk on the loop0 device it finds correctly the partition created beforehand.

I guess it has to do as how those distros handle "refreshing" the partitions available but I don't really know how to work around that.

  • Work-around, if any:
    None yet.

Regards,
Green

jsmeix commented at 2018-06-29 08:01:

Support for RAWDISK output
(plus TCG Opal 2-compliant self-encrypting disks)
was implemented by @OliverO2 in
https://github.com/rear/rear/pull/1659
where The code has been tested successfully on Ubuntu 16.04.3 LTS.

@OliverO2
could you please have a look what goes on here?

GreenBlood commented at 2018-06-29 11:22:

Just tried on Ubuntu 14 LTS and it works fine.

Then I tried to update (using backports) the Debian kernel to 3.16 and it does not change.
But as debian 7 is not supported anymore (my info were outdated) we might choose to drop it. It still leaves CentOS 6 which is still active.

On other news, using debian 8 there is no issue.

Might be related to losetup/util-linux version.

jsmeix commented at 2018-06-29 12:50:

I know nothing at all about RAWDISK output
but from plain looking at the code in
usr/share/rear/output/RAWDISK/Linux-i386/280_create_bootable_disk_image.sh

disk_device="$(losetup --show --find "$disk_image")"
...
local boot_partition="${disk_device}p1"
...
mkfs.vfat $v "$boot_partition" ...

it seems one cannot assume that the boot_partition device name
is always of the form ${disk_device}p1.

I assume how partitions are named in this case also depends on
what each particular Linux distribution likes to do in this area
like the various ways how each version of each Linux distribution
implements their naming of multipath device nodes differently, cf.
https://github.com/rear/rear/pull/1765
in particular see
https://github.com/rear/rear/pull/1765#issuecomment-381498504

@GreenBlood
if my above assumtion is right we would need to know
how on each version of each of your Linux distributions
the partitions are actually named in case of loop devices.

You could add a line

read -p "Press ENTER to continue ... " 0<&6 1>&7 2>&8

anywhere in output/RAWDISK/Linux-i386/280_create_bootable_disk_image.sh
e.g. directly before the mkfs.vfat ... line so that it stops there
and you could inspect what there actually is
at exactly that state on your system.

OliverO2 commented at 2018-06-29 13:17:

@GreenBlood Thanks for reporting and your research done so far.

@jsmeix What you have argued so far sounds reasonable. I'll try to figure out what could be done to improve portability.

GreenBlood commented at 2018-06-29 13:56:

@jsmeix Yeah, i've actually already tried to run the commands by hand (dd, gdisk, losetup and such) and the loopXp1 does not appears anyhow. But running losetup -a or fdisk -l /dev/loop0 show that the loop device is activated.
HOWEVER
Using a more recent kernel on the centos 6 i've managed to get it working.

I noticed that util-linux was on a different version between CentOS 6 and 7, so I went and grabbed the last version source rpm and compiled it.

Using the newly compiled binaries (I have not installed them as its a pretty important package), I got this result :

[root@centos6 util-linux-2.23.2]# losetup --show --find /tmp/rear.MOL0gr9BL4uKxck/tmp/rear-centos6.raw
/dev/loop0
[root@centos6 util-linux-2.23.2]# partx -a /dev/loop0
HDIO_GETGEO: Inappropriate ioctl for device
[root@centos6 util-linux-2.23.2]# ./partx -a /dev/loop0
[root@centos6 util-linux-2.23.2]# ls /dev/loop*
/dev/loop0  /dev/loop0p1  /dev/loop1  /dev/loop2  /dev/loop3  /dev/loop4  /dev/loop5  /dev/loop6  /dev/loop7  /dev/loop-control

While the compiled recent partx reloads the partitions, the system one does not work
Using the standard kernel and the compiled partx, it does not work anymore. So I guess it's the combination of the kernel version and a partx update that does the trick.

I don't know what to do with these informations, I feel like I went too far but meh. Seems like a dead end to me. My VM is now a CentOS Frankeinstein monster.

OliverO2 commented at 2018-06-29 14:13:

AFAIK creating device names for partitions is a kernel excercise. Unfortunately, there doesn't seem to be a tool which reports assigned partition device names. For example, partprobe --summary just outputs

/dev/loop0: gpt partitions 1

As it looks like there are different opinions on how to create partition devices names and no one to ask, my best guess would be to just rely on the fact that partition names will consist of the device name followed by some appendix. In this case, there is only one partition, so its name shouldn't be too hard to guess...

@GreenBlood Could you change the line

local boot_partition="${disk_device}p1"

to

local boot_partition="$(echo "${disk_device}"?*)"

in usr/share/rear/output/RAWDISK/Linux-i386/280_create_bootable_disk_image.sh and see if that works on each distribution?

OliverO2 commented at 2018-06-29 14:25:

@GreenBlood There was an overlap in our comments: If it's not a naming issue but a kernel failing to update its partitions table, there is probably not much we can do here. Modern kernels should update their partition tables automatically. Otherwise partprobe "$disk_device" should instruct them to. Another idea would be to use losetup's --partscan option like this:

disk_device="$(losetup --partscan --show --find "$disk_image")"

Would this make it work?

GreenBlood commented at 2018-06-29 14:44:

@OliverO2 Well, It seems that CentOS 6 being nearly ten years old, losetup does not includes --partscan.

Using my compiled losetup, it accepts this argument but does not work any better.

[root@centos6 util-linux-2.23.2]# ./losetup --show --find --partscan "/tmp/rear.IdSQo1PE0lrU95R/tmp/rear-centos6.raw"
/dev/loop0
[root@centos6 util-linux-2.23.2]# ls /dev/loop*
/dev/loop0  /dev/loop1  /dev/loop2  /dev/loop3  /dev/loop4  /dev/loop5  /dev/loop6  /dev/loop7

I guess CentOS 6 is off the list for RAWDISK, unless there is a way that I'm unaware of.

jsmeix commented at 2018-06-29 15:11:

@OliverO2
as a minimal improvement could you add a test
that such a $boot_partition actually exists
and error out if not like

local boot_partition="${disk_device}p1"
test -b "$boot_partition" || Error "Cannot ceate raw disk image (no $boot_partition partition on $disk_device)"

This does not make things work in environments where it currently cannot work
but it would at least tell the user what is unexpected or wrong in his environment.

jsmeix commented at 2018-06-29 15:15:

@OliverO2
in general regarding things like "kernel failing to update its partitions table"
you may have a look at the somewhat similar or related issue
https://github.com/rear/rear/issues/791

OliverO2 commented at 2018-07-03 11:14:

@GreenBlood Maybe there is a solution after all if you're able to install the kpartx tool on your older distributions: Could you just try to apply commit bcb0ed3e01d5e9225c6e243bb4971de90ea3c57b in my branch https://github.com/OliverO2/rear/tree/feature/rawdisk-portability-improvements?

@jsmeix This should also improve the error message. I'll create a PR if this has been tested successfully (currently I don't have one of these older kernels available so I could not test it fully).

GreenBlood commented at 2018-07-03 13:24:

@OliverO2 Ok so I tried your patch but no luck.

But I think what @jsmeix was suggesting earlier might be the issue we're facing. It seems that running losetup, then kpartx -a /dev/loopX makes the device appear in /dev/disk/by-*
See:

[root@centos6 ~]# losetup -d /dev/loop0
[root@centos6 ~]# losetup --show --find rear-centos6.raw 
/dev/loop0
[root@centos6 ~]# ls /dev/disk/by-id/
(System disks)
[root@centos6 ~]# kpartx -a /dev/loop0
[root@centos6 ~]# ls /dev/disk/by-id/
dm-name-loop0p1   dm-uuid-part1-loop0        (System disks)

So even though kpartx says that its adding a /dev/loop0p1 device it's not the case in this situation. On my server the symlinks in /dev/disk/by-* were to /dev/dm-2.
I integrated it in rear workflow to test using the correct devices on the server and it works.
I have this very ugly diff to show you what I modified (hardcoded value, but it was just for testing purposes :

@@ -67,9 +67,9 @@
 StopIfError "Could not create loop device on $disk_image"
 AddExitTask "losetup -d $disk_device >&2"

-partprobe "$disk_device" || Error "Could not make the kernel recognize loop device partitions"
-local boot_partition="${disk_device}p1"

+kpartx -a "$disk_device" || Error "Could not make the kernel recognize loop device partitions"
+local boot_partition="/dev/dm-2"

 ### Create and populate the boot file system

@@ -144,6 +144,7 @@

 umount "$boot_partition_root" || Error "Could not unmount boot file system"
 RemoveExitTask "umount $boot_partition_root >&2"
+kpartx -v -d /dev/loop0
 losetup -d "$disk_device" || Error "Could not delete loop device"
 RemoveExitTask "losetup -d $disk_device >&2"

I don't currently know how to "detect" where the loop device is going tho.

EDIT : It would seem that using lsblk -n --output "KNAME" $disk_device show on the second line where in /dev the partition is.
Debian 8 :

root@debian8:~# lsblk -n --output "KNAME" /dev/loop0
loop0
loop0p1

CentOS 6 :

[root@centos6 ~]# lsblk -n --output "KNAME" /dev/loop0
loop0
dm-2

Ubuntu 16 LTS :

root@ubuntu16:/tmp# lsblk -n --output "KNAME" /dev/loop0
loop0
loop0p1

I'd have to test on more Linux distros but as lsblk is part of util-linux it's a priori in every distro.

OliverO2 commented at 2018-07-03 14:59:

@GreenBlood Thanks for trying. I think we're at least on the right path here. Note that my patch uses the -u option of kpartx, not -a. Unfortunately, kpartx is not well documented so it's trial and error.

Note that the mapping device dm-name-loop0p1 you saw is absolutely OK. kpartx creates device mappings like these but should also create links for the proper loop device path /dev/loop0p1.

Diagnosis

Could you post the relevant section of the rear log when running the code with my patch? Maybe you could run it again even examine the state after the kpartx call by inserting

read -p "Press ENTER to continue ... " 0<&6 1>&7 2>&8

before the comment

# If unsuccessful, say so.

and look for loop devices.

What does kpartx -l say?

Alternative solution without losetup:

In addition, could you try this?

  1. Replace the lines
local disk_device  # separate 'local' statement to avoid losing $(...) exit status - cf. https://stackoverflow.com/a/10397996
disk_device="$(losetup --show --find "$disk_image")"
StopIfError "Could not create loop device on $disk_image"
AddExitTask "losetup -d $disk_device >&2"

local boot_partition="${disk_device}p1"

with

local kpartx_fields=($(kpartx -asv "$disk_image"))
[[ ${#kpartx_fields[*]} == 6 ]] || Error "kpartx could not create loop device and its partitions (result: $kpartx_fields)"

local disk_device="${kpartx_fields[4]}"
local boot_partition="/dev/${kpartx_fields[0]}"

AddExitTask "kpartx -d $disk_image >&2"

LogPrint "loop device: $disk_device, boot partition: $boot_partition"
  1. Uncomment these lines:
losetup -d "$disk_device" || Error "Could not delete loop device"
RemoveExitTask "losetup -d $disk_device >&2"
  1. Then run rear and post the relevant section of the log.

OliverO2 commented at 2018-07-03 15:10:

Correction - should be

AddExitTask "kpartx -d $disk_image >&2"

(not $disk_device)

OliverO2 commented at 2018-07-03 20:53:

@GreenBlood

Update: I have pushed a new commit 26e6eece4b5e9e911784e881c6cf2241b5f0e827 onto my branch https://github.com/OliverO2/rear/commits/feature/rawdisk-portability-improvements. With that commit I could successfully build a RAWDISK output file on CentOS 6.

Rear configuration:

OUTPUT=RAWDISK
OUTPUT_URL="file://$VAR_DIR/output"

Platform configuration:

  • CentOS-6.10-x86_64-minimal.iso
  • additional packages: gdisk, dosfstools

Terminal log:

Relax-and-Recover 2.4-git.3028.60985ac.featurerawdiskportabilityimprovements.changed / 2018-07-03
Using log file: /var/log/rear/rear-centos6.log
Creating disk layout
Using guessed bootloader 'GRUB' (found in first bytes on /dev/sda)
Creating root filesystem layout
Copying logfile /var/log/rear/rear-centos6.log into initramfs as '/tmp/rear-centos6-partial-2018-07-03T22:38:09+0200.log'
Copying files and directories
Copying binaries and libraries
Copying kernel modules
Copying all files in /lib*/firmware/
Creating recovery/rescue system initramfs/initrd initrd.cgz with gzip default compression
Created initrd.cgz with gzip default compression (74476711 bytes) in 8 seconds
Creating 83 MiB raw disk image "rear-centos6.raw"
Using syslinux to install a Legacy BIOS bootloader
Copying resulting files to file location
Saving /var/log/rear/rear-centos6.log as rear-centos6.log to file location
Exiting rear mkrescue (PID 2930) and its descendant processes
Running exit tasks

jsmeix commented at 2018-07-09 09:29:

@GreenBlood
with https://github.com/rear/rear/pull/1850 merged
this issue should be fixed where "fixed" means that now
ReaR tries as far as possible to make the needed partition device nodes appear
but if they finally won't appear it must error out because it cannot proceed
without the needed partition device nodes.


[Export of Github issue for rear/rear.]