#1846 Issue closed
: Partitioning errors in RAWDISK creation on Debian 7 and CentOS 6¶
Labels: enhancement
, fixed / solved / done
GreenBlood opened issue at 2018-06-28 13:17:¶
Relax-and-Recover (ReaR) Issue¶
- ReaR version:
Relax-and-Recover 2.4-git.3020.aa7b197.master / 2018-06-21 - OS version:
At least CentOS 6 and Debian 7 (fully updated) - ReaR configuration files:
BACKUP=NETFS
OUTPUT=RAWDISK
BACKUP_TYPE=incremental
FULLBACKUPDAY="Sun"
SSH_ROOT_PASSWORD="XXXXXXX"
USE_DHCLIENT=yes
BACKUP_URL=cifs://XXXXX/XXXXX
BACKUP_OPTIONS="cred=/etc/rear/.cifs_credentials"
- System architecture (x86 compatible or POWER and/or what kind of
virtual machine):
amd64 - Are you using BIOS or UEFI or another way to boot?
classic BIOS - Brief description of the issue:
I've found an issue while trying to use the RAWDISK output on old linux distroes (Centos 6 and Debian 7)
When the mkrescue commands reaches the 280_create_bootable_disk_image.sh file, it creates the raw file with dd, and adds correctly the rescue partition. However, on line 86 (mkfs.vfat) the program does not find the loop device's partition (/dev/loop0p1).
+++ losetup --show --find /tmp/rear.H1ohZi7mUy6CZWh/tmp/rear-debian7.raw
++ disk_device=/dev/loop0
++ StopIfError 'Could not create loop device on /tmp/rear.H1ohZi7mUy6CZWh/tmp/rear-debian7.raw'
++ (( 0 != 0 ))
++ AddExitTask 'losetup -d /dev/loop0 >&2'
++ EXIT_TASKS=("$*" "${EXIT_TASKS[@]}")
++ Debug 'Added '\''losetup -d /dev/loop0 >&2'\'' as an exit task'
++ test 1
++ Log 'Added '\''losetup -d /dev/loop0 >&2'\'' as an exit task'
+++ date '+%Y-%m-%d %H:%M:%S.%N '
++ local 'timestamp=2018-06-28 14:58:49.154127597 '
++ test 1 -gt 0
++ echo '2018-06-28 14:58:49.154127597 Added '\''losetup -d /dev/loop0 >&2'\'' as an exit task'
2018-06-28 14:58:49.154127597 Added 'losetup -d /dev/loop0 >&2' as an exit task
++ partprobe /dev/loop0
++ local boot_partition=/dev/loop0p1
++ mkfs.vfat -v /dev/loop0p1 -n 'RESCUE SYS'
/dev/loop0p1: No such file or directory
mkfs.vfat 3.0.13 (30 Jun 2012)
++ Error 'Could not create boot file system'
++ LogPrintError 'ERROR: Could not create boot file system'
I tried running the losetup command myself and in fact no /dev/loop0p1 appears, even though if I run gdisk on the loop0 device it finds correctly the partition created beforehand.
I guess it has to do as how those distros handle "refreshing" the partitions available but I don't really know how to work around that.
- Work-around, if any:
None yet.
Regards,
Green
jsmeix commented at 2018-06-29 08:01:¶
Support for RAWDISK output
(plus TCG Opal 2-compliant self-encrypting disks)
was implemented by @OliverO2 in
https://github.com/rear/rear/pull/1659
where The code has been tested successfully on Ubuntu 16.04.3 LTS.
@OliverO2
could you please have a look what goes on here?
GreenBlood commented at 2018-06-29 11:22:¶
Just tried on Ubuntu 14 LTS and it works fine.
Then I tried to update (using backports) the Debian kernel to 3.16 and
it does not change.
But as debian 7 is not supported anymore (my info were outdated) we
might choose to drop it. It still leaves CentOS 6 which is still active.
On other news, using debian 8 there is no issue.
Might be related to losetup/util-linux version.
jsmeix commented at 2018-06-29 12:50:¶
I know nothing at all about RAWDISK output
but from plain looking at the code in
usr/share/rear/output/RAWDISK/Linux-i386/280_create_bootable_disk_image.sh
disk_device="$(losetup --show --find "$disk_image")" ... local boot_partition="${disk_device}p1" ... mkfs.vfat $v "$boot_partition" ...
it seems one cannot assume that the boot_partition device name
is always of the form ${disk_device}p1
.
I assume how partitions are named in this case also depends on
what each particular Linux distribution likes to do in this area
like the various ways how each version of each Linux distribution
implements their naming of multipath device nodes differently, cf.
https://github.com/rear/rear/pull/1765
in particular see
https://github.com/rear/rear/pull/1765#issuecomment-381498504
@GreenBlood
if my above assumtion is right we would need to know
how on each version of each of your Linux distributions
the partitions are actually named in case of loop devices.
You could add a line
read -p "Press ENTER to continue ... " 0<&6 1>&7 2>&8
anywhere in
output/RAWDISK/Linux-i386/280_create_bootable_disk_image.sh
e.g. directly before the mkfs.vfat ...
line so that it stops there
and you could inspect what there actually is
at exactly that state on your system.
OliverO2 commented at 2018-06-29 13:17:¶
@GreenBlood Thanks for reporting and your research done so far.
@jsmeix What you have argued so far sounds reasonable. I'll try to figure out what could be done to improve portability.
GreenBlood commented at 2018-06-29 13:56:¶
@jsmeix Yeah, i've actually already tried to run the commands by hand
(dd, gdisk, losetup and such) and the loopXp1 does not appears anyhow.
But running losetup -a or fdisk -l /dev/loop0 show that the loop device
is activated.
HOWEVER
Using a more recent kernel on the centos 6 i've managed to get it
working.
I noticed that util-linux was on a different version between CentOS 6 and 7, so I went and grabbed the last version source rpm and compiled it.
Using the newly compiled binaries (I have not installed them as its a pretty important package), I got this result :
[root@centos6 util-linux-2.23.2]# losetup --show --find /tmp/rear.MOL0gr9BL4uKxck/tmp/rear-centos6.raw
/dev/loop0
[root@centos6 util-linux-2.23.2]# partx -a /dev/loop0
HDIO_GETGEO: Inappropriate ioctl for device
[root@centos6 util-linux-2.23.2]# ./partx -a /dev/loop0
[root@centos6 util-linux-2.23.2]# ls /dev/loop*
/dev/loop0 /dev/loop0p1 /dev/loop1 /dev/loop2 /dev/loop3 /dev/loop4 /dev/loop5 /dev/loop6 /dev/loop7 /dev/loop-control
While the compiled recent partx reloads the partitions, the system one
does not work
Using the standard kernel and the compiled partx, it does not work
anymore. So I guess it's the combination of the kernel version and a
partx update that does the trick.
I don't know what to do with these informations, I feel like I went too far but meh. Seems like a dead end to me. My VM is now a CentOS Frankeinstein monster.
OliverO2 commented at 2018-06-29 14:13:¶
AFAIK creating device names for partitions is a kernel excercise.
Unfortunately, there doesn't seem to be a tool which reports assigned
partition device names. For example, partprobe --summary
just outputs
/dev/loop0: gpt partitions 1
As it looks like there are different opinions on how to create partition devices names and no one to ask, my best guess would be to just rely on the fact that partition names will consist of the device name followed by some appendix. In this case, there is only one partition, so its name shouldn't be too hard to guess...
@GreenBlood Could you change the line
local boot_partition="${disk_device}p1"
to
local boot_partition="$(echo "${disk_device}"?*)"
in
usr/share/rear/output/RAWDISK/Linux-i386/280_create_bootable_disk_image.sh
and see if that works on each distribution?
OliverO2 commented at 2018-06-29 14:25:¶
@GreenBlood There was an overlap in our comments: If it's not a naming
issue but a kernel failing to update its partitions table, there is
probably not much we can do here. Modern kernels should update their
partition tables automatically. Otherwise partprobe "$disk_device"
should instruct them to. Another idea would be to use losetup's
--partscan
option like this:
disk_device="$(losetup --partscan --show --find "$disk_image")"
Would this make it work?
GreenBlood commented at 2018-06-29 14:44:¶
@OliverO2 Well, It seems that CentOS 6 being nearly ten years old,
losetup does not includes --partscan
.
Using my compiled losetup, it accepts this argument but does not work any better.
[root@centos6 util-linux-2.23.2]# ./losetup --show --find --partscan "/tmp/rear.IdSQo1PE0lrU95R/tmp/rear-centos6.raw"
/dev/loop0
[root@centos6 util-linux-2.23.2]# ls /dev/loop*
/dev/loop0 /dev/loop1 /dev/loop2 /dev/loop3 /dev/loop4 /dev/loop5 /dev/loop6 /dev/loop7
I guess CentOS 6 is off the list for RAWDISK, unless there is a way that I'm unaware of.
jsmeix commented at 2018-06-29 15:11:¶
@OliverO2
as a minimal improvement could you add a test
that such a $boot_partition actually exists
and error out if not like
local boot_partition="${disk_device}p1" test -b "$boot_partition" || Error "Cannot ceate raw disk image (no $boot_partition partition on $disk_device)"
This does not make things work in environments where it currently cannot
work
but it would at least tell the user what is unexpected or wrong in his
environment.
jsmeix commented at 2018-06-29 15:15:¶
@OliverO2
in general regarding things like "kernel failing to update its
partitions table"
you may have a look at the somewhat similar or related issue
https://github.com/rear/rear/issues/791
OliverO2 commented at 2018-07-03 11:14:¶
@GreenBlood Maybe there is a solution after all if you're able to
install the kpartx
tool on your older distributions: Could you just
try to apply commit bcb0ed3e01d5e9225c6e243bb4971de90ea3c57b in my
branch
https://github.com/OliverO2/rear/tree/feature/rawdisk-portability-improvements?
@jsmeix This should also improve the error message. I'll create a PR if this has been tested successfully (currently I don't have one of these older kernels available so I could not test it fully).
GreenBlood commented at 2018-07-03 13:24:¶
@OliverO2 Ok so I tried your patch but no luck.
But I think what @jsmeix was suggesting earlier might be the issue we're
facing. It seems that running losetup
, then kpartx -a /dev/loopX
makes the device appear in /dev/disk/by-*
See:
[root@centos6 ~]# losetup -d /dev/loop0
[root@centos6 ~]# losetup --show --find rear-centos6.raw
/dev/loop0
[root@centos6 ~]# ls /dev/disk/by-id/
(System disks)
[root@centos6 ~]# kpartx -a /dev/loop0
[root@centos6 ~]# ls /dev/disk/by-id/
dm-name-loop0p1 dm-uuid-part1-loop0 (System disks)
So even though kpartx
says that its adding a /dev/loop0p1 device it's
not the case in this situation. On my server the symlinks in
/dev/disk/by-* were to /dev/dm-2.
I integrated it in rear workflow to test using the correct devices on
the server and it works.
I have this very ugly diff to show you what I modified (hardcoded value,
but it was just for testing purposes :
@@ -67,9 +67,9 @@
StopIfError "Could not create loop device on $disk_image"
AddExitTask "losetup -d $disk_device >&2"
-partprobe "$disk_device" || Error "Could not make the kernel recognize loop device partitions"
-local boot_partition="${disk_device}p1"
+kpartx -a "$disk_device" || Error "Could not make the kernel recognize loop device partitions"
+local boot_partition="/dev/dm-2"
### Create and populate the boot file system
@@ -144,6 +144,7 @@
umount "$boot_partition_root" || Error "Could not unmount boot file system"
RemoveExitTask "umount $boot_partition_root >&2"
+kpartx -v -d /dev/loop0
losetup -d "$disk_device" || Error "Could not delete loop device"
RemoveExitTask "losetup -d $disk_device >&2"
I don't currently know how to "detect" where the loop device is going tho.
EDIT : It would seem that using
lsblk -n --output "KNAME" $disk_device
show on the second line where
in /dev the partition is.
Debian 8 :
root@debian8:~# lsblk -n --output "KNAME" /dev/loop0
loop0
loop0p1
CentOS 6 :
[root@centos6 ~]# lsblk -n --output "KNAME" /dev/loop0
loop0
dm-2
Ubuntu 16 LTS :
root@ubuntu16:/tmp# lsblk -n --output "KNAME" /dev/loop0
loop0
loop0p1
I'd have to test on more Linux distros but as lsblk
is part of
util-linux it's a priori in every distro.
OliverO2 commented at 2018-07-03 14:59:¶
@GreenBlood Thanks for trying. I think we're at least on the right path
here. Note that my patch uses the -u
option of kpartx
, not -a
.
Unfortunately, kpartx is not well documented so it's trial and error.
Note that the mapping device dm-name-loop0p1
you saw is absolutely OK.
kpartx
creates device mappings like these but should also create links
for the proper loop device path /dev/loop0p1
.
Diagnosis¶
Could you post the relevant section of the rear log when running the code with my patch? Maybe you could run it again even examine the state after the kpartx call by inserting
read -p "Press ENTER to continue ... " 0<&6 1>&7 2>&8
before the comment
# If unsuccessful, say so.
and look for loop devices.
What does kpartx -l
say?
Alternative solution without losetup:¶
In addition, could you try this?
- Replace the lines
local disk_device # separate 'local' statement to avoid losing $(...) exit status - cf. https://stackoverflow.com/a/10397996
disk_device="$(losetup --show --find "$disk_image")"
StopIfError "Could not create loop device on $disk_image"
AddExitTask "losetup -d $disk_device >&2"
local boot_partition="${disk_device}p1"
with
local kpartx_fields=($(kpartx -asv "$disk_image"))
[[ ${#kpartx_fields[*]} == 6 ]] || Error "kpartx could not create loop device and its partitions (result: $kpartx_fields)"
local disk_device="${kpartx_fields[4]}"
local boot_partition="/dev/${kpartx_fields[0]}"
AddExitTask "kpartx -d $disk_image >&2"
LogPrint "loop device: $disk_device, boot partition: $boot_partition"
- Uncomment these lines:
losetup -d "$disk_device" || Error "Could not delete loop device"
RemoveExitTask "losetup -d $disk_device >&2"
- Then run rear and post the relevant section of the log.
OliverO2 commented at 2018-07-03 15:10:¶
Correction - should be
AddExitTask "kpartx -d $disk_image >&2"
(not $disk_device
)
OliverO2 commented at 2018-07-03 20:53:¶
@GreenBlood
Update: I have pushed a new commit 26e6eece4b5e9e911784e881c6cf2241b5f0e827 onto my branch https://github.com/OliverO2/rear/commits/feature/rawdisk-portability-improvements. With that commit I could successfully build a RAWDISK output file on CentOS 6.
Rear configuration:
OUTPUT=RAWDISK
OUTPUT_URL="file://$VAR_DIR/output"
Platform configuration:
- CentOS-6.10-x86_64-minimal.iso
- additional packages:
gdisk
,dosfstools
Terminal log:
Relax-and-Recover 2.4-git.3028.60985ac.featurerawdiskportabilityimprovements.changed / 2018-07-03
Using log file: /var/log/rear/rear-centos6.log
Creating disk layout
Using guessed bootloader 'GRUB' (found in first bytes on /dev/sda)
Creating root filesystem layout
Copying logfile /var/log/rear/rear-centos6.log into initramfs as '/tmp/rear-centos6-partial-2018-07-03T22:38:09+0200.log'
Copying files and directories
Copying binaries and libraries
Copying kernel modules
Copying all files in /lib*/firmware/
Creating recovery/rescue system initramfs/initrd initrd.cgz with gzip default compression
Created initrd.cgz with gzip default compression (74476711 bytes) in 8 seconds
Creating 83 MiB raw disk image "rear-centos6.raw"
Using syslinux to install a Legacy BIOS bootloader
Copying resulting files to file location
Saving /var/log/rear/rear-centos6.log as rear-centos6.log to file location
Exiting rear mkrescue (PID 2930) and its descendant processes
Running exit tasks
jsmeix commented at 2018-07-09 09:29:¶
@GreenBlood
with
https://github.com/rear/rear/pull/1850
merged
this issue should be fixed where "fixed" means that now
ReaR tries as far as possible to make the needed partition device nodes
appear
but if they finally won't appear it must error out because it cannot
proceed
without the needed partition device nodes.
[Export of Github issue for rear/rear.]