#2006 Issue closed: No code has been generated to recreate disk

Labels: waiting for info, support / question, no-issue-activity

gk1 opened issue at 2018-12-17 20:28:

Relax-and-Recover (ReaR) Issue Template

Fill in the following items before submitting a new issue
(quick response is not guaranteed with free support):

  • ReaR version ("/usr/sbin/rear -V"): Relax-and_recover 2.4 / Git

  • OS version ("cat /etc/rear/os.conf" or "lsb_release -a" or "cat /etc/os-release"): RHEL 7.4

  • ReaR configuration files ("cat /etc/rear/site.conf" and/or "cat /etc/rear/local.conf"):

OUTPUT=ISO
### create a backup using the internal NETFS method, using 'tar'
BACKUP=NETFS
### write both rescue image and backup to the device labeled REAR-000
BACKUP_URL=cifs://192.168.187.35/backup/
BACKUP_OPTIONS="cred=/etc/rear/.cifs"
  • Hardware (PC or PowerNV BareMetal or ARM) or virtual machine (KVM guest or PoverVM LPAR): Baremetal restoring to ESXI VM

  • System architecture (x86 compatible or PPC64/PPC64LE or what exact ARM device): x86_64

  • Firmware (BIOS or UEFI or Open Firmware) and bootloader (GRUB or ELILO or Petitboot): Grub

  • Storage (lokal disk or SSD) and/or SAN (FC or iSCSI or FCoE) and/or multipath (DM or NVMe): Local software raid 10

  • Description of the issue (ideally so that others can reproduce it): No code created to recreate disk /dev/md127

  • Workaround, if any: none

  • Attachments, as applicable ("rear -D mkrescue/mkbackup/recover" debug log files):

gk1 commented at 2018-12-17 20:30:

rear-PriSMS-1.log

jsmeix commented at 2018-12-18 09:14:

@gk1
from only your "rear -v -D recover" log in
https://github.com/rear/rear/files/2687734/rear-PriSMS-1.log
it is in practice impossible to imagine your system disk layout
so that we also need at least your disklayout.conf file or even better
all files in your /var/lib/rear directory and in its sub-directories, cf. the
section "Debugging issues with Relax-and-Recover" in
https://en.opensuse.org/SDB:Disaster_Recovery

Additionally - if possible - could you provide us the output of

lsblk -i -p -o NAME,KNAME,TRAN,TYPE,FSTYPE,SIZE,MOUNTPOINT

on your original system?

I do not understand what RHEL 7.4 together with
Baremetal restoring to ESXI VM means - but I am no VMware ESXi user.
As far as I understand
https://en.wikipedia.org/wiki/VMware_ESXi
ESX runs on bare metal
so that I assume what you mean is that you like to recover
a RHEL 7.4 system that runs on VMware ESXi
or what do I misunderstand here?

gdha commented at 2018-12-18 10:19:

@gk1 As DRDB is involved it is a cluster setup we are dealing with. Is it possible to submit the /var/lib/rear/layout/disklayout.conf (all content of this directory is even better) and rear.log (of the recovery) of both nodes so we can compare?

gk1 commented at 2018-12-18 14:40:

@gk1
from only your "rear -v -D recover" log in
https://github.com/rear/rear/files/2687734/rear-PriSMS-1.log
it is in practice impossible to imagine your system disk layout
so that we also need at least your disklayout.conf file or even better
all files in your /var/lib/rear directory and in its sub-directories, cf. the
section "Debugging issues with Relax-and-Recover" in
https://en.opensuse.org/SDB:Disaster_Recovery

Additionally - if possible - could you provide us the output of

lsblk -i -p -o NAME,KNAME,TRAN,TYPE,FSTYPE,SIZE,MOUNTPOINT

on your original system?

I do not understand what RHEL 7.4 together with
Baremetal restoring to ESXI VM means - but I am no VMware ESXi user.
As far as I understand
https://en.wikipedia.org/wiki/VMware_ESXi
ESX runs on bare metal
so that I assume what you mean is that you like to recover
a RHEL 7.4 system that runs on VMware ESXi
or what do I misunderstand here?

As far as the baremetel, We are running this clusters on bare metal, and are trying to restore to an esxi based development system so we can resolve issues beyond just the Disaster recovery. We have replicated the number of disks in the esx guest to be able to do the restore.

It is a cluster as mention above. The second node of the cluster was able to restore the system fine, outside of that the drbd was broken as you can imagine, which is why we wanted to restore the primary as well. The 2 boot iso images are different sizes, with the one for the primary being smaller which i do not understand as it has more files and a complete running image, the secondary boot iso is like 40mb larger. I will gather the requested lsblk information later today and get back to you. I do not have direct access to the machines full time and need to request access to them again.

As far as the rear folder contents, is uploading a tar of the files ok?

jsmeix commented at 2018-12-18 15:02:

@gk1
when your original system is running ... on bare metal and you
are trying to restore to an esxi based development system
you do actually a migration from bare metal to VMware ESXi
cf. the section about "Virtual machines" in
https://en.opensuse.org/SDB:Disaster_Recovery

In general migrating with ReaR could become a complicated task
depending on how much the replacement system is different
compared to the original system.

Regarding migrating a system with ReaR see for example
http://lists.relax-and-recover.org/pipermail/rear-users/2018-November/003626.html
and follow the links therein.

Any format of uploaded files is o.k. for us provided it is
generally readable with usual free software so that
in particular good old traditional tar.gz is fine.

jsmeix commented at 2018-12-19 11:18:

@gk1
regarding your 2 boot iso images are different sizes:

To compare what is in each of your two ReaR recovery systems
set in etc/rear/local.conf KEEP_BUILD_DIR="yes" cf.
https://github.com/rear/rear/blob/master/usr/share/rear/conf/default.conf#L129
and create the ReaR recovery systems with "rear mkrecue" so that you can
compare what there is in $TMPDIR/rear.XXXXXXXXXXXXXXX/rootfs/

gk1 commented at 2018-12-20 18:53:

Output of lsblk on both computers
ISBLK P1.txt
ISBLK P2.txt

gk1 commented at 2018-12-20 18:54:

@gk1 As DRDB is involved it is a cluster setup we are dealing with. Is it possible to submit the /var/lib/rear/layout/disklayout.conf (all content of this directory is even better) and rear.log (of the recovery) of both nodes so we can compare?

Is that rear folder on the original machine, or from when i try to do the restore?

gk1 commented at 2018-12-20 19:09:

contents of /var/lib/rear on running machines

rear-folder.zip

gdha commented at 2018-12-21 15:47:

@gk1 From what I saw of the delivered output the md raid10 savelayout output is incorrect and therefore, the recover part fails as the (saved) input is incorrect (sounds obvious).

To find out what goes wrong could you deliver us the output of cat /proc/mdstat and perhaps run rear -vD savelayout and deliver us the rear logging of this?

gk1 commented at 2018-12-21 15:59:

whats weird though is the first time we made a backup/recovery of the 2nd machine, and we were able to restore that image to a vm without incident. At least from the perspective of rear, the machine was able to boot and recover in auto mode, and work. Just not the primary machine.

output.zip

gdha commented at 2018-12-25 10:28:

@gk1 The output of https://github.com/rear/rear/issues/2006#issuecomment-449426209 is different from the output of https://github.com/rear/rear/issues/2006#issuecomment-449104059

[output of rear-folder.zip]
$ grep raid /var/lib/rear.p1/layout/disklayout.conf
#part /dev/sda 499843596288 1048576 primary raid /dev/sda1
#part /dev/sda 263192576 499844644864 primary boot,raid /dev/sda2
#part /dev/sdb 499843596288 1048576 primary raid /dev/sdb1
#part /dev/sdb 263192576 499844644864 primary boot,raid /dev/sdb2
#part /dev/sdc 499843596288 1048576 primary raid /dev/sdc1
#part /dev/sdc 263192576 499844644864 primary boot,raid /dev/sdc2
#part /dev/sdd 499843596288 1048576 primary raid /dev/sdd1
#part /dev/sdd 263192576 499844644864 primary boot,raid /dev/sdd2
raid /dev/md127 metadata= level= raid-devices= uuid= name=boot
raid /dev/md126 metadata= level= raid-devices= uuid= name=pv00

[output of output.zip]
$ grep raid var/lib/rear/layout/disklayout.conf
part /dev/sda 499843596288 1048576 primary raid /dev/sda1
part /dev/sda 263192576 499844644864 primary boot,raid /dev/sda2
part /dev/sdb 499843596288 1048576 primary raid /dev/sdb1
part /dev/sdb 263192576 499844644864 primary boot,raid /dev/sdb2
part /dev/sdc 499843596288 1048576 primary raid /dev/sdc1
part /dev/sdc 263192576 499844644864 primary boot,raid /dev/sdc2
part /dev/sdd 499843596288 1048576 primary raid /dev/sdd1
part /dev/sdd 263192576 499844644864 primary boot,raid /dev/sdd2
raid /dev/md127 metadata=1.0 level=raid10 raid-devices=4 uuid=1453db52:62d03d02:f96be2ec:37089c20 name=boot layout=n2 chunk=512 devices=/dev/sda2,/dev/sdb2,/dev/sdc2,/dev/sdd2
raid /dev/md126 metadata=1.2 level=raid10 raid-devices=4 uuid=404a87ef:f078fa0f:4fd31a09:03035519 name=pv00 layout=n2 chunk=512 devices=/dev/sda1,/dev/sdb1,/dev/sdc1,/dev/sdd1

What did you do different? That explains why it did work the first time you ran a recover of course.

gk1 commented at 2018-12-26 16:17:

The machines are the same. The first one was built by hand, the second one was built from the Kickstart file of the first one.. Not sure why the output would be different.

gdha commented at 2018-12-27 08:38:

@gk1 I noticed the same behavior on p1 and p2. What I meant was: "was the configuration changed somehow?" Or, was rear updated in between on both systems?

gk1 commented at 2018-12-27 09:00:

I installed the version of rear that is in the repo for rhel via yum, i believe it was the same version. I dont believe that anything was changed after the installation. There are others that login to the machines, so it is possible. But we did make more than one backup, and the results were the same each time.

gdha commented at 2018-12-28 11:29:

@gk1 OK - perhaps follow the procedure on p1 and p2 described on web page http://www.it3.be/2016/06/08/rear-diskrestore/ and attach the corresponding /var/lib/rear/layout/diskrestore.sh script. Thanks.

gk1 commented at 2019-01-29 01:18:

mrds.zip
Attached is the output from the make_disk_restore file. I have included both the /var/rear/layout folder and the /var/log/rear/ folders and their contents. The script executes and then asks to proceed with recovery. Output is as follows:

/tmp/make_rear_diskrestore_script.sh

##################################################################################
#       Starting make_rear_diskrestore_script.sh to produce layout code script
#       (for debugging purposes only)
#
#       Log file : /var/log/rear/make_rear_diskrestore_script-20190128-2009.log
#       date : Mon Jan 28 20:09:14 EST 2019
##################################################################################


Proceed with recovery (yes) otherwise manual disk layout configuration is enforced
(default 'yes' timeout 30 seconds)
yes
No code has been generated to recreate /dev/md127 (raid).
    To recreate it manually add code to /var/lib/rear/layout/diskrestore.sh or abort.
Manually add code that recreates /dev/md127 (raid)
1) View /var/lib/rear/layout/diskrestore.sh
2) Edit /var/lib/rear/layout/diskrestore.sh
3) Go to Relax-and-Recover shell
4) Continue 'rear recover'
5) Abort 'rear recover'
(default '4' timeout 300 seconds)
5
ERROR: User chose to abort 'rear recover' in /usr/share/rear/layout/prepare/default/600_show_unprocessed.sh
Aborting due to an error, check  for details
Terminated

gdha commented at 2019-01-29 07:40:

@gk1 thanks for the logs. We are getting closer to the issue:

2019-01-28 01:30:04.072910737 Saving Software RAID configuration.
/usr/share/rear/layout/save/GNU/Linux/210_raid_layout.sh: line 44: let: sparedevices=-: syntax error: operand expected (error token is "-")
/usr/share/rear/layout/save/GNU/Linux/210_raid_layout.sh: line 65: [: : integer expression expected
/usr/share/rear/layout/save/GNU/Linux/210_raid_layout.sh: line 44: let: sparedevices=-: syntax error: operand expected (error token is "-")
/usr/share/rear/layout/save/GNU/Linux/210_raid_layout.sh: line 65: [: : integer expression expected

Could you drop me the output of the following commands?

echo mdadm --detail --scan --config=partitions
mdadm --detail --scan --config=partitions
for p in a b c d
do
  echo mdadm --misc --detail /dev/sd${p}1
  mdadm --misc --detail /dev/sd${p}1
  echo mdadm --misc --detail /dev/sd${p}2
  mdadm --misc --detail /dev/sd${p}2
done

gk1 commented at 2019-01-29 08:08:

mdadm --detail --scan --config=partitions
ARRAY /dev/md/boot metadata=1.0 name=localhost.localdomain:boot UUID=1453db52:62d03d02:f96be2ec:37089c20
ARRAY /dev/md/pv00 metadata=1.2 name=localhost.localdomain:pv00 UUID=404a87ef:f078fa0f:4fd31a09:03035519

mdadm --misc --detail /dev/sda1
mdadm: /dev/sda1 does not appear to be an md device
mdadm --misc --detail /dev/sda2
mdadm: /dev/sda2 does not appear to be an md device
mdadm --misc --detail /dev/sdb1
mdadm: /dev/sdb1 does not appear to be an md device
mdadm --misc --detail /dev/sdb2
mdadm: /dev/sdb2 does not appear to be an md device
mdadm --misc --detail /dev/sdc1
mdadm: /dev/sdc1 does not appear to be an md device
mdadm --misc --detail /dev/sdc2
mdadm: /dev/sdc2 does not appear to be an md device
mdadm --misc --detail /dev/sdd1
mdadm: /dev/sdd1 does not appear to be an md device
mdadm --misc --detail /dev/sdd2
mdadm: /dev/sdd2 does not appear to be an md device

gdha commented at 2019-01-29 08:42:

@gk1 nice thanks. So what I would like to see then is:

mdadm --misc --detail  /dev/md/boot 
mdadm --misc --detail  /dev/md/pv00

gk1 commented at 2019-01-29 08:45:

[root@PriSMS-1 digitalglue]# mdadm --misc --detail  /dev/md/boot
/dev/md/boot:
           Version : 1.0
     Creation Time : Thu Jan 25 17:54:39 2018
        Raid Level : raid10
        Array Size : 513024 (501.00 MiB 525.34 MB)
     Used Dev Size : 256512 (250.50 MiB 262.67 MB)
      Raid Devices : 4
     Total Devices : 4
       Persistence : Superblock is persistent

     Intent Bitmap : Internal

       Update Time : Tue Jan 29 03:42:28 2019
             State : clean
    Active Devices : 4
   Working Devices : 4
    Failed Devices : 0
     Spare Devices : 0

            Layout : near=2
        Chunk Size : 512K

Consistency Policy : bitmap

              Name : localhost.localdomain:boot
              UUID : 1453db52:62d03d02:f96be2ec:37089c20
            Events : 183

    Number   Major   Minor   RaidDevice State
       0       8        2        0      active sync set-A   /dev/sda2
       1       8       18        1      active sync set-B   /dev/sdb2
       2       8       34        2      active sync set-A   /dev/sdc2
       3       8       50        3      active sync set-B   /dev/sdd2

[root@PriSMS-1 digitalglue]# mdadm --misc --detail  /dev/md/pv00
/dev/md/pv00:
           Version : 1.2
     Creation Time : Thu Jan 25 17:54:16 2018
        Raid Level : raid10
        Array Size : 975994880 (930.78 GiB 999.42 GB)
     Used Dev Size : 487997440 (465.39 GiB 499.71 GB)
      Raid Devices : 4
     Total Devices : 4
       Persistence : Superblock is persistent

     Intent Bitmap : Internal

       Update Time : Tue Jan 29 03:45:21 2019
             State : active
    Active Devices : 4
   Working Devices : 4
    Failed Devices : 0
     Spare Devices : 0

            Layout : near=2
        Chunk Size : 512K

Consistency Policy : bitmap

              Name : localhost.localdomain:pv00
              UUID : 404a87ef:f078fa0f:4fd31a09:03035519
            Events : 77819

    Number   Major   Minor   RaidDevice State
       0       8        1        0      active sync set-A   /dev/sda1
       1       8       17        1      active sync set-B   /dev/sdb1
       2       8       33        2      active sync set-A   /dev/sdc1
       3       8       49        3      active sync set-B   /dev/sdd1
[root@PriSMS-1 digitalglue]#

jsmeix commented at 2019-01-29 13:14:

@gdha regarding your finding in
https://github.com/rear/rear/issues/2006#issuecomment-458438023

/usr/share/rear/layout/save/GNU/Linux/210_raid_layout.sh: line 44: let: sparedevices=-: syntax error: operand expected (error token is "-")
/usr/share/rear/layout/save/GNU/Linux/210_raid_layout.sh: line 65: [: : integer expression expected

The matching lines in layout/save/GNU/Linux/210_raid_layout.sh are

42 ndevices=$( grep "Raid Devices" $TMP_DIR/mdraid | tr -d " " | cut -d ":" -f "2")
43 totaldevices=$( grep "Total Devices" $TMP_DIR/mdraid | tr -d " " | cut -d ":" -f "2")
44 let sparedevices=$totaldevices-$ndevices
...
65 if [ "$sparedevices" -gt 0 ] ; then

It seems the code in layout/save/GNU/Linux/210_raid_layout.sh
does not sufficiently care about possible errors
(cf. https://github.com/rear/rear/wiki/Coding-Style)
or is not sufficiently safe against empty variables
(cf. https://github.com/rear/rear/pull/2021).

Should I review layout/save/GNU/Linux/210_raid_layout.sh
and do a pull request to make ReaR behave more fail-safe
when needed values are empty?

In this case "fail-safe" also means that we can error out early here because
'layout/save' scripts run during "rear mkrescue" where it is o.k. to error out
because the user can correct what leads to the error and redo "rear mkrescue"
compared to let "rear recover" fail when it is too late to correct the root cause.

gdha commented at 2019-01-29 13:27:

@gk1 @jsmeix There is something strange going on these systems as I have seen output were it was successful:

raid /dev/md127 metadata=1.0 level=raid10 raid-devices=4 uuid=8317f7a0:292074d7:eee8ccf4:80f8c853 name=boot layout=n2 chunk=512 devices=/dev/sda2,/dev/sdb2,/dev/sdc2,/dev/sdd2
raid /dev/md126 metadata=1.2 level=raid10 raid-devices=4 uuid=fa2be560:fbb4b0eb:a1f25070:9a8ae44b name=pv00 layout=n2 chunk=512 devices=/dev/sda1,/dev/sdb1,/dev/sdc1,/dev/sdd1

and where it fails (like above):

raid /dev/md127 metadata= level= raid-devices= uuid= name=boot
raid /dev/md126 metadata= level= raid-devices= uuid= name=pv00

The code in script /usr/share/rear/layout/save/GNU/Linux/210_raid_layout.sh should work, but what I would like to see before making decisions in this particular case it debugging output of this script only (do not want to see all debugging).
Therefore, it possible add set -x in top of the script and set +x at the bottom of this script and run rear -v savelaout and post the log of this action.

gk1 commented at 2019-01-29 14:24:

setx.zip
I ran the script. here are the disklayout files, and the rear logs

gk1 commented at 2019-01-29 14:28:

If it would help I can provide you with Teamviewer access to the machines to speed this along, we would just need to coordinate a time.

gdha commented at 2019-01-29 14:43:

setx.zip
I ran the script. here are the disklayout files, and the rear logs

In this debug session it was successful again - looks like an intermediate issue.
Perhaps run it a few times and check when the raid line is bad:

grep ^raid /var/lib/rear/layout/disklayout.conf

We are interested in the logs when it fails.

gk1 commented at 2019-01-29 14:49:

I ran it 15 times, and it was the same output each time.

raid /dev/md127 metadata=1.0 level=raid10 raid-devices=4 uuid=1453db52:62d03d02:f96be2ec:37089c20 name=boot layout=n2 chunk=512 devices=/dev/sda2,/dev/sdb2,/dev/sdc2,/dev/sdd2
raid /dev/md126 metadata=1.2 level=raid10 raid-devices=4 uuid=404a87ef:f078fa0f:4fd31a09:03035519 name=pv00 layout=n2 chunk=512 devices=/dev/sda1,/dev/sdb1,/dev/sdc1,/dev/sdd1

gdha commented at 2019-01-29 14:51:

@gk1 is the result the same on both systems? Above output is what it should be

gk1 commented at 2019-01-29 14:57:

so on P2 when i started i grep'd for raid like above and it did not have the UUID values in the line like the successful output, once i added the set -x in the script and ran it again, 10 times, 10 successful outputs

raid /dev/md127 metadata=1.0 level=raid10 raid-devices=4 uuid=8317f7a0:292074d7:eee8ccf4:80f8c853 name=boot layout=n2 chunk=512 devices=/dev/sda2,/dev/sdb2,/dev/sdc2,/dev/sdd2
raid /dev/md126 metadata=1.2 level=raid10 raid-devices=4 uuid=fa2be560:fbb4b0eb:a1f25070:9a8ae44b name=pv00 layout=n2 chunk=512 devices=/dev/sda1,/dev/sdb1,/dev/sdc1,/dev/sdd1

jsmeix commented at 2019-01-29 15:15:

@gdha
what I meant was more about general more fail-safe coding style.

In this particular case code like

let sparedevices=$totaldevices-$ndevices

can never work as intended when $totaldevices or $ndevices
is empty or unset or contains a string "fubar" or whatever
because that can never evaluate right unless both
$totaldevices and $ndevices are non negative integer values
(we only have is_integer and is_positive_integer functions)
and $totaldevices >= $ndevices

If usr/share/rear/layout/save/GNU/Linux/210_raid_layout.sh had
failed with an error for @gk1 during "rear mkrescue/mkbackup"
he would have reported a much easier to understand issue here
(which would have been an advantage to find the root cause)
and he would not have had to learn the hard way that
"something is somehow wrong" in his specific case
when - at the very end - "rear recover" failed for him.

Or in other words:
When "rear mkrescue" writes entries to 'disklayout.conf'
that are invalid to recreate the system like

raid /dev/md127 metadata= level= raid-devices= uuid= name=boot
raid /dev/md126 metadata= level= raid-devices= uuid= name=pv00

then "rear mkrescue" must error out.
According to the "Disk layout file syntax" in
https://github.com/rear/rear/blob/master/doc/user-guide/06-layout-configuration.adoc

raid /dev/<name> level=<RAID level> raid-devices=<nr of devices> [uuid=<uuid>] [spare-devices=<nr of spares>] [layout=<RAID layout>] [chunk=<chunk size>] devices=<device1,device2,...>

it is invalid when level or raid-devices are empty and when devices is missing
(I don't know about metadata that is not documented there).

gk1 commented at 2019-01-31 15:41:

Any thoughts on the output i sent?

gdha commented at 2019-01-31 16:31:

@gk1 so to recap: in debug mode the output for the raid rebuild are fine, and when you switch off debug mode it is randomly good or bad.
The only thing we can do is like @jsmeix proposed is bailing out when the output doesn't make sense. IMHO the output is produced by underlying commands out of control of ReaR, we can only act when the resulting output seems crap. However, I suspect there might be something flaky on the systems itself if the output is randomly different...

jsmeix commented at 2019-02-05 13:58:

...which would not be the first time where using ReaR revealed
that something subtle was badly broken on the original system...

I am thinking about adding a generic script that runs after disklayout.conf
was written (i.e. at the end of the layout/save stage) e.g.
layout/save/GNU/Linux/910_verify_disklayout_file.sh
that verifies that the entries in disklayout.conf match syntactically
what is specified in the section "Disk layout file syntax" at
https://github.com/rear/rear/blob/master/doc/user-guide/06-layout-configuration.adoc

@gdha
if you agree I will do a pull request with such a verify_disklayout_file script.

gk1 commented at 2019-02-06 02:16:

Ok so what is the next step then? We would really like this to work. Can i get you guys on the boxes to look, will that help?

Also i took the disklayout file that was created and had all the meta data in it and copied it to the machine i am trying to restore onto, and ran manual restore, was able to map the disks, but it is not able to create the drbd partition. I dont have the error at the moment. We would like to be able to use Rear to bare metal backup these machines.

I do agree there has to be something strange on these machines but i can not figure out anything that would cause this. Like you said, the commands output the data, although not always correctly it seems. These are machines that have been hardened with the government STIG process, which does a ton of things to them for security. Some of those things could effect it, but hard to tell what.

I know we have asked a lot of you, and if its a question of support contract purchase just say the word, we clearly need the help on this.

Thanks!!

gdha commented at 2019-02-06 14:54:

@gk1 We suppose you have a support contract on your hardware? If yes, please ask for a pro-active firmware investigation on your different hardware components. It makes no sense to try to fix something in ReaR if we cannot trust the hardware. The output of the savelayout has to be consistent in all times.

gk1 commented at 2019-02-06 19:10:

Yes but i dont believe that the issue is hardware related. The machine works just fine, reboots and starts fine and does what it is supposed to do. I do believe that something in the software config, or security settings could be the issue. But why does Rear work correctly generating the layout with debug on, but not with it off? What is the difference that could be causing that. That part really makes no sense to me.

gdha commented at 2019-02-06 19:19:

Difference between debug on/off is the speed between the commands I guess (cfr. https://github.com/rear/rear/issues/2006#issuecomment-461148541) - that is the only explanation I have (so far).

jsmeix commented at 2019-02-11 11:42:

@gk1
i.e. you may experiment with inserting some "sleep 1" delays
before or after certain commands in
/usr/share/rear/layout/save/GNU/Linux/210_raid_layout.sh

That script runs basically

while read array device junk ; do
    ...
    mdadm --misc --detail $device > $TMP_DIR/mdraid
    ...
done < <(mdadm --detail --scan --config=partitions)

i.e. it runs mdadm in a loop for each device
and perhaps things go wrong on your machine
when mdadm is called several times in a loop?

Furthermore you may add set -x at the beginning of
/usr/share/rear/layout/save/GNU/Linux/210_raid_layout.sh
and set +x at the end to get debug messages in your log
only while that script runs, perhaps you see something of interest
( oh! - wait - you won't see something of interest because it works
in debug mode ;-)

gdha commented at 2019-02-11 16:14:

@gk1 Try to use rear with always debug enabled and see how far you get now? The layout should be fine now. What was the issue with drdb? As @jsmeix explained in https://github.com/rear/rear/issues/2006#issuecomment-462299156 a sleep 1 might fix this, but we are not sure. However, as it seems to work in debug mode (which introduces a slight delay) this might be just the case.

gk1 commented at 2019-02-15 18:48:

What is the command line argument to force debug in its entirety? I will try to generate a new backup next week, and i will add the sleep delays as well.

gozora commented at 2019-02-15 19:21:

@gk1 rear -d -D mkbackup

V.

jsmeix commented at 2019-02-22 15:21:

Regarding my above
https://github.com/rear/rear/issues/2006#issuecomment-460646685
see
https://github.com/rear/rear/issues/1681#issuecomment-466432296

github-actions commented at 2020-06-28 01:33:

Stale issue message


[Export of Github issue for rear/rear.]