#1697 PR merged
: Automatically add 'missing' devices to MD arrays with not enough physical devices upon restore¶
Labels: enhancement
, fixed / solved / done
rmetrich opened issue at 2018-01-19 09:18:¶
If a software raid device is in degraded state at the time of the rear mkrescue creation (e.g. a device has been removed from a RAID1 array), the disk layout restoration will fail with following error seen in the log:
mdadm --create /dev/md127 --force --metadata=1.2 --level=raid1 --raid-devices=2 --uuid=a0fc5637:fdf5dafa:a345e010:83293548 --name=boot /dev/sdb1
+++ echo Y
mdadm: You haven't given enough devices (real or missing) to create this array
The least intrusive fix is to do the following: Add missing special devices to the mdadm create command upon restore when expression raid-devices - #devices > 0 is true
jsmeix commented at 2018-01-19 10:10:¶
I am not at all a RAID expert but I think when any kind of RAID
is in any kind of degraded state at the time of "rear mkrescue"
then it looks like a sufficiently severe issue in the original system
so that ReaR should do someting about it at the time of "rear mkrescue".
My basic idea behind is to let "rear mkrescue" ensure that the contert
in disklayout.conf represents the state of a clean original system
so that things can be expected to "just work" later during "rear
recover".
Accordingly I think in general when at the time of "rear mkrescue"
the original system is in any kind of degraded state
then "rear mkrescue" should error out.
I do not like it so much to try to fix a degraded original system
at the time of "rear mkrescue" later at the time of "rear recover".
Or is there a reasonable use case to let "rear mkrescue" finish
successfully
regardless that the original system is in any kind of degraded state?
Regardless of my reasoning above I also think "rear recover"
should work as robust as possible (with reasonable effort)
against failing to recreate the system so that I appreciate your pull
request
because it makes "rear recover" finish successfully more often than
before.
But I think an additioanl test during "rear mkrescue" to detect
a degraded RAID state would be needed even more
to avoid possible issues during "rear recover" preventively.
jsmeix commented at 2018-01-19 10:31:¶
@rear/contributors
is there a RAID expert among us who could actually review it?
rmetrich commented at 2018-01-19 10:40:¶
@jsmeix I agree, something must be done on the "mkrescue" side also, but
should be another PR, because it is not that easy. It will likely
require user interaction to decide whether creating a rescue image shall
be permitted or not depending on the RAID status.
Some admins would likely want to be able to create the backup anyway
(typically you removed a failing devices but had no spare devices to put
back yet).
rmetrich commented at 2018-01-19 10:45:¶
mdadm manpage:
-n, --raid-devices= Specify the number of active devices in the array. This, plus the number of spare devices (see below) must equal the number of component-devices (including "missing" devices) that are listed on the command line for --create. Setting a value of 1 is probably a mistake and so requires that --force be specified first. A value of 1 will then be allowed for linear, multipath, RAID0 and RAID1. It is never allowed for RAID4, RAID5 or RAID6. This number can only be changed using --grow for RAID1, RAID4, RAID5 and RAID6 arrays, and only on kernels which provide the necessary support.
jsmeix commented at 2018-01-19 10:46:¶
@rmetrich
according to my "man mdadm" on SLES11
To create a "degraded" array in which some devices are missing, simply give the word "missing" in place of a device name. This will cause mdadm to leave the corresponding slot in the array empty. For a RAID4 or RAID5 array at most one slot can be "missing"; for a RAID6 array at most two slots. For a RAID1 array, only one real device needs to be given. All of the others can be "missing".
I think a test if devices_found is zero should be added
probably with an explicit Error exit message
to avoid that "rear recover" dies later at a non-working
"mdadm create missing missing missing ..."
command in the diskrestore.sh script, cf.
"Try to care about possible errors" in
https://github.com/rear/rear/wiki/Coding-Style
rmetrich commented at 2018-01-19 11:07:¶
That would indicate a totally broken array with no active device, which is unlikely to happen, since data couldn't be read out of it.
jsmeix commented at 2018-01-19 12:41:¶
@rmetrich
regarding your
https://github.com/rear/rear/pull/1697#issuecomment-358929793
I agree that what needs to be done on the "mkrescue" side
can and should be implemented via a separated pull request.
Regarding "required user interaction":
Since there is the UserInput function in ReaR 2.3
user interaction does no longer cause problems
because any user interaction via the UserInput function
supports a default response after a timeout and also
any specifically needed (non-default) user interaction
can be automated by the user (i.e. user interaction
does no longer require a real user to be present)
so that "rear mkresuce/mkbackup" can run unattended
even at e.g. 03:00 at night (same for user interaction
via the UserInput function during "rear recover"), cf.
"First steps towards running Relax-and-Recover unattended in general"
in
http://relax-and-recover.org/documentation/release-notes-2-3
and have a look at
https://github.com/gdha/rear-automated-testing/issues/36#issuecomment-342769055
jsmeix commented at 2018-01-23 10:46:¶
If there are no objections I would like to merge it today afternoon.
jsmeix commented at 2018-01-23 15:16:¶
@rmetrich
many thanks for making "rear recover" run more robust!
[Export of Github issue for rear/rear.]