#2475 Issue closed
: Opal PBA does not support NVMe drives¶
Labels: enhancement
, fixed / solved / done
bmastenbrook opened issue at 2020-08-08 21:49:¶
Relax-and-Recover (ReaR) Issue Template¶
Fill in the following items before submitting a new issue
(quick response is not guaranteed with free support):
-
ReaR version ("/usr/sbin/rear -V"): Relax-and-Recover 2.6-git.4108.0f2f4a02.master / 2020-08-07
-
OS version ("cat /etc/os-release" or "lsb_release -a" or "cat /etc/rear/os.conf"): Ubuntu 20.04.1 LTS (Focal Fossa)
-
ReaR configuration files ("cat /etc/rear/site.conf" and/or "cat /etc/rear/local.conf"): N/A
-
Hardware (PC or PowerNV BareMetal or ARM) or virtual machine (KVM guest or PoverVM LPAR): PC
-
System architecture (x86 compatible or PPC64/PPC64LE or what exact ARM device): x86
-
Firmware (BIOS or UEFI or Open Firmware) and bootloader (GRUB or ELILO or Petitboot): UEFI with GRUB
-
Storage (local disk or SSD) and/or SAN (FC or iSCSI or FCoE) and/or multipath (DM or NVMe): NVMe WDC PC SN730 SDBQNTY-256G-1001
-
Storage layout ("lsblk -ipo NAME,KNAME,PKNAME,TRAN,TYPE,FSTYPE,SIZE,MOUNTPOINT" or "lsblk" as makeshift):
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
nvme0n1 259:0 0 238.5G 0 disk
├─nvme0n1p1 259:1 0 94M 0 part /boot/efi
├─nvme0n1p2 259:2 0 15.3G 0 part
│ └─swap_crypt 253:0 0 15.3G 0 crypt [SWAP]
└─nvme0n1p3 259:3 0 223.1G 0 part /home
- Description of the issue (ideally so that others can reproduce it):
NVMe devices are represented by the kernel as a root device
(/dev/nvme0
in this example) and a sub-device for each namespace of
the drive (/dev/nvme0n1
in this example). On most client SSDs, there
is only one namespace, and it contains the entirety of the user-visible
LBAs on the drive. sedutil-cli
expects to be passed the device
representing the entire drive, but partprobe
needs to be passed the
device representing the namespace.
All of the functions in opal-functions.sh
attempt to pass the same
device to both sedutil-cli
and partprobe
, and partprobe
returns an
error status when handed the root device for an NVMe drive instead of
the namespace device. As such, the unlocking function fails.
- Workaround, if any:
None.
- Attachments, as applicable ("rear -D mkrescue/mkbackup/recover" debug log files):
jsmeix commented at 2020-08-10 10:17:¶
@OliverO2
could you have a look at this issue here?
OliverO2 commented at 2020-08-10 10:30:¶
I'll look into this one too. I'd have to familiarize myself with NVME stuff first but the issue seems to contain sufficient details to figure out a solution.
jsmeix commented at 2020-08-10 10:41:¶
@bmastenbrook
I do not have a NVME device so I cannot reproduce things myself.
I am puzzled by your lsblk
output versus your description
NVMe devices are represented by the kernel as a root device
(/dev/nvme0 in this example) and a sub-device for each namespace
of the drive (/dev/nvme0n1 in this example)
because your lsblk
output does not contain a nvme0
kernel device
so it seems there is no such device on your system.
But when there is no nvme0
kernel device I wonder how sedutil-cli
can
expect to be passed the device representing the entire drive as nvme0
?
I assume for sedutil-cli ... <device>
the device needs to be a disk
type device
which is usually e.g. something like /dev/sda
.
In your lsblk
output nvme0n1
is show as a disk
type device
so I would expect that sedutil-cli ... /dev/nvme0n1
works.
But it seems in case of NVMe devices there are two layers of disk
like
devices:
A hidden / non-visible low-level nvme0
kind of whole-disk
type
device plus
one or more normal visible nvme0n1
kind of normal-disk
type devices?
Only FYI regarding "fun with device naming" for mmcblk/eMMC devices
cf.
https://github.com/rear/rear/issues/2087#issuecomment-475650821
and
https://github.com/rear/rear/issues/2087#issuecomment-477551079
OliverO2 commented at 2020-08-10 12:12:¶
I assume for
sedutil-cli ... <device>
the device needs to be adisk
type device
which is usually e.g. something like/dev/sda
.
Actually it could be a toaster as long as its driver provides the
necessary ioctls. sedutil-cli
uses different ioctl commands for NVME
and SATA devices.
jsmeix commented at 2020-08-10 13:12:¶
With disk
type device I meant what the TYPE
column of lsblk
shows
so of course also whole toasters when shown as disk
by lsblk
in contrast to toaster slots that could be shown as part
by lsblk
;-)
OliverO2 commented at 2020-08-10 13:42:¶
@jsmeix
Actually, it seems like sedutil-cli
uses the nvme character device
nvme0
in the above example, which is just a control device. This
control device is used to create the actual disk-type devices called
namespaces (= "amount of NVM storage formatted for block access") like
nvme0n1
. Each namespace device can then be partitioned.
sedutil-cli --scan
is used to determine available Opal devices and
this seems to spit out the control devices in case of NVME. These
character devices are not considered by lsblk
.
Maybe we should campaign for toasters being supported as NVME devices, too, with the slots being the namespaces. ;-)
@bmastenbrook
Could you change usr/share/rear/lib/opal-functions.sh
like this and
try again?
-
Insert this function just above the definion of
function opal_device_regenerate_dek_ERASING_ALL_DATA()
:function opal_device_partprobe() { local device="${1:?}" # uses partprobe to update partition tables for all block devices belonging to the given Opal device. case "$device" in (*/nvme*) partprobe "$device"n[0-9] # consider all namespace block devices (NOTE: relies on nullglob) ;; (*) partprobe "$device" ;; esac }
-
Change all
partprobe
invocations toopal_device_partprobe
jsmeix commented at 2020-08-10 14:07:¶
@OliverO2
I noticed device="${1:?}"
and found the ${VAR:?}
form many times in
Opal related code
but I this leads to an unfriendly exit without any error message when
VAR is null or unset.
This is described in "man bash" (GNU bash, version 4.4.23 in my case):
${parameter:?word}
Display Error if Null or Unset.
If parameter is null or unset, the expansion of word
(or a message to that effect if word is not present)
is written to the standard error and
the shell, if it is not interactive, exits.
Otherwise, the value of parameter is substituted.
For example when I put
LogPrint TEST begin
foo=""
bar="${foo:?}" || BugError "foo null or unset"
LogPrint TEST end
in a script ReaR exits as follows
# usr/sbin/rear -D mkrescue
Relax-and-Recover 2.6 / Git
Running rear mkrescue (PID 8066 date 2020-08-10 16:04:49)
Using log file: /root/rear.github.master/var/log/rear/rear-linux-h9wr.log
TEST begin
Exiting rear mkrescue (PID 8066) and its descendant processes ...
Running exit tasks
rear mkrescue failed, check /root/rear.github.master/var/log/rear/rear-linux-h9wr.log for details
OliverO2 commented at 2020-08-10 15:33:¶
@jsmeix
These checks are just a compact attempt to catch internal bugs early
without putting too much error-checking distractions in the code. The
log file in your example helps to spot the cause:
2020-08-10 17:20:16.233077604 TEST begin
/home/oliver/Repositories/open-source/rear/usr/share/rear/prep/default/009_test.sh: line 3: foo: parameter null or not set
2020-08-10 17:20:16.234854487 Exiting rear mkrescue (PID 20879) and its descendant processes ...
Is there a better alternative?
- Not checking at all (like most code in ReaR) would let ReaR either continue with undesirable results or trigger a follow-up error later on which is probably harder to diagnose.
- Putting in explicit error-checking code everywhere (like
[[ -z "$foo" ]] || BugError "foo: parameter null or not set" ]]
) distracts from the primary functionality and makes the code harder to read.
jsmeix commented at 2020-08-11 10:42:¶
@OliverO2
I think explicit testing is the best way for bash scripts.
I try to explain my reasoning why below.
Regarding bar="${foo:?}"
:
The problem for the user is that he does not get the error message on
his terminal.
Let him search the actual error message from the log is not user
friendly.
He needs at least some message about what went wrong on his terminal.
Then he can search the log for more details related to that message.
To show the bash error foo: parameter null or not set
on his
terminal
and have that message also in the log so the user can find the matching
part in the log
{ bar="${foo:?}" ; } 2>> >( tee -a "$RUNTIME_LOGFILE" 1>&8 )
is the only way that works for me.
Why a compound group command { ... ; } 2>
must be used see
https://github.com/rear/rear/blob/master/usr/share/rear/lib/_input-output-functions.sh#L840
and for the complicated redirection via tee
process substitution see
https://github.com/rear/rear/blob/master/usr/share/rear/backup/BORG/default/500_make_backup.sh#L71
and see the explanatory comment for that code.
and for background information see the discussion in
https://github.com/rear/rear/pull/2382
in particular
https://github.com/rear/rear/pull/2382#discussion_r417852393
and subsequent comments there like
https://github.com/rear/rear/pull/2382#discussion_r418018571
and
https://github.com/rear/rear/pull/2382#discussion_r419378558
But what would code like
function opal_device_partprobe() {
{ local device="${1:?}" ; } 2>> >( tee -a "$RUNTIME_LOGFILE" 1>&8 )
actually help?
E.g. it would not catch if opal_device_partprobe /etc/fstab
was
called.
Others who read that code would get a longer exercise in reverse
engineering
what that complicated code intends to do or a longer comment is needed
that explains them what the idea behind is.
In contrast explicit testing like
function opal_device_partprobe() {
local device="$1"
test -b "$device" -o -c "$device" || BugError "opal_device_partprobe called for '$device' that is neither a block nor a character device"
makes it obvious for a code reader what that code is about and
the user gets a helpful error message via ReaR standard functionality.
OliverO2 commented at 2020-08-11 13:40:¶
@jsmeix Since this is about ReaR coding conventions in general, could we continue in a separate issue? This could also prompt some feedback from other contributors. I don't know if is is possible to move the above issue comments to a new issue. If so, I'd appreciate it.
jsmeix commented at 2020-08-11 13:42:¶
@OliverO2
yes, it does not belong to this issue.
jsmeix commented at 2020-08-12 08:42:¶
@OliverO2
would you like to open that separated issue about ReaR coding
conventions?
I ask you because I think when you do the initial description
the issue would show things from your point of view which is the point
of view
we need primarily as initial input to get something new and interesting.
In contrast if I did the initial description the issue would
show things from my point of view which is what I described in
https://github.com/rear/rear/issues/2475#issuecomment-671871423
which is no new and interesting input but "just old known stuff" as in
https://github.com/rear/rear/wiki/Coding-Style
And there is no time pressure, in particular no need to change things
in your code right now, so take your time as long as you need.
OliverO2 commented at 2020-08-12 09:53:¶
@jsmeix Sure, I can open the issue. The only downside is that I don't know how to move comments from this issue to the new one. If this were possible at all, it would probably be restricted to maintainers. But let's do it this way, I'll start the new issue sometime this week, so stay tuned. :-)
jsmeix commented at 2020-08-12 11:09:¶
@OliverO2
thank you in advance!
I don't know if it is possible to move comments to another issue.
I see a "Reference in new issue" functionality but that seems to be
something different.
I can hide comments in issues.
So to "move" a comment I would manually copy&paste it into the new
issue
and then I could hide the old comment in the old issue.
OliverO2 commented at 2020-08-23 20:04:¶
@bmastenbrook Thanks again for opening this issue. Just a friendly reminder: Could you please respond to the suggested solution (https://github.com/rear/rear/issues/2475#issuecomment-671361418) or state if you have lost interest, so that we have an idea on how to proceed?
OliverO2 commented at 2020-09-04 08:50:¶
@bmastenbrook I have prepared the above pull request to ease testing. You could use the patch https://github.com/rear/rear/pull/2488.patch or the entire code at https://github.com/OliverO2/rear/tree/feature-opal-nvme-support. As time permits, of course. 😊
github-actions commented at 2020-11-04 01:34:¶
Stale issue message
jsmeix commented at 2020-11-05 13:51:¶
With
https://github.com/rear/rear/pull/2488
merged
this issue should be fixed.
gitthangbaby commented at 2020-11-15 20:56:¶
i got the git version, flashed Secure Boot PBA and a nice plymouth dialog is displayed after 11 seconds (not very fast, exactly like LUKS delay, but better than DTA PBA of 5 seconds + forced reboot 15 sec). After setting password, the message says "drive couldn't be unlocked" and next attempts fail to progress booting. In fact, the NVME is indeed unlocked with first attempt, as I can see when i fallback to OPAL PBA> command line (and by a negative check of falling back before password entry). I can unlock with sed-opal-unlocker also as only PBA changed. I have two NVME so the fastest approach is sed-opal-unlocker service with 1 sec delay but not suitable for single drive nor boot files being on the SED drive which i'd like.
unlockers summary:
BIOS: password prompt can be triggered with MBREnable=false but it
doesn't work
DTA PBA: works with 5 + 15sec delay, except Ryzen systems, unreadable
GUI with no pwd repeat, no Secure Boot?
DTA forks: some bring different hashing, some bring support for Ryzen,
added password save for suspend, none are compatible with DTA.
unreadable GUI with no pwd repeat, no Secure Boot?
sed-opal-unlocker: almost no delay, password can be in hashed file,
added password save for suspend, runs as a service too late to move any
boot files to SED drive, compatible with DTA, Secure boot ok
rear: 11sec delay, nice GUI with pwd repeat, added password save for
suspend?, compatible with DTA, Secure boot ok
OliverO2 commented at 2020-11-16 17:47:¶
@gitthangbaby Thanks for your report.
If the PBA fails to unlock your device, could you please use the rescue system to provide a detailed log?
This is what you could do to help find the cause:
- Prepare a rescue system on a USB stick according to Step 3 in Quick Start: Setting up Self-Encrypting Disks.
- Shut down your system so that SEDs become locked.
- Boot the rescue system.
- Log in as
root
(locally, or from another system viassh
) - Run these commands:
lsblk -o +MODEL
rear opaladmin info
- Provide the output of the above commands.
- Run this command to attempt to unlock SEDs.
rear -D opaladmin unlock
- Provide the complete contents of the log file
/var/log/rear/rear-*.log
- NOTE: Capture the log contents right away as it will be overwritten by the next rear invocation.
- Run once more:
rear opaladmin info
- Provide the output of the above command.
gitthangbaby commented at 2020-11-24 18:14:¶
Hi. Sorry for no reply, I had a misfortune of flashing the Rear USB
image (thinking it's PBA) to unlocked NVME drive (thinking it's locked),
so i lost all distros. The Rear USB started with grub> only.
Since i have advantage of two drives, i decided to put effort to
initram-ize the boot process and it was extremely difficult but
successful.
OliverO2 commented at 2020-11-24 22:04:¶
@gitthangbaby No problem. It would still be valuable if you could try the above steps and provide output and log file. I'm pretty sure whatever is is, it can be fixed easily once we have the necessary information.
[Export of Github issue for rear/rear.]