#2507 PR merged
: Fix #2474: add AMD firmware to OPAL PBA if necessary¶
Labels: enhancement
, fixed / solved / done
OliverO2 opened issue at 2020-11-03 14:47:¶
Pull Request Details:¶
-
Type: Bug Fix
-
Impact: Normal
-
Reference to related issue (URL): #2474
-
How was this pull request tested?
As AMD graphics hardware was not available on the test system, tests were run by mocking as follows:
-
Add these lines in
usr/share/rear/lib/mkopalpba-workflow.sh
afterSourceStage "prep"
:LogPrintError "FIRMWARE_FILES=<${FIRMWARE_FILES[*]}>" exit
-
Run
usr/sbin/rear mkopalpba
with default configuration (Intel graphics), expect result:FIRMWARE_FILES=<no>
. -
In
default.conf
use these single-line configuration changes:FIRMWARE_FILES=( yes )
, expect result:FIRMWARE_FILES=<no>
.FIRMWARE_FILES=( a b c )
, expect result:FIRMWARE_FILES=<a b c>
.OPAL_PBA_FIRMWARE_FILES=( one two )
, expect result:FIRMWARE_FILES=<one two>
.
-
In
usr/share/rear/conf/GNU/Linux.conf
, change grep expressionAMD
toIntel
, expect result:FIRMWARE_FILES=<amdgpu>
.
-
-
Brief description of the changes in this pull request:
OPAL PBA images only:
- Add a required firmware file when AMD graphics hardware is present.
- Allow setting a PBA-specific firmware configuration via the
OPAL_PBA_FIRMWARE_FILES
configuration variable.
The issue report #2474 indicates that a single firmware file is all that is required to make a PBA boot with AMD graphics hardware. If field testing shows that more changes are necessary, this PR's solution should be easy to extend.
jsmeix commented at 2020-11-04 09:44:¶
@OliverO2
as always thank you so much for your continuous valuable contributions
to ReaR
that help so much to make ReaR step by step better and better.
I think it is not right that a single firmware file is all that is
required
to make a system boot with AMD graphics hardware because on my
openSUSE Leap 15.1 system find /lib*/firmware -ipath '*amdgpu*'
lists more than 300 firmware files and their names indicate that
each kind of AMD GPU chips has and needs its own matching firmware
which is what I would expect for all those many different AMD GPU
chips,
cf.
https://en.wikipedia.org/wiki/Radeon
On my openSUSE Leap 15.1 system there are 24MB of AMD GPU firmware files:
# du -hs /lib/firmware/amdgpu/
24M /lib/firmware/amdgpu/
I hope this is not too much for a PBA?
OliverO2 commented at 2020-11-04 16:04:¶
@jsmeix
Thank you for the thorough review, it seems to be a bit more tricky than
I had hoped for.
Note that all of the following only applies to PBA images. Standard
ReaR images will contain the amdgpu
module and all firmware by
default.
The PBA image on my Ubuntu 20.04 desktop is about 46 MB (du -hs
). AMD
firmware looks like this:
$ du -hs /lib/firmware/amdgpu
33M /lib/firmware/amdgpu
On Ubuntu 20.04 the amdgpu
firmware directory contains 336 files for
30 different GPU models. While we do not know which of those might
trigger the boot-time freeze, at least some AMD GPUs seem to be
affected:
Without this package installed, poor 2D/3D performance is commonly experienced. Some GPUs may require firmware to function properly at all.
The spec. allows a PBA size up to 128 MB. So we could just throw in that entire firmware directory here to avoid boot-time freezes. Downside: Slows down PBA uploading to the device (can take minutes).
Additional notes:
-
We should not change
FIRMWARE_FILES
: By default (non-PBA), it already includes all available files. If the user chose to restrict firmware files, we should not override this choice by auto-configuration. -
With respect to modules, I don't think we need to change anything (PBA or non-PBA):
default.conf
containsMODULES=( 'all_modules' )
, which should include theamdgpu
module automatically.usr/share/rear/prep/OPALPBA/Linux-i386/001_configure_workflow.sh
containsMODULES=( 'loaded_modules' )
which apparently also includes theamdgpu
module, otherwise the issue #2474 would not have occured as described (the kernel would have never tried to load AMD firmware).
OliverO2 commented at 2020-11-04 17:21:¶
With 0a1ba2f3c8fa08de8d5528dc1bb583d94a5eddcc on Ubuntu 20.04 desktop, the PBA size is
- 48 MB for systems without AMD GPU,
- 59 MB for systems with AMD GPU.
Looks like we get a good compression ratio on these files in
initrd.cgz
.
jsmeix commented at 2020-11-05 09:00:¶
@OliverO2
thank you for your explanatory reply.
It helped me a lot to avoid a mistake.
I fully agree with your reasoning in
https://github.com/rear/rear/pull/2507#issuecomment-721820442
therein in particular
We should not change FIRMWARE_FILES:
By default (non-PBA), it already includes all available files.
If the user chose to restrict firmware files, we should not override this choice by auto-configuration.
Yesterday I was too fast with my (false) conclusion that because of
/lib/firmware/amdgpu/* firmware files plus the amdgpu kernel module
are needed in any case to boot the ReaR recovery system
on systems with AMD-GPU graphics hardware
we should have in ReaR some hardcoded
if type -p lspci >/dev/null && lspci | grep --quiet ' VGA .*AMD'; then
MODULES+=( amdgpu )
FIRMWARE_FILES+=( '*amdgpu*' )
My conclusion is wrong because it would be against what I always
demand:
"Final power to the user!"
In contrast with such hardcoded things it would be ReaR that has the
final power
what gets included in the recovery system regardless what the user may
have
specified in his etc/rear/local.conf.
jsmeix commented at 2020-11-05 09:01:¶
@rear/contributors
if there are no objections I would like to merge it today afternoon.
jsmeix commented at 2020-11-05 10:56:¶
@OliverO2
I think my argument in
https://github.com/rear/rear/pull/2507#issuecomment-722241196
also applies to
if type -p lspci >/dev/null && lspci | grep --quiet ' VGA .*AMD'; then
OPAL_PBA_FIRMWARE_FILES+=( '*/amdgpu/*' )
fi
which hardcodedly adds amdgpu firmware files regardless what
the user has specified for OPAL_PBA_FIRMWARE_FILES in his local.conf.
So I wonder if not something like
if type -p lspci >/dev/null && lspci | grep --quiet ' VGA .*AMD'; then
# Add amdgup firmware files unless the user has specified to not add any firmware files:
is_false "$OPAL_PBA_FIRMWARE_FILES" || OPAL_PBA_FIRMWARE_FILES+=( '*/amdgpu/*' )
fi
would be better?
I know this is probably somewhat nitpicking or even over the top
but I like to understand how OPAL_PBA_FIRMWARE_FILES is meant to be
used.
OliverO2 commented at 2020-11-05 12:29:¶
@jsmeix
Yes, in general you are right that OPAL_PBA_FIRMWARE_FILES+=...
always
extends the user's configuration. In this respect it works more like
COPY_AS_IS
and others in usr/share/rear/conf/GNU/Linux.conf
.
If we would like to give the user more control, supporting just a "no firmware" decision seems insufficient. More likely an expert user would want to include some firmware as in
OPAL_PBA_FIRMWARE_FILES=( '*/amdgpu/vega20*' )
To support this, we could choose to enable AMD firmware auto-detection
only if OPAL_PBA_FIRMWARE_FILES
is empty. The downside would be that
if a user wanted to include some unrelated firmware like
*/intel-ucode/*
, the user would also have to remember dealing with AMD
GPU quirks, probably without knowing.
The question is, given that
- the added weight of AMD firmware files is not that significant and
- those files are only present when an AMD GPU is detected, and
- the probable outcome of not correctly including AMD firmware is a non-booting system,
is it worth it?
jsmeix commented at 2020-11-05 13:45:¶
@OliverO2
I will merge it as is because in its current state it is at least "good
enough".
If further enhancements are needed we can add them as actually needed.
[Export of Github issue for rear/rear.]