#2474 Issue closed: Opal PBA does not boot on AMD APU/GPU systems: Missing firmware for amdgpu module

Labels: enhancement, fixed / solved / done

bmastenbrook opened issue at 2020-08-08 21:43:

Relax-and-Recover (ReaR) Issue Template

Fill in the following items before submitting a new issue
(quick response is not guaranteed with free support):

  • ReaR version ("/usr/sbin/rear -V"):
    Relax-and-Recover 2.6-git.4108.0f2f4a02.master / 2020-08-07

  • OS version ("cat /etc/os-release" or "lsb_release -a" or "cat /etc/rear/os.conf"):
    Ubuntu 20.04.1 LTS (Focal Fossa)

  • ReaR configuration files ("cat /etc/rear/site.conf" and/or "cat /etc/rear/local.conf"):
    N/A without workaround described below

  • Hardware (PC or PowerNV BareMetal or ARM) or virtual machine (KVM guest or PoverVM LPAR):
    PC (ThinkPad X395)

  • System architecture (x86 compatible or PPC64/PPC64LE or what exact ARM device):
    x86

  • Firmware (BIOS or UEFI or Open Firmware) and bootloader (GRUB or ELILO or Petitboot):
    UEFI / GRUB

  • Storage (local disk or SSD) and/or SAN (FC or iSCSI or FCoE) and/or multipath (DM or NVMe):
    NVMe WDC PC SN730 SDBQNTY-256G-1001

  • Storage layout ("lsblk -ipo NAME,KNAME,PKNAME,TRAN,TYPE,FSTYPE,SIZE,MOUNTPOINT" or "lsblk" as makeshift):

NAME           MAJ:MIN RM   SIZE RO TYPE  MOUNTPOINT
nvme0n1        259:0    0 238.5G  0 disk  
├─nvme0n1p1    259:1    0    94M  0 part  /boot/efi
├─nvme0n1p2    259:2    0  15.3G  0 part  
│ └─swap_crypt 253:0    0  15.3G  0 crypt [SWAP]
└─nvme0n1p3    259:3    0 223.1G  0 part  /home
  • Description of the issue (ideally so that others can reproduce it):

The Opal PBA image includes modules from the running system but omits firmware. During boot, KMS is enabled by default, and the amdgpu module attempts to enable it. However, without firmware, this process hangs, resulting in an unusable PBA.

  • Workaround, if any:

OPAL_PBA_KERNEL_CMDLINE="nomodeset" in local.conf prevents amdgpu from attempting to configure KMS and prevents the hang. Alternatively, editing prep/OPALPBA/Linux-i386/001_configure_workflow.sh to prevent FIRMWARE_FILES from being set to no results in a raw image that works when booted off of a flash drive, but which is too large to be used as a PBA.

  • Attachments, as applicable ("rear -D mkrescue/mkbackup/recover" debug log files):

N/A

jsmeix commented at 2020-08-10 10:13:

I know nothing about Opal PBA but I guess some code as in
usr/share/rear/build/GNU/Linux/420_copy_firmware_files.sh
https://github.com/rear/rear/blob/master/usr/share/rear/build/GNU/Linux/420_copy_firmware_files.sh
might be also useful when creating the Opal PBA image?

@OliverO2
could you have a look at this issue here?

OliverO2 commented at 2020-08-10 10:25:

@bmastenbrook Thanks for this detailed report and @jsmeix thanks for pinging me. I'll look into this.

OliverO2 commented at 2020-08-10 10:45:

My first idea would be to edit usr/share/rear/prep/OPALPBA/Linux-i386/001_configure_workflow.sh as follows:

  • change FIRMWARE_FILES=( 'no' )
  • to FIRMWARE_FILES=( 'amdgpu' )

@bmastenbrook Could you check if this change works?

If so, I'd envision to introduce a variable OPAL_PBA_FIRMWARE_FILES which would enable overriding the current configuration for the PBA only. In addition, maybe it would be a good idea to set FIRMWARE_FILES=( 'no' ) only if the initial setting has been FIRMWARE_FILES=( 'yes' ) or FIRMWARE_FILES=() as these would make the PBA become too large.

OliverO2 commented at 2020-08-23 20:05:

@bmastenbrook Thanks again for opening this issue. Just a friendly reminder: Could you please respond to the suggested solution above or state if you have lost interest, so that we have an idea on how to proceed?

bmastenbrook commented at 2020-09-03 01:59:

@OliverO2 Sorry, I've been pretty busy. I'll give this a try next week, but I can't promise anything sooner. Apologies for the issue-and-run 🙃

OliverO2 commented at 2020-09-04 08:27:

@bmastenbrook That's OK 😊. As you came up with a high quality issue report and seems a permanent fix is just a few steps ahead, I justed wanted to grab the occasion. Take your time, it would still be great if you could test the above suggestion. Maybe you could even report the output of lspci | grep VGA. If we could auto-configure the FIRMWARE_FILES setting you (and others) would never have to worry about this again.

github-actions commented at 2020-11-04 01:34:

Stale issue message

jsmeix commented at 2020-11-05 13:58:

With https://github.com/rear/rear/pull/2507 merged
this issue should be fixed.


[Export of Github issue for rear/rear.]