#3208 Issue closed: BACKUP=CDM missing security certificate for replica Rubrik instance

Labels: waiting for info, support / question, external tool, no-issue-activity

rmccrack opened issue at 2024-04-19 18:51:

  • ReaR version ("/usr/sbin/rear -V"):
    rear-2.7-1.git.5366

  • If your ReaR version is not the current version, explain why you can't upgrade:

  • OS version ("cat /etc/os-release" or "lsb_release -a" or "cat /etc/rear/os.conf"):
    RHEL 7.9

  • ReaR configuration files ("cat /etc/rear/site.conf" and/or "cat /etc/rear/local.conf"):

BACKUP=CDM
OUTPUT=ISO
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/lib64/python2.7/site-packages/;/usr/lib64/bind9-export/:/usr/lib64/R/lib/"
  • Hardware vendor/product (PC or PowerNV BareMetal or ARM) or VM (KVM guest or PowerVM LPAR):
    Vmware VM

  • System architecture (x86 compatible or PPC64/PPC64LE or what exact ARM device):

  • Firmware (BIOS or UEFI or Open Firmware) and bootloader (GRUB or ELILO or Petitboot):

  • Storage (local disk or SSD) and/or SAN (FC or iSCSI or FCoE) and/or multipath (DM or NVMe):

  • Storage layout ("lsblk -ipo NAME,KNAME,PKNAME,TRAN,TYPE,FSTYPE,LABEL,SIZE,MOUNTPOINT"):

  • Description of the issue (ideally so that others can reproduce it):

When I attempt a restore from an iso created
using the rear -v mkrescue command,
and select the option to restore from a replica Rubrik cluster,
the agent will not register on the replica cluster
when I attempt to register it using the IP address.
I get an error indicating the agent is not running
(refusal to connect on port 12801).
When I check on the partially restored system,
the Rubrik agents are not running.
Attempts to start the agents using
"systemctl start rubrikagents"
result in an error
"rubrikagents.service not found"

  • Workaround, if any:

None I can find so far.

  • Attachments, as applicable ("rear -D mkrescue/mkbackup/recover" debug log files):

The backup log files did not reflect any errors.
The system I was trying to recover (a Vmware VM)
no longer exists and I have no option left to restore
except to use the Rubrik replica copy of the backup.

You can drag-drop log files into this editor to create an attachment
or paste verbatim text like command output or file content
by including it between a leading and a closing line of
three backticks like this:

verbatim content

gdha commented at 2024-04-26 14:20:

@rmccrack If you look into the /usr/share/rear/conf/default.conf file you will find:

####
# BACKUP=CDM (Rubrik CDM; Cloud Data Management)
##
# ReaR support for Rubrik Cloud Data Management (CDM).
# ReaR will copy the Rubrk RBS agent and required OS binaries to its ISO for incluson on boot.
# ReaR will start the Rubrik RBS agent when 'rear recover' is run.
COPY_AS_IS_CDM=( /etc/rubrik /usr/bin/rubrik /var/log/rubrik /etc/pki )
COPY_AS_IS_EXCLUDE_CDM=( /var/log/rubrik/* )
PROGS_CDM=( /usr/bin/rubrik/backup_agent_main /usr/bin/rubrik/bootstrap_agent_main openssl uuidgen )
####

And also check script /usr/share/rear/verify/CDM/default/450_start_cdm_rbs.sh - could be that the content is a bit outdated?

rmccrack commented at 2024-05-13 19:52:

The issue is not with starting the Rubrik agent, it is that the original Rubrik installation, which is what the iso contains, does not have the correct security certificate for the replica Rubrik instance. This is embedded in the rpm package that is normally downloaded from the Rubrik instance, and is installed when that package is installed. It is possible to download that package, as the curl utility does seem to be available. But once downloaded, neither rpm nor yum are available to install the replacement package, and without that one cannot complete the step of registering the system by IP address on the replica Rubrik instance. Consequently, one cannot complete the recovery.

gdha commented at 2024-05-14 06:23:

@rmccrack Certificates of Rubrik are stored on your original system on a dedicated location. Please check the Rubrik documentation if you don't know it by heart. You could also verify inside the RPM package with rpm -qpl rubrik.rpm command.
Once found the location you could add it to the COPY_AS_IS_CDM=( /etc/rubrik /usr/bin/rubrik /var/log/rubrik /etc/pki ) array in the /usr/share/rear/conf/default.conf file.

jsmeix commented at 2024-05-14 06:25:

@rmccrack
the ReaR recovery system is intentionally a minimal system
that contains only what is needed for "rear recover"
because for each system one system specific
ReaR recovery system is needed so the ReaR recovery system
should not be bigger than what is actually needed.
Think about bigger environments where ReaR is used
for very many systems - one needs storage space for
all those system specific ReaR recovery systems.

In particular there is no software package management
in the ReaR recovery system by default.

You could add what you like to the ReaR recovery system
via COPY_AS_IS, REQUIRED_PROGS, and LIBS
see their descriptions in
usr/share/rear/conf/default.conf

All what is needed for "rear recover" must be put
into the ReaR recovery system during "rear mkrescue".

So the missing "security certificate" has to be be put
into the ReaR recovery system during "rear mkrescue".

This is what @gdha tried to tell you.
I.e. you may use e.g. COPY_AS_IS_CDM
to copy the missing "security certificate"
into the ReaR recovery system during
"rear mkrescue/mkbackup".

In case of emergency during "rear recover" you can add
missing files into the running ReaR recovery system
e.g. via "curl" or "scp" or via USB stick or whatever.
But you would have to provide the actual missing files
and not those files somehow packaged into a package format
that you cannot unpack within the ReaR recovery system.
So if the missing "security certificate" is one file
you could use e.g. "curl" to get that file from another
system and copy it into the running ReaR recovery system.
When you need to copy many files into the running ReaR
recovery system you can use a tar archive because 'tar'
is always available in the ReaR recovery system, see
REQUIRED_PROGS in usr/share/rear/conf/default.conf
and see also RECOVERY_UPDATE_URL therein.

FYI
in general regarding issues with third-party backup tools:

Usually we at ReaR upstream do not have or use third-party backup tools
(in particular not if a third-party backup tool is proprietary software)
so usually we cannot reproduce issues with third-party backup tools.

In case of issues with third-party backup tools and ReaR
we at ReaR upstream can usually do nothing but totally
depend on contributions and help from those specific users
who use and know about each specific third-party backup tool.

rmccrack commented at 2024-05-14 16:05:

First, all the "COPY_AS_IS" information is beside the point, because the required certificate for the Rubrik replica server is not on the original system. True, I could anticipate and stash the required file somewhere, but then the user would have to have a simple way to stop the Rubrik agent, move the certificate file into place, and restart the agent. With the usual "service" or "systemctl" utilities non-existent (apparently) that's not a trivial issue and at the very least your program should be putting detailed instructions to the screen for this when a replica is to be used.

If you are not intending to support use of a Rubrik replica instance for a restore, why do you prompt for a yes/no answer regarding whether the original CDM instance is being used for recovery? And then, if the answer is no, prompt for the IP address of a node in the replica instance? These are valid questions if you can use the replica to restore from. Otherwise, they are misleading because the fact is, one can only use the original instance as the code stands today.

What I need is for our average off-hours support technician to be able to use this product to get a trashed unix virtual system back up and running in a reasonable period of time. They aren't going to know all the intricacies of either ReaR or Rubrik to "work around" this issue. And for reasons I won't detail, as often as not a replica copy needs to be used.

github-actions commented at 2024-07-14 02:28:

Stale issue message

gdha commented at 2024-09-06 12:47:

@rmccrack The setting in configuration file conf/default.conf:PROGS_CDM=( /usr/bin/rubrik/backup_agent_main /usr/bin/rubrik/bootstrap_agent_main openssl uuidgen ) makes sure the needed software is copied into the rescue image of ReaR.

The script prep/CDM/default/450_check_cdm_client.sh checks if the the client software /usr/bin/rubrik/backup_agent_main is present. However, it seems there is no code present to start the executable it seems, or am I wrong?

What is the content of rubrikagents.service on your system?
And, there must be a public Rubrik certificate available on your client system as otherwise I don't see how if can identify itself with the Rubrik master?

rmccrack commented at 2024-09-06 13:51:

We figured out what is going wrong and developed a workaround.

When the option to use a replica is selected, four things are attempted:

  1. The /etc/rubrik/keys/agent.pem, agent.crt, and rubrik.crt files are renamed.
  2. The Solaris agent package is downloaded from the replica server, unpacked, and the /etc/rubrik/keys/rubrik.crt file replaced.
  3. The /etc/rubrik/rba_keygen.sh script is run to replace the /etc/rubrik/keys/agent.pem and agent.crt files.
  4. The two agent programs are re-launched.

A significant percentage of the time, one or both of operations 2 and 3 fail, silently. This leaves one or more of the agent.pem, agent.crt, and rubrik.crt files missing and as a result the agent restarts fail, which in turn causes the remainder of the recovery to fail.

Our workaround is a script that we place in the /etc/rear directory that re-attempts the previously described steps, with extensive error-checking. We choose the option to restore from the original Rubrik cluster, and when we reach the prompt following filesystem creation, etc. we run our script. After successful completion, we continue the normal procedure of registering by IP address on the replica cluster, etc.

Regards,
Ron McCracken

github-actions commented at 2025-02-16 02:40:

Stale issue message


[Export of Github issue for rear/rear.]