#531 Issue closed: rear git201501071534 recovery system on Fedora 21: systemd at 100% CPU (/sbin/udevd missing)

Labels: enhancement, support / question, fixed / solved / done

jsmeix opened issue at 2015-01-20 11:18:

I am testing rear git201501071534 version on Fedora 21:

When I boot the rear recovery system it works but
there are two processes "systemd" and "systemd-journal"
where each one continuously runs basically
at 50% CPU usage.

The following dirty hack seems to "fix" it for me:

ln -s /bin/true /sbin/udevd

Details:

RESCUE f42:~ # cat /etc/os-release 
NAME=Fedora
VERSION="21 (Twenty One)"
ID=fedora
VERSION_ID=21
PRETTY_NAME="Fedora 21 (Twenty One)"
ANSI_COLOR="0;34"
CPE_NAME="cpe:/o:fedoraproject:fedora:21"
HOME_URL="https://fedoraproject.org/"
BUG_REPORT_URL="https://bugzilla.redhat.com/"
REDHAT_BUGZILLA_PRODUCT="Fedora"
REDHAT_BUGZILLA_PRODUCT_VERSION=21
REDHAT_SUPPORT_PRODUCT="Fedora"
REDHAT_SUPPORT_PRODUCT_VERSION=21

RESCUE f42:~ # top
...
  PID USER  ...  %CPU ...  COMMAND
   54  root          50.0         systemd-journal
     1  root          43.8         systemd

RESCUE f42:~ # journalctl
...
 systemd[1]: Enqueued job udev.service/start as 14149229
 systemd[1]: Installed new job udev.service/start as 14149229
 systemd[1]: Trying to enqueue job udev.service/start/replace
 systemd[1]: Incoming traffic on udev-kernel.socket
 systemd[1]: udev-kernel.socket changed running -> listening
 systemd[1]: udev-kernel.socket got notified about service death (failed permanently: no)
 systemd[1]: Started udev Kernel Device Manager.
 systemd[1]: Job udev.service/start finished, result=done
 systemd[1]: Starting of udev.service requested but condition failed. Ignoring.
 systemd[1]: ConditionPathExists=/sbin/udevd failed for udev.service.
 systemd[1]: udev-kernel.socket changed listening -> running
 systemd[1]: Enqueued job udev.service/start as 14149223
 systemd[1]: Installed new job udev.service/start as 14149223
 systemd[1]: Trying to enqueue job udev.service/start/replace
 systemd[1]: Incoming traffic on udev-kernel.socket
 systemd[1]: udev-kernel.socket changed running -> listening
 systemd[1]: udev-kernel.socket got notified about service death (failed permanently: no)
 systemd[1]: Started udev Kernel Device Manager.
 systemd[1]: Job udev.service/start finished, result=done
 systemd[1]: Starting of udev.service requested but condition failed. Ignoring.
 systemd[1]: ConditionPathExists=/sbin/udevd failed for udev.service.
...

RESCUE f42:~ # ls /sbin/udevd
ls: cannot access /sbin/udevd: No such file or directory

RESCUE f42:~ # ps auxw | grep udev
root ... /usr/lib/systemd/systemd-udevd

RESCUE f42:~ # ln -s /bin/true /sbin/udevd

RESCUE f42:~ # top
...
  PID USER  ...  %CPU ...  COMMAND
     1  root           0.0          systemd

RESCUE f42:~ # journalctl
...
 systemd[1]: udev.service failed.
 systemd[1]: Unit udev.service entered failed state.
 systemd[1]: Unit udev-kernel.socket entered failed state.
 systemd[1]: udev-kernel.socket changed running -> failed
 systemd[1]: udev-kernel.socket got notified about service death (failed permanently: yes)
 systemd[1]: Failed to start udev Kernel Device Manager.
 systemd[1]: Job udev.service/start finished, result=failed
 systemd[1]: udev.service changed dead -> failed
 systemd[1]: start request repeated too quickly for udev.service
 systemd[1]: Starting udev Kernel Device Manager...
 systemd[1]: ConditionPathExists=/sbin/udevd succeeded for udev.service.

jsmeix commented at 2015-01-20 11:47:

See also https://bugzilla.opensuse.org/show_bug.cgi?id=908854#c9

gdha commented at 2015-01-20 12:49:

That must be introduced by ./skel/default/usr/lib/systemd/system/udev.service script and I think the guilty entry might be Restart=on-failure in the service description.

[Unit]
Description=udev Kernel Device Manager
Wants=udev-control.socket udev-kernel.socket
After=udev-control.socket udev-kernel.socket
Before=basic.target
DefaultDependencies=no
ConditionPathExists=/sbin/udevd

[Service]
Type=notify
OOMScoreAdjust=-1000
Sockets=udev-control.socket udev-kernel.socket
Restart=on-failure
ExecStart=/sbin/udevd

Perhaps adding RestartSec=10s would be enough to reduce the CPU load?

jsmeix commented at 2015-01-20 13:51:

Still same CPU load in the recovery system now with:

RESCUE f42:~ # cat /usr/lib/systemd/system/udev.service
[Unit]
Description=udev Kernel Device Manager
Wants=udev-control.socket udev-kernel.socket
After=udev-control.socket udev-kernel.socket
Before=basic.target
DefaultDependencies=no
ConditionPathExists=/sbin/udevd
[Service]
Type=notify
OOMScoreAdjust=-1000
Sockets=udev-control.socket udev-kernel.socket
Restart=on-failure
RestartSec=10s
ExecStart=/sbin/udevd
RESCUE f42:~ #

But CPU load is not the actual problem - it is only a symptom.

Regarding the actual problem:

It seems in my original Fedora 21 system there is no udevd executable:

[root@f42 ~]# type -a udevd
-bash: type: udevd: not found
[root@f42 ~]# ls -l /sbin/udevd /usr/sbin/udevd /bin/udevd /usr/bin/udevd
ls: cannot access /sbin/udevd: No such file or directory
ls: cannot access /usr/sbin/udevd: No such file or directory
ls: cannot access /bin/udevd: No such file or directory
ls: cannot access /usr/bin/udevd: No such file or directory

Accordingly it seems to not make sense to have a systemd unit file in the rear recovery system with "ExecStart=/sbin/udevd" because this can never work.

gdha commented at 2015-01-20 14:45:

@jsmeix That is service for older Fedora releases (and/or other distro's) still referring to this service of udevd. If you put the Restart=on-failure in comment then the problem is gone.

jsmeix commented at 2015-01-20 14:53:

FYI:

What "ExecStart=..." exist in my recovery system and what executables are missing there:

RESCUE f42:~ # find /usr/lib/systemd/ | xargs grep -h 'ExecStart=' 2>/dev/null | grep -o '/[^ ]*' | sort -u
/bin/agetty
/bin/dbus-daemon
/bin/systemctl
/etc/scripts/boot
/etc/scripts/run-sshd
/etc/scripts/run-syslog
/etc/scripts/system-setup
/lib/systemd/systemd-logger
/lib/systemd/systemd-shutdownd
/sbin/agetty
/sbin/udevd
/usr/bin/udevadm
/usr/lib/systemd/systemd-journald
/usr/lib/systemd/systemd-udevd
RESCUE f42:~ # for e in $( find /usr/lib/systemd/ | xargs grep -h 'ExecStart=' 2>/dev/null | grep -o '/[^ ]*' | sort -u ) ; do ls -l $e | grep 'No such' ; done
ls: cannot access /lib/systemd/systemd-logger: No such file or directory
ls: cannot access /sbin/udevd: No such file or directory
RESCUE f42:~ # 

/lib/systemd/systemd-logger is used only in /usr/lib/systemd/system/systemd-logger.service as ExecStart=/lib/systemd/systemd-logger

/sbin/udevd is used only in /usr/lib/systemd/system/udev.service as
ConditionPathExists=/sbin/udevd and ExecStart=/sbin/udevd

I think when the rear recovery system is generated, there should be a test to ensure executables that are needed by systemd unit files do actually exist.

gdha commented at 2015-01-20 15:17:

@jsmeix Basically I agree with you. However, systemd development is going so fast and is different on several distro's that it is very hard for us to keep track where what is used and what became obsolete in the meantime. Remember, this is part of the skel tree.
If we want to change this then we should create additional scripts which figures this out during the recovery image build. Not impossible, but time consuming and error prone because it is impossible to test all possible combinations in a limited time period (which means other stuff may be broken).

To be honest, I do not care much about some missing executables as once the system is recovered it is finished with rear. I care about rear's functionality - doing a proper recovery.

jsmeix commented at 2015-01-20 15:33:

I fully agree that systemd stuff are at least currently various too fast moving targets so that others simply cannot keep up with that madness.

Feel free to close this issue.

Nevertheless I am willing to have a look if it is possible to add some check scripts that implement some very basic tests whether or not the rear recovery system seems to be self-consistent.

In this case I would like to get some initial basic hint how I should implement it in particular the location of such scripts in rear and how to make them automatically executed during "rear mkbackup/mkrescue".

jsmeix commented at 2015-01-20 15:48:

FYI:

What "ExecStart=..." exist in my recovery system for openSUSE 13.2
and what executables are missing there:

RESCUE f74:~ # find /usr/lib/systemd/ | xargs grep -h 'ExecStart=' 2>/dev/null | grep -o '/[^ ]*' | sort -u
/bin/agetty
/bin/dbus-daemon
/bin/systemctl
/etc/scripts/boot
/etc/scripts/run-sshd
/etc/scripts/run-syslog
/etc/scripts/system-setup
/lib/systemd/systemd-logger
/lib/systemd/systemd-shutdownd
/sbin/agetty
/sbin/udevd
/usr/bin/udevadm
/usr/lib/systemd/systemd-journald
/usr/lib/systemd/systemd-udevd
RESCUE f74:~ # for e in $( find /usr/lib/systemd/ | xargs grep -h 'ExecStart=' 2>/dev/null | grep -o '/[^ ]*' | sort -u ) ; do ls -l $e | grep 'No such' ; done
ls: cannot access /lib/systemd/systemd-logger: No such file or directory
ls: cannot access /lib/systemd/systemd-shutdownd: No such file or directory
RESCUE f74:~ # find /usr/lib/systemd/ | xargs grep '/lib/systemd/systemd-logger' 2>/dev/null
/usr/lib/systemd/system/systemd-logger.service:ExecStart=/lib/systemd/systemd-logger
RESCUE f74:~ # find /usr/lib/systemd/ | xargs grep '/lib/systemd/systemd-shutdownd' 2>/dev/null
/usr/lib/systemd/system/systemd-shutdownd.service:ExecStart=/lib/systemd/systemd-shutdownd

gdha commented at 2015-01-23 15:24:

@jsmeix about:

Nevertheless I am willing to have a look if it is possible to add some check scripts that implement some very basic tests whether or not the rear recovery system seems to be self-consistent.

In this case I would like to get some initial basic hint how I should implement it in particular the location of such scripts in rear and how to make them automatically executed during "rear mkbackup/mkrescue".

I would add these scripts somewhere in the build stage (use -s option):

Source rescue/GNU/Linux/95_cfg2html.sh
Source rescue/GNU/Linux/96_collect_MC_serviceguard_infos.sh
Source build/GNU/Linux/00_create_symlinks.sh
Source build/GNU/Linux/09_create_lib_directories_and_symlinks.sh
Source build/GNU/Linux/10_copy_as_is.sh
Source build/GNU/Linux/11_touch_empty_files.sh
Source build/GNU/Linux/13_create_dotfiles.sh
Source build/GNU/Linux/15_adjust_permissions.sh
Source build/GNU/Linux/16_adjust_sshd_config.sh
Source build/GNU/Linux/39_copy_binaries_libraries.sh
Source build/GNU/Linux/40_copy_modules.sh
Source build/default/50_patch_sshd_config.sh
Source build/GNU/Linux/60_verify_and_adjust_udev.sh
Source build/GNU/Linux/61_verify_and_adjust_udev_systemd.sh
Source build/default/96_remove_encryption_keys.sh
Source build/default/97_add_rear_release.sh
Source build/default/98_verify_rootfs.sh
Source build/default/99_update_os_conf.sh
Source pack/Linux-i386/30_copy_kernel.sh

jsmeix commented at 2015-01-26 13:12:

Right now I found another issue where something is missing
in the recovery system:

When a vfat filesystem should be recreated "rear recover" fails with

/var/lib/rear/layout/diskrestore.sh: line 156: dosfslabel: command not found

One can manually make it work by adding

PROGS=( ${PROGS[@]} dosfslabel )

in /etc/rear/local.conf but it would be nicer if "rear mkbackup" itself
detects what is needed in the recovery system and reports an error
if what is needed for the recovery system cannot be found
in the original system where "rear mkbackup" runs.

gdha commented at 2015-01-26 17:46:

try a 'grep -r dosfslabel .' and you will see it is detected for EFI
systems where a vfat partition is required. Do you need it for something
else?

On Mon, Jan 26, 2015 at 2:12 PM, Johannes Meixner notifications@github.com
wrote:

Right now I found another issue where something is missing
in the recovery system:

When a vfat filesystem should be recreated "rear recover" fails with

/var/lib/rear/layout/diskrestore.sh: line 156: dosfslabel: command not found

One can manually make it work by adding

PROGS=( ${PROGS[@]} dosfslabel )

in /etc/rear/local.conf but it would be nicer if "rear mkbackup" itself
detects what is needed in the recovery system and reports an error
if what is needed for the recovery system cannot be found
in the original system where "rear mkbackup" runs.


Reply to this email directly or view it on GitHub
https://github.com/rear/rear/issues/531#issuecomment-71458330.

jsmeix commented at 2015-01-27 09:03:

Out of curiosity I made on my test system an additional normal data partition (/dev/sda7) with a FAT filesystem on it.

In 23_filesystem_layout.sh there is

  (vfat)
  ...
  label=$(dosfslabel $device | tail -1)

and in 13_include_filesystem_code.sh there is

  (vfat)
  ...
  echo "dosfslabel $device $label >&2" >> "$LAYOUT_CODE"

gdha commented at 2015-01-27 11:00:

was the FAT partition mounted? If yes, then you found a defect.

On Tue, Jan 27, 2015 at 10:03 AM, Johannes Meixner <notifications@github.com

wrote:

Out of curiosity I made on my test system a normal partition (/dev/sdXn)
with a FAT filesystem on it.


Reply to this email directly or view it on GitHub
https://github.com/rear/rear/issues/531#issuecomment-71612242.

jsmeix commented at 2015-01-27 11:21:

It is mounted.

On the original system (it is the SLES12 system that I used in https://github.com/rear/rear/issues/533 for testing MD stuff where I just out of play instinct used the last remaining cylinder of the disk as separate partition and because it is so small I used FAT on it):

f143:~ # mount -t ext2,ext4,vfat
/dev/md0 on / type ext4 (rw,relatime,stripe=16,data=ordered)
/dev/sda7 on /remainder type vfat (rw,nosuid,nodev,noexec,relatime,gid=100,fmask=0002,dmask=0002,allow_utime=0020,codepage=437,iocharset=iso8859-1,shortname=mixed,utf8,errors=remount-ro)
/dev/sda2 on /boot type ext2 (rw,relatime)
f143:~ # mdadm --detail /dev/md0
/dev/md0:
        Version : 1.0
  Creation Time : Mon Jan 26 13:21:59 2015
     Raid Level : raid0
     Array Size : 20945856 (19.98 GiB 21.45 GB)
   Raid Devices : 2
  Total Devices : 2
    Persistence : Superblock is persistent
    Update Time : Mon Jan 26 13:21:59 2015
          State : clean 
 Active Devices : 2
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 0
     Chunk Size : 32K
           Name : any:0
           UUID : d83d58d6:10642523:9803931f:4969ea21
         Events : 0
    Number   Major   Minor   RaidDevice State
       0       8        5        0      active sync   /dev/sda5
       1       8        6        1      active sync   /dev/sda6
f143:~ # fdisk -l -u=cylinders
Disk /dev/sda: 24 GiB, 25769803776 bytes, 50331648 sectors
Geometry: 255 heads, 63 sectors/track, 3133 cylinders
Units: cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: dos
Disk identifier: 0x000edd6b
Device     Boot Start   End Cylinders Size Id Type
/dev/sda1           1   393       392   3G 82 Linux swap / Solaris
/dev/sda2  *      393   524       132   1G 83 Linux
/dev/sda3         524  3134      2610  20G  f W95 Ext'd (LBA)
/dev/sda5         524  1827      1304  10G fd Linux raid autodetect
/dev/sda6        1828  3131      1304  10G fd Linux raid autodetect
/dev/sda7        3132  3132         1   7M  c W95 FAT32 (LBA)
Disk /dev/md0: 20 GiB, 21448556544 bytes, 41891712 sectors
Geometry: 2 heads, 4 sectors/track, 5236464 cylinders
Units: cylinders of 8 * 512 = 4096 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 32768 bytes / 65536 bytes

gdha commented at 2015-01-27 11:30:

Just added dosfslabel to the conf/GNU/Linux.conf file which should fix your vfat dosfslabel issue

gdha commented at 2015-01-28 16:15:

@jsmeix can this issue be closed?

jsmeix commented at 2015-01-29 10:00:

@gdha
I am sorry. I misused this issue when I found "dosfslabel" is missing in the recovery system. I should have submitted a new separated issue for "dosfslabel". The "dosfslabel" part of this isuse is fixed now.

This issue is initially about "on Fedora 21: systemd at 100% CPU (/sbin/udevd missing)". In my last test with Fedora 21 it still happened so that I think the initial issue is not yet fixed.

gdha commented at 2015-01-30 15:56:

Guess we can add some more lines in here to remove the obsolete udev.service

2015-01-30 14:55:25 Including build/GNU/Linux/61_verify_and_adjust_udev_systemd.sh
+ . /usr/share/rear/build/GNU/Linux/61_verify_and_adjust_udev_systemd.sh
++ test -d /tmp/rear.Vd25lI6NQG7yU5K/rootfs/usr/lib/systemd/system
++ Log 'Cleaning up systemd udev socket files'
++ test 1 -gt 0
+++ Stamp
+++ date '+%Y-%m-%d %H:%M:%S '
++ echo '2015-01-30 14:55:25 Cleaning up systemd udev socket files'
2015-01-30 14:55:25 Cleaning up systemd udev socket files
++ my_udev_files=($(find $ROOTFS_DIR/usr/lib/systemd/system/sockets.target.wants -type l -name "*udev*"  -printf "%P\n"))
+++ find /tmp/rear.Vd25lI6NQG7yU5K/rootfs/usr/lib/systemd/system/sockets.target.wants -type l -name '*udev*' -printf '%P\n'
++ for m in '"${my_udev_files[@]}"'
++ [[ ! -h /lib/systemd/system/sockets.target.wants/systemd-udevd-control.socket ]]
++ for m in '"${my_udev_files[@]}"'
++ [[ ! -h /lib/systemd/system/sockets.target.wants/systemd-udevd-kernel.socket ]]
++ for m in '"${my_udev_files[@]}"'
++ [[ ! -h /lib/systemd/system/sockets.target.wants/udev-control.socket ]]
++ [[ ! -h /usr/lib/systemd/system/sockets.target.wants/udev-control.socket ]]
++ rm -f /tmp/rear.Vd25lI6NQG7yU5K/rootfs/usr/lib/systemd/system/sockets.target.wants/udev-control.socket
++ for m in '"${my_udev_files[@]}"'
++ [[ ! -h /lib/systemd/system/sockets.target.wants/udev-kernel.socket ]]
++ [[ ! -h /usr/lib/systemd/system/sockets.target.wants/udev-kernel.socket ]]
++ rm -f /tmp/rear.Vd25lI6NQG7yU5K/rootfs/usr/lib/systemd/system/sockets.target.wants/udev-kernel.socket

gdha commented at 2015-02-06 13:22:

Did a test and now we have a sane CPU usage again! Fixed.

gdha commented at 2015-02-23 13:49:

added to the release notes so we can close this issue


[Export of Github issue for rear/rear.]