#3251 Issue open: Decide on test matrix of distributions/versions

Labels: discuss / RFC, ReaR Project

pcahyna opened issue at 2024-06-17 16:19:

Moved from https://github.com/rear/rear/pull/3239 .

I am less concerned with builds (they only rarely fail) and more concerned with tests (they fail fairly often, so it is a more pressing issue). We need to pay attention to failing tests, because they may mean either that:

  1. the PR introduces a regression and should not be merged
  2. something changed in the distro that we test against, so the PR can be merged, but we still need to know about it and fix the issue ASAP

In the first case, "if it is broken then eventually somebody will fix it" is not the right approach. The PR author needs to fix it, and not "eventually", but before the PR is merged.

For this we need the test to be not flaky. "continuously failing building and testing against unknown Linux distro versions" is not even the worst case (if it continuously fails, it is probably a case of point 2.) . The worst case is that tests fail randomly. If tests fail for random reason too often, we will learn that we should disregard them and cases of point 1. will creep in.

I understand @schlomo 's view

I see value in continuously building and testing against unknown Linux distro versions, because that gives us an early warning for problems that our users might be facing tomorrow.
Yes, there is - and will be - a permanent risk of such builds failing for external, e.g. non-ReaR, reasons. IMHO this is an accepted risk and in the end we still benefit more from the continued validation than what we suffer from the flaky builds. For me this is a trade-off decision where I choose the information provided or confirmation supplied by these builds.

but also @jsmeix 's

When automated tests result too many false positives
those automated tests do more harm than good.

The question is: should we test on development versions of distros (in our case, Fedora Rawhide for now)? We will catch incompatibilities with updates to the distro, but won't the occasional false positives due to unrelated breakage in the distro hide true test failures and lead us to ignoring them?

schlomo commented at 2024-06-17 16:35:

@pcahyna thank you and good catch:

The question is: should we test on development versions of distros (in our case, Fedora Rawhide for now)? We will catch incompatibilities with updates to the distro, but won't the occasional false positives due to unrelated breakage in the distro hide true test failures and lead us to ignoring them?

I'm totally fine with not doing anything on development versions of Linux distros and rather solve problems when we see them on the released versions. In any case I don't expect a distro to fix something because ReaR stopped working.

BTW, #2972 was a nice example of an upstream problem that I found where maybe (at least I like to believe that) we were helpful in getting it fixed faster - because our CI found it and I followed up on it and it turned out to be an upstream bug.

pcahyna commented at 2024-06-17 16:50:

@schlomo well, https://github.com/rear/rear/issues/2972 contradicts your "I don't expect a distro to fix something because ReaR stopped working". If you mean "only because ReaR stopped working and nothing else", even if the distro does not fix the incompatible change that breaks our use, we still need to develop a fix on our side, so we need to know about it.

As a compromise, I think I'll enable testing on fedora-branched ( https://packit.dev/docs/configuration#aliases ). This includes all supported stable versions, as well as the version that was branched from the development branch (rawhide) and is undergoing stabilization in order to become the next stable version. Testing on it will hopefully allow us enough time to catch issues due to distribution changes before the stable distribution version release, while avoiding most of the breakage associated with Rawhide.

(This is for longer term; in the short term I will disable even the F40 stable version though due to a known but incompletely understood test failure as discussed elsewhere.)

schlomo commented at 2024-06-17 17:02:

yes @pcahyna that is indeed an anti-example. I was thinking if I should change ReaR to not use the broken cp option, but then I decided that I first want to try reporting it upstream and simply waiting for upstream to fix it. Which happened in the end. If this would have been for a release that had to go out then I would indeed have changed ReaR as a workaround (and reverted that for the next release).

jsmeix commented at 2024-06-18 04:32:

Clarification of my unclear wording:
With
"continuously failing building and testing
against unknown Linux distro versions"
I actually meant the worst case
that continuously this or that tests fail randomly
so we are continuously hit by various false alarms.
I did not mean that specific tests fail continuously.

jsmeix commented at 2024-06-18 05:01:

Regarding

We need to pay attention to failing tests, because they may mean
...
2. something changed in the distro that we test against,
   so the PR can be merged, but we still need to know about it
   and fix the issue ASAP

"Tertium datur!" ;-)

When a test for a specific distribution fails
for a longer time (i.e. considered "continuously"),
it my not mean we have to fix something in ReaR.

When that specific distribution is not a stable release
but something that changes, then a test that fails
for a longer time could be a bug in that distribution
which will be eventually fixed by that distribution,
e.g. as in https://github.com/rear/rear/issues/2972

Only when this bug in that distribution is not fixed
when that distribution is released, then we may think
about implementing a workaround in ReaR - but this
requires that there is someone who uses that distribution
and who contributes the workaround - i.e. as in
https://github.com/rear/rear/issues/3251#issuecomment-2173907303

Because of my above reasoning I still think
https://github.com/rear/rear/pull/3239#issuecomment-2173626556

I would prefer - if possible - is that our CI tests
run in stable environments that are under our control

i.e. theat we run the CI tests only on (stable) released
Linux distribution versions, same as @schlomo wrote in
https://github.com/rear/rear/issues/3251#issuecomment-2173857729

I'm totally fine with not doing anything
on development versions of Linux distros and
rather solve problems when we see them
on the released versions.

On the other hand I fully understand when @pcahyna
would like to test on development versions of
the distro he personally cares about (Fedora Rawhide)
because he may like to know early when there are issues
in Fedora Rawhide that affect ReaR, e.g. to avoid early
that such issues later hit Red Hat customers who use RHEL.

The crucial difference here is that
when @pcahyna enables tests on Fedora Rawhide
and when he cares about "his" test results
then all is well and I really appreciate it
because test failures on Fedora Rawhide may
indicate early a generic issue in ReaR,
e.g. because of a newer version of some software
as in https://github.com/rear/rear/issues/3021

In contrast when we enable tests on various
development versions of Linux distros
but there is no dedicated person (maintainer)
who cares about those test results,
then at least I feel somewhat annoyed by
"randomly failing tests" and all I could do
in this case with reasonable effort is to
simply ignore particular failing tests
(i.e. tests where the root cause of the failure
is not obvious to me with reasonable effort).

pcahyna commented at 2024-06-18 09:33:

@jsmeix

"Tertium datur!" ;-)

When a test for a specific distribution fails
for a longer time (i.e. considered "continuously"),
it my not mean we have to fix something in ReaR.

When that specific distribution is not a stable release
but something that changes, then a test that fails
for a longer time could be a bug in that distribution
which will be eventually fixed by that distribution,
e.g. as in https://github.com/rear/rear/issues/2972

Don't rely on "bug will be eventually fixed" without our action. In many cases, we will need to actively report the bug on order to get it fixed, otherwise nobody will care. This is especially the case for obscure features of otherwise standard software that do not affect anyone else if they get broken. So, even if we don't implement a workaround in ReaR, we still need to know about the problem in order to get the maintainers fix it.

The crucial difference here is that
when @pcahyna enables tests on Fedora Rawhide
and when he cares about "his" test results
then all is well and I really appreciate it
because test failures on Fedora Rawhide may
indicate early a generic issue in ReaR,
e.g. because of a newer version of some software
as in https://github.com/rear/rear/issues/3021

The main question is, if a test on rawhide fails on a PR, what will you (or other maintainers) do? Debug the cause of the test failure? Ask me to debug it and wait with merging until the cause is known? Or just merge the PR despite the test failing? In the third case, won't that lead you to gradually ignore all test failures, even when they occur on stable versions where the likelihood of the failure being a real bug is much higher?

If there is a risk that the third case would apply, I prefer not testing on rawhide (see https://github.com/rear/rear/issues/3251#issuecomment-2173885768 for a compromise solution that would IMO bring lots of the benefits without most of the drawbacks).

jsmeix commented at 2024-06-18 10:31:

@pcahyna
regarding the first part:
Simply put in practice I can only care about (open)SUSE
because I have basically no time for other distributions.

Regarding the second part:
I only meant that you could enable tests on whatever
distribution you like provided you care about its
test results. I did not mean you should enable
tests on Fedora Rawhide - but you could if you
care about its test results.

In general:
I am against CI tests on a Linux distribution
(regardless how many users it has "out there")
when we have noone here at ReaR upstream
who actively cares about that Linux distribution,
cf. what I wrote about Ubuntu and Canonical in
https://github.com/rear/rear/pull/3239#issuecomment-2175051154

Bottom line from my point of view:
It is not about what would be good to do or good to have.
It is about what we can get done with reasonable effort.

E.g. since about one year I wished a new ReaR release
could get done - but see how long we need in practice
for such a very basic task to get it actually done.
So all what might delay us getting our basic tasks
done should be avoided as far as possible.

pcahyna commented at 2024-06-18 14:59:

@jsmeix

I only meant that you could enable tests on whatever
distribution you like provided you care about its
test results. I did not mean you should enable
tests on Fedora Rawhide - but you could if you
care about its test results.

I will care about the CI results - I already do.

Even then it entails waiting with PR merges on me debugging CI failures sometimes - is that acceptable for you?

Are you willing to have a quick look at CI test failures on your PRs yourself first, please? From time to time there will be actual bugs introduced and initial investigation by the PR author would save me some work. Just as you did in https://github.com/rear/rear/pull/3128#issuecomment-1892136367 where you correctly identified that the failures are something to investigate in more detail.

I understand that you may not have the time for this, but then I will not be able to enable rawhide tests, as the load would be too high.

jsmeix commented at 2024-06-19 05:55:

@pcahyna
how I behaved at the time of
https://github.com/rear/rear/pull/3128#issuecomment-1892136367
versus how I behave now regarding CI test failures
explains perfectly well what I mean with
https://github.com/rear/rear/pull/3239#issuecomment-2154754393

When automated tests result too many false positives
those automated tests do more harm than good.

I think more than one or two percent false positives
is already when it does more harm than good.

When for each pull request 10 automated tests are run
and each on has 98% reliability, then the overall
reliability is only 82% (0.98 ^ 10 = 0.817)
so every 5th CI failure is a false positive.

To have 98% overall reliability (for 10 CI tests)
each one must have 99.8% reliability
(0.998 ^ 10 = 0.980).

With "reliability" I mean that a failing CI test
correctly indentifies an issue in our own code.

At the time of
https://github.com/rear/rear/pull/3128#issuecomment-1892136367
the automated tests were rather new for me
so I cared and paid attention to failing tests
because a failed CI thing was interesting.

Meanwhile for any pull requests there are basically
always some CI things that fail (regardless what kind
of CI thing fails - basically always something fails).

So I became rather accustomed to failed CI things
and because most failed CI things were not caused
by the actual code changes, failed CI things became
no longer interesting but "just some normal noise"
that one better "simply ignores".

Actually when something is mostly only noise
one must ignore the noise to be able to still focus
on what one actually wants to get done - otherwise
if one would listen to noise (pay attention to noise)
one could not get done what one actually wants to get done.

So my current ignorance of failed CI things is
a necessity or a way out for me so that I still
can get done what I actually want to get done.

pcahyna commented at 2024-06-19 08:19:

@jsmeix exactly, and that's why I think disabling Rawhide tests is a good thing to do despite missing the opportunities to catch important changes or bugs in the distribution early. Hopefully the CI will be all green most of the time then and developers will start paying attention to it again.


[Export of Github issue for rear/rear.]