#1427 Issue closed: Align AUTHORS and COPYRIGHT persons in various sources

Labels: enhancement, documentation, cleanup, fixed / solved / done

jsmeix opened issue at 2017-07-21 10:58:

In
https://github.com/rear/rear/commit/fd13be8f1bb091e1d324d35d3be527b34346a38e
I noticed changes of the AUTHORS and COPYRIGHT parts
in doc/rear.8.adoc.

Those changes are missing
in doc/rear.8 and
in usr/sbin/rear and
in the AUTHORS file.

I can adapt those files but I would need to know
what the actually right list of authors is.

schlomo commented at 2017-07-21 12:03:

In my opinion the list of authors contains

  1. those who can write to the rear git repo
  2. those who add themselves to the list via a PR together with a relevant or significant contribution

In any case we give write access to our regular contributors to make it easier for them and to reward their contribution.

What do you think?

jsmeix commented at 2017-07-21 13:03:

For this issue here I had hoped I could get
a "ready-to-use" list of names,
not an abstract definition ;-)

My current main issue here is that the AUTHORS file
content differs compared to doc/rear.8.adoc.

Clearly @gozora "Vladimir Gozora" is missing as author.

But what about other major contributors even without GitHub
write access like @schabrolles "Sebastien Chabrolles"?

In contrast what about long inactive major contributors?

Could it have some legal relevance who is listed as "author"?
I think the only legal relevance regarding license issues is
who contributed and not who is listed as what in which file.

I wonder if it o.k. to list only the current main contributors
as "author" in the current source files to keep the
information who is an "author" usefully up-to-date?

Is the AUTHORS file meant as up-to-date information
for interested users who are current "main persons"?

Or is the AUTHORS file meant as reference of all
persons who had ever contributed something?

According to
https://www.gnu.org/prep/maintain/html_node/Recording-Contributors.html
the latter is meant.

But I wonder if nowadays git history isn't much better
to provide that information.

At least I will not continuously maintain an AUTHORS file
that contains all that information.

Or we "just generate" it using "git log -p --follow" for each file
to get tons of strictly correct but useless information ;-)

schlomo commented at 2017-07-21 13:13:

I see, sorry. And yes, I would be happy to see @schabrolles mentioned there.

Maybe for starters we consolidate all the different locations to a single one (e.g. the AUTHORS file) and reference that from the others? That we we have only one file to maintain. We could then copy this file as a doc file into the RPM/DEB packages to make it also available offline.

jsmeix commented at 2017-07-24 10:18:

I played around how we could generate an AUTHORS file
that is to some reasonable extent in compliance with
https://www.gnu.org/prep/maintain/html_node/Recording-Contributors.html

I think there are two main measures
cf. https://en.wikipedia.org/wiki/Measure_%28mathematics%29
how to measure the "legally significant contribution weight"
of a person:

The simpler one is the number of git commits
which is directly available from within a git checkout via

git shortlog -e -s -n

E.g. this results on my current git checkout:

$ git shortlog -e -s -n | head -n20
   902  Gratien D'haese <gratien.dhaese@gmail.com>
   491  Johannes Meixner <jsmeix@suse.com>
   317  Dag Wieers <dag@wieers.com>
   223  Jeroen Hoekx <jeroen.hoekx@hamok.be>
   198  Schlomo Schapiro <github@schlomo.schapiro.org>
    98  Sebastien Chabrolles <s.chabrolles@fr.ibm.com>
    95  Dag Wieërs <dag@wieers.com>
    45  Vladimir Gozora <c@gozora.sk>
    28  Didac Oliveira <didac@brainupdaters.net>
    27  gozora <c@gozora.sk>
    26  Jeroen Hoekx <jeroen@hoekx.be>
    25  schabrolles <s.chabrolles@fr.ibm.com>
    21  Pavol Domin <pavol.domin@hpe.com>
    21  Schlomo Schapiro <schlomo@schapiro.org>
    19  Florent <florent.rochette@gmail.com>
    19  Yoann Laissus <yoann.laissus@gmail.com>
    16  ProBackup-nl <b1686429@prtnx.com>
    15  Thomas Schumm <thomas.schumm@gmx.de>
    15  rowens275 <rowens@fdrinnovation.com>
    13  Robin Schneider <ypid23@aol.de>

Note that same persons could be listed under
different git author IDs.

This provides a reasonable short overview about
who the main contributors are (or have been in the past)
but it does not tell how much a person really contributed
and it does not tell where a person contributed
so that this alone is insufficient to be in compliance with
https://www.gnu.org/prep/maintain/html_node/Recording-Contributors.html

Therefore an additional more precise measure is needed:

A more complicated measure is needed to be in compliance with
https://www.gnu.org/prep/maintain/html_node/Recording-Contributors.html
at least to some reasonable extent for automated generation
of an AUTHORS file.

Hereby I propose as such a measure to count in each source file
how many lines of code a person had contributed over the time.

I think that "had contributed over the time" is crucial
because as far as I understand
https://www.gnu.org/prep/maintain/html_node/Recording-Contributors.html
the AUTHORS file is meant to contain all authors
from the very beginning and not only some recent ones.

Therefore things like "git blame $source_file" are insufficient.
Instead the whole source file history needs to be considered
i.e. "git log --follow $source_file".

As far as I understand
https://www.gnu.org/prep/maintain/html_node/Legally-Significant.html#Legally-Significant
from that follows that only added code lines count as
"Legally Significant Changes" because when only real existing
code lines count, then removed code lines can no longer count
so that I conclude that only added code lines count.
Accordingly my following script counts only added code lines
and it ignores whitespace changes (because I assume whitespaces
are no real existing code) but it can be easily changed to also count
removed lines and/or whitespace changes (e.g. sometimes
a whitespace change could fix even a severe bug).

Furthermore I skip files that are no regular files
and symbolic links (to avoid things get counted twice).

My script that counts in each source file how many lines of code
a person had contributed over the time:

for src in $( git ls-files ) ; do
    test -f $src || continue
    test -L $src && continue
    echo
    echo $src
    git log --minimal -w -U0 --follow $src \
        | egrep -o '^Author: .*|^\+' \
        | egrep -v '^\+\+' \
        | while read key value ; do
              if test "Author:" = "$key" ; then
                  author="$value"
                  continue
              fi
              echo $author
          done | sort | uniq -c | sort -n -r
done

Some excerpts of its output:

packaging/rpm/rear.spec
    265 Dag Wieërs <dag@wieers.com>
    139 Gratien D'haese <gratien.dhaese@gmail.com>
     39 Johannes Meixner <jsmeix@suse.com>
     11 Dag Wieers <dag@wieers.com>
     10 Thomas Schumm <thomas.schumm@gmx.de>
      3 Masanori Mitsugi <mitsugi@vnet.linux.ibm.com>
      2 Schlomo Schapiro <schlomo.schapiro@zalando.de>
      2 Robin Schneider <ypid23@aol.de>
      2 Florent <florent.rochette@gmail.com>
      2 Didac Oliveira <didac@brainupdaters.net>

usr/sbin/rear
    874 Johannes Meixner <jsmeix@suse.com>
    525 Schlomo Schapiro <github@schlomo.schapiro.org>
    164 Gratien D'haese <gratien.dhaese@gmail.com>
    161 Dag Wieers <dag@wieers.com>
     49 Jeroen Hoekx <jeroen.hoekx@hamok.be>
     32 Dag Wieërs <dag@wieers.com>
     13 Didac Oliveira <didac@brainupdaters.net>
      6 Thomas Schumm <thomas.schumm@gmx.de>
      3 Jeroen Hoekx <jeroen@hoekx.be>

usr/share/rear/backup/BLOCKCLONE/default/400_copy_disk_struct_files.sh
     24 Vladimir Gozora <c@gozora.sk>

usr/share/rear/backup/BLOCKCLONE/default/500_start_clone.sh
     35 Vladimir Gozora <c@gozora.sk>

usr/share/rear/backup/BORG/default/100_get_suffix.sh
     26 gozora <c@gozora.sk>
      7 Vladimir Gozora <c@gozora.sk>
      2 Johannes Meixner <jsmeix@suse.com>

Again same persons could be listed under different git author IDs.

gozora commented at 2017-07-24 15:00:

@jsmeix just an idea ...

What about using github api for getting all contributor data?
Rear contributor statistics can be obtained at https://api.github.com/repos/rear/rear/stats/contributors , but there might be a small problem with parsing JSON :-).

V.

schlomo commented at 2017-07-24 15:07:

I think it is important that we can build ReaR offline:

  1. create a dist tar.gz from a git clone without Internet
  2. create OS packages from the dist tar.gz without git

That way we can be sure to support also older or strange systems (e.g. SLES10 doesn't come with git) and we enable users who must work offline to build from source. In certain strongly validated environments this is necessary.

gozora commented at 2017-07-24 15:14:

Not sure I understand your https://github.com/rear/rear/issues/1427#issuecomment-317453229 @schlomo , aren't we talking here about creation of AUTHORS file?
For me building package is different story ...

V

schlomo commented at 2017-07-24 18:01:

@jsmeix thank you very much for bringing this up. After today's discussion I came to the conclusion that we should indeed change the way how we deal with this. I am preparing a new PR right now which I will put up for discussion. My direction is to remove most of the names from the software and to give much more honor to all the many contributors and all the maintainers who helped ReaR to become the leading DR software.

Thank you @jsmeix also for making a concrete suggestion in #1428 which helped me to understand the critical aspects of the details at that level. Code and specific suggestions really help to bring such a discussion to a clear point where we can take a decision. I would kindly ask you to take down this list of email addresses to protect our contributors as much as we can.

I tried to understand what to do with the copyright notice and came upon https://www.gnu.org/licenses/gpl-faq.en.html#AssignCopyright which makes me wonder if we should have asked all ReaR contributors to assign their copyright to a singular entity. Since we didn't do that I suppose that every contributor actually retains his copyright over his code pieces.

I don't know yet if we should do something about that, maybe somebody else has some more experience in this area.


[Export of Github issue for rear/rear.]