Skip to content

model: Add a new field for authors#3627

Merged
sschuberth merged 2 commits into
oss-review-toolkit:masterfrom
boschglobal:add-declared-contributors
Feb 18, 2021
Merged

model: Add a new field for authors#3627
sschuberth merged 2 commits into
oss-review-toolkit:masterfrom
boschglobal:add-declared-contributors

Conversation

@bs-ondem

@bs-ondem bs-ondem commented Feb 11, 2021

Copy link
Copy Markdown
Member

This is a preparation for solving #3400 and making copyright holder
curations possible. A new field declaredAuthors is added to
Package, PackageCuration and Project in order to be able to track
fields like authors from various package managers. These changes
don't include the adaptation of the package managers.

@sschuberth

Copy link
Copy Markdown
Member

authors/developers/contributrs

"contributors"

@sschuberth

sschuberth commented Feb 11, 2021

Copy link
Copy Markdown
Member

And please rebase @bs-ondem to resolve a conflict.

Comment thread model/src/main/kotlin/Package.kt Outdated
* TODO: Remove default value after upgrading all package manager implementations.
*/
@JsonInclude(JsonInclude.Include.NON_NULL)
val declaredContributors: SortedSet<String>? = null,

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We try to avoid nullable properties in the model, in this case an empty set is just as good as null and makes the property easier to use.
Instead of adding a default value here, you could already pass an empty set in all package managers and add the TODOs there, this would make it easier to track for which package managers this still needs to be implemented.
Would it make sense to capture this for projects as well? At least for some package managers like Maven we get the same metadata for projects as for packages. It might not be needed for copyright curations, but it would be more consistent.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added the declaredContributors field also in the Project class and my intention is to track also the contributors for the projects as well, even if this is not required to solve the issue #3400.

@bs-ondem bs-ondem force-pushed the add-declared-contributors branch 2 times, most recently from 063309a to 9f62ce8 Compare February 11, 2021 17:15
tsteenbe
tsteenbe previously requested changes Feb 11, 2021

@tsteenbe tsteenbe left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sschuberth @bs-ondem We collect this information to do notices generation to show who owns the code and under which license they distributed it. Contributors are often not from a legal point of view authors/code owner as they signed a copyright assignment e.g. this is why most ORT code is 'HERE Europe B.V.' and/or 'Bosch.IO GmbH'.

Author is a term used in copyright law see the copyright law of the united states, June 2020 - it contains 622 matches on 'author' and 0 matches on 'contributor' which is exactly why most package managers I mention in my #3400 comment use author as a field name

Propose to rename declaredContributors to declaredAuthors if your intent is to store the under copyright law recognized authors/owner of the code for use in notice generation.

Furthermore I would update the comments referring to 'authors/developers/contributors' to just mention authors. Why? Some package manager such as npm has both author and contributors fields see 1, also Maven has developers/contributors see 2. My understanding of this field is that we only want to capture code owners and do not plan to capture any of the other people fields as part of #3400.

@sschuberth

sschuberth commented Feb 12, 2021

Copy link
Copy Markdown
Member

Maven has developers/contributors

@tsteenbe, would it then be fine in your view to populate declaredAuthors from Maven's developers? Edit: And should the "declared author" be the "developer's" name, or the organization? Maybe take the organization if present, and fall back to the author?

And "contributors" we should generally ignore then, correct?

@bs-ondem

Copy link
Copy Markdown
Member Author

@sschuberth In case of Maven: Do you mean the organization of the project or the organization of each developer, or both? Because in Maven it's possible to specify the organization of both.

@tsteenbe Thanks for the clarification. I will rename it to declaredAuthors accordingly.

@sschuberth

sschuberth commented Feb 12, 2021

Copy link
Copy Markdown
Member

Do you mean the organization of the project or the organization of each developer, or both?

Good question. I was referring to developer -> organization only, but now think that we should have both and also capture organization -> name as an author. @tsteenbe?

@bs-ondem bs-ondem force-pushed the add-declared-contributors branch from 92ca1ec to c90974e Compare February 12, 2021 13:40
@bs-ondem bs-ondem changed the title model: Add a new field for declared contributors model: Add a new field for declared authors Feb 12, 2021
@tsteenbe

tsteenbe commented Feb 12, 2021

Copy link
Copy Markdown
Member

@sschuberth to your above questions I propose we do the following:

<organization>
    <name>The Apache Software Foundation</name>
    <url>http://www.apache.org/</url>
</organization>
  • https://maven.apache.org/pom.html#developers capture as author developer->organization if not found for developer capture developer->name under the assumption that a developer working for an org that the org is the copyright holder. Think it's save to assume that if a person made it to be a maintainer of the project he/she/their org has made significant commits to have a copyright on the project.
  • https://maven.apache.org/pom.html#contributors are not maintainers (developers) may in some case if they made a large enough contributions be a copyright holder however I believe adding contributors to declaredAuthors will likely yield a lot of false positives. If an ORT user believes a contributor is a copyright holder then they can use this new declaredAuthors to manual add them ;-)

and then run a de-duplicate to ensure we only have unique authors.

Comment thread analyzer/src/main/kotlin/managers/Bower.kt
@bs-ondem bs-ondem force-pushed the add-declared-contributors branch from c90974e to dc35ac1 Compare February 15, 2021 13:23
oheger-bosch
oheger-bosch previously approved these changes Feb 15, 2021
Comment thread model/src/main/kotlin/Package.kt Outdated
* The list of authors declared for this package.
*/
@JsonInclude(JsonInclude.Include.NON_DEFAULT)
val declaredAuthors: SortedSet<String> = sortedSetOf(),

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe what @mnonnenmacher meant when he said "Instead of adding a default value here, you could already pass an empty set in all package managers" was that you should not only not assign null here, but no default value at all. That way you're forced to initialize the field in all package managers, which IMO was the intent. And that also implies that you should squash the two commits.

Also, as for licenses, we IMO should always serialize the field, even if it's empty.

Finally, I'm a bit unsure about the field ordering. My hunch is that we should keep all license-related fields grouped together, and add the new declaredAuthors between concludedLicense and description. What do you think, @mnonnenmacher?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe what @mnonnenmacher meant when he said "Instead of adding a default value here, you could already pass an empty set in all package managers" was that you should not only not assign null here, but no default value at all. That way you're forced to initialize the field in all package managers, which IMO was the intent.

Introducing this new field without a default value is a rather breaking change because then existing ORT results can no longer be deserialized. This would also impact the data stored in scan result storages. This may be acceptable, but is something we should be aware of.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My bad, I didn't think of deserializing old results. @mnonnenmacher I guess we want / need to maintain backward-compatibility here? But we should still drop @JsonInclude(JsonInclude.Include.NON_DEFAULT), right?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The annotation is not necessary. We put it here for practical reasons because otherwise all tests for analyzers would fail that compare current results against stored yml files. Maybe we could keep the annotation until we have reworked the analyzer implementations to populate the authors field? In the course of this change the test files would have to be adapted anyway.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finally, I'm a bit unsure about the field ordering.

I grouped declaredLicenses and declaredAuthors in order to have to the "declared" things together.

But we should still drop @JsonInclude(JsonInclude.Include.NON_DEFAULT), right?

As @oheger-bosch already mentioned, if we drop the annotation here then we have to adapted all analyzer tests in this PR and again after we have reworked the analyzers. Can we keep the annotation here with a TODO comment e.g. "Can be removed after all package manager implementations have filled the field declaredAuthors accordingly."?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's one more thought: Does it even make sense to call this declaredAuthors instead of just authors after all? While our original idea of naming the field declaredCopyrights made sense to distinguish from detected copyrights, we already figured that this field does not contain full copyright statements in the "detected copyright" sense as it lacks the actual "Copyright (C)" prefix and the year.

So why not treat this as just another metadata field, like e.g. homepageUrl, name it just authors, put it before description, and let later logic consider "authors" as "declared copyright holders" (similar to like we also consider the "homepageUrl" later as a fallback for a VCS URL)?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original plan was to treat author information analogously to declared licenses and allow an analogous curation mechanism. In this context, the name declaredAuthors would make sense.

If we treat authors as regular metadata, would we then still want to curate them? Or would the curation rather apply to the logic that transforms authors to copyright holders? If so, would the curations still be defined in curations.yml or would this be a different mechanism?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please have a look at the PackageCurationData class: We can curate any metadata, no matter whether it start with a "declared" prefix or not. So my point is purely about how to name the field, the original plan and all processing of the field's data stays the same.

My (somewhat rhetorical) question boils down to: If we had a declaredAuthors field, what does "declared" mean here, and how does it semantically differ from just authors, as any metadata in the Package / Project classes is just "declared" (and not detected) metadata. Phrased differently, why is homepageUrl not called declaredHomepageUrl. Well, because "declared" is redundant in this (the analyzer's) context.

Going further off-topic: Consequently, I actually believe we should have named the declaredLicenses fields also just licenses here, and only further tooling that also has to deal with other kinds of licenses should refer to it as declaredLicenses.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But why is declaredAuthors then between declaredLicenses and declaredLicensesProcessed instead of after declaredLicensesProcessed?

Because for me declaredLicensesProcessed is not a field for gathering metadata directly from the package manager but used a processor in order to map the declaredLicenses to SpdxExpression.

Phrased differently, why is homepageUrl not called declaredHomepageUrl. Well, because "declared" is redundant in this (the analyzer's) context.

From this point of view it makes sense to name it "authors" but as you mentioned we have to rename consequently also "declaredLicenses" to "licenses".

But we have to come to a conclusion and therefor I'll try to summarize the suggestions and we should decide on one:

  1. Field name declaredAuthors and put it before declaredLicenses.
  2. Field name declaredAuthors and put it before description.
  3. Field name authors and put it (?) before declaredLicenses or descriptions. In addition, rename declaredLicenses to licenses.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've put this onto the agenda for the next dev meeting to move forward.

Comment thread model/src/main/kotlin/PackageCurationData.kt Outdated
Comment thread model/src/main/kotlin/Project.kt Outdated
Comment thread analyzer/src/main/kotlin/managers/Bower.kt Outdated
Comment thread analyzer/src/main/kotlin/managers/Bower.kt
This is a preparation for solving oss-review-toolkit#3400 and making copyright holder
curations possible. A new field authors is added to Package,
PackageCuration and Project in order to be able to track fields like
authors from various package managers. These changes don't include the
adaptation of the package managers.

Signed-off-by: Onur Demirci <onur.demirci@bosch.io>
To keep track of which package manager needs to be expanded with
authors, initialize all Project/Package objects with an empty set for
authors and add a TODO comment above. The package managers can be
expanded step by step later.

Signed-off-by: Onur Demirci <onur.demirci@bosch.io>
@bs-ondem bs-ondem force-pushed the add-declared-contributors branch from ad65845 to 4f8c184 Compare February 18, 2021 11:58
@bs-ondem bs-ondem changed the title model: Add a new field for declared authors model: Add a new field for authors Feb 18, 2021
@sschuberth sschuberth dismissed stale reviews from tsteenbe and mnonnenmacher February 18, 2021 15:01

Comments addressed.

@sschuberth sschuberth merged commit cf93025 into oss-review-toolkit:master Feb 18, 2021
@sschuberth sschuberth deleted the add-declared-contributors branch February 18, 2021 15:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants