Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set extra=forbid for Pydantic Schema Fields #1865

Closed
wants to merge 3 commits into from

Conversation

ytausch
Copy link
Contributor

@ytausch ytausch commented Mar 15, 2024

The Pydantic model currently allows additional fields for BuildPlatform, OSVersion, ProviderType, which does not make a lot of sense and can hide errors IMO.

(This PR previously contained changed related to bot.inspection which were addressed separately.)

@ytausch ytausch requested a review from a team as a code owner March 15, 2024 15:38
@beckermr
Copy link
Member

@isuruf didnt we have os version for a reason?

@beckermr
Copy link
Member

beckermr commented Mar 15, 2024

We allow false for bot inspection: https://github.com/regro/cf-scripts/blob/master/conda_forge_tick/migrators/version.py#L316

It turns off all hints of any kind from the bot.

Copy link
Member

@beckermr beckermr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't get why we are excluding configs for these keys. They are allowed by the code in smithy as far as I know.

See eg https://github.com/conda-forge/conda-smithy/blob/main/conda_smithy/configure_feedstock.py#L818

We cannot deprecate and remove them without first changing smithy to deprecate them, doing an admin migration over the feedstocks, and then finally remove them.

Those removal cycles are expensive.

@isuruf
Copy link
Member

isuruf commented Mar 15, 2024

extra=forbid is wrong for all of these. I'd like to support new platforms without changing conda-smithy.

@ytausch
Copy link
Contributor Author

ytausch commented Mar 15, 2024

Why is it needed to support new platforms without changing conda-smithy? My point is:

  • The documentation (which is generated from the schema) should always be up-to-date
  • There are few feedstocks today that use an invalid build platform, which is ignored by the conda-smithy linter, causing hidden problems. This should not be the case. I think a typo is much more likely than wanting to use a build platform that has never been mentioned in the code and documentation here.

@isuruf

@ytausch
Copy link
Contributor Author

ytausch commented Mar 18, 2024

We allow false for bot inspection: https://github.com/regro/cf-scripts/blob/master/conda_forge_tick/migrators/version.py#L316

It turns off all hints of any kind from the bot.

I see - three thoughts about this:

  1. This behavior is not mentioned in the old documentation and only 11 feedstocks (list below) use it.
  2. I think a disabled option makes a lot more sense than allowing false as an option.
  3. Even if we want to stick to false, the current Pydantic model is wrong since true is NOT supported as an option! In fact, any truthy value which is not one of the supported string options crashes the bot in get_dep_updates_and_hints.

Affected Feedstocks

  • cdsdashboards
  • spyder
  • dustgoggles
  • cx_freeze
  • semi-ate-testers
  • sisl
  • unicorn-binance-suite
  • sepal-ui
  • apsg
  • intake_pattern_catalog
  • requests-cache

Path Forward

I see two possible paths forward here:

  1. Introduce a disabled option, change those 11 feedstocks manually to use it, remove support for false from the bot, raise a proper error in the bot if a non-supported value is used
  2. Leave the false option, change this Pydantic model to Literal[False] instead of bool, raise a proper error in the bot if a non-supported value is used, clearly state in the Pydantic field documentation what false means (this is not currently the case)

I will also open an issue in cf-scripts regarding the bug. It will appear below this comment.

What do you think? @beckermr

@jaimergp
Copy link
Member

I don't get why we are excluding configs for these keys. They are allowed by the code in smithy as far as I know.

The main idea behind excluding by default is to prevent typos from leaking in silently; e.g. tiny mistakes like linxu_64 would result in no-ops otherwise.

@jaimergp
Copy link
Member

I see two possible paths forward here:

Option (2) sounds the cheapest in terms of effort and it's not too inelegant imo!

@jaimergp
Copy link
Member

Also, are we always validating? We could just leave the linter on for validations, but let smithy operate on the raw, unvalidated conda-forge.yml if updating the schema for new keys gets too annoying (I don't think we are adding new keys that often, fwiw, but no strong feelings).

@beckermr
Copy link
Member

The main idea behind excluding by default is to prevent typos from leaking in silently; e.g. tiny mistakes like linxu_64 would result in no-ops otherwise.

We should make the model match what the code accepts. The code is the source of truth.

@ytausch
Copy link
Contributor Author

ytausch commented Mar 18, 2024

I see two possible paths forward here:

Option (2) sounds the cheapest in terms of effort and it's not too inelegant imo!

We're already working on (1), see regro/cf-scripts#2272

@xhochy
Copy link
Member

xhochy commented Mar 18, 2024

As simple as these changes sound, it would be better to have each of them in a separate PR, as each of them will be a separate discussion.

For the bot inspection, as there is already work over at regro/cf-scripts#2272, we could also continue the discussion there. I'm happy with that, so I won't comment there immediately.

Also, are we always validating? We could just leave the linter on for validations, but let smithy operate on the raw, unvalidated conda-forge.yml if updating the schema for new keys gets too annoying (I don't think we are adding new keys that often, fwiw, but no strong feelings).

I would very much like that. There is a lot of value in the lints (and them being more strict). At least from all the errors, I had in the past, it would have saved me a lot of time.

extra=forbid is wrong for all of these. I'd like to support new platforms without changing conda-smithy.

@isuruf Can we even support new platforms without touching conda-smithy at all? Probably yes, as it would generate the matching .ci_support files already? Thus, would you be happy if we switched to lint-only and had a warning for a while?

Personally, I see great value in validating the schema and thus being a bit stricter than the code itself. When I'm building new extensions/platforms, I would be OK with getting some warnings as I would already be in uncharted territories.

@isuruf
Copy link
Member

isuruf commented Mar 18, 2024

@isuruf Can we even support new platforms without touching conda-smithy at all? Probably yes, as it would generate the matching .ci_support files already? Thus, would you be happy if we switched to lint-only and had a warning for a while?

Yes, we can support new platforms without touching conda-smithy. I would be happy with just a warning.

@isuruf
Copy link
Member

isuruf commented Mar 18, 2024

I recently fixed a small bug in conda-build to support new platforms without explicitly adding the new platforms. Works just fine afterwards.

@ytausch
Copy link
Contributor Author

ytausch commented Mar 18, 2024

I readded the option that bot.inspection can be False (bool is clearly wrong). From my side, this is ready now if we accept the stricter schema.
regro/cf-scripts#2272 will eventually lead to replacing False with disabled.

@ytausch ytausch requested a review from beckermr March 18, 2024 17:58
Copy link
Member

@beckermr beckermr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are unresolved issues about extra=forbid that need to be resolved before we can merge if I understand the thread correctly.

@ytausch ytausch requested a review from beckermr March 25, 2024 10:50
@ytausch
Copy link
Contributor Author

ytausch commented Mar 25, 2024

@beckermr As far as I am concerned, the correct behavior for validation errors was now introduced with #1885. So I don't see anything to add here. Is that correct?

@beckermr
Copy link
Member

@isuruf Are you ok keeping the extra forbid settings here now that smithy has switched to warning on validation errors?

I couldn't tell from the discussion above.

@isuruf
Copy link
Member

isuruf commented Mar 25, 2024

No, the linter is still failing. We have to move these specific errors to warnings.

@ytausch
Copy link
Contributor Author

ytausch commented Mar 25, 2024

(please disregard a previous comment that was here)
You mean the specific errors for extra keys? I can check if that's possible.

@isuruf
Copy link
Member

isuruf commented Mar 25, 2024

Why does the linter need to go through?

To make automerging go through.

@ytausch
Copy link
Contributor Author

ytausch commented Mar 25, 2024

Okay, so my problem currently is: We could add some special case handling to here

lints = []
hints = []
for error in validator.iter_errors(config):
if isinstance(error, DeprecatedFieldWarning):
hints.append(error)
else:
lints.append(error)

catching errors related to those 3 extra fields (that's possible by checking error.validator == "additionalProperties" and error.jsonPath). But this would break the conformity of JSON files that the linter accepts as compared with the Pydantic schema, which is available to users and exported by the conda-smithy conda package (we want to use it in cf-scripts, for example).

This would be pretty bad API design if you ask me. Providing a schema but requiring the user to ignore schema validation errors in some weird edge cases. All conda-forge.yml files in conda-forge should conform to the Pydantic schema, whatever it might look like.

I still did not understand your use case @isuruf that needs the autotickbot to auto-merge when playing around with new platforms that are not yet added to conda-smithy - but I also don't have any experience with adding new platforms to conda-forge. Isn't it enough that you can run commands like rerender (that only warn you) before actually adding the platform to here? I would be pleased if you could elaborate on that and don't see why it should be even possible to add a feedstock that contains a new platform to conda-forge without saying "this platform now exists" here in this repo.

There should be a schema, as strict as reasonably possible, that every feedstock in conda-forge conforms to. This would have prevented a lot of headaches I ran into while working on the autotickbot.

@ytausch ytausch changed the title Minor Pydantic Model Fixes for conda-forge.yml Minor Pydantic Model Fixes for conda-forge.yml (extra=forbid) Mar 25, 2024
@jaimergp
Copy link
Member

I mostly agree with @ytausch. The point of the extra=forbid bits is to prevent accidental typos from happening, and imo that's an important point of failure (we fixed a few feedstocks that had been carrying some of those for years).

I don't think we are adding new platforms so often (the two last ones have been osx-arm64 and win-arm64 and I don't think we are expecting any new ones) to make this UX issue acceptable. I do get that there's a bootstrapping issue when adding new platforms and this would remove one step from the "add new platform" checklist.

To make automerging go through.

Could we configure automerge to ignore (certain) linter errors instead? (certain) being either specific type of error (extra keys) or a specific type of PR (opened by conda-forge bots == automated).

@beckermr
Copy link
Member

Automerge relies on the GitHub status / checks apis. Getting it to ignore a subset of linter errors is not at all obviously easy to do and sounds like a pain TBH.

@isuruf
Copy link
Member

isuruf commented Mar 26, 2024

I don't understand why people always want to support only their use-case and break other use-cases when there are compromises that can support both (given minor inconveniences to both).

@ytausch
Copy link
Contributor Author

ytausch commented Mar 26, 2024

I don't think we are adding new platforms so often (the two last ones have been osx-arm64 and win-arm64 and I don't think we are expecting any new ones)

Automerge relies on the GitHub status / checks apis. Getting it to ignore a subset of linter errors is not at all obviously easy to do and sounds like a pain TBH.

All of this sounds to me like failing the linter on new platforms (as it's currently implemented in this PR) is simply the option with the best tradeoff between extensibility, schema consistency, and implementation complexity. The amount of work required to add a new platform to here should be manageable, isn't it?

@ytausch
Copy link
Contributor Author

ytausch commented Mar 26, 2024

I don't understand why people always want to support only their use-case and break other use-cases when there are compromises that can support both (given minor inconveniences to both).

Which compromise are you proposing?

@ytausch ytausch changed the title Minor Pydantic Model Fixes for conda-forge.yml (extra=forbid) Set extra=forbid for Pydantic Schema Fields Mar 26, 2024
@isuruf
Copy link
Member

isuruf commented Mar 26, 2024

Which compromise are you proposing?

Adding special casing in #1865 (comment) and use conda_smithy.validate_schema.validate_json_schema as the source of truth about hard vs soft errors in downstream code.

@ytausch
Copy link
Contributor Author

ytausch commented Mar 26, 2024

Which compromise are you proposing?

Adding special casing in #1865 (comment) and use conda_smithy.validate_schema.validate_json_schema as the source of truth about hard vs soft errors in downstream code.

Some parts of this solution sound reasonable, yes. And implementing it is probably not the end of the world for cf-scripts if we give your use case high priority. Here are my 2 reasons why I still prefer not doing it this way (I hope I don't repeat myself, my personal weight estimate in parenthesis):

  1. Complexity. Instead of giving downstream users a JSON schema or a Pydantic model they can use in a flexible way, we force them into calling a custom method for schema validation, which is also not very extensible. For example, with Pydantic support, I can use the conda-forge.yml schema very easily as a subfield of another model. This is done in cf-scripts, for example. (weak to medium weight)
  2. Pydantic Model. The benefit of having a Pydantic model is not only the "schema validation" itself but also having type-safe access to the model fields after deserialization. Downgrading the Pydantic model to a means of generating JSON schema files which are thereafter processed by custom validation logic completely prevents us from using those features of Pydantic.
    A lot of conda-forge code uses dictionary-style JSON manipulation (like config["bot"]["inspection"]) which always has suboptimal type safety, even if done properly. Pydantic-style code like config.bot.inspection removes redundant edge case checking by providing defaults (does config even have a bot field or is it None?), prevents nonconformant data from interacting with the codebase without duplicate type definitions, and also has better IDE support out of the box (Questions like "Where is the bot.inspection field used in the codebase?" are currently a huge pain to answer, at least they were for me while working on cf-scripts).
    I see a future conda-forge making more use of Pydantic, what I think we should not prevent with this change (strong weight).

In my eyes, avoiding a single simple PR for maybe one additional platform per year does not outweigh these drawbacks. However, we need to come to a solution.

@isuruf
Copy link
Member

isuruf commented Mar 26, 2024

In light of #1893 please update your answer. If you have a use-case other than the bot, please say so. Otherwise please don't break my workflows.

I certainly don't understand your insistence on make your use-case higher priority over other peoples' workflows.
I'll refrain from commenting on this further.

@ytausch
Copy link
Contributor Author

ytausch commented Mar 26, 2024

In light of #1893 please update your answer.

I don't see how #1893 changes the situation. The bot uses the entirety of conda-forge.yml, not only the bot section. My example above could have also been with provider.linux.

If you have a use-case other than the bot, please say so. Otherwise please don't break my workflows.

The other use case I have in mind is:

  • conda-smithy would also profit from more type-safe access to conda-forge.yml, and could facilitate its own Pydantic model for it pretty easily.
  • Otherwise, everyone else who has to read a conda-forge.yml file (that's clear, of course)

I certainly don't understand your insistence on make your use-case higher priority over other peoples' workflows.
I'll refrain from commenting on this further.

I am sorry if I gave you a reason for not commenting further. Maybe other maintainers can help us resolving this argument?

@beckermr
Copy link
Member

I am sorry if I gave you a reason for not commenting further. Maybe other maintainers can help us resolving this argument?

Thanks for this comment @ytausch. Indeed we are at a bit of an impasse. I do not think other maintainers are going to help resolve this beyond suggesting additional technical approaches. We generally work by consensus and we have not reached it here.

@jaimergp
Copy link
Member

What about having two possible schemas? Something like:

  • conda-forge.strict.schema.json: restricts platforms
  • conda-forge.schema.json: allows extra platforms

Then folks can pick which one to use for each purpose. The linter would still use the flexible schema so it wouldn't error out on new platforms, but the bot could use the strict one if that really helps with something. Would that help achieve some consensus?

@beckermr
Copy link
Member

I'd prefer the autotick bot to use the less-strict schema as well too. The conda-forge ecosystem is very fluid and the bot reporting strict validation errors is likely more noise than signal.

@ytausch
Copy link
Contributor Author

ytausch commented Mar 27, 2024

So how about this option then: We reject this PR and leave the schema as-is, but introduce custom hint code that detects unknown platforms? I still don't really like this but I think we are able to find a consensus about it.

@beckermr
Copy link
Member

I am ok with that. @isuruf?

@isuruf
Copy link
Member

isuruf commented Mar 27, 2024

Sure. I'm okay with that. I'm also okay with your suggestion #1865 (comment).

@ytausch
Copy link
Contributor Author

ytausch commented Apr 2, 2024

Superseded by #1900.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants