Set extra=forbid for Pydantic Schema Fields #1865

ytausch · 2024-03-15T15:38:12Z

The Pydantic model currently allows additional fields for BuildPlatform, OSVersion, ProviderType, which does not make a lot of sense and can hide errors IMO.

(This PR previously contained changed related to bot.inspection which were addressed separately.)

beckermr · 2024-03-15T15:44:58Z

@isuruf didnt we have os version for a reason?

beckermr · 2024-03-15T15:48:45Z

We allow false for bot inspection: https://github.com/regro/cf-scripts/blob/master/conda_forge_tick/migrators/version.py#L316

It turns off all hints of any kind from the bot.

beckermr

I don't get why we are excluding configs for these keys. They are allowed by the code in smithy as far as I know.

See eg https://github.com/conda-forge/conda-smithy/blob/main/conda_smithy/configure_feedstock.py#L818

We cannot deprecate and remove them without first changing smithy to deprecate them, doing an admin migration over the feedstocks, and then finally remove them.

Those removal cycles are expensive.

isuruf · 2024-03-15T15:59:34Z

extra=forbid is wrong for all of these. I'd like to support new platforms without changing conda-smithy.

ytausch · 2024-03-15T16:58:48Z

Why is it needed to support new platforms without changing conda-smithy? My point is:

The documentation (which is generated from the schema) should always be up-to-date
There are few feedstocks today that use an invalid build platform, which is ignored by the conda-smithy linter, causing hidden problems. This should not be the case. I think a typo is much more likely than wanting to use a build platform that has never been mentioned in the code and documentation here.

@isuruf

ytausch · 2024-03-18T08:47:03Z

We allow false for bot inspection: https://github.com/regro/cf-scripts/blob/master/conda_forge_tick/migrators/version.py#L316

It turns off all hints of any kind from the bot.

I see - three thoughts about this:

This behavior is not mentioned in the old documentation and only 11 feedstocks (list below) use it.
I think a disabled option makes a lot more sense than allowing false as an option.
Even if we want to stick to false, the current Pydantic model is wrong since true is NOT supported as an option! In fact, any truthy value which is not one of the supported string options crashes the bot in get_dep_updates_and_hints.

Affected Feedstocks

cdsdashboards
spyder
dustgoggles
cx_freeze
semi-ate-testers
sisl
unicorn-binance-suite
sepal-ui
apsg
intake_pattern_catalog
requests-cache

Path Forward

I see two possible paths forward here:

Introduce a disabled option, change those 11 feedstocks manually to use it, remove support for false from the bot, raise a proper error in the bot if a non-supported value is used
Leave the false option, change this Pydantic model to Literal[False] instead of bool, raise a proper error in the bot if a non-supported value is used, clearly state in the Pydantic field documentation what false means (this is not currently the case)

I will also open an issue in cf-scripts regarding the bug. It will appear below this comment.

What do you think? @beckermr

jaimergp · 2024-03-18T16:14:56Z

I don't get why we are excluding configs for these keys. They are allowed by the code in smithy as far as I know.

The main idea behind excluding by default is to prevent typos from leaking in silently; e.g. tiny mistakes like linxu_64 would result in no-ops otherwise.

jaimergp · 2024-03-18T16:15:56Z

I see two possible paths forward here:

Option (2) sounds the cheapest in terms of effort and it's not too inelegant imo!

jaimergp · 2024-03-18T16:18:38Z

Also, are we always validating? We could just leave the linter on for validations, but let smithy operate on the raw, unvalidated conda-forge.yml if updating the schema for new keys gets too annoying (I don't think we are adding new keys that often, fwiw, but no strong feelings).

beckermr · 2024-03-18T16:46:13Z

The main idea behind excluding by default is to prevent typos from leaking in silently; e.g. tiny mistakes like linxu_64 would result in no-ops otherwise.

We should make the model match what the code accepts. The code is the source of truth.

ytausch · 2024-03-18T17:16:06Z

I see two possible paths forward here:

Option (2) sounds the cheapest in terms of effort and it's not too inelegant imo!

We're already working on (1), see regro/cf-scripts#2272

xhochy · 2024-03-18T17:35:42Z

As simple as these changes sound, it would be better to have each of them in a separate PR, as each of them will be a separate discussion.

For the bot inspection, as there is already work over at regro/cf-scripts#2272, we could also continue the discussion there. I'm happy with that, so I won't comment there immediately.

Also, are we always validating? We could just leave the linter on for validations, but let smithy operate on the raw, unvalidated conda-forge.yml if updating the schema for new keys gets too annoying (I don't think we are adding new keys that often, fwiw, but no strong feelings).

I would very much like that. There is a lot of value in the lints (and them being more strict). At least from all the errors, I had in the past, it would have saved me a lot of time.

extra=forbid is wrong for all of these. I'd like to support new platforms without changing conda-smithy.

@isuruf Can we even support new platforms without touching conda-smithy at all? Probably yes, as it would generate the matching .ci_support files already? Thus, would you be happy if we switched to lint-only and had a warning for a while?

Personally, I see great value in validating the schema and thus being a bit stricter than the code itself. When I'm building new extensions/platforms, I would be OK with getting some warnings as I would already be in uncharted territories.

isuruf · 2024-03-18T17:40:55Z

@isuruf Can we even support new platforms without touching conda-smithy at all? Probably yes, as it would generate the matching .ci_support files already? Thus, would you be happy if we switched to lint-only and had a warning for a while?

Yes, we can support new platforms without touching conda-smithy. I would be happy with just a warning.

isuruf · 2024-03-18T17:42:09Z

I recently fixed a small bug in conda-build to support new platforms without explicitly adding the new platforms. Works just fine afterwards.

ytausch · 2024-03-18T17:56:22Z

I readded the option that bot.inspection can be False (bool is clearly wrong). From my side, this is ready now if we accept the stricter schema.
regro/cf-scripts#2272 will eventually lead to replacing False with disabled.

beckermr

There are unresolved issues about extra=forbid that need to be resolved before we can merge if I understand the thread correctly.

ytausch · 2024-03-25T10:51:53Z

@beckermr As far as I am concerned, the correct behavior for validation errors was now introduced with #1885. So I don't see anything to add here. Is that correct?

beckermr · 2024-03-25T12:47:34Z

@isuruf Are you ok keeping the extra forbid settings here now that smithy has switched to warning on validation errors?

I couldn't tell from the discussion above.

isuruf · 2024-03-25T13:57:32Z

No, the linter is still failing. We have to move these specific errors to warnings.

ytausch · 2024-03-25T14:14:31Z

(please disregard a previous comment that was here)
You mean the specific errors for extra keys? I can check if that's possible.

isuruf · 2024-03-25T14:18:10Z

Why does the linter need to go through?

To make automerging go through.

ytausch · 2024-03-25T14:50:23Z

Okay, so my problem currently is: We could add some special case handling to here

conda-smithy/conda_smithy/validate_schema.py

Lines 50 to 56 in 56750bb

    
           lints = [] 
        
           hints = [] 
        
           for error in validator.iter_errors(config): 
        
               if isinstance(error, DeprecatedFieldWarning): 
        
                   hints.append(error) 
        
               else: 
        
                   lints.append(error)

catching errors related to those 3 extra fields (that's possible by checking error.validator == "additionalProperties" and error.jsonPath). But this would break the conformity of JSON files that the linter accepts as compared with the Pydantic schema, which is available to users and exported by the conda-smithy conda package (we want to use it in cf-scripts, for example).

This would be pretty bad API design if you ask me. Providing a schema but requiring the user to ignore schema validation errors in some weird edge cases. All conda-forge.yml files in conda-forge should conform to the Pydantic schema, whatever it might look like.

I still did not understand your use case @isuruf that needs the autotickbot to auto-merge when playing around with new platforms that are not yet added to conda-smithy - but I also don't have any experience with adding new platforms to conda-forge. Isn't it enough that you can run commands like rerender (that only warn you) before actually adding the platform to here? I would be pleased if you could elaborate on that and don't see why it should be even possible to add a feedstock that contains a new platform to conda-forge without saying "this platform now exists" here in this repo.

There should be a schema, as strict as reasonably possible, that every feedstock in conda-forge conforms to. This would have prevented a lot of headaches I ran into while working on the autotickbot.

jaimergp · 2024-03-26T07:57:44Z

I mostly agree with @ytausch. The point of the extra=forbid bits is to prevent accidental typos from happening, and imo that's an important point of failure (we fixed a few feedstocks that had been carrying some of those for years).

I don't think we are adding new platforms so often (the two last ones have been osx-arm64 and win-arm64 and I don't think we are expecting any new ones) to make this UX issue acceptable. I do get that there's a bootstrapping issue when adding new platforms and this would remove one step from the "add new platform" checklist.

To make automerging go through.

Could we configure automerge to ignore (certain) linter errors instead? (certain) being either specific type of error (extra keys) or a specific type of PR (opened by conda-forge bots == automated).

beckermr · 2024-03-26T11:53:54Z

Automerge relies on the GitHub status / checks apis. Getting it to ignore a subset of linter errors is not at all obviously easy to do and sounds like a pain TBH.

isuruf · 2024-03-26T14:53:00Z

I don't understand why people always want to support only their use-case and break other use-cases when there are compromises that can support both (given minor inconveniences to both).

ytausch · 2024-03-26T14:54:59Z

I don't think we are adding new platforms so often (the two last ones have been osx-arm64 and win-arm64 and I don't think we are expecting any new ones)

Automerge relies on the GitHub status / checks apis. Getting it to ignore a subset of linter errors is not at all obviously easy to do and sounds like a pain TBH.

All of this sounds to me like failing the linter on new platforms (as it's currently implemented in this PR) is simply the option with the best tradeoff between extensibility, schema consistency, and implementation complexity. The amount of work required to add a new platform to here should be manageable, isn't it?

ytausch · 2024-03-26T15:35:57Z

I don't understand why people always want to support only their use-case and break other use-cases when there are compromises that can support both (given minor inconveniences to both).

Which compromise are you proposing?

isuruf · 2024-03-26T16:10:30Z

Which compromise are you proposing?

Adding special casing in #1865 (comment) and use conda_smithy.validate_schema.validate_json_schema as the source of truth about hard vs soft errors in downstream code.

ytausch · 2024-03-26T17:13:24Z

Which compromise are you proposing?

Adding special casing in #1865 (comment) and use conda_smithy.validate_schema.validate_json_schema as the source of truth about hard vs soft errors in downstream code.

Some parts of this solution sound reasonable, yes. And implementing it is probably not the end of the world for cf-scripts if we give your use case high priority. Here are my 2 reasons why I still prefer not doing it this way (I hope I don't repeat myself, my personal weight estimate in parenthesis):

Complexity. Instead of giving downstream users a JSON schema or a Pydantic model they can use in a flexible way, we force them into calling a custom method for schema validation, which is also not very extensible. For example, with Pydantic support, I can use the conda-forge.yml schema very easily as a subfield of another model. This is done in cf-scripts, for example. (weak to medium weight)
Pydantic Model. The benefit of having a Pydantic model is not only the "schema validation" itself but also having type-safe access to the model fields after deserialization. Downgrading the Pydantic model to a means of generating JSON schema files which are thereafter processed by custom validation logic completely prevents us from using those features of Pydantic.
A lot of conda-forge code uses dictionary-style JSON manipulation (like config["bot"]["inspection"]) which always has suboptimal type safety, even if done properly. Pydantic-style code like config.bot.inspection removes redundant edge case checking by providing defaults (does config even have a bot field or is it None?), prevents nonconformant data from interacting with the codebase without duplicate type definitions, and also has better IDE support out of the box (Questions like "Where is the bot.inspection field used in the codebase?" are currently a huge pain to answer, at least they were for me while working on cf-scripts).
I see a future conda-forge making more use of Pydantic, what I think we should not prevent with this change (strong weight).

In my eyes, avoiding a single simple PR for maybe one additional platform per year does not outweigh these drawbacks. However, we need to come to a solution.

isuruf · 2024-03-26T17:21:15Z

In light of #1893 please update your answer. If you have a use-case other than the bot, please say so. Otherwise please don't break my workflows.

I certainly don't understand your insistence on make your use-case higher priority over other peoples' workflows.
I'll refrain from commenting on this further.

ytausch · 2024-03-26T17:47:28Z

In light of #1893 please update your answer.

I don't see how #1893 changes the situation. The bot uses the entirety of conda-forge.yml, not only the bot section. My example above could have also been with provider.linux.

If you have a use-case other than the bot, please say so. Otherwise please don't break my workflows.

The other use case I have in mind is:

conda-smithy would also profit from more type-safe access to conda-forge.yml, and could facilitate its own Pydantic model for it pretty easily.
Otherwise, everyone else who has to read a conda-forge.yml file (that's clear, of course)

I certainly don't understand your insistence on make your use-case higher priority over other peoples' workflows.
I'll refrain from commenting on this further.

I am sorry if I gave you a reason for not commenting further. Maybe other maintainers can help us resolving this argument?

beckermr · 2024-03-26T18:04:57Z

I am sorry if I gave you a reason for not commenting further. Maybe other maintainers can help us resolving this argument?

Thanks for this comment @ytausch. Indeed we are at a bit of an impasse. I do not think other maintainers are going to help resolve this beyond suggesting additional technical approaches. We generally work by consensus and we have not reached it here.

jaimergp · 2024-03-27T15:14:01Z

What about having two possible schemas? Something like:

conda-forge.strict.schema.json: restricts platforms
conda-forge.schema.json: allows extra platforms

Then folks can pick which one to use for each purpose. The linter would still use the flexible schema so it wouldn't error out on new platforms, but the bot could use the strict one if that really helps with something. Would that help achieve some consensus?

beckermr · 2024-03-27T15:21:19Z

I'd prefer the autotick bot to use the less-strict schema as well too. The conda-forge ecosystem is very fluid and the bot reporting strict validation errors is likely more noise than signal.

ytausch · 2024-03-27T15:39:23Z

So how about this option then: We reject this PR and leave the schema as-is, but introduce custom hint code that detects unknown platforms? I still don't really like this but I think we are able to find a consensus about it.

beckermr · 2024-03-27T15:43:55Z

I am ok with that. @isuruf?

isuruf · 2024-03-27T15:45:30Z

Sure. I'm okay with that. I'm also okay with your suggestion #1865 (comment).

ytausch · 2024-04-02T14:31:07Z

Superseded by #1900.

ytausch requested a review from a team as a code owner March 15, 2024 15:38

beckermr requested changes Mar 15, 2024

View reviewed changes

ytausch mentioned this pull request Mar 18, 2024

conda-forge.yml Pydantic Model: Single Values instead of Lists #1869

Closed

This was referenced Mar 18, 2024

Bot Crashes with Meaningless Error Message for Unsupported bot.inspection Value, wrong type annotation regro/cf-scripts#2272

Closed

Add Pydantic Data Model to Document Node Attributes Data Structure regro/cf-scripts#2239

Merged

ytausch requested a review from beckermr March 18, 2024 17:58

beckermr requested changes Mar 18, 2024

View reviewed changes

ytausch force-pushed the pydantic-fixes branch from 926c8a7 to 4a44fd8 Compare March 25, 2024 10:48

ytausch requested a review from beckermr March 25, 2024 10:50

ytausch mentioned this pull request Mar 25, 2024

Rerenders: do not raise on validation errors #1885

Merged

1 task

ytausch changed the title ~~Minor Pydantic Model Fixes for conda-forge.yml~~ Minor Pydantic Model Fixes for conda-forge.yml (extra=forbid) Mar 25, 2024

ytausch added 2 commits March 26, 2024 17:01

disallow extra fields

85ce6b2

small fixes

f4847ab

ytausch force-pushed the pydantic-fixes branch from 03ee598 to f4847ab Compare March 26, 2024 16:05

ytausch changed the title ~~Minor Pydantic Model Fixes for conda-forge.yml (extra=forbid)~~ Set extra=forbid for Pydantic Schema Fields Mar 26, 2024

regenerate schema

0dfc135

ytausch mentioned this pull request Mar 27, 2024

Use an external json schema for botconfig #1894

Draft

ytausch mentioned this pull request Apr 2, 2024

refactor conda-forge.yml linting logic, hint about extra fields #1900

Closed

1 task

ytausch closed this Apr 2, 2024

ytausch mentioned this pull request Nov 22, 2024

Hint about extra fields in conda-forge.yml #1920

Open

1 task

Set extra=forbid for Pydantic Schema Fields #1865

Set extra=forbid for Pydantic Schema Fields #1865

Conversation

ytausch commented Mar 15, 2024 • edited Loading

beckermr commented Mar 15, 2024

beckermr commented Mar 15, 2024 • edited Loading

beckermr left a comment • edited Loading

Choose a reason for hiding this comment

isuruf commented Mar 15, 2024

ytausch commented Mar 15, 2024 • edited Loading

ytausch commented Mar 18, 2024 • edited Loading

Affected Feedstocks

Path Forward

jaimergp commented Mar 18, 2024

jaimergp commented Mar 18, 2024

jaimergp commented Mar 18, 2024

beckermr commented Mar 18, 2024

ytausch commented Mar 18, 2024

xhochy commented Mar 18, 2024 • edited Loading

isuruf commented Mar 18, 2024

isuruf commented Mar 18, 2024

ytausch commented Mar 18, 2024 • edited Loading

beckermr left a comment

Choose a reason for hiding this comment

ytausch commented Mar 25, 2024

beckermr commented Mar 25, 2024

isuruf commented Mar 25, 2024

ytausch commented Mar 25, 2024 • edited Loading

isuruf commented Mar 25, 2024

ytausch commented Mar 25, 2024 • edited Loading

jaimergp commented Mar 26, 2024

beckermr commented Mar 26, 2024

isuruf commented Mar 26, 2024

ytausch commented Mar 26, 2024

ytausch commented Mar 26, 2024

isuruf commented Mar 26, 2024

ytausch commented Mar 26, 2024

isuruf commented Mar 26, 2024

ytausch commented Mar 26, 2024

beckermr commented Mar 26, 2024

jaimergp commented Mar 27, 2024

beckermr commented Mar 27, 2024

ytausch commented Mar 27, 2024

beckermr commented Mar 27, 2024

isuruf commented Mar 27, 2024

ytausch commented Apr 2, 2024

ytausch commented Mar 15, 2024 •

edited

Loading

beckermr commented Mar 15, 2024 •

edited

Loading

beckermr left a comment •

edited

Loading

ytausch commented Mar 15, 2024 •

edited

Loading

ytausch commented Mar 18, 2024 •

edited

Loading

xhochy commented Mar 18, 2024 •

edited

Loading

ytausch commented Mar 18, 2024 •

edited

Loading

ytausch commented Mar 25, 2024 •

edited

Loading

ytausch commented Mar 25, 2024 •

edited

Loading