Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add PEP 621 project table parsing and validation module #647

Open
chrisjsewell opened this issue Dec 22, 2022 · 19 comments
Open

Add PEP 621 project table parsing and validation module #647

chrisjsewell opened this issue Dec 22, 2022 · 19 comments

Comments

@chrisjsewell
Copy link

As prototyped in #646, I believe this repo is a nice place to house a general parsing and validation function for the PEP 621 [project] table in the pyproject.toml.

This returns a list of (build tool independent) validation errors and a TypedDict of the project data.

@brettcannon
Copy link
Member

I guess the first question is do we think this is a good fit for this project? For that I'm not sure as we have not really done front-end stuff before. We already have enough work just to keep up with the back-end-related, so I'm not sure if we want to take this extra work on if it isn't going to be used by e.g. pip (which I don't think pip does since it will read PKG-INFO for sdists and the [build-system] table is the back-end-specific part of pyproject.toml). Or put another way, does an installer need access to this table?

My next point is I don't think a custom ParseResult object to collect all validation errors is the right thing long-term in the face of https://peps.python.org/pep-0654/ ; I think writing a shim for ExceptionGroup for Python < 3.11 is a better fit. I also don't think a named tuple should be used for VError (named tuples were created for old APIs to transition to a more object-oriented format, like stuff wrapping C code, not new APIs where you have to support indexing and names forever).

@brettcannon
Copy link
Member

To be clear, I still appreciate the work and I see value of this making it up to PyPI somehow, I just don't know if @pradyunsg and I think this project is the best fit.

@chrisjsewell
Copy link
Author

chrisjsewell commented Dec 23, 2022

For that I'm not sure as we have not really done front-end stuff before.

The point of this is for builders to use (flit, etc). This is what I'm using it for.
I understand them to be backend?

@chrisjsewell
Copy link
Author

I think writing a shim for ExceptionGroup

I'm not sure if this is achieving quite the same purpose.
The key thinking was that I want to collect all the issues with validation, rather than fail on the first one, in order to report to the users all the problems at once, rather than them have to rerun many times.
I.e. this is the same as how a linter works.

@chrisjsewell
Copy link
Author

I also don't think a named tuple should be used

Yeh can change names tuples for dataclasses, I'm not fussed either way

@chrisjsewell
Copy link
Author

chrisjsewell commented Dec 23, 2022

I just don't know if @pradyunsg and I think this project is the best fit.

It does basically use every aspect of this module, which is why I felt it was a good fit.
Otherwise you are just gonna have to have packaging as a dependency, if it goes somewhere else.

But yeh up to you guys, I'll be using this code anyway 😄

@pradyunsg
Copy link
Member

pradyunsg commented Dec 23, 2022

Outside of the specifics of the implementation, I do think it's a good idea to have some sort of validation for the standards-backed pyproject.toml->[project] metadata stuff in packaging; especially since we're established that we wanna go down some of that route through packaging.metadata already. It might make sense as a part of that; when we come around to handling that aspect of things?

@pradyunsg
Copy link
Member

FWIW, https://validate-pyproject.readthedocs.io/en/latest/index.html exists already.

@brettcannon
Copy link
Member

The key thinking was that I want to collect all the issues with validation, rather than fail on the first one, in order to report to the users all the problems at once, rather than them have to rerun many times.

That's exactly what exception groups were designed for.

especially since we're established that we wanna go down some of that route through packaging.metadata already.

Sort of. I mean packaging.metadata is at a different layer than pyproject.toml's [project] table.

I think I'm +0 on the idea, but I personally don't have the capacity to do the review on the PR to get it merged (I'm already having to consider picking up Donald's draft PR for metadata to see it through). If I were to do this myself, I would write a module that did the following:

  • Defined a convert()/parse() function that took a dict and returned a TypedDict that covers everything that's in a spec converted to objects provided by this project (I think the PR already does this, although I don't know if it covers [build-system]); this aligns with the plans for packaging.metadata since TOML implicitly handles the "raw" parsing of the format. I'm not sure if it's best to parse everything or parse each potential table separately, but we should cover it all if we are taking on the responsibility of handling packaging specs around pyproject.toml.
  • Not do any I/O (which I think the PR already does).
  • Use exception groups for the collected failures.

@chrysle
Copy link

chrysle commented Apr 7, 2024

  • Defined a convert()/parse() function that took a dict and returned a TypedDict that covers everything that's in a spec converted to objects provided by this project (I think the PR already does this, although I don't know if it covers [build-system]); this aligns with the plans for packaging.metadata since TOML implicitly handles the "raw" parsing of the format. I'm not sure if it's best to parse everything or parse each potential table separately, but we should cover it all if we are taking on the responsibility of handling packaging specs around pyproject.toml.
  • Not do any I/O (which I think the PR already does).
  • Use exception groups for the collected failures.

The recently adopted pyproject-metadata covers the first two aspects, while the third has already been proposed (crosslink pypa/pyproject-metadata#45), although it doesn't seem likely to be included soon. Should the project be recommended instead of including the functionality in packaging itself?

@abravalheri
Copy link
Contributor

Should the project be recommended instead of including the functionality in packaging itself?

Because build-backends have a limitation in terms of dependencies and usually have to bundle them to prevent dependency cycles, I think it would be best if these things are concentrated in packaging (because it is already a dependency on the majority of backends anyway).

@chrysle
Copy link

chrysle commented Apr 8, 2024

Would merging that into packaging be an option, then (without having asked the maintainers)? Seemingly, development is stalled currently due to the lack of privileges. Merging could also simplify that.

@abravalheri
Copy link
Contributor

If there is no dependencies and the maintainers think that is a good idea, I think that would be fine.

@chrisjsewell
Copy link
Author

Sounds cool to me 😄

@chrysle
Copy link

chrysle commented Apr 13, 2024

Great!

cc @FFY00 @henryiii @frostming, which I think are the maintainers currently. What are your thoughts on that?

@frostming
Copy link

frostming commented Apr 14, 2024

Merging pyproject-metadata into packaging is a good idea for me.

@henryiii
Copy link
Contributor

I don't think it's ready. There's still quite a bit of work to be done, like using the standard library email module, and better validation. In the near future, though, it might make a good candidate?

@brettcannon
Copy link
Member

I don't think it's ready. There's still quite a bit of work to be done, like using the standard library email module, and better validation. In the near future, though, it might make a good candidate?

Would a short term step of what @chrisjsewell originally proposed and doing something like packaging.metadata be useful?:

  • A raw TypedDict for what's expected when parsing the TOML
  • An enriched object representation that translates the TypedDict into packaging objects

@henryiii
Copy link
Contributor

henryiii commented Aug 9, 2024

I think the main issue is with .as_rfc822(), which is extremely useful, but does some (occasionally incorrect) magic for two fields: Metadata and Dynamic:

  • Metadata: Updating a default is not backward compatible.
  • Dynamic: This one is really messy. pyproject-metadata attempts to guess which fields will by dynamic (meaning the wheel might have different values than the SDist), and it can be wrong.

It's also implemented on a custom class instead of using email.message.EMailMessage.

I think there are two options that would unblock this:

  1. Just remove this method for now and all the internals related to it.
  2. Make dynamic and metadata_version keyword arguments, the later being required. Ideally rework this to support any EMailMessage-like class, and have a user create and pass it.

Especially if the later was chosen, I'd probably want a few cleanups that are easier with a completely new interface, like removing reliance on custom features of the EMailMessage-like class.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants