Skip to content

Conversation

@dcreager
Copy link
Member

@dcreager dcreager commented Apr 8, 2025

This PR adds very basic inference of generic typevars at call sites. It does not bring in a full unification algorithm, and there are a few TODOs in the test suite that are not discharged by this. But it handles a good number of useful cases! And the PR does not add anything that would go away with a more sophisticated constraint solver.

In short, we just look for typevars in the formal parameters, and assume that the inferred type of the corresponding argument is what that typevar should map to. If a typevar appears more than once, we union together the corresponding argument types.

Cases we are not yet handling:

  • We are not widening literals.
  • We are not recursing into parameters that are themselves generic aliases.
  • We are not being very clever with parameters that are union types.

This depends on #17023.

dcreager and others added 30 commits March 30, 2025 12:00
* main:
  [red-knot] Add property tests for callable types (#17006)
  [red-knot] Disjointness for callable types (#17094)
  [red-knot] Flatten `Type::Callable` into four `Type` variants (#17126)
  mdtest.py: do a full mdtest run immediately when the script is executed (#17128)
  [red-knot] Fix callable subtyping for standard parameters (#17125)
  [red-knot] Fix more `redundant-cast` false positives (#17119)
  Sync vendored typeshed stubs (#17106)
  [red-knot] support Any as a class in typeshed (#17107)
  Visit `Identifier` node as part of the `SourceOrderVisitor` (#17110)
  [red-knot] Don't infer Todo for quite so many tuple type expressions (#17116)
  CI: Run pre-commit on depot machine (#17120)
  Error instead of `panic!` when running Ruff from a deleted directory (#16903) (#17054)
  Control flow graph: setup (#17064)
  [red-knot] Playground improvements (#17109)
  [red-knot] IDE crate (#17045)
  Update dependency vite to v6.2.4 (#17104)
  [red-knot] Add redundant-cast error (#17100)
  [red-knot] Narrowing on `in tuple[...]` and `in str` (#17059)
* main:
  [red-knot] Specialize `str.startswith` for string literals (#17351)
  [syntax-errors] `yield`, `yield from`, and `await` outside functions (#17298)
  [red-knot] Refresh diagnostics when changing related files (#17350)
  Add `Checker::import_from_typing` (#17340)
  Don't add chaperone space after escaped quote in triple quote (#17216)
  [red-knot] Silence errors in unreachable type annotations / class bases (#17342)
Copy link
Member Author

@dcreager dcreager left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@carljm, the "identity specialization" worked well, and let me remove the new InstanceType enum variant as we discussed.

return x

# TODO: no error
# TODO: revealed: str
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I was thinking is that when first parameter passed ("a"), then T typevar locks to Literal["a"] and by second parameter which is Literal["b"] it will invalidate the T.

This is a good way to think about how the implementation works here, and how it will need to grow to support some of the other mdtests that still have TODOs.

Right now, the only "solving" that we do is to see that the param is a typevar, and the argument is "some type", and add to the pending specialization that the typevar maps to the type. But if we've already seen that typevar in a different parameter, instead of replacing the previous type, or requiring the previous and new types to be the same (as you thought might be the case), we merge them together into a union. This is the only "unification" that the implementation in this PR does.

This first example in this section is one that this PR doesn't addres, where we'll need to not just blindly union everything together — the type annotation is meant to be an actual restriction that we should enforce.

.member_lookup_with_policy(
db,
name.into(),
MemberLookupPolicy::NO_INSTANCE_FALLBACK | policy,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Callers must now provide the full policy that they want to use, and NO_INSTANCE_FALLBACK is not implicitly added. That allows us to use try_call_dunder_with_policy (instead of copy/pasting it) down below in try_call_constructor, to find and call the __new__ class method.

Comment on lines -3524 to +3533
mut argument_types: CallArgumentTypes<'_, 'db>,
argument_types: &mut CallArgumentTypes<'_, 'db>,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Taking in a mut reference here let's us use with_self down below, which reuses and modifies a CallArgumentTypes in place.

Comment on lines 3757 to 3758
MemberLookupPolicy::MRO_NO_OBJECT_FALLBACK
| MemberLookupPolicy::META_CLASS_NO_TYPE_FALLBACK,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This the lookup policy that does not include NO_INSTANCE_FALLBACK, where we can now use try_call_dunder_with_policy because of the change described above


/// Create a new [`CallArgumentTypes`] by prepending a synthetic argument to the front of this
/// argument list.
pub(crate) fn prepend_synthetic(&self, synthetic: Type<'db>) -> Self {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is redundant with with_self, as long as you can reuse the CallArgumentTypes instance

Comment on lines +240 to +245
pub fn display(&'db self, db: &'db dyn Db) -> DisplayGenericContext<'db> {
DisplayGenericContext {
typevars: self.variables(db),
db,
}
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't actually used in the PR, but I was using it for some printf debugging at one point, and it seemed worth keeping.

Comment on lines +193 to +197
let builder = self
.types
.entry(typevar)
.or_insert_with(|| UnionBuilder::new(self.db));
builder.add_in_place(ty);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is where (as discussed above) we are unioning together the mappings for a typevar that appears in multiple parameters

@dcreager dcreager marked this pull request as ready for review April 11, 2025 15:31
Copy link
Contributor

@carljm carljm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fantastic.

return x

# TODO: no error
# TODO: revealed: str
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That example will also require type context (bidirectional checking).

return x

# TODO: no error
# TODO: revealed: str
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know I said in the first comment here that I wasn't actually saying we should remove all these TODOs asking for wider types... but I kind of think we should? Unless someone wants to argue that we definitely want this change. Having TODOs around suggests to a contributor that a PR to widen all these types would be welcomed, when I'm not sure it would/should be.

mod bind;
pub(super) use arguments::{Argument, CallArgumentTypes, CallArguments};
pub(super) use bind::Bindings;
pub(super) use bind::{Bindings, CallableBinding};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't seem to be used in this module?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's reexported here so that this line can pull it in from the call module instead of from call::bind. I think that's so that we can use pub(super) here to limit it to the types::* module tree, and not let it leak to the entire crate? (We'd need a pub(grandparent) to do that on its definition in bind.)

I honestly don't love that pattern. (Do we really need that level of control over visibility? And if so, would it be better to use more crates to achieve it? If we don't want more crates, is that not an argument that we don't really need that level of control?) But I don't feel that strongly, and it's the pattern currently being used, so I didn't want to conflate an extra thing into this PR.

Type::FunctionLiteral(function),
"__new__" | "__init__",
) => Type::FunctionLiteral(
function.with_generic_context(db, alias.origin(db).generic_context(db)),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit confused by this. So we still have the specialized function, and we don't remove that specialization, but we attach the class' generic context to the function? What effect does having the specialization still attached have?

I guess I'm having some trouble following the semantics of the "generic context" field on a function type; maybe this is also related to your comment just above. I wonder if it would be clearer to have a separate dedicated field for this special case of "already-specialized method, but we want call binding to infer a specialization from it"?

I guess a maybe related question is, what about if the class is explicitly specialized already (e.g. C[int](...) constructor call)? In that case, it seems like we don't want/need to infer a specialization from the constructor, we just want to use the existing specialization and check the arguments against it. How is that handled currently?

Copy link
Member Author

@dcreager dcreager Apr 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit confused by this. So we still have the specialized function, and we don't remove that specialization, but we attach the class' generic context to the function? What effect does having the specialization still attached have?

This is related to your question below: we shouldn't be doing this special-case lookup here at all, it should only happen down in ClassLiteralType, where neither the generic class nor its __new__ method have been specialized yet. So I've removed this.

I think the equivalent logic in ClassLiteralType (which we keep) is correct, at least for __new__ methods that are not further specialized. (For ones that are, we run afoul of my comment above about not handling that nested generic context especially well yet.) You would have:

class C[T]:
    def __new__(cls, x: T) -> "C"[T]: ...

In ClassLiteralType, we would looking up C.__new__. C is not yet specialized, so the method inherits the class's generic context and (via the with_generic_context call) becomes C.__new__[T]. Then when we call it, a specialization gets inferred, and that specialization gets applied to the class as well, resulting in e.g. C.__new__[int]C[int].

class_literal
.own_class_member(db, name)
.map_type(|ty| self.specialize_type(db, ty))
class_literal.own_class_member(db, name).map_type(|ty| {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we have to implement the special case for __init__ and __new__ both here and in ClassLiteralType::own_class_member, when this method delegates to ClassLiteralType::own_class_member?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above; we shouldn't have this logic in both places, only in ClassLiteralType, where the generic class has not yet been specialized

@dcreager dcreager marked this pull request as draft April 15, 2025 20:55
@dcreager
Copy link
Member Author

Still working on more thorough tests of __new__ and __init__. Will move it back out of draft when ready for more review

* main: (44 commits)
  [`airflow`] Extend `AIR311` rules (#17422)
  [red-knot] simplify union size limit handling (#17429)
  [`airflow`] Extract `AIR311` from `AIR301` rules (`AIR301`, `AIR311`) (#17310)
  [red-knot] set a size limit on unions of literals (#17419)
  [red-knot] make large-union benchmark slow again (#17418)
  [red-knot] optimize building large unions of literals (#17403)
  [red-knot] Fix comments in type_api.md (#17425)
  [red-knot] Do not assume that `x != 0` if `x` inhabits `~Literal[0]` (#17370)
  [red-knot] make large-union benchmark more challenging (#17416)
  [red-knot] Acknowledge that `T & anything` is assignable to `T` (#17413)
  Update Rust crate clap to v4.5.36 (#17381)
  Raise syntax error when `\` is at end of file (#17409)
  [red-knot] Add regression tests for narrowing constraints cycles (#17408)
  [red-knot] Add some knowledge of `__all__` to `*`-import machinery (#17373)
  Update taiki-e/install-action digest to be7c31b (#17379)
  Update Rust crate mimalloc to v0.1.46 (#17382)
  Update PyO3/maturin-action action to v1.49.1 (#17384)
  Update Rust crate anyhow to v1.0.98 (#17380)
  dependencies: switch from `chrono` to `jiff`
  Update Rust crate bstr to v1.12.0 (#17385)
  ...
…nto dcreager/infer-function-calls

* origin/dcreager/infer-function-calls:
  Real function body
  Add comment about potential class method inference
  Add TODO about nested contexts
  Remove TODOs
@dcreager dcreager marked this pull request as ready for review April 16, 2025 17:04
@dcreager
Copy link
Member Author

Going to go ahead and merge this; if anyone sees anything that needs to be addressed as follow-on work, please let me know!

@dcreager dcreager merged commit 914095d into main Apr 16, 2025
23 checks passed
@dcreager dcreager deleted the dcreager/infer-function-calls branch April 16, 2025 19:07
dcreager added a commit that referenced this pull request Apr 18, 2025
* main: (123 commits)
  [red-knot] Handle explicit class specialization in type expressions (#17434)
  [red-knot] allow assignment expression in call compare narrowing (#17461)
  [red-knot] fix building unions with literals and AlwaysTruthy/AlwaysFalsy (#17451)
  [red-knot] Type narrowing for assertions (take 2) (#17345)
  [red-knot] class bases are not affected by __future__.annotations (#17456)
  [red-knot] Add support for overloaded functions (#17366)
  [`pyupgrade`] Add fix safety section to docs (`UP036`) (#17444)
  [red-knot] more type-narrowing in match statements (#17302)
  [red-knot] Add some narrowing for assignment expressions (#17448)
  [red-knot] Understand `typing.Protocol` and `typing_extensions.Protocol` as equivalent (#17446)
  Server: Use `min` instead of `max` to limit the number of threads (#17421)
  [red-knot] Detect version-related syntax errors (#16379)
  [`pyflakes`] Add fix safety section (`F841`) (#17410)
  [red-knot] Add `KnownFunction` variants for `is_protocol`, `get_protocol_members` and `runtime_checkable` (#17450)
  Bump 0.11.6 (#17449)
  Auto generate `visit_source_order` (#17180)
  [red-knot] Initial tests for protocols (#17436)
  [red-knot] Dataclasses: synthesize `__init__` with proper signature (#17428)
  [red-knot] Dataclasses: support `order=True` (#17406)
  [red-knot] Super-basic generic inference at call sites (#17301)
  ...
dcreager added a commit that referenced this pull request Apr 29, 2025
We are currently representing type variables using a `KnownInstance`
variant, which wraps a `TypeVarInstance` that contains the information
about the typevar (name, bounds, constraints, default type). We were
previously only constructing that type for PEP 695 typevars. This PR
constructs that type for legacy typevars as well.

It also detects functions that are generic because they use legacy
typevars in their parameter list. With the existing logic for inferring
specializations of function calls (#17301), that means that we are
correctly detecting that the definition of `reveal_type` in the typeshed
is generic, and inferring the correct specialization of `_T` for each
call site.

This does not yet handle legacy generic classes; that will come in a
follow-on PR.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ty Multi-file analysis & type inference

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants