Skip to content

Type Inference

Jake Chitel edited this page Nov 9, 2017 · 1 revision

Ideally, the developer should rarely, if ever, have to explicitly specify types if they don't want to. The verbosity of specifying types is one of the biggest complaints of strongly-typed languages. Newer strongly typed languages get around this by building intelligence into their compilers that infers types based on usage, so developers don't always need to specify types.

Type Flow

When dealing with type inference, one often encounters the concept of "type flow". When the type of one expression is inferred, that type flows through anywhere that the expression's value is used. This cascades through an entire function, module, and program. Because "downstream type flow" like this follows the same direction as the logic, this is often very easy to follow, and in fact, Ren already has this type of inference.

Upstream Type Flow

Where this becomes difficult is the concept of "upstream type flow", which happens when you want to infer the type of an expression that has no inherent type. This is almost always an identifier such as a parameter, field, variable, or generic type parameter. These constructs are just identifiers, references to other expressions that have types.

Declaration paired directly with initialization

When the identifier declaration is paired directly with its initialization (for example, a simple variable initialization), this is easy. Simply get the type of the initializing expression, and assign that type to the identifier.

Identifiers that will have an initialization

But for parameters, fields, or variables that are declared without an initializer, the actual initialization can be occuring anywhere. In the instance of fields or variables, this is still a "downstream" situation because the initialization will occur after the declaration. You can simply hold onto the identifier, wait for it to be used, then set its type.

Parameters

Parameters (and anything that depends on them) are a different story entirely. Parameters are placeholders for a value that will be passed into a function or type. In order to infer their types, you need to either look for the usages of the parameters, or look for the usages of the function that declares the parameters. In this instance, you have a choice: base the type of the parameter on the parameter usage, or the function usage. In almost all cases, you want to use the parameter usage, otherwise the function becomes dependent on how it is used rather than what it does, which is weird. But in some cases the parameter usage is not sufficient, so that is another valid option.

Process

Now how do we want this to work? Ideally, you will start with the main function of the application and work your way through all references in the program. When you encounter an identifier with no type, you want to try to get its type from its declaration before trying to get it from its usage. This will involve tracking down the declaration and tracing through that. This process will be heavily recursive. If a declaration is processed and no type is inferred, you can then go on to usages. Place the identifier into a table for "yet to be typed identifiers". Whenever you encounter the usage of an identifier in the table, you should be able to infer the type of the identifier from the context it is used. If not, that's fine (but it needs to be determined whether this is possible). Ideally, when you have finished traversing a program, you have been able to assign types to everything. If you haven't, the interesting thing to consider is that it may not even matter. If a value is simply passed around, and no operation is done on it that applies to a specific type, then a type need not be assigned to it. The place the identifier's value came from, and the place its value went, simply did not require a specific type. The program should be just fine.

Typable elements to be inferred

As of writing, the following language constructs can be tagged with a type:

  • type declarations (obviously)
  • function parameters
  • function return types
  • lambda function parameters
  • type parameter constraints
  • type arguments (applications of type parameters)
  • struct/class fields
  • array base types
  • tuple types
  • variables

Any non-identifier expression's type can be automatically inferred based on its inherent behavior. Numbers can be inferred to have numeric types, likewise with other primitive types.

Inference restrictions

At the moment, this will be the set of available restrictions on type inference:

  • require types for any of each typable element
  • disallow types for any of each typable element
  • require types on exported values
  • require types on function signatures
  • strict inference:
    • types will never be inferred to any or union types, so any situation where a type is inferred and another usage conflicts with that inference will be an error
    • types must be inferrable, meaning that any non-inferred type, even if it doesn't break the program, will trigger an error

Explicit typing

A related feature that is actually the opposite of type inference is "explicit typing", where you actually do provide the types for things. At the moment, almost everything is either inferred or explicitly typed, but everything should be able to be explicitly typed:

  • function parameters (already possible)
  • function return types (already possible)
  • lambda function parameters (already possible)
  • lambda function return types (not currently possible)
  • type arguments (already possible)
  • fields (already possible)
  • variables (not currently possible)
  • cast expressions (not currently even in the language)