Skip to content

Is Parameter Inference with Integer Subtyping still Deterministic? #87

@VonTum

Description

@VonTum

I've simplified the issue to make it as concrete as possible. Because Latency Counting behaves similarly to subtyping, I believe that we can leave it out completely and just express the problem in terms of integer subtyping.

Example module that has parameters:

module SplitAt #(T, int SIZE, int SPLIT_POINT) {
	interface SplitAt : T[SIZE] i -> T[SPLIT_POINT] left, T[SIZE - SPLIT_POINT] right

	for int I in 0..SPLIT_POINT {
		left[I] = i[I]
	}
	for int I in 0..SIZE - SPLIT_POINT {
		right[I] = i[I+SPLIT_POINT]
	}
}

Example use:

module use_SplitAt {
    gen int MY_ARR_SIZE = 5

    bool[MY_ARR_SIZE] myArr

    bool[3] a, bool[ ] b = SplitAt(myArr)
    // ^ Infers to 
    // bool[3] a, bool[2] b = SplitAt #(T: bool, SIZE: 5, SPLIT_POINT: 3)(myArr)
}

As an example, Synthesizing use_SplitAt goes as follows:

  • First all modules are parsed, global resolution, etc, and "abstract typechecked" (irrelevant here)
  • We execute the generative code in use_SplitAt. So MY_ARR_SIZE is assigned '5', myArr gets the type bool[5], and an "instantiation candidate" is created for SplitAt. Represented as SplitAt#(T: bool, SIZE: var_1, SPLIT_POINT: var_2)
  • Concrete typechecking starts unifying the type value parameters, and through inverting the equations on module ports of SplitAt, it can infer var_1 and var_2.
  • As soon as all parameters of a submodule become known, it is executed (and typechecked) in turn. If execution/typechecking is successful, the concrete type parameters of its ports are injected back into the parent typing context. In this case SplitAt execution is successful, and the types for the ports bool[5] i, bool[3] left, and bool[2] right are injected back into the typing context of use_SplitAt. From this we can now assign the type of b, namely: bool[2] b
  • Once all variables are inferred, and no type errors happened, the instantiation of use_SplitAt is successful.

More in depth

Submodule (and struct) parameter inference always happens from both known values (size of array bool[3] a), and unknown values (bool[] b). This does not lead to the need for any symbolic manipulation, since concrete typing inference is only started after execution of the generative code of a module, and thus any symbols or expressions will have resolved to concrete values. (Eg: bool[MY_ARR_SIZE] in will have resolved to the concrete type bool[5] before concrete type checking has started)

The typechecker is purely value-based. Any "structure" of the types (like bool[][]...) has been resolved at the earlier Abstract Typing stage and falls outside of the scope here. Concrete typing is just responsible for filling out the concrete value parameters of types. So in bool[var_1][var_2], resolving var_1 and var_2 to concrete values like 3 and 5.

Parameters of submodules are inferred by inverting the symbolic formulas on the modules ports (Like SIZE in T[SIZE] i). If it is known that i connects to bool[5] then we can infer SIZE to be 5.

Inferrable expressions

I think, given the next bits that making a distinction between "inferrable" expressions and "non-inferrable" expressions is pertinent.

bool[SIZE + 3] <: bool[10] can be used to infer SIZE = 7, even bool[V * 4 + 2] <: bool[10] could be used to infer V = 2. But of course, bool[V % 4] <: bool[3] cannot infer V

Feature 1: use already-inferred parameters to invert more complex expressions

If we already know one parameter, then we could use that information to infer other parameters. If SIZE == 5 in T[SIZE - SPLIT_POINT] right and right connects to bool[2], then we can infer SPLIT_POINT == 3

Feature 2: Integer Subtyping

Of course this would be more general, but integers are a useful enough use case

Integer is defined in std/core.sus as:

__builtin__ struct int #(int MIN, int MAX) {}

Integers are defined by their minimum and maximum (inclusive) values.

Integer subtyping lets us assign an int #(MIN: 5, MAX: 10) to an int #(MIN: 0, MAX: 255) for instance.

Example:

module Ints {
	interface Ints : int #(MIN: 0, MAX: 15) a, bool c

    int b // Infers to int #(MIN: 0, MAX: 20)

    when c { // "Hardware 'if'"
        b = a
    } else {
        b = 20 // "20" is of type int #(MIN: 20, MAX: 20)
    }
}

Because the concrete typing system does not work on type trees, but rather just on the type parameters, we specify the subtyping rules to operate on such value variables rather.

Constraints can be of three kinds: (We use "must be" to make it clear that the constraints are applied to val_a and only constrain val_a, not val_b)

  • Exact equality: val_a == val_b
  • Less than: val_a must be <= val_b
  • Greater than: val_a must be >= val_b

Typing assignments then produce these constraints. Examples:

  • bool[val_a] <: bool[val_b] produces val_a == val_b
  • int #(MIN: min_a, MAX: max_a) <: int #(MIN: min_b, MAX: max_b) produces min_b must be <= min_a and max_b must be >= max_a

Exact equality is handled by immediate unification of the variables and can be done for any type, whereas the inequality constraints are explicitly stored and can only be performed for int parameters.

If a variable is only bounded by Less than (respectively Greater than) constraints, then once all the right sides are resolved to a value, the variable is unified with the minimum (respectively maximum) of all the right side values.

As an aside, as a consequence of this way of generating and resolving constraints means that type inference for inequalities can only go forwards (from subexpressions to their parent expressions), whereas

Desirable properties:

Property 1: Determinism

If a module typechecks, there should be exactly one obvious typing, regardless of the order in which submodule parameters are inferred, or the order in which they are executed.
If a module doesn't typecheck, this isn't necessary of course.
Of course, spurious failures due to this ordering are also disallowed.

Formal-ish Problem statement

Without Feature 1 and Feature 2, the base implementation is fully deterministic, as it's just a really simple kind of Hindley-Milner. It is when we add these features that things get murky.

Does only Feature 1 break Determinism?

With only Feature 1, I believe Determinism still holds, since if any order in which differing options for inference would be found should still produce a type error.

How to design Inference such that Feature 2 doesn't break Determinism?

A simple example to show that Feature 2 might break determinism is:

module eq #(int MAX) {
    interface eq : int #(MIN: 0, MAX: MAX) a, int #(MIN: 0, MAX: MAX) b -> bool is_eq

    is_eq = a == b
}
module use_eq #(int MAX) {
    int #(MIN: 0, MAX: 3) a
    int #(MIN: 0, MAX: 5) b
    
    bool c = eq(a, b)
    // Might get inferred from a.MAX to be eq#(MAX: 3), in which case b doesn't typecheck
    // Or gets inferred from b.MAX to be eq#(MAX: 5), in which case a is a subtype of int#(MIN: 0, MAX: 5)
}

Perhaps doing a similar thing for inferring parameters as for integer subtyping (as in, Less than / Greater than relations are created for inferrable variables. Of course, if there's any exact equality constraints on the same variable they'd take prescedence and immediately resolve the variable).

Then again, requiring that we grab the minimum / maximum from all possible places from where a value might be inferred could come with its own drawbacks, and leaves non-inferrable expressions still containing a parameter in an awkward place.

Perhaps we can constrain "inference" site a bit better still, as outputs could never receive a valid inferrable value. That way there being a MAX parameter on total does not prevent it from being inferred if both MAX params for a and b are known.

module modulus_add #(int MAX) {
    interface modulus_add : int #(MIN: 0, MAX: MAX) a, int #(MIN: 0, MAX: MAX) b -> int #(MIN: 0, MAX: MAX) total

    //...
}

In summary:
Is the base typing system extended with Feature 2 (where inequality constrained inference requires valid values for all inferrable sites)

If Base + Feature 2 is Deterministic, does adding in Feature 1 re-break Determinism?

Say Base + Feature 2 turns out Deterministic, the if we were to add in Feature 1, then that might turn previously non-inferrable sites into inferrable sites. For example, bool[A - B] <: bool[10] is not an inferrable site for A or B, but once one of them is known, the other becomes inferrable.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Language DesignThese are issues about the design of the language, and thus should be discussedbugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions