Possible next steps for executable semantics #627

geoffromer · 2021-07-08T22:16:22Z

geoffromer
Jul 8, 2021
Collaborator

Now that the end is in sight for the work I've been doing on Expression, I'm starting to think about what to work on next, and I thought I'd lay out my thoughts here so we can compare notes about where we want executable-semantics to go.

At a high level, I think we should be trying to get to a point where everyone on the Carbon project feels equipped to implement new features in executable semantics, and where executable semantics implementation work proceeds roughly in tandem with Carbon proposal development. To that end, I think our main focus should be on improving the executable-semantics developer experience. However, it's also important to keep working on implementing Carbon features so that we don't fall too far behind in the meantime, and so that we get direct feedback on what parts of the developer experience need improvement.

What follows are some specific tasks I've come up with in those two areas. What have I missed? What should I have left out? Any thoughts on how we should prioritize?

New user-facing features

Accepted proposals

There are a number of proposals that have been accepted, but aren't yet implemented in executable-semantics. It might be nice to have some way of tracking them more systematically, but the ones I know of are:

I think those are roughly in decreasing order of importance, but also in roughly decreasing order of difficulty (other than #199, which sort of ill-defined because we don't have a String type yet), so in terms of return on investment, they all seem about evenly matched.

Upcoming proposals

I think the big one here is generics, but there are probably others.

Exploratory work

From a suggestion by @chandlerc: we've talked about having a principle that all of Carbon's types behave like library types, even if they're implemented with intrinsics, but executable-semantics currently special-cases Int and Bool as built-in types; it even lexes them as keywords, rather than as identifiers. It could be useful to explore what we'd need to add to Carbon, and what we'd need to change in executable-semantics, to treat those as library types.

The hard-mode version of this question would be to do the same thing for Type.

Developer experience

Documentation

I think the most important thing we can do to make the executable-semantics codebase more accessible is to add a lot more documentation. The only reasons this isn't already my primary focus are that I need to understand it before I can document it, and in a lot of cases the documentation I could add right now would be made obsolete by the structural changes that we've been implementing, or at least considering. But as those structural changes land, this should be an increasing focus.

Sum type restructuring

I think the 5 major sum types in executable semantics (Expression, Statement, Declaration, Value, and Action) should be restructured so that each alternative is represented by a unique class type, and a unique enumerator, whose names have a completely mechanical correspondence to each other. Adding an alternative should involve as little additional boilerplate as possible, and there should be few (if any) restrictions on the content of the alternative types. In particular, they should be able to have non-trivial data members (such as std::string), and the data members that are currently pointers in order to satisfy triviality requirements should be changed to values.

I've largely finished doing this for Expression, but the other four types need the same treatment.

Reduce use of pointers

We've largely settled that AST nodes (i.e. Expression, Statement, and Declaration) should be immutable, arena-allocated, and passed/stored by const pointer, and the tentative plan is to treat Value the same way. Actions, on the other hand, must be mutable, and have a bounded lifetime, so they should generally be stack- or heap-allocated, stored as values, and passed as values or const references. The same applies to essentially all uses of standard library types, including std::string, std::vector, std::list, and std::pair (although the latter two should generally be avoided anyway).

Decouple `Value`s from the Carbon heap

Currently, Carbon Values are inherently tied to the Carbon heap. For example, a tuple value is represented as basically a vector of Addresses. This makes it easy to do things like take the address of a tuple element, but it forces us to blur the distinction between compile-time and run-time (because we need to be able to work with Values during type checking), and it makes it much harder to reason about mutability at the C++ level (because even if a Value object is immutable, the Carbon value it represents may change during evaluation). I think we should instead take the approach Dave took with the Swift interpreter, where a composite Value is composed of other Values, and doesn't directly or indirectly contain an Address unless it represents a Carbon value that contains an address.

In order to make this work while still supporting things like taking the address of a subobject, Address will need to represent something more like a path expression than a simple memory index.

"Subtypes" of sum types

In Carbon, an expression is "just" a pattern that doesn't contain any bindings or placeholders, and a type is "just" a value whose type is Type. Even a value is in a sense "just" the no-placeholders special case of something more general that we don't really have a name for ("pattern value", perhaps?). However, those special cases are important, and central to how we think about Carbon code. In fact, they're so important that in some cases we've named the C++ types after the special cases, rather than what they actually represent: the Expression type represents patterns, not just expressions, and the Value type represents not just values but "pattern values". This is liable to surprise and confuse unwary programmers; I for one was certainly surprised and confused to realize that a Value can contain variables and placeholders. Furthermore, there are a lot of contexts where we know we are dealing with one of those special cases, and the code would be clearer and more statically safe if the code could explicitly say so.

Conceptually, what we want here is to be able to define and work with subtypes of a sum type, e.g. to make Expression a subtype of Pattern (which we currently call Expression), and Type a subtype of Value. Note that this isn't just a matter of subsetting the alternatives: an Expression node isn't just a Pattern node whose kind is not PatternVariableExpression; it's a Pattern node whose Pattern children are also Expressions.

I'm not yet sure how to model this sort of subtyping in C++, at least not without excessive amounts of boilerplate. I think Clang may have had to solve similar problems in their AST, so that may be a good place to look for inspiration.

Static name lookup

If I understand correctly, the Carbon interpreter currently performs name lookup in both the typechecker and the interpreter. Not only does this result in duplicated work, it makes it hard to distinguish static from dynamic dispatch, and creates the risk that the two sets of name lookup logic might accidentally fall out of sync. I think name lookup should be performed once, prior to or during typechecking, and the results stored in the AST so that they can be reused by the interpreter.

Handling AST mutation

The AST is currently almost entirely stateless: essentially the only information that's available for a given node is the information that was available to the parser. The type checker does perform some structural transformations on the AST, such as statically evaluating certain expressions and replacing them with literals, but it does not attach new kinds of information to the AST. However, it seems likely that we will eventually want later phases of semantic analysis (or even the run-time interpreter) to be able to access information computed by earlier analysis phases. Name lookup is one example, as discussed above, but it seems likely that we'll also eventually want the types of declarations and expressions to persist after typechecking.

However, there are also very good reasons to want the AST to be immutable, and to have strong static guarantees about what information is available about a given node. One way to reconcile that tension would be for the AST node types to be templated, with a parameter that (roughly speaking) specifies what information is present in the node (see here for a discussion of how the GHC Haskell compiler does this kind of thing).

I don't think the time is ripe to pursue this yet, but it's something to keep in mind.

jonmeow · 2021-07-09T17:12:10Z

jonmeow
Jul 9, 2021
Maintainer

Thank you for this writeup! It's very helpful for understanding where you're expecting executable semantics to go next. (I think I'll need to spend more time looking at the code to give better comment, and I currently feel a little momentum with the migration tool)

0 replies

jsiek · 2021-07-12T19:43:19Z

jsiek
Jul 12, 2021

Regarding static name lookup, that's something that was done in the Swift implementation as well.
It was done prior to type checking. By placing it prior to type checking, we were assuming that
things like function overloads would be handled by aggregating the overload set for a
particular scope into one name declaration.

1 reply

jsiek Jul 12, 2021

Though I should mention that it is more traditional in the PL theory world to do name lookup during type checking.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Possible next steps for executable semantics #627

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 2 comments 1 reply

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

Select a reply

Possible next steps for executable semantics #627

geoffromer Jul 8, 2021 Collaborator

New user-facing features

Accepted proposals

Upcoming proposals

Exploratory work

Developer experience

Documentation

Sum type restructuring

Reduce use of pointers

Decouple Values from the Carbon heap

"Subtypes" of sum types

Static name lookup

Handling AST mutation

Replies: 2 comments · 1 reply

jonmeow Jul 9, 2021 Maintainer

jsiek Jul 12, 2021

jsiek Jul 12, 2021

geoffromer
Jul 8, 2021
Collaborator

Decouple `Value`s from the Carbon heap

Replies: 2 comments 1 reply

jonmeow
Jul 9, 2021
Maintainer

jsiek
Jul 12, 2021