Possible next steps for executable semantics #627
Replies: 2 comments 1 reply
-
Thank you for this writeup! It's very helpful for understanding where you're expecting executable semantics to go next. (I think I'll need to spend more time looking at the code to give better comment, and I currently feel a little momentum with the migration tool) |
Beta Was this translation helpful? Give feedback.
-
Regarding static name lookup, that's something that was done in the Swift implementation as well. |
Beta Was this translation helpful? Give feedback.
-
Now that the end is in sight for the work I've been doing on
Expression
, I'm starting to think about what to work on next, and I thought I'd lay out my thoughts here so we can compare notes about where we want executable-semantics to go.At a high level, I think we should be trying to get to a point where everyone on the Carbon project feels equipped to implement new features in executable semantics, and where executable semantics implementation work proceeds roughly in tandem with Carbon proposal development. To that end, I think our main focus should be on improving the executable-semantics developer experience. However, it's also important to keep working on implementing Carbon features so that we don't fall too far behind in the meantime, and so that we get direct feedback on what parts of the developer experience need improvement.
What follows are some specific tasks I've come up with in those two areas. What have I missed? What should I have left out? Any thoughts on how we should prioritize?
New user-facing features
Accepted proposals
There are a number of proposals that have been accepted, but aren't yet implemented in executable-semantics. It might be nice to have some way of tracking them more systematically, but the ones I know of are:
I think those are roughly in decreasing order of importance, but also in roughly decreasing order of difficulty (other than #199, which sort of ill-defined because we don't have a
String
type yet), so in terms of return on investment, they all seem about evenly matched.Upcoming proposals
I think the big one here is generics, but there are probably others.
Exploratory work
From a suggestion by @chandlerc: we've talked about having a principle that all of Carbon's types behave like library types, even if they're implemented with intrinsics, but executable-semantics currently special-cases
Int
andBool
as built-in types; it even lexes them as keywords, rather than as identifiers. It could be useful to explore what we'd need to add to Carbon, and what we'd need to change in executable-semantics, to treat those as library types.The hard-mode version of this question would be to do the same thing for
Type
.Developer experience
Documentation
I think the most important thing we can do to make the executable-semantics codebase more accessible is to add a lot more documentation. The only reasons this isn't already my primary focus are that I need to understand it before I can document it, and in a lot of cases the documentation I could add right now would be made obsolete by the structural changes that we've been implementing, or at least considering. But as those structural changes land, this should be an increasing focus.
Sum type restructuring
I think the 5 major sum types in executable semantics (
Expression
,Statement
,Declaration
,Value
, andAction
) should be restructured so that each alternative is represented by a unique class type, and a unique enumerator, whose names have a completely mechanical correspondence to each other. Adding an alternative should involve as little additional boilerplate as possible, and there should be few (if any) restrictions on the content of the alternative types. In particular, they should be able to have non-trivial data members (such asstd::string
), and the data members that are currently pointers in order to satisfy triviality requirements should be changed to values.I've largely finished doing this for
Expression
, but the other four types need the same treatment.Reduce use of pointers
We've largely settled that AST nodes (i.e.
Expression
,Statement
, andDeclaration
) should be immutable, arena-allocated, and passed/stored by const pointer, and the tentative plan is to treatValue
the same way.Action
s, on the other hand, must be mutable, and have a bounded lifetime, so they should generally be stack- or heap-allocated, stored as values, and passed as values or const references. The same applies to essentially all uses of standard library types, includingstd::string
,std::vector
,std::list
, andstd::pair
(although the latter two should generally be avoided anyway).Decouple
Value
s from the Carbon heapCurrently, Carbon
Value
s are inherently tied to the Carbon heap. For example, a tuple value is represented as basically a vector ofAddress
es. This makes it easy to do things like take the address of a tuple element, but it forces us to blur the distinction between compile-time and run-time (because we need to be able to work withValues
during type checking), and it makes it much harder to reason about mutability at the C++ level (because even if aValue
object is immutable, the Carbon value it represents may change during evaluation). I think we should instead take the approach Dave took with the Swift interpreter, where a compositeValue
is composed of otherValues
, and doesn't directly or indirectly contain anAddress
unless it represents a Carbon value that contains an address.In order to make this work while still supporting things like taking the address of a subobject,
Address
will need to represent something more like a path expression than a simple memory index."Subtypes" of sum types
In Carbon, an expression is "just" a pattern that doesn't contain any bindings or placeholders, and a type is "just" a value whose type is
Type
. Even a value is in a sense "just" the no-placeholders special case of something more general that we don't really have a name for ("pattern value", perhaps?). However, those special cases are important, and central to how we think about Carbon code. In fact, they're so important that in some cases we've named the C++ types after the special cases, rather than what they actually represent: theExpression
type represents patterns, not just expressions, and theValue
type represents not just values but "pattern values". This is liable to surprise and confuse unwary programmers; I for one was certainly surprised and confused to realize that aValue
can contain variables and placeholders. Furthermore, there are a lot of contexts where we know we are dealing with one of those special cases, and the code would be clearer and more statically safe if the code could explicitly say so.Conceptually, what we want here is to be able to define and work with subtypes of a sum type, e.g. to make
Expression
a subtype ofPattern
(which we currently callExpression
), andType
a subtype ofValue
. Note that this isn't just a matter of subsetting the alternatives: anExpression
node isn't just aPattern
node whose kind is notPatternVariableExpression
; it's aPattern
node whosePattern
children are alsoExpression
s.I'm not yet sure how to model this sort of subtyping in C++, at least not without excessive amounts of boilerplate. I think Clang may have had to solve similar problems in their AST, so that may be a good place to look for inspiration.
Static name lookup
If I understand correctly, the Carbon interpreter currently performs name lookup in both the typechecker and the interpreter. Not only does this result in duplicated work, it makes it hard to distinguish static from dynamic dispatch, and creates the risk that the two sets of name lookup logic might accidentally fall out of sync. I think name lookup should be performed once, prior to or during typechecking, and the results stored in the AST so that they can be reused by the interpreter.
Handling AST mutation
The AST is currently almost entirely stateless: essentially the only information that's available for a given node is the information that was available to the parser. The type checker does perform some structural transformations on the AST, such as statically evaluating certain expressions and replacing them with literals, but it does not attach new kinds of information to the AST. However, it seems likely that we will eventually want later phases of semantic analysis (or even the run-time interpreter) to be able to access information computed by earlier analysis phases. Name lookup is one example, as discussed above, but it seems likely that we'll also eventually want the types of declarations and expressions to persist after typechecking.
However, there are also very good reasons to want the AST to be immutable, and to have strong static guarantees about what information is available about a given node. One way to reconcile that tension would be for the AST node types to be templated, with a parameter that (roughly speaking) specifies what information is present in the node (see here for a discussion of how the GHC Haskell compiler does this kind of thing).
I don't think the time is ripe to pursue this yet, but it's something to keep in mind.
Beta Was this translation helpful? Give feedback.
All reactions