Refactor tracing and compilation aspects#2118
Merged
vext01 merged 4 commits intoykjit:masterfrom Mar 5, 2026
Merged
Conversation
Most of the time -- and certainly if you create "good" traces! -- `seen_hls` is pretty small. In such cases (from experience, up to perhaps 16-20 elements), a `Vec` will be faster to create, insert, and search than a `HashSet`. But the real reason for this change is that it allows us to know the order we encountered `Location`s, which will soon be useful to us. While I'm here, add a Lua test that makes sure this change is correct. [I got this commit wrong at first, and broke this test, so having it here, now, is useful.]
The problem with storing `const *`s is that it's easy to forget when we have, or haven't, held onto a strong `Arc` link. This commit thus moves to storing `Arc`s and puts them in a new struct `SeenHotLocations` where we can hide the (small but noticeable) horror of "have we unrolled?" behind a function call, rather than endlessly duplicating the check.
We removed "connector" as a concept in favour of "coupler" some time ago, but it seems I did not do a full sweep of the codebase.
Not only are these terms are now somewhat anachronistic, but the code for compiling them -- which is fairly complex -- contained lots of duplication with minor variations. This commit smooths the mt.rs side of things out, borrowing from j2 the idea of `TraceStart` and `TraceEnd` as a way of taming the complexity of the different kinds of trace we record and compile. This lets us remove the separate `root_compile` and `sidetrace_compile` functions with just `compile`. This commit doesn't push this quite as far as it could: aot_to_hir still has an unfortunate `BuildKind` enum. But this commit is still an improvement, and quite big enough on its own, that tackling more now doesn't seem like a good idea! To my surprise, this commit must very slightly change compilation times, because some tests which deterministically passed before now very occasionally execute before compilation completed. They should always have had `YKD_SERIALISE_COMPILATION=1` set. Overall, a benchmarking run shows there's no meaningful change in performance as a result of this commit.
c26b402 to
4d9a376
Compare
Contributor
Author
|
There are more tests which should have |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Originally this PR was going to be the start of a bigger PR, but it's become big enough, and useful enough, on its own to be worth doing separately now.
There are two main parts:
seen_hlsinto shape. Previously we used aHashSetofconst *pointers. That means we can't (in the future) associate control point calls withHotLocations and we also have a soundness issue, because theconst *pointers might be deallocated. The first two commits fix both issues. Note: if you have ludicrously long traces as a matter of course (e.g. BF) d5552a9 slows tracing down. In "normal" scenarios, there's no measurable performance difference.Note that, once the new test is removed from proceedings, this PR is able to simplify and shorten quite a lot of code. I think it stands on its own, even without the functional improvements I hope it enables in the near future.