Reduce indirection and memory overhead for slot maps. #1782
+49
−2
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Reducing the property related memory overhead of JavaScript objects.
Current slot map model
Currently JavaScript Objects in RHino store their properties in slot map, which is held by a slot map container. The map itself then either holds an array, or some other collection structure, which actually holds the slots. So the structure looks something like this:
And in any environment which relies on thread safe slot map handling there is an additional stamped lock object like this:
There is one optimisation here, in that the arrays for slots in an
EmbeddedSlotMap
are not created until slots are stored in the map, but all the other objects are created eagerly. If I look at a heap dump from one of our production systems and analyse the number of properties stored in slot maps then we see the following:The remaining objects are distributed over a fairly wide range.
The prevalence of zero size slot maps may be surprising, but is due to the large number of common objects that are implemented in Rhino itself as
IdScriptableObject
s with built in properties not held as real slots (functions, strings, arrays, wrapped native Java methods, etc.). This means we pay a significant memory overhead on the 75% of objects that will never have properties, and we aren't optimising for the remaining commons case of a single property. This type of property map size distribution appears consistent with my experience on TruffleRuby and other dynamic language implementations, so I would expect roughly similar distributions for other users.In our particular case we have a small number of objects shared between threads, and so enable thread safety, so we pay for a lock with every one of these objects.
The memory overheads we incur are as follows (on a 64bit JVM with compact oops
SlotMapContainer
StampedLocked
EmbeddedSlotMap
Slot[]
Combined with the data on object sizes about this gives us total sizes as follows
Reducing indirection
Ideally we would want the common cases to incur as little overhead as possible, and potential store a small number of properties within the
ScriptableObject
itself if that significantly reduced overhead. Short of that we can optimise for the common cases and change the existing slot map promotion chain of:to something more like:
I do not propose to embed slots in the
ScriptableObject
s themselves at this time, but instead exploit the fact that we can use a singular immutable empty slot map; refactor out the slot map container for single threaded use; introduce a single entry slot map to reduce overhead and indirection in the remaining common case; and finally remove the use of slot map containers in the multi-threaded case.PRs
I've split this into a number of stacked PRs detailed below
Introduce a singleton
EMPTY_SLOT_MAP
(#1782)This PR.
Introduces a singleton
EMPTY_SLOT_MAP
reducing the common empty case toThis saves 32 bytes per object with an empty property map.
Start to factor out slot map container (#1783)
Introduces the concept of a
SlotMapOwner
and refactorsScritableObject
andSlotMapContainer
to inherit fromSlotMapOwner
. The slot maps themselves are made responsible for promotions themselves via theSlotMapOwner
. As well as reducing the amount of indirection needed to access the slots it also means we can avoid promotion checks except when we know the collection might overflow some internal limit (for example, when the array inEmbeddedSlotMap
would be expanded). This changes the empty map case to:and the occupied map case to
Single-entry slot maps (#1784)
Introduces an immutable single-entry slot map. It is immutable and always promotes to
EmbeddedSlotMap
to make its future use in multi-threaded cases easier, but also because it seems like a reasonable assumption that any single-entry slot map that is mutated is likely to continue being mutated.This changes the single slot case to:
Introduce thread safe versions of slot maps (#1785)
Introduces an API on slot map owner to do a compare and swap of the current slot map to enable thread safe promotion of maps, and introduces thread safe versions of the slot map types which handle locking where necessary. We avoid the use of locks on empty and single entry maps as they are immutable, and introduce APIS and share locks between further promotions to avoid the overhead of an additional container.
Benchmark comparison
Benchmark scores are roughly neutral, some are gains, some gain performance on the two full runs I performed, and some lost performance, but they all show a degree of instability on repeated runs.