Currently from the generated code for C#, the stemmer uses an internal state to keep track of position etc. Unfortunately this prevents re-use of the same stemmer instance across multiple threads. Creating a new instance of a stemmer per thread (or in a web application, per request) means the constructor for the stemmer (with all the Among-type arrays generated) occurs frequently, generating the same data, creating additional GC pressure.
In moving the state into the main stem-processing method, it should also allow for the ability to also switch to using Span<char> to do a lot of the work where you have StringBuilder, avoiding further allocations too.
I'm a C# dev, not a C dev, so while I want to help wherever I can, I would be somewhat limited in my capacity to help in the generational side of things.
Currently from the generated code for C#, the stemmer uses an internal state to keep track of position etc. Unfortunately this prevents re-use of the same stemmer instance across multiple threads. Creating a new instance of a stemmer per thread (or in a web application, per request) means the constructor for the stemmer (with all the
Among-type arrays generated) occurs frequently, generating the same data, creating additional GC pressure.In moving the state into the main stem-processing method, it should also allow for the ability to also switch to using
Span<char>to do a lot of the work where you haveStringBuilder, avoiding further allocations too.I'm a C# dev, not a C dev, so while I want to help wherever I can, I would be somewhat limited in my capacity to help in the generational side of things.