-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
initial prove of concept less string allocations #1826
base: master
Are you sure you want to change the base?
Conversation
Can you also benchmark with config option |
sure |
i modified the testset to generate 100_000 rows with random int values as just having 50 values repeating over and over is not realistic here are the numbers with caching enabled:
|
Dang. I was planning on doing this at some point so it's great that you could do it. This will be a breaking change so I may deploy with some other changes. We'll see. |
I would really love to also support netstandard and net4.5 properly but i didnt find a parser. the spanbased parser overloads are only present in netcoreapp |
You might be able to use this. https://www.nuget.org/packages/System.Memory/ |
There is a lot more work that needs to go into this. All the type converters would need to be switched to use |
|
Seems to be a lot slower when using the field cache. If the data is duplicated, the field cache should help, otherwise it's a lot more processing. |
I will try it. but the utf8parser assumes utf8 encoding so we would need to convert span of char to span of byte which needs to be allocated / arraypooled and also takes some time.
I agree I was just proposing the design to get feedback.
Sure thats possible but breaking. with my approach all old converters (also the ones of library consumers) still work and the span based converter is an opt in. Im also fine with just converting all string returning methods to span as it will not pollute the api with a bunch more overloads
no caching strings is not faster see my second benchmark post. Another idea would be to cache the converted object instead of the raw value that way you could eliminate the converter call. |
That could be an issue if people are expecting a new object each time. If they have a list of objects and edit one, they could be editing multiple without knowing. Also, that is exactly what Maybe you were talking about the member values and not the object itself. That's an interesting idea. I think that would work fine with |
yup that is what i meant to say |
I don't understand. What is utf8parser? |
the parser in that nuget |
Oh... I just meant to be able to use |
thats working fine with the dependencies we have but we need something that turns the spans into ints, longs, datetimes and so on |
i certainly dont want to replicate the code in https://github.com/dotnet/runtime/blob/main/src/libraries/System.Private.CoreLib/src/System/Number.Parsing.cs |
Ha. I completely forgot about the converters needing a |
Maybe in the case of older frameworks the converters can do a |
jep that will always work |
When I get some time I'll play around with this idea and see whether I want the parser to return only |
fixed #1825
I used the benchmark project to generate a file with 1000000 records and then actually run the benchmark to read them with
+ 40% speed
-100% string allocations when reading records
-100% type[] allocation with default ctor
if the general design is accepted i will implement more type converters and come up with a way to check wether a row should be skipped without allocating