-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Expose GetFieldSpan on IParser #2142
base: master
Are you sure you want to change the base?
Conversation
|
I am going to introduce a new parser written from scratch to utilize SIMD operations for a major performance gain, based on nietras' findings. This parser will be using Spans and other nice things. This may end up causing a rewrite of CsvReader also, as it could utilize this instead of using strings for everything. |
@JoshClose sounds like a huge endeavor, any time frame in mind? „major performance gain“ makes me drool and I can hardly wait :D |
I'm going to get through all the pull requests, then some major bugs, then I'll start actual work on it. I've done some prototyping to see how it all works, and it'll actually be a nicer way of parsing I think. That being said, it'll all depend on how busy I am. Hopefully within a few months. It should get a huge performance gain based on the numbers. https://www.joelverhagen.com/blog/2020/12/fastest-net-csv-parsers CsvHelper does implement many more features, which in general will slow things done some, but I'd expect double the current speed. |
3ad60c6
to
b16ef63
Compare
Sounds good to me. I've rebased anyway to resolve conflicts. Feel free to close it otherwise |
Co-authored-by: JanEggers <[email protected]>
b16ef63
to
b7627ff
Compare
This PR changes
private string GetField
topublic ReadOnlySpan<char> GetFieldSpan
on CsvParser. It also adds aRawRecordSpan
property and exposes both onIParser
.Aside from any buffer resizes, it allows allocation-free parsing (note the constant Allocated column):
It works by storing processed fields as
Memory<char>
s over the internal buffer(s) rather than as strings. This requires some tweaks to the usage ofprocessFieldBuffer
so that it is used on a per-row basis rather than per-field.Because these APIs return a view over an internal buffer, they represent some danger if misused (in particular, when keeping a reference to the returned span during subsequent calls to
Read
). Thus they are explicitly not defined on any of the higher level reading classes/interfaces in order to make them less discoverable, and more likely to only be used by someone who is prepared to take the risk. They are documented similarly.They are not hooked up to any of the record creation logic, but in theory if the type converters took a
ReadOnlySpan<char>
instead of astring
then allocation reductions could be realised there with some simple changes.The interface additions are defined with default interface methods (DIMs) deferring to the string variants on .NET (Core) targets, but not for the .NET Standard or Framework targets which do not support DIMs. I'm indifferent as to whether they are defined with DIMs.