Expose GetFieldSpan on IParser #2142

Rob-Hague · 2023-04-02T20:50:06Z

This PR changes private string GetField to public ReadOnlySpan<char> GetFieldSpan on CsvParser. It also adds a RawRecordSpan property and exposes both on IParser.

Aside from any buffer resizes, it allows allocation-free parsing (note the constant Allocated column):

Method	RowCount	ColumnCount	Mean	Error	Gen0	Allocated
Sum_Span	100	10	39.02 μs	0.269 μs	10.1929	20.91 KB
Sum_String	100	10	52.38 μs	0.381 μs	25.4517	52.08 KB
Sum_Span	100	100	334.17 μs	1.405 μs	9.7656	20.91 KB
Sum_String	100	100	494.95 μs	4.586 μs	163.0859	333.33 KB
Sum_Span	1000	10	427.13 μs	1.321 μs	9.7656	20.91 KB
Sum_String	1000	10	592.86 μs	7.818 μs	197.2656	403.64 KB
Sum_Span	1000	100	4,010.94 μs	79.261 μs	7.8125	20.91 KB
Sum_String	1000	100	5,800.72 μs	36.086 μs	1882.8125	3848.96 KB

class Program
{
    static void Main(string[] args)
    {
        BenchmarkRunner.Run<Benchmarks>();
    }
}

[MemoryDiagnoser]
public class Benchmarks
{
    [Params(100, 1_000)]
    public int RowCount { get; set; }

    [Params(10, 100)]
    public int ColumnCount { get; set; }

    private string _csvString;

    [GlobalSetup]
    public void GlobalSetup()
    {
        StringBuilder sb = new();

        for (int i = 0; i < RowCount; i++)
        {
            sb.Append(i * RowCount);

            for(int j = 1; j < ColumnCount; j++)
            {
                sb.Append(',');
                sb.Append(i * RowCount + j);
            }

            sb.AppendLine();
        }

        _csvString = sb.ToString();
    }

    [Benchmark]
    public int Sum_String()
    {
        using (var reader = new StringReader(_csvString))
        using (var parser = new CsvParser(reader, CultureInfo.InvariantCulture))
        {
            int sum = 0;
            while (parser.Read())
            {
                for (int i = 0; i < parser.Count; i++)
                {
                    sum += parser[i].Length;
                }
            }
            return sum;
        }
    }

    [Benchmark]
    public int Sum_Span()
    {
        using (var reader = new StringReader(_csvString))
        using (var parser = new CsvParser(reader, CultureInfo.InvariantCulture))
        {
            int sum = 0;
            while (parser.Read())
            {
                for (int i = 0; i < parser.Count; i++)
                {
                    sum += parser.GetFieldSpan(i).Length;
                }
            }
            return sum;
        }
    }
}

It works by storing processed fields as Memory<char>s over the internal buffer(s) rather than as strings. This requires some tweaks to the usage of processFieldBuffer so that it is used on a per-row basis rather than per-field.

Because these APIs return a view over an internal buffer, they represent some danger if misused (in particular, when keeping a reference to the returned span during subsequent calls to Read). Thus they are explicitly not defined on any of the higher level reading classes/interfaces in order to make them less discoverable, and more likely to only be used by someone who is prepared to take the risk. They are documented similarly.

They are not hooked up to any of the record creation logic, but in theory if the type converters took a ReadOnlySpan<char> instead of a string then allocation reductions could be realised there with some simple changes.

The interface additions are defined with default interface methods (DIMs) deferring to the string variants on .NET (Core) targets, but not for the .NET Standard or Framework targets which do not support DIMs. I'm indifferent as to whether they are defined with DIMs.

Rand-Random · 2023-04-03T21:38:28Z

Seems duplicate of
#1826

03113a5#diff-d16372217c6c2da5d653d00f2d21c0d2e14fe6041a8e387910df4de183b2ff3b

Rob-Hague · 2023-04-04T07:16:14Z

I agree they look similar. And while I would not say that that PR influenced any of my decisions, I have looked at it in the past, and so I have added @JanEggers as a co-author.
That PR is a proof of concept showing how this idea can be extended "up the stack". This PR is a complete unit of work - just at the parser level - which (hopefully) is closer to merging.

JoshClose · 2024-01-24T20:09:49Z

I am going to introduce a new parser written from scratch to utilize SIMD operations for a major performance gain, based on nietras' findings. This parser will be using Spans and other nice things. This may end up causing a rewrite of CsvReader also, as it could utilize this instead of using strings for everything.

Rand-Random · 2024-01-24T20:16:32Z

@JoshClose sounds like a huge endeavor, any time frame in mind? „major performance gain“ makes me drool and I can hardly wait :D

JoshClose · 2024-01-24T20:27:13Z

any time frame in mind?

I'm going to get through all the pull requests, then some major bugs, then I'll start actual work on it. I've done some prototyping to see how it all works, and it'll actually be a nicer way of parsing I think.

That being said, it'll all depend on how busy I am. Hopefully within a few months.

It should get a huge performance gain based on the numbers. https://www.joelverhagen.com/blog/2020/12/fastest-net-csv-parsers CsvHelper does implement many more features, which in general will slow things done some, but I'd expect double the current speed.

Rob-Hague · 2024-01-25T13:32:11Z

Sounds good to me. I've rebased anyway to resolve conflicts. Feel free to close it otherwise

Co-authored-by: JanEggers <[email protected]>

Rob-Hague force-pushed the getfieldspan branch from b9acbb7 to 3ad60c6 Compare April 4, 2023 07:13

Rob-Hague force-pushed the getfieldspan branch from 3ad60c6 to b16ef63 Compare January 25, 2024 13:31

Expose GetFieldSpan on IParser

b7627ff

Co-authored-by: JanEggers <[email protected]>

Rob-Hague force-pushed the getfieldspan branch from b16ef63 to b7627ff Compare January 25, 2024 13:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Expose GetFieldSpan on IParser #2142

Expose GetFieldSpan on IParser #2142

Rob-Hague commented Apr 2, 2023 •

edited

Loading

Rand-Random commented Apr 3, 2023 •

edited

Loading

Rob-Hague commented Apr 4, 2023

JoshClose commented Jan 24, 2024

Rand-Random commented Jan 24, 2024

JoshClose commented Jan 24, 2024

Rob-Hague commented Jan 25, 2024

Expose GetFieldSpan on IParser #2142

Are you sure you want to change the base?

Expose GetFieldSpan on IParser #2142

Conversation

Rob-Hague commented Apr 2, 2023 • edited Loading

Rand-Random commented Apr 3, 2023 • edited Loading

Rob-Hague commented Apr 4, 2023

JoshClose commented Jan 24, 2024

Rand-Random commented Jan 24, 2024

JoshClose commented Jan 24, 2024

Rob-Hague commented Jan 25, 2024

Rob-Hague commented Apr 2, 2023 •

edited

Loading

Rand-Random commented Apr 3, 2023 •

edited

Loading