Rewrite ChunkedMemoryStream #2828

JimBobSquarePants · 2024-10-22T09:18:04Z

Prerequisites

I have written a descriptive pull-request title
I have verified that there are no overlapping pull-requests open
I have verified that I am following the existing coding patterns and practice as demonstrated in the repository. These follow strict Stylecop rules 👮.
I have provided test coverage for my change (where applicable)

Description

~~ChunkedMemoryStream contained multiple writing bugs and was too costly to fix/maintain relative to performance benefits so I'm just ditching it.~~

Complete rewrite of ChunkedMemoryStream to simplify it and fix numerous bugs.
Ensure ImageEncoder uses the chunked stream when encoding non-seekable streams.

antonfirsov · 2024-10-22T13:33:35Z

relative to performance benefits

IMO the memory benefits were significant. With the current decoder & encoder design, non seekable streams are always fully buffered into memory. A switch to MemoryStream will reintroduce significant GC allocations for large inputs all around the library, this will be a noticable regression for users who deal with http or other kinds of network streams.

Recommendations:

In case there is no bug in stream writing, do not remove ChunkedMemoryStream for decoders, since the largest streams are typically the input ones.
Instead of using MemoryStream for encoders, implement our own non-chunked MemoryStream that still uses MemoryAllocator. Although buffers over 4MB won't be pooled, in typical scenarios encoded files are smaller.

This reverts commit 1e58db2.

JimBobSquarePants · 2024-10-23T02:21:05Z

relative to performance benefits

IMO the memory benefits were significant. With the current decoder & encoder design, non seekable streams are always fully buffered into memory. A switch to MemoryStream will reintroduce significant GC allocations for large inputs all around the library, this will be a noticable regression for users who deal with http or other kinds of network streams.

Recommendations:

In case there is no bug in stream writing, do not remove ChunkedMemoryStream for decoders, since the largest streams are typically the input ones.

Instead of using MemoryStream for encoders, implement our own non-chunked MemoryStream that still uses MemoryAllocator. Although buffers over 4MB won't be pooled, in typical scenarios encoded files are smaller.

Thanks for the review @antonfirsov I've instead chosen to completely rewrite the ChunkedMemoryStream to simplify the implementation. It's much easier to maintain now!

This reverts commit 1a15078.

mgravell · 2024-10-30T13:55:56Z

src/ImageSharp/IO/ChunkedMemoryStream.cs

-    public override void Flush()
-    {
+        _ = this.Read(this.singleByteBuffer, 0, 1);
+        return MemoryMarshal.GetReference<byte>(this.singleByteBuffer);


since the byte[] path goes via AsSpan(), why not use the span approach directly?

Span<byte> buffer = stackalloc byte[1]; return Read(buffer) == 1 ? buffer[0] : -1

ideally with SkipLocalsInit enabled

alt that elides a range check:

byte b = 0; return Read(MemoryMarshal.CreateSpan(ref b, 1)) == 1 ? b : -1; (you can't do this for the async path, though)

Ah... I'd completely forgotten about that method!

Rather than decorating a public mthod with [SkipLocalsInit] i've opted for the following:

/// <inheritdoc/> public override int ReadByte() { Unsafe.SkipInit(out byte b); return this.Read(MemoryMarshal.CreateSpan(ref b, 1)) == 1 ? b : -1; }

Looks much better!

mgravell · 2024-10-30T14:02:45Z

src/ImageSharp/IO/ChunkedMemoryStream.cs

-        int offset = 0;
-        int count = buffer.Length;
-        while (count > 0)
+        while (bytesToRead != 0 && this.currentChunk != this.memoryChunkBuffer.Length)


I don't know enough about the underlying implementation here; if this is doing additional downstream reads, you might prefer to exit after the first read (you're only required to read "some" data - you don't need to fill the supplied data, just return at least 1 byte or 0 for EOF); if all the data is already loaded, I wonder whether your memoryChunkBuffer is duplicating the innards of ReadOnlySequence<byte> - that already has all the Slice, CopyTo etc you might want; just a suggestion, though (I can help you grok ReadOnlySequence<T> if you're not already familiar)

Yeah, looking at MemoryChunkBuffer there is definitely some overlap. Ideally, I should be tracking the buffer and chunk indexes internally within that class.

However, this type needs to be expandable on-demand which AFAIK is not possible with ReadonlySequence<T>.

The underlying buffer-chain is as mutable as you want it to be (it is your chain, ultimately); if you want to resize it, that is usually as simple as simply creating a new ROS (which is a lightweight struct just tracking the start and end), specifying the new bounds. The chain bits aren't trivial, but not too complex. I guess if what you already have works well, it might be overkill to touch it, though.

mgravell · 2024-10-30T14:05:25Z

src/ImageSharp/IO/ChunkedMemoryStream.cs

-        }
-
-        return chunkBuffer.GetSpan()[this.readOffset++];
+        MemoryMarshal.Write(this.singleByteBuffer, ref value);


ditto stackalloc; or possibly even the more exotic:

byte b = 0; var span = MemoryMarshal.CreateSpan(ref b, 1);

/// <inheritdoc/> public override void WriteByte(byte value) => this.Write(MemoryMarshal.CreateSpan(ref value, 1));

mgravell · 2024-10-30T14:07:40Z

src/ImageSharp/IO/ChunkedMemoryStream.cs

    /// </summary>
-    /// <returns>The <see cref="T:byte[]"/>.</returns>
+    /// <returns>A new <see cref="T:byte[]"/>.</returns>


Note that this is very inefficient; I would suggest trying to deprecate this kind of API - especially if we can use ROS, for example;

[Obsolete("prefer " + nameof(AsReadOnlySequence)] public byte[] ToArray() => AsReadOnlySequence().ToArray(); public ReadOnlySequence<byte> AsReadOnlySequence() => /* magic happens */

Unfortunately, we need this to when reading XMP data for the V3 build however I wish to rewrite the XMPProfile type for V4 to avoid passing arrays around. Once this is merged to the V3 branch I'll upstream and make additional changes.

antonfirsov · 2024-10-30T17:32:54Z

@JimBobSquarePants I believe I will have some time to also review this in the weekend.

colgreen · 2024-10-30T20:12:31Z

src/ImageSharp/IO/ChunkedMemoryStream.cs

@@ -12,44 +14,23 @@ namespace SixLabors.ImageSharp.IO;
 /// Chunks are allocated by the <see cref="MemoryAllocator"/> assigned via the constructor
 /// and is designed to take advantage of buffer pooling when available.
 /// </summary>
-internal sealed class ChunkedMemoryStream : Stream
+public class ChunkedMemoryStream : Stream


Consider reverting this back to being sealed. I.e. is this class designed to be sub-classed? If not then sealing can allow the JITter to make certain optimisations around the method calling of the virtual/override methods

It's supposed to be internal sealed actually! I forgot to change it back after rewriting (public makes the IDE tell me to add method docs).

colgreen · 2024-10-30T20:17:32Z

src/ImageSharp/IO/ChunkedMemoryStream.cs

-    private readonly int allocatorCapacity;
-
-    // Has the stream been disposed.
+    private long length;


Sidenote: IMO it's fairly conventional in C# land to prefix private field names with an underscore, to allow easy distinction from local variables, and to avoid excessive use of this.. I assume this is your personal preference, but thought I'd mention it as IMO it is somewhat non-idiomatic C#.

Thanks, but I'd rather stick to using the language as designed rather than using conventions carried over from C.

The this keyword provides important context IMO and encouraged consistancy throughout a codebase.

@colgreen the project's preferred coding style is based on Framework design guidelines, and on StyleCop recommendations. The guidelines explicitly prohibit prefixing variables.

The guidelines state Internal and private fields are not covered by guidelines, (I believe those guidelines are primarily related to public API surface, rather than private/internal naming). The underscore prefix for private fields is very common in my experience, e.g. it's used widely in Microsoft .NET repos.

However, this topic is probably not relevant in the context of this PR :)

colgreen · 2024-10-30T20:30:53Z

src/ImageSharp/IO/ChunkedMemoryStream.cs

-    // Has the stream been disposed.
+    private long length;
+    private long position;
+    private int currentChunk;


These names could be considered a little misleading/confusing.

E.g. currentChuck is an index into memoryChunkBuffer, so I think maybe call it currentChunkIdx.

Whereas currentChunkIdx is an offset/index /within/ the current chunk, so maybe call it intraChunkByteIdx, chunkByteIdx, or checkByteOffset? etc.

Yeah.. good point. I've opted for bufferIndex and chunkIndex as an improvement.

colgreen · 2024-10-30T20:35:25Z

src/ImageSharp/IO/ChunkedMemoryStream.cs

-            chunk.Dispose();
-            chunk = chunk.Next;
+            this.Dispose(true);
+            GC.SuppressFinalize(this);


I think it's not necessary to call GC.SuppressFinalize(this) in a sealed class with no finalizer. This would be to cover sub-types that have a finalizer (in scenarios where there is no finalizer defined directly on the type).

Force of habit. Well spotted!

antonfirsov

Previous reviewers had good points.
Added some nitpicking.
Improving funcitonal coverage would be valuable.

Looks good otherwise.

antonfirsov · 2024-11-03T19:33:53Z

src/ImageSharp/IO/ChunkedMemoryStream.cs

+            return i < 16 ? b128K * (1 << (int)((uint)i / 4)) : b4M;
+        }
+
+        private void Dispose(bool disposing)


Given https://github.com/SixLabors/ImageSharp/pull/2828/files#r1823357071, the Dispose(bool disposing) is not even needed.

Same for MemoryChunk.

That's an override of the base Stream method though. I can't implement Dispose directly. All other implementations have been simplified now though.

antonfirsov · 2024-11-03T19:47:46Z

src/ImageSharp/IO/ChunkedMemoryStream.cs

+        if (remaining > count)
+        {
+            remaining = count;
+        }

-        Span<byte> chunkBuffer = this.writeChunk.Buffer.GetSpan();
-        int chunkSize = this.writeChunk.Length;
-        int count = buffer.Length;
-        int offset = 0;
-        while (count > 0)
+        int bytesToWrite = (int)remaining;


remaining is not being used after this line.

- if (remaining > count) - { - remaining = count; - } - int bytesToWrite = (int)remaining; + int bytesToWrite = count;

Same for Read.

antonfirsov · 2024-11-03T21:08:47Z

src/ImageSharp/IO/ChunkedMemoryStream.cs

+        public IEnumerator<MemoryChunk> GetEnumerator()
+            => ((IEnumerable<MemoryChunk>)this.memoryChunks).GetEnumerator();
+
+        IEnumerator IEnumerable.GetEnumerator()
+            => ((IEnumerable)this.memoryChunks).GetEnumerator();


I don't see any code enumerating this with foreach, so IEnumerable implementation can be deleted.

antonfirsov · 2024-11-03T21:40:09Z

src/ImageSharp/IO/ChunkedMemoryStream.cs

-    private readonly int allocatorCapacity;
-
-    // Has the stream been disposed.
+    private long length;


@colgreen the project's preferred coding style is based on Framework design guidelines, and on StyleCop recommendations. The guidelines explicitly prohibit prefixing variables.

antonfirsov · 2024-11-03T21:43:23Z

tests/ImageSharp.Tests/IO/ChunkedMemoryStreamTests.cs

@@ -30,7 +30,7 @@ public class ChunkedMemoryStreamTests
    [Fact]


Is it possible to extend these tests to stress the corner(?) cases which were buggy in the previous implementation.

Yeah. Tests have been massivly exapanded. We now test reading larger buffers and test encoding to webp for all test images.

antonfirsov · 2024-11-03T21:47:57Z

tests/ImageSharp.Tests/IO/ChunkedMemoryStreamTests.cs

    [InlineData(DefaultSmallChunkSize * 16)]
    public void MemoryStream_ReadByteBufferSpanTest(int length)


I would also include lengths over DefaultSmallChunkSize * 16 and make buffer.Length a parameter. Would test cases, when buffer.Length > DefaultSmallChunkSize.

length is already determined by the parameter but I've expanded to double the previous maximum length

antonfirsov · 2024-11-03T21:51:01Z

tests/ImageSharp.Tests/IO/ChunkedMemoryStreamTests.cs

@@ -167,18 +167,20 @@ public void MemoryStream_WriteToTests()
    [Fact]
    public void MemoryStream_WriteToSpanTests()


Write tests could stress more cases with different sizes. See my comment on the read tests.

antonfirsov · 2024-11-03T21:56:00Z

src/ImageSharp/IO/ChunkedMemoryStream.cs

-        int offset = 0;
-        int count = buffer.Length;
-        while (count > 0)
+        while (bytesToRead != 0 && this.currentChunk != this.memoryChunkBuffer.Length)


Even though current code looks good and Slice would throw for a negative number, I find this safer to maintain. Same for Write.

Suggested change

while (bytesToRead != 0 && this.currentChunk != this.memoryChunkBuffer.Length)

while (bytesToRead > 0 && this.currentChunk != this.memoryChunkBuffer.Length)

Remove ChunkedMemoryStream

1e58db2

JimBobSquarePants added the bug label Oct 22, 2024

JimBobSquarePants added 2 commits October 22, 2024 19:37

Update BufferedStreams.cs

1a15078

Merge branch 'release/3.1.x' into js/issue-2806

ee53532

JimBobSquarePants added 2 commits October 23, 2024 09:50

Revert "Remove ChunkedMemoryStream"

c418bb0

This reverts commit 1e58db2.

Rewrite ChunkedMemoryStream

48645f8

JimBobSquarePants changed the title ~~Remove ChunkedMemoryStream~~ Rewrite ChunkedMemoryStream Oct 23, 2024

JimBobSquarePants requested a review from antonfirsov October 23, 2024 02:20

JimBobSquarePants added 3 commits October 23, 2024 12:23

Revert "Update BufferedStreams.cs"

f96f8ca

This reverts commit 1a15078.

Simplify and optimize position checking

03343c4

Add WriteByte and optimize ReadByte

b74d2e4

mgravell reviewed Oct 30, 2024

View reviewed changes

colgreen reviewed Oct 30, 2024

View reviewed changes

antonfirsov reviewed Nov 3, 2024

View reviewed changes

JimBobSquarePants added 2 commits November 4, 2024 10:12

Feedback updates and massively expand write tests

6301662

Fix read bug.

c45702d

		@@ -30,7 +30,7 @@ public class ChunkedMemoryStreamTests
		[Fact]

		[InlineData(DefaultSmallChunkSize * 16)]
		public void MemoryStream_ReadByteBufferSpanTest(int length)

		@@ -167,18 +167,20 @@ public void MemoryStream_WriteToTests()
		[Fact]
		public void MemoryStream_WriteToSpanTests()

	while (bytesToRead != 0 && this.currentChunk != this.memoryChunkBuffer.Length)
	while (bytesToRead > 0 && this.currentChunk != this.memoryChunkBuffer.Length)

Rewrite ChunkedMemoryStream #2828

Are you sure you want to change the base?

Rewrite ChunkedMemoryStream #2828

Conversation

JimBobSquarePants commented Oct 22, 2024 • edited Loading

Prerequisites

Description

antonfirsov commented Oct 22, 2024 • edited Loading

JimBobSquarePants commented Oct 23, 2024

mgravell Oct 30, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mgravell Oct 31, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

JimBobSquarePants Oct 31, 2024 • edited Loading

Choose a reason for hiding this comment

antonfirsov commented Oct 30, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

antonfirsov left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

antonfirsov Nov 3, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

antonfirsov Nov 3, 2024 • edited Loading

Choose a reason for hiding this comment

JimBobSquarePants commented Oct 22, 2024 •

edited

Loading

antonfirsov commented Oct 22, 2024 •

edited

Loading

mgravell Oct 30, 2024 •

edited

Loading

mgravell Oct 31, 2024 •

edited

Loading

JimBobSquarePants Oct 31, 2024 •

edited

Loading

antonfirsov Nov 3, 2024 •

edited

Loading

antonfirsov Nov 3, 2024 •

edited

Loading