Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DispatchIO.read spinning while pipe is open on Windows #820

Open
DimaKozhinov opened this issue Mar 6, 2024 · 37 comments · May be fixed by #853 or #854
Open

DispatchIO.read spinning while pipe is open on Windows #820

DimaKozhinov opened this issue Mar 6, 2024 · 37 comments · May be fixed by #853 or #854
Labels

Comments

@DimaKozhinov
Copy link

DimaKozhinov commented Mar 6, 2024

The sourcekit-lsp.exe consumes ~24% of CPU power all the time, even when idle. This bug does not appear under Linux, and does not appear in Swift versions earlier than v5.9. It is still present in v5.10. I tested this under VirtualBox, my virtual machine has 4 virtual processors. The host machine has 13th Gen Intel Core i7 processor, so ~24% load seems like a lot of mysterious calculations.

How to reproduce:

  1. Install Swift version 5.9 .. 5.10. Install Microsoft VS Code and Swift extension for it (v1.8.1 from Swift Server Work Group).
  2. Create a test Swift package:
    mkdir test
    cd test
    swift package init
  3. Open the test folder in VS Code. (File | Open Folder...). At this point CPU usage increases, you don't even need to edit any source file. On my (usually silent) PC fan immediately starts rotating much faster, producing loud noise.
  4. Check CPU usage by the sourcekit-lsp.exe: Press Ctrl-Shift-Esc, in the process list find the Visual Studio Code, click ">" next to it to see its sub-processes, and see high CPU usage by sourcekit-lsp .
  5. Return to step 1 and this time install Swift version 5.8.1 or earlier. Repeat step 2 (create Swift package) because earlier version of Swift won't open a package created with a later version. Repeat all next steps and see ~0% CPU usage by sourcekit-lsp.exe .

This bug has nothing to do with VS Code. I found this bug when using my own app that runs sourcekit-lsp.exe, and not VS Code.

@ahoppen
Copy link
Member

ahoppen commented Mar 6, 2024

Tracked in Apple’s issue tracker as rdar://124157417

@litewrap
Copy link

litewrap commented Mar 6, 2024

Same problem here.
Using Swift on Windows 11.
I am running Windows 11 under Parallels Desktop on Macbook M1.
VS Code 1.8 or recent 1.8.1 and Task Manager shows sourcekit-lsp CPU usage 25% (single swift file open) to 71% (when many .swift files open).

@DimaKozhinov
Copy link
Author

To rule out influence of virtual environments like VirtualBox or Parallels, I've tested this on bare metal (3 GHz Intel Core i5-7400, Windows 10):

Swift 5.8.1 sourcekit-lsp CPU usage: 0% (the expected behavior)
Swift 5.10 sourcekit-lsp CPU usage: 29%

Small note: This time sourcekit-lsp appeared in Task Manager not as a sub-process of Visual Studio Code, but as a separate background process due to some reason.

@nieuwenhoven
Copy link

Same problem. Running Swift 5.10 directly on Windows 11 x64

@ahoppen
Copy link
Member

ahoppen commented Mar 13, 2024

I have managed to reduce the problem to the following.

  1. Replace Sources/sourcekit-lsp/SourceKitLSP.swift with the following main function that doesn’t do anything and just causes sourcekit-lsp to listen for data from stdin.
Sources/sourcekit-lsp/SourceKitLSP.swift

import Dispatch
import Foundation

#if canImport(CDispatch)
import struct CDispatch.dispatch_fd_t
#endif


@main
struct MyCommand {
    static func main() throws {
        let fh = try! FileHandle(forWritingTo: URL(fileURLWithPath: #"C:/Users/rintaro/out.txt"#))
        try! fh.seekToEnd()
        try! fh.write("start\n".data(using: .utf8)!)

        let queue: DispatchQueue = DispatchQueue(label: "jsonrpc-queue", qos: .userInitiated)
        #if os(Windows)
        let rawInFD = dispatch_fd_t(bitPattern: FileHandle.standardInput._handle)
        #else
        let rawInFD = inFD.fileDescriptor
        #endif

        let receiveIO = DispatchIO(type: .stream, fileDescriptor: rawInFD, queue: queue) { (error: Int32) in
          if error != 0 {
            print("IO error \(error)")
          }
        }
        receiveIO.setLimit(lowWater: 1)
        receiveIO.read(offset: 0, length: Int.max, queue: queue) { done, data, errorCode in
            print("received \(data?.count ?? -1) data")
            let fh = try! FileHandle(forWritingTo: URL(fileURLWithPath: #"C:/Users/rintaro/out.txt"#))
            try! fh.seekToEnd()
            try! fh.write(contentsOf: data!)

        }
        
        dispatchMain()
    }
}

  1. Build a local toolchain
  2. Launch Visual Studio Code with the locally modified version of sourcekit-lsp by running
set SDKROOT=S:\Program Files\Swift\Platforms\Windows.platform\Developer\SDKs\Windows.sdk
path S:\Program Files\Swift\Runtimes\0.0.0\usr\bin;S:\Program Files\Swift\Toolchains\0.0.0+Asserts\usr\bin;%PATH%
"C:\path\to\Microsoft VS Code\Code.exe" C:\path\to\a\swiftpm\project
  1. Open Task Manager and notice that sourcekit-lsp is constantly using CPU

I was unable to reproduce this issue by launching this modified version of sourcekit-lsp from the command prompt and passing data to stdin by typing or by piping data to the modified version of sourcekit-lsp using

Get-Content C:\path\to\input.txt | .\sourcekit-lsp.exe

With the following input.txt

input.txt

Content-Length: 5887

{"jsonrpc":"2.0","id":0,"method":"initialize","params":{"processId":1948,"clientInfo":{"name":"Visual Studio Code","version":"1.87.2"},"locale":"en","rootPath":"s:\\SourceCache\\sourcekit-lsp","rootUri":"file:///s%3A/SourceCache/sourcekit-lsp","capabilities":{"workspace":{"applyEdit":true,"workspaceEdit":{"documentChanges":true,"resourceOperations":["create","rename","delete"],"failureHandling":"textOnlyTransactional","normalizesLineEndings":true,"changeAnnotationSupport":{"groupsOnLabel":true}},"configuration":true,"didChangeWatchedFiles":{"dynamicRegistration":true,"relativePatternSupport":true},"symbol":{"dynamicRegistration":true,"symbolKind":{"valueSet":[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26]},"tagSupport":{"valueSet":[1]},"resolveSupport":{"properties":["location.range"]}},"codeLens":{"refreshSupport":true},"executeCommand":{"dynamicRegistration":true},"didChangeConfiguration":{"dynamicRegistration":true},"workspaceFolders":true,"foldingRange":{"refreshSupport":true},"semanticTokens":{"refreshSupport":true},"fileOperations":{"dynamicRegistration":true,"didCreate":true,"didRename":true,"didDelete":true,"willCreate":true,"willRename":true,"willDelete":true},"inlineValue":{"refreshSupport":true},"inlayHint":{"refreshSupport":true},"diagnostics":{"refreshSupport":true}},"textDocument":{"publishDiagnostics":{"relatedInformation":true,"versionSupport":false,"tagSupport":{"valueSet":[1,2]},"codeDescriptionSupport":true,"dataSupport":true},"synchronization":{"dynamicRegistration":true,"willSave":true,"willSaveWaitUntil":true,"didSave":true},"completion":{"dynamicRegistration":true,"contextSupport":true,"completionItem":{"snippetSupport":true,"commitCharactersSupport":true,"documentationFormat":["markdown","plaintext"],"deprecatedSupport":true,"preselectSupport":true,"tagSupport":{"valueSet":[1]},"insertReplaceSupport":true,"resolveSupport":{"properties":["documentation","detail","additionalTextEdits"]},"insertTextModeSupport":{"valueSet":[1,2]},"labelDetailsSupport":true},"insertTextMode":2,"completionItemKind":{"valueSet":[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25]},"completionList":{"itemDefaults":["commitCharacters","editRange","insertTextFormat","insertTextMode","data"]}},"hover":{"dynamicRegistration":true,"contentFormat":["markdown","plaintext"]},"signatureHelp":{"dynamicRegistration":true,"signatureInformation":{"documentationFormat":["markdown","plaintext"],"parameterInformation":{"labelOffsetSupport":true},"activeParameterSupport":true},"contextSupport":true},"definition":{"dynamicRegistration":true,"linkSupport":true},"references":{"dynamicRegistration":true},"documentHighlight":{"dynamicRegistration":true},"documentSymbol":{"dynamicRegistration":true,"symbolKind":{"valueSet":[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26]},"hierarchicalDocumentSymbolSupport":true,"tagSupport":{"valueSet":[1]},"labelSupport":true},"codeAction":{"dynamicRegistration":true,"isPreferredSupport":true,"disabledSupport":true,"dataSupport":true,"resolveSupport":{"properties":["edit"]},"codeActionLiteralSupport":{"codeActionKind":{"valueSet":["","quickfix","refactor","refactor.extract","refactor.inline","refactor.rewrite","source","source.organizeImports"]}},"honorsChangeAnnotations":true},"codeLens":{"dynamicRegistration":true},"formatting":{"dynamicRegistration":true},"rangeFormatting":{"dynamicRegistration":true,"rangesSupport":true},"onTypeFormatting":{"dynamicRegistration":true},"rename":{"dynamicRegistration":true,"prepareSupport":true,"prepareSupportDefaultBehavior":1,"honorsChangeAnnotations":true},"documentLink":{"dynamicRegistration":true,"tooltipSupport":true},"typeDefinition":{"dynamicRegistration":true,"linkSupport":true},"implementation":{"dynamicRegistration":true,"linkSupport":true},"colorProvider":{"dynamicRegistration":true},"foldingRange":{"dynamicRegistration":true,"rangeLimit":5000,"lineFoldingOnly":true,"foldingRangeKind":{"valueSet":["comment","imports","region"]},"foldingRange":{"collapsedText":false}},"declaration":{"dynamicRegistration":true,"linkSupport":true},"selectionRange":{"dynamicRegistration":true},"callHierarchy":{"dynamicRegistration":true},"semanticTokens":{"dynamicRegistration":true,"tokenTypes":["namespace","type","class","enum","interface","struct","typeParameter","parameter","variable","property","enumMember","event","function","method","macro","keyword","modifier","comment","string","number","regexp","operator","decorator"],"tokenModifiers":["declaration","definition","readonly","static","deprecated","abstract","async","modification","documentation","defaultLibrary"],"formats":["relative"],"requests":{"range":true,"full":{"delta":true}},"multilineTokenSupport":false,"overlappingTokenSupport":false,"serverCancelSupport":true,"augmentsSyntaxTokens":true},"linkedEditingRange":{"dynamicRegistration":true},"typeHierarchy":{"dynamicRegistration":true},"inlineValue":{"dynamicRegistration":true},"inlayHint":{"dynamicRegistration":true,"resolveSupport":{"properties":["tooltip","textEdits","label.tooltip","label.location","label.command"]}},"diagnostic":{"dynamicRegistration":true,"relatedDocumentSupport":false}},"window":{"showMessage":{"messageActionItem":{"additionalPropertiesSupport":true}},"showDocument":{"support":true},"workDoneProgress":true},"general":{"staleRequestSupport":{"cancel":true,"retryOnContentModified":["textDocument/semanticTokens/full","textDocument/semanticTokens/range","textDocument/semanticTokens/full/delta"]},"regularExpressions":{"engine":"ECMAScript","version":"ES2020"},"markdown":{"parser":"marked","version":"1.1.0"},"positionEncodings":["utf-16"]},"notebookDocument":{"synchronization":{"dynamicRegistration":true,"executionSummarySupport":true}}},"trace":"verbose","workspaceFolders":[{"uri":"file:///s%3A/SourceCache/sourcekit-lsp","name":"sourcekit-lsp"}]}}

@ahoppen
Copy link
Member

ahoppen commented Mar 13, 2024

@tristanlabelle This sounds like a similar issue to swiftlang/sourcekit-lsp#752, which you fixed in #796. Do you have any idea what might be going wrong here?

@tristanlabelle
Copy link
Contributor

That's very interesting, thanks for the reduced repro. It points to another pipe handling problem in libdispatch, likely on the reading side, but I don't know what it may be. What kind of received <N> data output do you get when testing this? And does that print send back a broken response to vscode and causes it to misbehave?

@ahoppen
Copy link
Member

ahoppen commented Mar 14, 2024

I just managed to reduce it even further without any dependency on VS Code.

It appears that DispatchIO.read is constantly using CPU time until the pipe to it has been closed.

To reproduce this one:

  • Unzip the attached package
  • swift build (to build both parent.exe and child.exe)
  • Run .build\debug\parent.exe
  • Notice that child.exe is constantly using CPU without any work happening.

testpackage.zip

@tristanlabelle
Copy link
Contributor

Nice repro @ahoppen ! Let me know if this gets into gnarly libdispatch Win32 pipe semantics and you need help. We're tracking this bug as affecting our developers and we could give a hand to the investigation.

@ahoppen ahoppen transferred this issue from swiftlang/sourcekit-lsp Mar 14, 2024
@ahoppen ahoppen changed the title Excessive CPU usage even when idle (Windows platform) DispatchIO.read spinning while pipe is open on Windows Mar 14, 2024
@ahoppen
Copy link
Member

ahoppen commented Mar 15, 2024

@tristanlabelle I have reached that stage. If you could investigate it, I would really appreciate it.

@tristanlabelle
Copy link
Contributor

Hi @ahoppen , we'll look into it. Currently our priorities in this area are roughly:

  1. SourceKit-LSP file locking issue
  2. Frequent crashes on Windows with Assertion failed: IDAndOffset.first.isValid() && "invalid FileID for transition" sourcekit-lsp#1139
  3. This

@ahoppen
Copy link
Member

ahoppen commented Jul 31, 2024

It looks like we are spending the time in _dispatch_event_loop_leave_immediate. @tristanlabelle Have you seen this before in your previous fixes of hangs in libdispatch?

Screenshot 2024-07-30 at 19 21 13

Edit: Sorry, I just realized that I already reached this debugging state a few months ago. I remembered that you looked into Dispatch before but forgot that the DispatchIO issue was still open.

@tristanlabelle
Copy link
Contributor

@ahoppen I don't remember seeing this before specifically. The trace will be useful though as this is next in the priority list. We got through the file locking issue and I don't think we're hitting the crash anymore (from my list in previous comment). fyi @z2oh

@ahoppen
Copy link
Member

ahoppen commented Jul 31, 2024

That would be amazing. I wonder what kind of fix this will be.

@z2oh
Copy link

z2oh commented Aug 6, 2024

I've figured out what's causing this, but I'm not sure of a proper fix yet.

By default, Windows pipes are blocking on both the read and write side (PIPE_WAIT).

libdispatch seems to expect non-blocking write semantics on pipes, and so this was effected on Windows in #781 by setting the PIPE_NOWAIT flag here:
https://github.com/apple/swift-corelibs-libdispatch/blob/85369bae730669dcee18b771f27f885250971900/src/io.c#L1440-L1456

Perhaps surprisingly, this also changes the read side of the pipe to do non-blocking reads. The original Windows pipe-event implementation (#501) took advantage of Windows' blocking-by-default read semantics and spawned a lightweight thread to call ReadFile on the pipe in a loop.

I'm not sure when ReadFile on a PIPE_WAIT pipe would result in ERROR_NO_DATA (the docs don't describe this case in detail), but the dispatch code handles this error by restarting the loop. Under PIPE_NOWAIT, ERROR_NO_DATA is returned immediately on a read of an empty pipe, turning this code into a CPU-hogging polling loop.

https://github.com/apple/swift-corelibs-libdispatch/blob/85369bae730669dcee18b771f27f885250971900/src/event/event_windows.c#L299-L303

The comment in the code here seems to indicate that this error implies we have a non-blocking pipe (which is true as we've seen), but I can't intuit why that should cause the loop to restart. Perhaps there was an assumption that the pipe is still initializing and would eventually be set to PIPE_WAIT? I'm not sure.

#781 was cherry-picked into Swift 5.9 (and after), which explains why this issue isn't present in earlier versions.

I've built a local dispatch.dll that switches the pipe back to PIPE_WAIT and I no longer observe the perpetually spinning thread in sourcekit-lsp or in the test repro provided by @ahoppen. Presumably, this reintroduces the bug(s) that #781 resolved, so the actual fix will need to be more sophisticated. I'll continue to look into this.

@z2oh
Copy link

z2oh commented Aug 15, 2024

I've done some more investigation here:

The key insight is that the PIPE_WAIT pipe was cleverly overloaded to perform both data transfer and synchronization (via blocking on a 0-byte read). As far as I can tell, a PIPE_NOWAIT pipe requires some external signaling mechanism to achieve the same kind of synchronization. Because external writers may only have access to the file descriptor of the write side of the pipe, there's no way to get reliable synchronization from outside of libdispatch. Even overlapped I/O (the preferred mechanism for non-blocking pipe semantics) requires explicit signaling when calling WriteFile.

This is to say, I don't see a way to fix this problem without switching back to PIPE_WAIT. As mentioned previously, this will require an alternate solution for the write-hang bug that was mitigated by switching to PIPE_NOWAIT in #781, which is what I will be looking for next.

@tristanlabelle
Copy link
Contributor

tristanlabelle commented Aug 16, 2024

It makes sense to me that a final solution would be with PIPE_WAIT pipes, especially since PIPE_NOWAIT is deprecated. For context on the original issue: a PIPE_WAIT write will block if there's not enough buffer space to complete the write (which the dispatch code isn't expecting), and we never were able to find a reliable way to know that there will not be enough buffer space before attempting the write. The function that is supposed to report that information spuriously reports a misleading value of zero bytes.
Some possible avenues of investigation:

  • Can we get the reader side of the pipe from the writer and query how much data is available to read as a proxy to determine how much of the pipe is empty and available to write?
  • Can we do an overlapped write operation and then cancel it if it doesn't complete immediately because the pipe is full?

@litewrap
Copy link

Just for my information, is there any progress on resolving this very annoying bug on Windows and identified 8 months ago?

@tristanlabelle
Copy link
Contributor

@litewrap , I can only speak for the Browser Company and we're not scheduling this work right now. Our investigation to this point showed that it will require a refactoring of libdispatch that stretches our familiarity with the codebase. It doesn't mean we won't do it as I agree it's a bad bug, but we're not currently blocked on it.

@rauhul
Copy link
Member

rauhul commented Oct 29, 2024

FWIW any rewrites of parts of corelibs dispatch should consider using swift, I'm currently battling a dispatch segfault on linux.

@Keithcat1
Copy link

Yeah, just ran into this on Swift 6.0.2, Windows 11 64-bit. Simply leaving VSCode open will make my laptop fan go crazy.

@z2oh
Copy link

z2oh commented Jan 9, 2025

I had some more time to look into this, and I think I have a fix!

To summarize the problem:

  • In Use PIPE_NOWAIT on Windows to avoid blocking on writes #781, the pipe was changed from PIPE_WAIT to PIPE_NOWAIT to achieve POSIX-like O_NONBLOCK semantics, as attempting to write a large buffer would cause the dispatch queue thread to block long enough to hit a time out.
  • The Windows pipe synchronization thread, _dispatch_pipe_monitor_thread, uses a blocking 0-byte read as a synchronization mechanism to signal to the actual reader threads that there is data waiting in the pipe. Of course, for this to work, the read must actually be blocking. With a PIPE_NOWAIT pipe, the Read will not block, causing the thread to spin endlessly and constantly wake the actual reader thread to perform 0 byte reads. This spinning thread is the root source of the problem in this issue.

Switching the pipe back to PIPE_WAIT is easy enough, but then the original problem of blocking writes resurfaces. However, after a lot of investigation into Windows pipe semantics, I think #796 already fixes the blocking problem (this was a follow-up PR to fix a problem with the PIPE_NOWAIT implementation), which would mean the simple change back to PIPE_WAIT is all that's needed.

Bounding the size of the write to the size of the buffer ensures we don't block for too long on the dispatch queue:

IO_STATUS_BLOCK iosb;
FILE_PIPE_LOCAL_INFORMATION fpli;
NTSTATUS status = _dispatch_NtQueryInformationFile(hFile,
&iosb, &fpli, sizeof(fpli), FilePipeLocalInformation);
if (NT_SUCCESS(status)) {
// WriteQuotaAvailable is unreliable in the presence
// of a blocking reader, when it can return zero, so only
// account for it otherwise
if (fpli.WriteQuotaAvailable > 0) {
len = MIN(len, fpli.WriteQuotaAvailable);
}
len = MIN(len, fpli.OutboundQuota);
}

  • We first check the WriteQuotaAvailable of the pipe (which is the size of the pipe's buffer minus the total amount of space queued to be read), and use this as an initial upper bound.
  • If WriteQuotaAvailable == 0, this implies we have a reader that has requested a full buffer's worth of data from the pipe, and so we bound by the size of the pipe's buffer (OutboundQuota).
  • Either that reader is currently draining the pipe, in which case writing the full size of the buffer will be able to finish quickly after the pipe is drained
  • or the pipe is currently empty with a blocked reader waiting for data, in which case the full buffer write will also be able to finish quickly.

With the following simple patch:

diff --git a/src/io.c b/src/io.c
index e31e28c..38e46b6 100644
--- a/src/io.c
+++ b/src/io.c
@@ -1441,16 +1441,15 @@ _dispatch_fd_entry_create_with_fd(dispatch_fd_t fd, uintptr_t hash)
                                // from Foundation.Pipe may not have FILE_WRITE_ATTRIBUTES.
                                DWORD dwPipeMode = 0;
                                if (GetNamedPipeHandleState((HANDLE)fd, &dwPipeMode, NULL,
-                                               NULL, NULL, NULL, 0) && !(dwPipeMode & PIPE_NOWAIT)) {
-                                       dwPipeMode |= PIPE_NOWAIT;
+                                               NULL, NULL, NULL, 0) && !(dwPipeMode & PIPE_WAIT)) {
+                                       dwPipeMode |= PIPE_WAIT;
                                        if (!SetNamedPipeHandleState((HANDLE)fd, &dwPipeMode,
                                                        NULL, NULL)) {

I'm seeing no spinning from sourcekit-lsp or the test program provided above. Anecdotally, sourcekit-lsp feels much snappier and more responsive with this patch in place (though I haven't measured to be sure). I've convinced myself in theory that the original hang condition that precipitated the switch to PIPE_NOWAIT won't occur thanks to the bounding as described above. The only thing left to do is to demonstrate this in practice by replicating the hang with PIPE_WAIT without the bounding logic in place, and then verifying that the re-adding this logic avoids the hang.

I hope to have a PR up soon!

@ahoppen
Copy link
Member

ahoppen commented Jan 10, 2025

Thank you so much for diving deep and investigating this @z2oh 🤩

@z2oh
Copy link

z2oh commented Jan 24, 2025

PR is up!

#853 just switches the pipe back to PIPE_WAIT, but #854 adds an additional trick to enable WriteQuotaAvailable == 1 to be used as a sentinel value to distinguish the "output buffer is full" case. By always leaving a free byte of space in the buffer when writing, the only way that WriteQuotaAvailable == 0 can happen is if a reader has requested a full buffer's worth of data, so we can avoid blocking on the WriteFile in all cases. There may be issues with this approach, as it means we can no longer serially write-then-read bufsize worth of data, only <bufsize.

Both of these PRs fix the spinning thread issue though, and if #854 is not mergeable then I think #853 will do.

@ahoppen
Copy link
Member

ahoppen commented Jan 24, 2025

This is great :z2oh: Thank you some much for your investigation and the fix 🎉 🙌🏽

@litewrap
Copy link

Is this fixed ?

@z2oh
Copy link

z2oh commented Feb 21, 2025

Is this fixed ?

Not yet. Either #854 or #853 still need to be reviewed and merged. This is a subtle/tricky change with a lot of background context to ingest, so a thorough review is not easy. I'll try and see if I can resurface this on the relevant folks' radar to hopefully get a review soon.

@DimaKozhinov
Copy link
Author

It's been a year already... The bug is really painful and damaging for Swift ecosystem on Windows.

@compnerd
Copy link
Member

compnerd commented Mar 8, 2025

@DimaKozhinov you are welcome to take up the efforts to solve the issue. We have tried a few different approaches, but they all find different ways that they are insufficient.

@JoaoVictorVP
Copy link

Can't we like put a small sleep on the thread while it is spinning as a short-time solution, I think the drawbacks of this are less likely to be worse than consuming constantly 15% of CPU and making the experience of whoever is developing laggier because of it

@z2oh
Copy link

z2oh commented Mar 31, 2025

We had a chat with a Win32 pipes maintainer at Microsoft and it was confirmed that there is no way to reliably inspect the free space of a pipe's outbound buffer (even in a single-threaded context). The guidance offered was to use worker threads on the write side to emulate asynchronous writing. This solution is not intractable, although it introduces complexity and adds additional thread pressure to the system. I'm not sure when I will be able to dedicate time in pursuing this solution (though I may be able to at some point in the future).

In the meantime, I still think etiher #854 or #853 is worth merging. These solutions are not flawless: they trade off the perpetually spinning thread for a tiny bit of correctness, where under very specific conditions it is possible for a write to never make progress (importantly: without blocking the dispatch queue). But in the weeks that I've been using this patched version of libdispatch, I have not able to produce these conditions in practice, even with synthetic pathological experiments.

@ahoppen
Copy link
Member

ahoppen commented Apr 8, 2025

Thank you for continuing to investigate this, @z2oh and checking with the Microsoft engineers 🙏🏽

What’s your opinion on which is the better solution to fix this problem? #854 or #853? Also, your comment here sounds like there is a correctness issue with both approaches but the description of #854 mentions that it doesn’t have the correctness issue #853. I just want to make sure that you didn’t make any new discoveries that aren’t reflected in the PR descriptions.

Also, one question about the explanation in #853

So if nothing is actively reading the data, the queue will be blocked from making progress. However, I have been unable to trigger this case in practice. If there is no reader on the pipe, WriteFile will fail quickly. From what I could grok from the libdispatch code, the pipe synchronization mechanism will always have a reader waiting on some I/O Completion Port that will quickly resume when data in the pipe is available to be read.

I might be misunderstanding what you mean by if nothing is actively reading the data. Couldn’t the other end of the pipe be a separate process that just never reads data from the pipe. Since that process might be implemented without libdispatch, I’m not sure how assumptions about the libdispatch reader can affect the writer.

@z2oh
Copy link

z2oh commented Apr 8, 2025

There are no new discoveries, but information is a bit scattered between the two PRs and this issue thread. I will attempt to collate and clarify here.

Summary

There are two potential correctness issue I'm thinking about:

  1. The dispatch queue blocks (permanently or long enough to hit timeouts) on the call to WriteFile.
  2. Data passed into libdispatch is not written through to the pipe, despite a reader attempting to read data. N.B. In the absence of a reader, I would consider it not incorrect to avoid writing data.

The first type of incorrectness was present in libdispatch >=2 years ago. I can't find the issue now, but I recall that this was directly affecting users of sourcekit-lsp with crashes/stalls on large-completion buffers (which would attempt to write the full buffer in a single synchronous call, blocking the queue until all data was read or a timeout was encountered), and was fixed by #781 which switched the pipe to PIPE_NOWAIT to ensure that WriteFile no longer blocked. However, this change inadvertently introduced two other bugs:

  1. the spinning thread issue due to reads on a PIPE_NOWAIT pipe no longer blocking on the monitoring thread
  2. the second correctness issue listed above (materialized as Large completion lists (e.g. typing "Se") break language services in VS Code on Windows sourcekit-lsp#752). A WriteFile on a PIPE_NOWAIT pipe may not write any bytes if the entire write cannot fit into the pipe's buffer, meaning we would never make progress writing if the write request exceeded the pipe's outbound buffer size.

The second bug was fixed by a followup PR (#796) which puts an upper bound on the attempted write. Because the write request is now bounded by the pipe's outbound buffer size, we can be confident that progress will be made as long as there is a reader draining the pipe. So now we are left with the spinning thread issue, and this is the state of things today.

Both #854 and #853 switch the pipe back to PIPE_WAIT in order to fix the spinning thread issue. As libdispatch pipes expect O_NONBLOCK semantics it was tempting to try and leave PIPE_NOWAIT around and instead fix this by changing the pipe monitoring thread mechanism to no longer rely on a blocking read, but PIPE_NOWAIT mode is not recommended for use and only still exists for backwards compatibility, so it behooves us to pursue an alternate approach.

WriteFile on a PIPE_WAIT pipe is never guaranteed not to block, so we cannot completely avoid correctness issue 1. without a dedicated writer thread (however, we are confident that writing less data than the free space of a pipe's outbound buffer will not block, even though this is not guaranteed by the API). Because the write is bounded by the size of the outbound buffer in either case, this is unlikely to be a problem.

Analysis of #853:

In #853, the write-size bounding logic is:

if (fpli.WriteQuotaAvailable > 0) {
	len = MIN(len, fpli.WriteQuotaAvailable);
}
len = MIN(len, fpli.OutboundQuota);

Keep in mind that WriteQuotaAvailable = OutboundQuota - space_filled_by_writer - data_requested_by_reader. So if we have an empty outbound pipe, but a reader in another process requests a full buffer's worth of data, we will counterintuitively see WriteQuotaAvailable == 0 on the writer (without this idiosyncratic behavior, this problem would be easy to solve).

If a reader is connected to the pipe (WriteFile will fail fast if one isn't) but is not attempting to read any data, I suspect that a second write will see WriteQuotaAvailable == 0 because the write buffer is full, but continue to bound the write size by OutboundQuota, and then block the dispatch queue because WriteFile will block if the data can't fit in the outbound buffer (correctness issue 1). We can't just skip the write in this case, because WriteQuotaAvailable may just be 0 because the buffer is empty but a full buffer's worth of data is being requested by a reader, and then we would never make progress (which would be correctness issue 2).

This sounds bad, however, it just brings the implementation back to how it was before #781 with the improvement introduced by #796 to prevent very large writes from overwhelming the queue. As long as a reader is making progress, we will split the write up into OutboundQuota sized chunks preventing the dispatch queue from blocking (at least not for long).

Conclusion: if a reader is connected but not making progress, this solution may block the dispatch queue.

Analysis of #854:

In #854, the write-size bounding logic is:

if (fpli.WriteQuotaAvailable == 0) {
	len = MIN(len, fpli.OutboundQuota - 1);
} else {
	len = MIN(len, fpli.WriteQuotaAvailable - 1);
}

The idea here is that we never write a full buffer's worth of data, which reserves WriteQuotaAvailable == 0 to indicate that there is a blocked reader waiting for a full buffer's worth of data. When WriteQuotaAvailable == 1 we assume the pipe is full from the last write, and write 0 bytes, avoiding blocking the dispatch queue. Unfortunately, I suspect there is a possible race condition here which can result in correctness issue 2:

  1. The pipe is empty
  2. The reader requests exactly bufsize - 1 bytes of data
  3. The writer sees WriteQuotaAvailable == 1, assumes the pipe is full, and does not make progress.

I was unable to produce this case in practice, even under synthetic conditions.

This PR also breaks a test, which is why I mentioned a "requirements change" in the PR. There is a test which serially:

  1. Writes bufsize bytes of data to the pipe
  2. Waits for the write to complete
  3. Reads bufsize bytes of data from the pipe
  4. Waits for the read to complete

This no longer works with my PR, because we will only write bufsize - 1 bytes of data in lieu of a reader making progress. I changed the test to only write/read bufsize - 1 bytes, and I don't think this is a big deal because there's no reason to serially write and read through a pipe like this on a single thread, and attempting to do this with e.g. bufsize + 1 bytes was never supported for the same reason. But because this scenario was explicitly tested, I felt it worth calling out as a "requirements change".

Conclusion: if a reader requests bufsize - 1 bytes of data from an empty pipe, the write operation may not make progress, but will not block the dispatch queue.

Tradeoffs

The current implementation persistently spins the monitoring thread, which does not actually affect correctness, but impacts performance under load and devastates idle performance.

#853 brings the implementation back to how it was 2 years ago, but with some added logic to try and avoid blocking the dispatch queue for too long, but a misbehaving reader may still be able to block the thread.

#854 updates the write bounding logic to guarantee that we avoid blocking the dispatch queue (via writing less than the free space in the buffer; WriteFile is never guaranteed to not block), but may introduce a race condition that can prevent a write operation from progressing.

Obviously none of these are perfect, but I think #854 is the least likely to encounter a problem, and the problem that can materialize is often non-fatal (a non-progressing write may be able to be canceled, though not necessarily).

I think that #853 and #854 are both better than the status quo, as the spinning thread affects everyone under every case. This is predicated on my belief that the problems with both solutions are encountered rarely. Of course, if I'm wrong about this, the solutions are not suitable, but I was unable to ever reproduce the problem mentioned with #854, and the problem with #853 lingered in the code for years, and is less likely to be encountered today thanks to #796.

Alternatives

As mentioned in the previous comment, the Microsoft-approved way of fixing this will be dedicated writer threads with a signaling mechanism to perform synchronous WriteFiles off of the dispatch queue thread in order emulate asynchronous pipe writing semantics. This is likely to impact performance and increase thread-pressure on the system, though it will be correct and much better than the spinning thread of today. This solution also adds complexity, and I'm not sure when I will be able to pursue it.

There's some heuristics we could use to try and avoid the problems mentioned above, but they aren't foolproof. For example: with #854 we could instead write bufsize - 2 bytes, and then when we see the "pipe full" case (WriteQuotaAvailable == 2 now) for some period of time (say, 16 iterations of the dispatch_operation_t through _dispatch_operation_perform with no change), we can try only writing 1 byte. If we were in the ambiguous case where a reader was waiting for bytes, then WriteQuotaAvailable will move to 3 and the write will again be able to progress. If after this case, WriteQuotaAvailable == 1 for a while, we can assume the pipe was actually full and there is no reader making progress, in which case we can take some other action (cancellation, or just waiting forever). But this sort of just feels like kicking the can down the road.

Conclusion

I didn't intend to write an essay here, my apologies for the long comment 😅 . To directly answer your questions:

What’s your opinion on which is the better solution to fix this problem? #854 or #853?

I think #854 is the better approach of the two, although this is predicated on my belief that the incorrectness condition mentioned will be exceedingly rare.

I might be misunderstanding what you mean by if nothing is actively reading the data. Couldn’t the other end of the pipe be a separate process that just never reads data from the pipe. Since that process might be implemented without libdispatch, I’m not sure how assumptions about the libdispatch reader can affect the writer.

Yes, I think my understanding was off here. The synchronization thread will always signal an I/O completion port when there is data available, but this does not imply that there is an active reader waiting for a signal to start consuming data. If there is a reader connected to the pipe but not reading data, I think that #853 could cause the dispatch queue to become blocked (as explained above).

@ahoppen
Copy link
Member

ahoppen commented Apr 8, 2025

Thanks a lot for that detailed compilation of the tradeoff (would have called it summary if not for the length 😅).

I also prefer the solution in #854 because:

  1. Having a setup where a reader does not make progress seems like a common mistake that can be made during development. Blocking the writer because of this may make problems like these even harder to diagnose.
  2. The potential error scenario that you described in Switch Win32 pipes to PIPE_WAIT with sentinel bufsize #854 seems unlikely to occur in practice
  3. You weren’t able to produce the correctness issue in Switch Win32 pipes to PIPE_WAIT with sentinel bufsize #854 in practice
  4. I don’t think the change in tests is a blocker since I don’t see a real-world use case where this serial write-read behavior would be required.
  5. Both solutions are probably better than the status quo.

Before going ahead, I would like to clarify one thing about your test setup for the potential issue in #854 though:

If you replicate the setup that you describe there, what do you see for WriteQuotaAvailable? The test setup sounds to me like it should deterministically produce WriteQuotaAvailable == 1, so I’m curious why you can’t reproduce that behavior.

@z2oh
Copy link

z2oh commented Apr 8, 2025

My test setup was attempting to read bufsize - 1 bytes to trigger the ambiguous case as described, but I never beat the write side in the race to the pipe, so the read would always succeed because the data was there. I'll try another set of experiments to stress test this and will report back this evening or tomorrow.

@ahoppen
Copy link
Member

ahoppen commented Apr 8, 2025

Shouldn’t you be able to hit this pretty consistently by eg. waiting 1 second before doing the write?

@xwu
Copy link

xwu commented Apr 11, 2025

There's some heuristics we could use to try and avoid the problems mentioned above, but they aren't foolproof. For example: with #854 we could instead write bufsize - 2 bytes [...]. But this sort of just feels like kicking the can down the road.

I'm very naive in this area, but in the implementation of our random number generation API, we use a (widely used) rejection sampling technique that theoretically requires an unbounded number of iterations but in practice always terminates very quickly.

Does each heuristic "kick" here move the "can" down the road exponentially further, such that even if you figure out a way to trigger the bufsize - 1 case consistently, you effectively make the issue a statistical near-impossibility if (for example) you give yourself three kicks of the can by starting at bufsize - 4?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet