-
Notifications
You must be signed in to change notification settings - Fork 465
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DispatchIO.read spinning while pipe is open on Windows #820
Comments
Tracked in Apple’s issue tracker as rdar://124157417 |
Same problem here. |
To rule out influence of virtual environments like VirtualBox or Parallels, I've tested this on bare metal (3 GHz Intel Core i5-7400, Windows 10): Swift 5.8.1 sourcekit-lsp CPU usage: 0% (the expected behavior) Small note: This time sourcekit-lsp appeared in Task Manager not as a sub-process of Visual Studio Code, but as a separate background process due to some reason. |
Same problem. Running Swift 5.10 directly on Windows 11 x64 |
I have managed to reduce the problem to the following.
Sources/sourcekit-lsp/SourceKitLSP.swift
import Dispatch
import Foundation
#if canImport(CDispatch)
import struct CDispatch.dispatch_fd_t
#endif
@main
struct MyCommand {
static func main() throws {
let fh = try! FileHandle(forWritingTo: URL(fileURLWithPath: #"C:/Users/rintaro/out.txt"#))
try! fh.seekToEnd()
try! fh.write("start\n".data(using: .utf8)!)
let queue: DispatchQueue = DispatchQueue(label: "jsonrpc-queue", qos: .userInitiated)
#if os(Windows)
let rawInFD = dispatch_fd_t(bitPattern: FileHandle.standardInput._handle)
#else
let rawInFD = inFD.fileDescriptor
#endif
let receiveIO = DispatchIO(type: .stream, fileDescriptor: rawInFD, queue: queue) { (error: Int32) in
if error != 0 {
print("IO error \(error)")
}
}
receiveIO.setLimit(lowWater: 1)
receiveIO.read(offset: 0, length: Int.max, queue: queue) { done, data, errorCode in
print("received \(data?.count ?? -1) data")
let fh = try! FileHandle(forWritingTo: URL(fileURLWithPath: #"C:/Users/rintaro/out.txt"#))
try! fh.seekToEnd()
try! fh.write(contentsOf: data!)
}
dispatchMain()
}
}
set SDKROOT=S:\Program Files\Swift\Platforms\Windows.platform\Developer\SDKs\Windows.sdk
path S:\Program Files\Swift\Runtimes\0.0.0\usr\bin;S:\Program Files\Swift\Toolchains\0.0.0+Asserts\usr\bin;%PATH%
"C:\path\to\Microsoft VS Code\Code.exe" C:\path\to\a\swiftpm\project
I was unable to reproduce this issue by launching this modified version of sourcekit-lsp from the command prompt and passing data to stdin by typing or by piping data to the modified version of sourcekit-lsp using Get-Content C:\path\to\input.txt | .\sourcekit-lsp.exe With the following input.txt
|
@tristanlabelle This sounds like a similar issue to swiftlang/sourcekit-lsp#752, which you fixed in #796. Do you have any idea what might be going wrong here? |
That's very interesting, thanks for the reduced repro. It points to another pipe handling problem in libdispatch, likely on the reading side, but I don't know what it may be. What kind of |
I just managed to reduce it even further without any dependency on VS Code. It appears that To reproduce this one:
|
Nice repro @ahoppen ! Let me know if this gets into gnarly libdispatch Win32 pipe semantics and you need help. We're tracking this bug as affecting our developers and we could give a hand to the investigation. |
@tristanlabelle I have reached that stage. If you could investigate it, I would really appreciate it. |
Hi @ahoppen , we'll look into it. Currently our priorities in this area are roughly:
|
It looks like we are spending the time in ![]() Edit: Sorry, I just realized that I already reached this debugging state a few months ago. I remembered that you looked into Dispatch before but forgot that the DispatchIO issue was still open. |
@ahoppen I don't remember seeing this before specifically. The trace will be useful though as this is next in the priority list. We got through the file locking issue and I don't think we're hitting the crash anymore (from my list in previous comment). fyi @z2oh |
That would be amazing. I wonder what kind of fix this will be. |
I've figured out what's causing this, but I'm not sure of a proper fix yet. By default, Windows pipes are blocking on both the read and write side ( libdispatch seems to expect non-blocking write semantics on pipes, and so this was effected on Windows in #781 by setting the Perhaps surprisingly, this also changes the read side of the pipe to do non-blocking reads. The original Windows pipe-event implementation (#501) took advantage of Windows' blocking-by-default read semantics and spawned a lightweight thread to call I'm not sure when The comment in the code here seems to indicate that this error implies we have a non-blocking pipe (which is true as we've seen), but I can't intuit why that should cause the loop to restart. Perhaps there was an assumption that the pipe is still initializing and would eventually be set to #781 was cherry-picked into Swift 5.9 (and after), which explains why this issue isn't present in earlier versions. I've built a local |
I've done some more investigation here: The key insight is that the This is to say, I don't see a way to fix this problem without switching back to |
It makes sense to me that a final solution would be with
|
Just for my information, is there any progress on resolving this very annoying bug on Windows and identified 8 months ago? |
@litewrap , I can only speak for the Browser Company and we're not scheduling this work right now. Our investigation to this point showed that it will require a refactoring of libdispatch that stretches our familiarity with the codebase. It doesn't mean we won't do it as I agree it's a bad bug, but we're not currently blocked on it. |
FWIW any rewrites of parts of corelibs dispatch should consider using swift, I'm currently battling a dispatch segfault on linux. |
Yeah, just ran into this on Swift 6.0.2, Windows 11 64-bit. Simply leaving VSCode open will make my laptop fan go crazy. |
I had some more time to look into this, and I think I have a fix! To summarize the problem:
Switching the pipe back to Bounding the size of the write to the size of the buffer ensures we don't block for too long on the dispatch queue: swift-corelibs-libdispatch/src/io.c Lines 2516 to 2528 in 0c38954
With the following simple patch:
I'm seeing no spinning from sourcekit-lsp or the test program provided above. Anecdotally, sourcekit-lsp feels much snappier and more responsive with this patch in place (though I haven't measured to be sure). I've convinced myself in theory that the original hang condition that precipitated the switch to I hope to have a PR up soon! |
Thank you so much for diving deep and investigating this @z2oh 🤩 |
PR is up! #853 just switches the pipe back to Both of these PRs fix the spinning thread issue though, and if #854 is not mergeable then I think #853 will do. |
This is great :z2oh: Thank you some much for your investigation and the fix 🎉 🙌🏽 |
Is this fixed ? |
It's been a year already... The bug is really painful and damaging for Swift ecosystem on Windows. |
@DimaKozhinov you are welcome to take up the efforts to solve the issue. We have tried a few different approaches, but they all find different ways that they are insufficient. |
Can't we like put a small sleep on the thread while it is spinning as a short-time solution, I think the drawbacks of this are less likely to be worse than consuming constantly 15% of CPU and making the experience of whoever is developing laggier because of it |
We had a chat with a Win32 pipes maintainer at Microsoft and it was confirmed that there is no way to reliably inspect the free space of a pipe's outbound buffer (even in a single-threaded context). The guidance offered was to use worker threads on the write side to emulate asynchronous writing. This solution is not intractable, although it introduces complexity and adds additional thread pressure to the system. I'm not sure when I will be able to dedicate time in pursuing this solution (though I may be able to at some point in the future). In the meantime, I still think etiher #854 or #853 is worth merging. These solutions are not flawless: they trade off the perpetually spinning thread for a tiny bit of correctness, where under very specific conditions it is possible for a write to never make progress (importantly: without blocking the dispatch queue). But in the weeks that I've been using this patched version of libdispatch, I have not able to produce these conditions in practice, even with synthetic pathological experiments. |
Thank you for continuing to investigate this, @z2oh and checking with the Microsoft engineers 🙏🏽 What’s your opinion on which is the better solution to fix this problem? #854 or #853? Also, your comment here sounds like there is a correctness issue with both approaches but the description of #854 mentions that it doesn’t have the correctness issue #853. I just want to make sure that you didn’t make any new discoveries that aren’t reflected in the PR descriptions. Also, one question about the explanation in #853
I might be misunderstanding what you mean by if nothing is actively reading the data. Couldn’t the other end of the pipe be a separate process that just never reads data from the pipe. Since that process might be implemented without libdispatch, I’m not sure how assumptions about the libdispatch reader can affect the writer. |
There are no new discoveries, but information is a bit scattered between the two PRs and this issue thread. I will attempt to collate and clarify here. SummaryThere are two potential correctness issue I'm thinking about:
The first type of incorrectness was present in libdispatch >=2 years ago. I can't find the issue now, but I recall that this was directly affecting users of sourcekit-lsp with crashes/stalls on large-completion buffers (which would attempt to write the full buffer in a single synchronous call, blocking the queue until all data was read or a timeout was encountered), and was fixed by #781 which switched the pipe to
The second bug was fixed by a followup PR (#796) which puts an upper bound on the attempted write. Because the write request is now bounded by the pipe's outbound buffer size, we can be confident that progress will be made as long as there is a reader draining the pipe. So now we are left with the spinning thread issue, and this is the state of things today. Both #854 and #853 switch the pipe back to
Analysis of #853:In #853, the write-size bounding logic is:
Keep in mind that If a reader is connected to the pipe ( This sounds bad, however, it just brings the implementation back to how it was before #781 with the improvement introduced by #796 to prevent very large writes from overwhelming the queue. As long as a reader is making progress, we will split the write up into Conclusion: if a reader is connected but not making progress, this solution may block the dispatch queue. Analysis of #854:In #854, the write-size bounding logic is:
The idea here is that we never write a full buffer's worth of data, which reserves
I was unable to produce this case in practice, even under synthetic conditions. This PR also breaks a test, which is why I mentioned a "requirements change" in the PR. There is a test which serially:
This no longer works with my PR, because we will only write Conclusion: if a reader requests TradeoffsThe current implementation persistently spins the monitoring thread, which does not actually affect correctness, but impacts performance under load and devastates idle performance. #853 brings the implementation back to how it was 2 years ago, but with some added logic to try and avoid blocking the dispatch queue for too long, but a misbehaving reader may still be able to block the thread. #854 updates the write bounding logic to guarantee that we avoid blocking the dispatch queue (via writing less than the free space in the buffer; Obviously none of these are perfect, but I think #854 is the least likely to encounter a problem, and the problem that can materialize is often non-fatal (a non-progressing write may be able to be canceled, though not necessarily). I think that #853 and #854 are both better than the status quo, as the spinning thread affects everyone under every case. This is predicated on my belief that the problems with both solutions are encountered rarely. Of course, if I'm wrong about this, the solutions are not suitable, but I was unable to ever reproduce the problem mentioned with #854, and the problem with #853 lingered in the code for years, and is less likely to be encountered today thanks to #796. AlternativesAs mentioned in the previous comment, the Microsoft-approved way of fixing this will be dedicated writer threads with a signaling mechanism to perform synchronous There's some heuristics we could use to try and avoid the problems mentioned above, but they aren't foolproof. For example: with #854 we could instead write ConclusionI didn't intend to write an essay here, my apologies for the long comment 😅 . To directly answer your questions:
I think #854 is the better approach of the two, although this is predicated on my belief that the incorrectness condition mentioned will be exceedingly rare.
Yes, I think my understanding was off here. The synchronization thread will always signal an I/O completion port when there is data available, but this does not imply that there is an active reader waiting for a signal to start consuming data. If there is a reader connected to the pipe but not reading data, I think that #853 could cause the dispatch queue to become blocked (as explained above). |
Thanks a lot for that detailed compilation of the tradeoff (would have called it summary if not for the length 😅). I also prefer the solution in #854 because:
Before going ahead, I would like to clarify one thing about your test setup for the potential issue in #854 though: If you replicate the setup that you describe there, what do you see for |
My test setup was attempting to read |
Shouldn’t you be able to hit this pretty consistently by eg. waiting 1 second before doing the write? |
I'm very naive in this area, but in the implementation of our random number generation API, we use a (widely used) rejection sampling technique that theoretically requires an unbounded number of iterations but in practice always terminates very quickly. Does each heuristic "kick" here move the "can" down the road exponentially further, such that even if you figure out a way to trigger the |
The sourcekit-lsp.exe consumes ~24% of CPU power all the time, even when idle. This bug does not appear under Linux, and does not appear in Swift versions earlier than v5.9. It is still present in v5.10. I tested this under VirtualBox, my virtual machine has 4 virtual processors. The host machine has 13th Gen Intel Core i7 processor, so ~24% load seems like a lot of mysterious calculations.
How to reproduce:
mkdir test
cd test
swift package init
This bug has nothing to do with VS Code. I found this bug when using my own app that runs sourcekit-lsp.exe, and not VS Code.
The text was updated successfully, but these errors were encountered: