-
-
Notifications
You must be signed in to change notification settings - Fork 321
Fix race conditions in FTP socket closure that cause intermittent errors #17540
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
I just ran a test with Cyberduck on Windows - on one of the FTP servers, there are NO issues - hundreds of transfers are fast and don't cause 426. On the other server, the first file just hangs at 100% forever. Other clients (e.g. WinSCP) work fine for both servers. This is likely a separate issue though, so let's ignore it for now. |
I'm trying to compile the Windows version with this PR to see if it happens to resolve the hanging issue. However, I'm getting errors on pulling the NuGet dependencies from nuget.config: <?xml version="1.0" encoding="utf-8"?>
<configuration>
<packageSources>
<clear />
<add key="nuget.org" value="https://api.nuget.org/v3/index.json" protocolVersion="3" />
<add key="gh-iterate-ch" value="https://nuget.pkg.github.com/iterate-ch/index.json" protocolVersion="3" />
<add key="gh-ikvmnet" value="https://nuget.pkg.github.com/ikvmnet/index.json" protocolVersion="3" />
</packageSources>
<packageSourceMapping>
<packageSource key="nuget.org">
<package pattern="IKVM.ByteCode" />
<package pattern="*" />
</packageSource>
<packageSource key="gh-ikvmnet">
<package pattern="IKVM*" />
</packageSource>
<packageSource key="gh-iterate-ch">
<package pattern="iterate-ch.*" />
</packageSource>
</packageSourceMapping>
</configuration> It gives me 401, and if I put my GH token, it gives 403. |
Sorry for the unclear instructions, I'll update the building-part of the readme. In the meantime, make sure to use a "Legacy/Classic" personal access token with at least "read:packages" permissions, then in the Cyberduck repo root use Terminal commands
After that, building with |
@AliveDevil thanks, that worked. Two more notes:
Sadly, even with the changes from this PR applied, the transfer still hangs on one of the FTP servers on Windows. However, I looked at its WireShark dump, and this issue seems directly connected to the one with macOS. The problem on Windows is that it sometimes doesn't send FIN with the last FTP-DATA packet, so the server naturally waits for more data. The client on the other side is waiting for server response, so it hangs. I believe that my fix from this PR should have worked here too, but perhaps due to the Windows networking stack and/or IKVM cross-compilation, it has no effect and/or different behavior. I collected TCP dumps of various FTP clients that work properly, both on macOS and Windows, and they all shared the same flow:
To be honest, there might be a better way to achieve a correct flow here. Could this perhaps be a bug somewhere outside of the read/write close() functions? Maybe a race with when data is drained vs. when close is called? I'm not familiar with Cyberduck's codebase. Sadly, as things stand now, Cyberduck's FTP implementation seems very broken on both Mac and Windows. |
Good news, I came up with a proper implementation that fixes both Windows and macOS across all the clients I tested. Additionally, the macOS delay side effect is now gone. Will polish it up and update this PR. Ultimately, this turned out to be a bug in Apache Common NET's implementation of FTPClient - more specifically, their socket handling. |
Force-pushed the new fix. To summarize, the issue was in Apache Commons NET's public class SocketOutputStream extends FilterOutputStream {
private final Socket socket;
public SocketOutputStream(Socket socket, OutputStream stream) {
super(stream);
this.socket = socket;
}
public void close() throws IOException {
super.close();
this.socket.close();
}
public void write(byte[] buffer, int offset, int length) throws IOException {
this.out.write(buffer, offset, length);
}
} The There are two ways I could think of fixing this - either ensure connections are finalized in all instances where I wasn't sure just how many places would need changing for the first approach, and whether it's possible at all since at least some functions of Apache's FTPClient perform the closure on their own, so you have no control over them. For example, So I opted to wrap the socket instead - since we have a nice hook in the beginning of the connection, the new socket will be used throughout Cyberduck and Apache's code, and anywhere it is closed, proper shutdown will be enforced. I went a bit heavy with the comments in the code given the low-level and quirky nature. If something doesn't make sense, please let me know. I was also unsure about the socket timeout part, but I will leave a comment in the PR for us to discuss it. As mentioned in my previous comment, this fix works with all of my FTP servers with no side effects. However, it's worth noting that I only tested plaintext FTP over LAN. I haven't tested FTP+TLS, SFTP, Active mode, etc. I don't expect there to be issues, but you might want to check any more exotic configurations on your end. |
Alright, with the latest changes, I'm happy to announce that Cyberduck works on all of my test FTP servers, both Windows and macOS, with no performance regressions. I stress-tested all of them with 1,000 small files back-to-back, and there were no issues. What's more, the performance of Cyberduck is significantly better than all other FTP clients that I tested, in both multiple small files and a single large file. I think the PR is now ready to merge, at least from my view. Please let me know if you have any questions. |
finally { | ||
log.debug("Closing socket {}", delegate); | ||
// Work around macOS quirk where Java NIO's SocketDispatcher.close0() has a 1,000ms delay | ||
CompletableFuture.runAsync(() -> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Running in a background thread and missing any error looks dangerious to me as we will not report any error when closing the stream fails which may indicate that the file is not properly persisted on the remote server.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I fully agree. I tried really hard to find a workaround, but couldn't come up with one. Having 1 whole second delay between each file upload is really bad when you upload lots of small files, and it causes Cyberduck to be upwards of 10x slower than competitor FTP clients. If you have any suggestion here, I'm more than happy to take it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I don't think this failing specifically could result in data loss. By the time we've reached here we already flushed our data, closed our stream, and waited for the server to close its stream. So, I don't think failing here could ever result in data loss.
The only case of data loss should occur if "Failed to shutdown output for socket" is hit. Maybe we can change that to throw instead of logging and discarding.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are probably right there is no risk here in the real world. But it should probably not be in the finally block as it does not need to be executed when above code fails with I/O.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is that close quirk on macOS specific to server implementations?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it is. I really have no explanation why, but it's 100% consistent. Even if IO fails, don't we want to make sure the socket is closed though? Otherwise we might leak it. I'm not sure how Java deals with this specifically.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean do you see this delay regardless of what server implementation you connect to? If it shows only when connected to a specific server implementation we would not want to implement a workaround.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But I agree that we will want to close the socket regardless of any previous error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I only see this delay with one specific FTP server. In that server, it happens 100% of the time. I looked at the packet dump and the server returns immediately - the delay is purely on client side with no pending packets. I can look again if there's something that might be triggering it, but no other FTP client has a delay on macOS, so I'm inclined to believe it's a bug that can happen even to other servers.
Can you comment what server implementations you tested this changeset with to avoid regressions. |
Found related documentation in Orderly Versus Abortive Connection Release in Java 1
Footnotes |
I tested with the following FTP servers:
However, I just noticed a peculiar issue exclusively with FileZilla Server - there's insane read amplification. I confirmed that us draining the input on download operations actually triggers the server to re-send the file again and again. I am thinking since we already wrap the input/output streams, we could track how many bytes they've processed. And when we close, if the output bytes > 0, only then we do full close. Does that make sense to you? Only caveat is that if a socket is used for both input and output, and draining the input starts re-sending, we can't avoid this. But at least in FTP, this should not be the case - data sockets are either in only or out only. |
Pushed my fix as described above. Seems to work everywhere now, and no more read amplification. |
We will want to test interoperability with common FTP servers in use such as
The embedded integration tests run against Apache FTP Server. |
public synchronized InputStream getInputStream() throws IOException { | ||
if(inputStreamWrapper == null) { | ||
inputStreamWrapper = new ProxyInputStream(delegate.getInputStream()) { | ||
@Override | ||
public void close() throws IOException { | ||
// super.close() will call delegate.close(), so override it with ours instead | ||
FTPSocket.this.close(); | ||
} | ||
}; | ||
} | ||
return inputStreamWrapper; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see we are required to override the implementation here if it should not apply when no data is sent to the server.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because if the socket was used for output but the input stream is closed, that would bypass proper closure.
finally { | ||
log.debug("Closing socket {}", delegate); | ||
// Work around macOS quirk where Java NIO's SocketDispatcher.close0() has a 1,000ms delay | ||
CompletableFuture.runAsync(() -> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean do you see this delay regardless of what server implementation you connect to? If it shows only when connected to a specific server implementation we would not want to implement a workaround.
finally { | ||
log.debug("Closing socket {}", delegate); | ||
// Work around macOS quirk where Java NIO's SocketDispatcher.close0() has a 1,000ms delay | ||
CompletableFuture.runAsync(() -> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But I agree that we will want to close the socket regardless of any previous error.
private final AtomicInteger outputBytesCount = new AtomicInteger(0); | ||
|
||
private InputStream inputStreamWrapper; | ||
private OutputStream outputStreamWrapper; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If counting bytes is really required then reuse org.apache.commons.io.output.CountingOutputStream
.
bytesRead++; | ||
} | ||
if(bytesRead > 0) { | ||
log.debug("Drained {} bytes from socket {}", bytesRead, delegate); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Log with warn
level as unexpected.
* 2. Drain input to wait for ACKs until we receive server FIN | ||
* 3. Close the socket to release resources | ||
*/ | ||
public class FTPSocket extends Socket { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Extend from socket that delegates only to proxy implementation.
public class FTPSocket extends Socket { | |
public class FTPSocket extends DelegatingSocket { |
package ch.cyberduck.core.socket;
/*
* Copyright (c) 2002-2025 iterate GmbH. All rights reserved.
* https://cyberduck.io/
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*/
import java.io.IOException;
import java.io.InputStream;
import java.io.OutputStream;
import java.net.InetAddress;
import java.net.Socket;
import java.net.SocketAddress;
import java.net.SocketException;
import java.nio.channels.SocketChannel;
public class DelegatingSocket extends Socket {
protected final Socket delegate;
public DelegatingSocket(final Socket delegate) {
this.delegate = delegate;
}
@Override
public void bind(final SocketAddress bindpoint) throws IOException {
delegate.bind(bindpoint);
}
@Override
public synchronized void close() throws IOException {
delegate.close();
}
@Override
public void connect(final SocketAddress endpoint) throws IOException {
delegate.connect(endpoint);
}
@Override
public void connect(final SocketAddress endpoint, final int timeout) throws IOException {
delegate.connect(endpoint, timeout);
}
@Override
public SocketChannel getChannel() {
return delegate.getChannel();
}
@Override
public InetAddress getInetAddress() {
return delegate.getInetAddress();
}
@Override
public InputStream getInputStream() throws IOException {
return delegate.getInputStream();
}
@Override
public boolean getKeepAlive() throws SocketException {
return delegate.getKeepAlive();
}
@Override
public InetAddress getLocalAddress() {
return delegate.getLocalAddress();
}
@Override
public int getLocalPort() {
return delegate.getLocalPort();
}
@Override
public SocketAddress getLocalSocketAddress() {
return delegate.getLocalSocketAddress();
}
@Override
public boolean getOOBInline() throws SocketException {
return delegate.getOOBInline();
}
@Override
public OutputStream getOutputStream() throws IOException {
return delegate.getOutputStream();
}
@Override
public int getPort() {
return delegate.getPort();
}
@Override
public synchronized int getReceiveBufferSize() throws SocketException {
return delegate.getReceiveBufferSize();
}
@Override
public SocketAddress getRemoteSocketAddress() {
return delegate.getRemoteSocketAddress();
}
@Override
public boolean getReuseAddress() throws SocketException {
return delegate.getReuseAddress();
}
@Override
public synchronized int getSendBufferSize() throws SocketException {
return delegate.getSendBufferSize();
}
@Override
public int getSoLinger() throws SocketException {
return delegate.getSoLinger();
}
@Override
public synchronized int getSoTimeout() throws SocketException {
return delegate.getSoTimeout();
}
@Override
public boolean getTcpNoDelay() throws SocketException {
return delegate.getTcpNoDelay();
}
@Override
public int getTrafficClass() throws SocketException {
return delegate.getTrafficClass();
}
@Override
public boolean isBound() {
return delegate.isBound();
}
@Override
public boolean isClosed() {
return delegate.isClosed();
}
@Override
public boolean isConnected() {
return delegate.isConnected();
}
@Override
public boolean isInputShutdown() {
return delegate.isInputShutdown();
}
@Override
public boolean isOutputShutdown() {
return delegate.isOutputShutdown();
}
@Override
public void sendUrgentData(final int data) throws IOException {
delegate.sendUrgentData(data);
}
@Override
public void setKeepAlive(final boolean on) throws SocketException {
delegate.setKeepAlive(on);
}
@Override
public void setOOBInline(final boolean on) throws SocketException {
delegate.setOOBInline(on);
}
@Override
public void setPerformancePreferences(final int connectionTime, final int latency, final int bandwidth) {
delegate.setPerformancePreferences(connectionTime, latency, bandwidth);
}
@Override
public synchronized void setReceiveBufferSize(final int size) throws SocketException {
delegate.setReceiveBufferSize(size);
}
@Override
public void setReuseAddress(final boolean on) throws SocketException {
delegate.setReuseAddress(on);
}
@Override
public synchronized void setSendBufferSize(final int size) throws SocketException {
delegate.setSendBufferSize(size);
}
@Override
public void setSoLinger(final boolean on, final int linger) throws SocketException {
delegate.setSoLinger(on, linger);
}
@Override
public synchronized void setSoTimeout(final int timeout) throws SocketException {
delegate.setSoTimeout(timeout);
}
@Override
public void setTcpNoDelay(final boolean on) throws SocketException {
delegate.setTcpNoDelay(on);
}
@Override
public void setTrafficClass(final int tc) throws SocketException {
delegate.setTrafficClass(tc);
}
@Override
public void shutdownInput() throws IOException {
delegate.shutdownInput();
}
@Override
public void shutdownOutput() throws IOException {
delegate.shutdownOutput();
}
@Override
public String toString() {
final StringBuilder sb = new StringBuilder("DelegatingSocket{");
sb.append("delegate=").append(delegate);
sb.append('}');
return sb.toString();
}
}
Hello!
First thing's first, I want to clarify that this issue was only experienced on macOS. I don't know if Windows is also affected.
Over the past years, I've been observing intermittent instances of
426 Connection closed; transfer aborted.
across various FTP servers (TLS and not). Recently, I tried to use Cyberduck to transfer files to my Nintendo Switch where I host a FTP server using Sphaira or FTPd. There I realized that the same intermittent issue now has more of a 50% chance to happen on every single file, rendering Cyberduck basically useless. I tried a variety of other clients and they never experienced such issues, so I just abandoned Cyberduck for this purpose.However, today, I decided to dig into this properly. For some context, I am connecting over plain FTP, passive mode, over LAN. I noticed that the 426 occurs just at the end of a file transfer (e.g. 10GB file would transfer fine for minutes and then fails at 100%). I sniffed the packets with WireShark and noticed something interesting - Cyberduck was sometimes closing the socket right after the data is sent, before the server ACKs, resulting in RST. Here are dumps of a working and non-working case:
TCP logs
The fix in this PR definitely doesn't follow proper OOP - it's merely a PoC - but I think it gets the idea right. We half-close the socket (output only), signaling to the FTP server that we're done here. Then the server ACKs and responds with "226 OK", and only then we full-close the socket (input as well).
I re-ran my previously failing test cases a few hundred times, and all worked fine with this PR! The only caveat is, with this PR transfers to one of the FTP servers has a small delay between a file upload being complete, and a new file transfer starting. I compared with other FTP clients (Forklift on macOS) and there was no such delay. On the other FTP server, neither Cyberduck nor Forklift have this delay.