-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TP v.0.1.14
takes a lot to complete handshakes after the first one
#80
Comments
TP logs:
Pool logs:
Pool restarted:
|
@Sjors as you can see from the logs, the second time I run the Pool, it's at But from the TP logs, it's not possible to see the second noise connection request until So that demonstrates how the Pool gets stuck for about 18 seconds |
This is probably not related to the changes in More likely the problem is that once the TP has a connection, it's (partially) blocked from listening to new connections. This might be tricky to debug. I'm going to try the laziest solution first, namely to update the I'll ping you when that's done so you can test it. If it works I'll tag a new release, otherwise I'll investigate more but it might be two weeks before I get to that. |
I can't reproduce this with the latest But also not with the Both compiled from source as a debug build: For the pool I used your PR at c76a161cfe6e7ea6b5b2f11381a6eb6ebb62035b. Both on the same machine, using SRI signet. I didn't mine. Tested on (Apple Silicon) macOS 15.3.1 Let me know how it goes on your end... Do you get the same issue when you keep the pool connected, and then connect from another role? If not, then maybe the problem is related to the disconnection. Also, did you test whether v.0.1.13 has the same issue? (maybe just dropping the |
It's also a good idea to wipe the |
Sorry I was stuck in other stuff during last days. I just compiled your last commit (5fe5296) and tested again. Unfortunately I still get the same behaviour..
I get the same issue also if I connect a JDC to the TP after the Pool.
v.0.1.13 doesn't have this issue at all, I tested it last week and everything was fine. |
Thanks for the update. Difficult to debug until I can reproduce... I noticed you're using macOS 13, is that an Intel machine? I'll try that later, maybe it helps in reproducing. How are you building and compiling both Bitcoin Core and the pool role? Are you running them straight from the command line terminal, or some Docker like setup? Have you tried on a linux machine? |
Yes, it is. I'm running everything from the command line. |
Maybe there's detail there that I'm doing different. Also, are you using the SRI signet, or testnet4 or mainnet? |
I'm using On ubuntu I just used the binary provided by you btw, I haven't tried to compile it from latest commit. |
This would be useful for comparison. Can you try again with more verbose logging: When building, can you add I'd also like to know if this only happens when you disconnect and reconnect, or if it also happens if you get a second connection (e.g. from a JDC).
The timing here suggests that our main thread is blocked, and then suddenly becomes available again and serves both a normal peer (p2p) and your pool (sv2). cc @vasild I can't reproduce with testnet4 either, but will try on an Intel Mac next week. |
Building with this flag seems to have improved things a little bit (I don't know if it could be related in any way :D). The problem starts to manifest when I connect a second client to it though. |
Debug mode is slower, but maybe there's some threading behavior difference that "helps". But that's not a bug fix. |
I ran the
Using the Release (default) build type since you mentioned the Debug build didn't reproduce the issue. I started the node:
Note that I waited for testnet4 to sync. I then ran the pool role from your branch, against testnet4. After a few seconds I stopped it and started again. It indeed takes a very long time to connect. The more detailed log provides clues. The pool was stopped at:
The TP didn't notice this for 90 seconds until it tried to send a new template:
Even then it took 5 seconds to fail. But more importantly, I had already restarted the pool by this time.
Six minutes (!) went by before the handshake completed, according to the pool. But notice that the Template Provider "only" need two minutes to answer the new connection:
But it took a minute before it processed the handshake:
And another minute to send the handshake reply:
And another 90 seconds to finish the handshake:
It doesn't appear that the main node thread is blocked, because in the mean time there's plenty of p2p traffic. Perhaps there's something wrong in the TP event loop. Or perhaps it's waiting on the socket for something more to happen. |
Have you also tried with release I haven't noticed the issue there, so my impression is that some change between Let me know if I can help with anything else! |
On possibility that has been raised is that |
I pushed new commit 6efae13 to the Suggested testing with:
I also noticed a problem, which I've solved with an ugly hack: if a client disconnected while Perhaps this can be done cleanly in combination with the |
The template provider main loop in When a new connection arrives |
@vasild that's probably unchanged between my old code and the new commit, so I'll look into that more.
Much longer in the new version. It first waits The previous version would only run for 1 second. |
Is there an existing issue for this?
Current behaviour
I tested
v.0.1.14
together with my corresponding PR on SRI: stratum-mining/stratum#1456, to test the newCoinbaseOutputConstraints
message.Using just the Pool, it's easy to see that the first time it connects to the TP, everything is fine.
But if I restart the Pool, the second handshake takes a lot of time to finish, and sometimes it doesn't finish at all.
Expected behaviour
As with previous versions, the handshakes should be completed quickly
Steps to reproduce
Run the latest version of TP (
v.0.1.14
) and checkout my PR on SRI: stratum-mining/stratum#1456.Then run the Pool role for the first time, kill it, and then start it again.
Relevant log output
No response
How did you obtain Bitcoin Core
Compiled from source
What version of Bitcoin Core are you using?
TP v.0.1.14
Operating system and version
Mac OS 13.0.1
Machine specifications
No response
The text was updated successfully, but these errors were encountered: