Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New 5.9.0-dev networking is not allowing legitimate players to connect under some rare circumstances #14765

Open
KaylebJay opened this issue Jun 20, 2024 · 6 comments
Labels
@ Network Regression Something that used to work no longer does. Unconfirmed bug Bug report that has not been confirmed to exist/be reproducible

Comments

@KaylebJay
Copy link

Minetest version

Minetest-5.9.0-dev-93f4844c

Irrlicht device

No response

Operating system and version

Ubuntu 20.04

CPU model

No response

GPU model

No response

Active renderer

No response

Summary

Since upgrading our server to Minetest 5.9.0-dev, we have had two cases now about users not being able to log in, but having no difficulty logging into other servers that are on 5.8.0 or a later version.

The first case involves one of our users (we'll call her Jane Doe) that had no problems logging in before, and started having difficulty joining (whenever she joined, she would get a Connection timed out message). After investigating a bit, we applied this patch: https://gitlab.com/tunnelers-abyss/minetest/-/commit/f9ece9553a970dd550bf2f97b7999580ae60f502
This allowed Jane Doe to get past the initial connecting phase, but her client would then stall on Item Definitions. After waiting for 10 or 15 minutes there, her client finally logged in. Once she is in-game, it's mostly smooth sailing. Sometimes it is much faster to login, sometimes it is slower, and when the server was on 5.8.0 she apparently did not have this problem. Her internet is also pretty good, so it's not that, and she doesn't have any firewalls / something similar that might be happening here.

The second case involves a user that has trouble logging in (stuck at Item Definitions again), but when they finally log in, everything moves at a snail's pace around them - almost no packets are being sent/received for some reason - when this user was fine before the upgrade. Additionally, this user has like 80mb up/down internet, which is more than enough...

I don't really have the time to do deep debugging with Wireshark or something similar, which is why tweaking the new checks in the networking code was my first thought. The vast majority of our users have had absolutely no problems, which makes this a bit more difficult to debug.

Steps to reproduce

I really have no idea. If you give me things to try I can test them with Jane Doe, who wants very much to get this issue fixed!

@KaylebJay KaylebJay added the Unconfirmed bug Bug report that has not been confirmed to exist/be reproducible label Jun 20, 2024
@sfan5 sfan5 added Regression Something that used to work no longer does. @ Network labels Jun 20, 2024
@sfan5
Copy link
Member

sfan5 commented Jun 20, 2024

The changes in question: #14217.

What would help is a verbose log from the server (ideally the client too) that shows the problems being reproduced.

@KaylebJay
Copy link
Author

KaylebJay commented Jun 20, 2024

This is what we had gathered earlier: (The user's name and IP address has been obfuscated just in case she wanted that. I'm OK with sharing the IP address privately but I'm not sure if that would even help in this situation.)

2024-06-04 17:53:57: INFO[Server]: Server: New connection: "jane_doe" from 177.26.134.237 (peer_id=12171)
2024-06-04 17:55:20: INFO[Server]: Server: New connection: "jane_doe" from 177.26.134.237 (peer_id=46659)
... no other logs ...
2024-06-04 17:54:28: INFO[ConnectionSend]: con(4/1)RunTimeouts(): Peer 12171 has timed out (outgoing reliables channel=0)
2024-06-04 17:56:10: INFO[ConnectionSend]: con(4/1)RunTimeouts(): Peer 46659 has timed out (outgoing reliables channel=0)

After we applied the fix mentioned in the OP, we got this:

2024-06-04 18:44:51: INFO[Server]: Server: New connection: "jane_doe" from 177.26.134.237 (peer_id=32695)
2024-06-04 18:44:53: WARNING[ConnectionSend]: con(4/1) Packet quota used up for peer_id=32695, was 682 pkts

Immediately after, the client stalled at Item Definitions and no new information about the client was logged or sent to the server. After 10 minutes or so, finally the client joined, with no other interesting logs.

@sfan5
Copy link
Member

sfan5 commented Jun 20, 2024

The entire log please, I can't work with fragments.
The best way would be to create an isolated test server that is otherwise identical and have the user connect there.
And also: are they able to connect to other 5.9.0-dev servers?

@KaylebJay
Copy link
Author

This is all I got - I could try running the server in verbose logging and have Jane Doe connect for more information. We ran it in info logging before - is that not good?

I hadn't checked that, because I don't know of any other servers that are using 5.9.0-dev with the new networking code. That's a good idea though, we'll check that if we can.

@KaylebJay
Copy link
Author

I checked quickly and we are getting many more timeouts from users with the new networking code as well, around 400-500% more than previously.

@sfan5
Copy link
Member

sfan5 commented Jun 21, 2024

This is all I got - I could try running the server in verbose logging and have Jane Doe connect for more information. We ran it in info logging before - is that not good?

Yes, verbose logging is needed. trace would be even better.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
@ Network Regression Something that used to work no longer does. Unconfirmed bug Bug report that has not been confirmed to exist/be reproducible
Projects
None yet
Development

No branches or pull requests

2 participants