Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Busy loop when client is connected #12

Open
kpcyrd opened this issue Jan 15, 2018 · 7 comments
Open

Busy loop when client is connected #12

kpcyrd opened this issue Jan 15, 2018 · 7 comments

Comments

@kpcyrd
Copy link
Contributor

kpcyrd commented Jan 15, 2018

I've noticed that, if a client connects to the server, both the server and the client enter a busy loop until one of them disconnects. Both of them call epoll_pwait as fast as possible until data is returned. I assume there is a timeout of 0 somewhere that shouldn't be there.

[...]
[pid  7563] epoll_pwait(3, [{EPOLLOUT, {u32=3, u64=3}}], 1024, -1, NULL, 8) = 1
[pid  7563] epoll_pwait(3, [{EPOLLOUT, {u32=3, u64=3}}], 1024, -1, NULL, 8) = 1
[pid  7563] epoll_pwait(3, [{EPOLLOUT, {u32=3, u64=3}}], 1024, -1, NULL, 8) = 1
[pid  7563] epoll_pwait(3, [{EPOLLOUT, {u32=3, u64=3}}], 1024, -1, NULL, 8) = 1
[pid  7563] epoll_pwait(3, [{EPOLLOUT, {u32=3, u64=3}}], 1024, -1, NULL, 8) = 1
[pid  7563] epoll_pwait(3, [{EPOLLOUT, {u32=3, u64=3}}], 1024, -1, NULL, 8) = 1
[pid  7563] epoll_pwait(3, [{EPOLLOUT, {u32=3, u64=3}}], 1024, -1, NULL, 8) = 1
[pid  7563] epoll_pwait(3, [{EPOLLOUT, {u32=3, u64=3}}], 1024, -1, NULL, 8) = 1
[pid  7563] epoll_pwait(3, [{EPOLLOUT, {u32=3, u64=3}}], 1024, -1, NULL, 8) = 1
[pid  7563] epoll_pwait(3, [{EPOLLOUT, {u32=3, u64=3}}], 1024, -1, NULL, 8) = 1
[pid  7563] epoll_pwait(3, [{EPOLLOUT, {u32=3, u64=3}}], 1024, -1, NULL, 8) = 1
[pid  7563] epoll_pwait(3, [{EPOLLOUT, {u32=3, u64=3}}], 1024, -1, NULL, 8) = 1
[pid  7563] epoll_pwait(3, [{EPOLLOUT, {u32=3, u64=3}}], 1024, -1, NULL, 8) = 1
[pid  7563] epoll_pwait(3, [{EPOLLOUT, {u32=3, u64=3}}], 1024, -1, NULL, 8) = 1
[pid  7563] epoll_pwait(3, [{EPOLLOUT, {u32=3, u64=3}}], 1024, -1, NULL, 8) = 1
[pid  7563] epoll_pwait(3, [{EPOLLOUT, {u32=3, u64=3}}], 1024, -1, NULL, 8) = 1
[pid  7563] epoll_pwait(3, [{EPOLLOUT, {u32=3, u64=3}}], 1024, -1, NULL, 8) = 1
[pid  7563] epoll_pwait(3, [{EPOLLOUT, {u32=3, u64=3}}], 1024, -1, NULL, 8) = 1
[pid  7563] epoll_pwait(3, [{EPOLLOUT, {u32=3, u64=3}}], 1024, -1, NULL, 8) = 1
[pid  7563] epoll_pwait(3, [{EPOLLOUT, {u32=3, u64=3}}], 1024, -1, NULL, 8) = 1
[pid  7563] epoll_pwait(3, [{EPOLLOUT, {u32=3, u64=3}}], 1024, -1, NULL, 8) = 1
[...]
@kpcyrd
Copy link
Contributor Author

kpcyrd commented Jan 15, 2018

Logs from the server:

during connect:

DEBUG:scaproust::core::socket: [Socket:0] recv
DEBUG:scaproust::proto::rep: [Socket:0] switch from Idle to RecvOnHold
DEBUG:scaproust::reactor::dispatcher: process_io Token(1) Readable
DEBUG:scaproust::transport::async: [Socket:0 Pipe:2] switch from Initial to HandshakeTx
DEBUG:scaproust::reactor::dispatcher: process_io Token(2) Writable
DEBUG:scaproust::transport::async: [Socket:0 Pipe:2] switch from HandshakeTx to HandshakeRx
DEBUG:scaproust::reactor::dispatcher: process_io Token(2) Readable
DEBUG:scaproust::transport::async: [Socket:0 Pipe:2] switch from HandshakeRx to Active
DEBUG:scaproust::reactor::dispatcher: process_io Token(2) Writable
DEBUG:scaproust::transport::async: [Socket:0 Pipe:2] switch from Active to Active
DEBUG:scaproust::core::socket: [Socket:0] ep 2 send ready: true 
DEBUG:scaproust::proto::rep: [Socket:0] switch from RecvOnHold to RecvOnHold
DEBUG:scaproust::reactor::dispatcher: process_io Token(2) Writable
DEBUG:scaproust::transport::async: [Socket:0 Pipe:2] switch from Active to Active
DEBUG:scaproust::reactor::dispatcher: process_io Token(2) Writable
DEBUG:scaproust::transport::async: [Socket:0 Pipe:2] switch from Active to Active
DEBUG:scaproust::reactor::dispatcher: process_io Token(2) Writable
DEBUG:scaproust::transport::async: [Socket:0 Pipe:2] switch from Active to Active
DEBUG:scaproust::reactor::dispatcher: process_io Token(2) Writable
DEBUG:scaproust::transport::async: [Socket:0 Pipe:2] switch from Active to Active
[...]

During disconnect (^C on the client):

DEBUG:scaproust::transport::async: [Socket:0 Pipe:2] switch from Active to Active
DEBUG:scaproust::reactor::dispatcher: process_io Token(2) Writable
DEBUG:scaproust::transport::async: [Socket:0 Pipe:2] switch from Active to Active
DEBUG:scaproust::reactor::dispatcher: process_io Token(2) Writable
DEBUG:scaproust::transport::async: [Socket:0 Pipe:2] switch from Active to Active
DEBUG:scaproust::reactor::dispatcher: process_io Token(2) Writable
DEBUG:scaproust::transport::async: [Socket:0 Pipe:2] switch from Active to Active
DEBUG:scaproust::reactor::dispatcher: process_io Token(2) Writable
DEBUG:scaproust::transport::async: [Socket:0 Pipe:2] switch from Active to Active
DEBUG:scaproust::reactor::dispatcher: process_io Token(2) Writable
DEBUG:scaproust::transport::async: [Socket:0 Pipe:2] switch from Active to Active
DEBUG:scaproust::reactor::dispatcher: process_io Token(2) Writable
DEBUG:scaproust::transport::async: [Socket:0 Pipe:2] switch from Active to Active
DEBUG:scaproust::reactor::dispatcher: process_io Token(2) Writable
DEBUG:scaproust::transport::async: [Socket:0 Pipe:2] switch from Active to Active
DEBUG:scaproust::reactor::dispatcher: process_io Token(2) Writable
DEBUG:scaproust::transport::async: [Socket:0 Pipe:2] switch from Active to Active
DEBUG:scaproust::reactor::dispatcher: process_io Token(2) Writable
DEBUG:scaproust::transport::async: [Socket:0 Pipe:2] switch from Active to Active
DEBUG:scaproust::reactor::dispatcher: process_io Token(2) Writable
DEBUG:scaproust::transport::async: [Socket:0 Pipe:2] switch from Active to Active
DEBUG:scaproust::reactor::dispatcher: process_io Token(2) Readable | Writable | Hup
INFO:scaproust::transport::async::state: [Socket:0 Pipe:2] error while Active: Error { repr: Custom(Custom { kind: Other, error: StringError("hup") }) }
DEBUG:scaproust::transport::async: [Socket:0 Pipe:2] switch from Active to Dead
DEBUG:scaproust::proto::rep: [Socket:0] switch from RecvOnHold to RecvOnHold
DEBUG:scaproust::transport::async: [Socket:0 Pipe:2] switch from Dead to Dead

I think the problem is the wakeup from Writable, even if no data is about to be written (?).

@blabaere
Copy link
Owner

The zero timeout in the event loop thread is there to minimize latency. And that means one core is busy spinning all the time. One way to fix it is to let the user configure the timeout in the session options. Another one is to have the endpoint state machine register to the writable event only when required, this is clearly more difficult and risky.

1 similar comment
@blabaere
Copy link
Owner

The zero timeout in the event loop thread is there to minimize latency. And that means one core is busy spinning all the time. One way to fix it is to let the user configure the timeout in the session options. Another one is to have the endpoint state machine register to the writable event only when required, this is clearly more difficult and risky.

@kpcyrd
Copy link
Contributor Author

kpcyrd commented Jan 16, 2018

Considering the additional loglines and skipping through some mio docs I'm guessing it's explicitly requesting to be notified when the socket can be written, ie. the kernel write buffer has capacity. Since this is usually always the case for idle connections, epoll_wait returns instantly.

I agree that the correct way would be deregistering on write notifications. I'd argue that having a syscall return as soon as something needs to be done should have at least equal latency as constantly polling if something needs to be done. ;) Having the process run on zero instructions on idle would also potentionally reduce wear on low-power hardware like raspberry pis.

It seems PollOpt::edge() is what most people go with, but I still need to learn some things about mio. If you're busy the next few days I'm going to read some more about mio and try to submit a PR.

@blabaere
Copy link
Owner

I remember having started with PollOpt::level and at some point switching to edge but I can't remember which feature I was implementing. I'm pretty sure it is related to the fact that protocols require pipes to raise a send_ready event after each successful send. With edge notifications this is much easier to implement. The protocol and pipe state machines interactions are very sensible to this kind of change. You can see this in the Windows-specific code for pipe active state for example.

This is because of this kind of problems that I think it would be better to rely on tokio rather than mio.
Anyway, I recommend to be extra-cautious when touching this part, it drove me mad several times already. May the force be with you on this one.

@inv2004
Copy link

inv2004 commented Jan 24, 2018

I see the same problem in my app: 100% all the time even if it is waiting for timeout. Disabled message sending and 0%.

Could you please recommend: is better to migrate to nanomsg.rs or wait for fix?

Thank you,

@blabaere
Copy link
Owner

If you need something that actually works, it really is a better idea to use nanomsg.rs. It is a rust wrapper around the original C library so one can be much more confident.

I don't see a fix happening in the near future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants