Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make the coordinator less CPU-spin happy #30

Open
ms705 opened this issue Jul 14, 2015 · 8 comments
Open

Make the coordinator less CPU-spin happy #30

ms705 opened this issue Jul 14, 2015 · 8 comments

Comments

@ms705
Copy link
Collaborator

ms705 commented Jul 14, 2015

Currently, the coordinator spins hard on incoming messages in the StreamSocketsAdapter when doing asynchronous receives. This was the easiest way of implementing the functionality, but is not actually required.

Instead, we should rework the StreamSocketsAdapter and StreamSocketsChannel code to use the Boost ASIO proactor pattern correctly (with a finishing async receive callback setting up the next async receive), or move to a libev or select-based implementation.

This isn't a correctness issue, but a performance one: the coordinator currently uses an entire CPU core on each machine. Usually, that's not an issue, but we may as well not waste resources unnecessarily.

@mgrosvenor
Copy link
Contributor

An alternative option would be to move to CamIO2, which can be made to sleep, but will still have an asyc pattern for speed.

On 14 Jul 2015, at 4:57 pm, Malte Schwarzkopf [email protected] wrote:

Currently, the coordinator spins hard on incoming messages in the StreamSocketsAdapter when doing asynchronous receives. This was the easiest way of implementing the functionality, but is not actually required.

Instead, we should rework the StreamSocketsAdapter and StreamSocketsChannel code to use the Boost ASIO proactor pattern correctly (with a finishing async receive callback setting up the next async receive), or move to a libev or select-based implementation.

This isn't a correctness issue, but a performance one: the coordinator currently uses an entire CPU core on each machine. Usually, that's not an issue, but we may as well not waste resources unnecessarily.


Reply to this email directly or view it on GitHub #30.

@WIZARD-CXY
Copy link

WIZARD-CXY commented May 4, 2016

Now I met the same problem with the latest code, the coordinator just uses 100% cpu
2016-05-04 9 41 45
I'm not sure whether it is caused by this issue?

@WIZARD-CXY
Copy link

ping @ms705 Is there updates in this issue?

@ms705
Copy link
Collaborator Author

ms705 commented May 12, 2016

Hi @WIZARD-CXY,

Sorry for the delayed response -- we've been bogged down with a bunch of paper deadlines recently.

This issues is still outstanding. Our plan to fix it is to replace our own home-baked, polling RPC layer with gRPC, which will fix the spinning. However, it will take us a while to implement this.

In the meantime, you can work around the issue by adding a sleep() or nanosleep() invocation at line 83 in stream_sockets_adapter.h. If you sleep there (e.g., for 5ms), you will slow down the receive loop, leading to reduced CPU consumption. However, your RPC processing latency will also be increased.

Hope that helps!

@WIZARD-CXY
Copy link

@ms705 Thanks, I will give it a shot tomorrow!

@WIZARD-CXY
Copy link

2016-05-16 3 14 48
Is this the right trick?I added usleep above. but the coordinator still using 100% cpu @ms705

@ms705
Copy link
Collaborator Author

ms705 commented May 16, 2016

@WIZARD-CXY Ah -- this workaround only works if there actually are any channels (i.e., other running coordinators or running tasks). Otherwise, the code falls into the early exit condition on line 59 and you spin until channels show up.

Here's a patch that works even when there are no channels (with the same provision re RPC latency as above):

diff --git a/src/engine/node.cc b/src/engine/node.cc
index 5c01a75..6353e55 100644
--- a/src/engine/node.cc
+++ b/src/engine/node.cc
@@ -106,6 +106,7 @@ void Node::AwaitNextMessage() {
   VLOG(3) << "Waiting for next message from adapter...";
   m_adapter_->AwaitNextMessage();
-  //boost::this_thread::sleep(boost::posix_time::seconds(1));
+  usleep(500000);
 }

In other words, add the usleep invocation to AwaitNextMessage() in src/engine/node.cc. If you add this, you can get rid of the usleep statements in stream_sockets_adapter.h.

@WIZARD-CXY
Copy link

It works now@ms705, thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants