-
-
Notifications
You must be signed in to change notification settings - Fork 646
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pass through an RpcFinished(None) when grpc write streams are terminated early #7422
pass through an RpcFinished(None) when grpc write streams are terminated early #7422
Conversation
0f2ec4d
to
fec857f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm pretty uncomfortable with blindly ignoring errors here without understanding why they're happening... Do you know when RpcFinished(None)
is happening and why? Is it a server-side issue we're working around? Is it an issue in grpcio
we're working around? In particular, this seems to be common, because there are already retries, so if this is happening often enough that requests are failing despite multiple attempts, it would be really good to address the root cause.
fec857f
to
ed9db30
Compare
88b5729
to
be8d5e9
Compare
Marked WIP as per the above comment. This PR currently just has logging changes that are being used to diagnose this. |
391a11d
to
784df66
Compare
As per discussion with @illicitonion "offline", I'm going to be splitting out the changes adding logging to a separate PR and focus this one on specifically the |
784df66
to
40a2c42
Compare
40a2c42
to
3644a17
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
…ted early (#7422) ### Problem When using remotely executed scala compiles by setting `--execution-strategy=hermetic` with zinc compiles and pointing the `--remote-execution-server` and `--remote-store-server`s at an internal instance of https://github.com/twitter/scoot to compile scala in https://github.com/twitter/util, we are seeing some [`RpcFinished(None)` errors](https://docs.rs/grpcio/0.4.2/grpcio/enum.Error.html) causing upload digest requests to fail after a few retries. In #6344 we see a case of a spurious `RpcFinished(None)` which caused us to fork grpcio. We believe this is similarly a spurious failure caused by a non-erroneous race condition, as in logs from Scoot we are seeing that the digests reporting a failure have successfully been uploaded to the remote store. In testing, we have found that this occurs with files which are large and are used by multiple process requests (e.g. `scala-reflect.jar` in zinc compilations). We believe the `RpcFinished(None)` we are converting into an `Ok(None)` here occurs when pants has multiple concurrent write requests against the remote store, and the remote store cancels the others after it successfully receives the first write request. ### Solution - Pass through an [`RpcFinished(None)`](https://docs.rs/grpcio/0.4.2/grpcio/enum.Error.html) in `store.rs`. ### Result Pants no longer considers it a failure to receive an `RpcFinished(None)` response. The discussion is ongoing as to whether this should be an accepted part of the remote execution API at https://groups.google.com/forum/#!topic/remote-execution-apis/NXUe3ItCw68/discussion.
Problem
When using remotely executed scala compiles by setting
--execution-strategy=hermetic
with zinc compiles and pointing the--remote-execution-server
and--remote-store-server
s at an internal instance of https://github.com/twitter/scoot to compile scala in https://github.com/twitter/util, we are seeing someRpcFinished(None)
errors causing upload digest requests to fail after a few retries.In #6344 we see a case of a spurious
RpcFinished(None)
which caused us to fork grpcio. We believe this is similarly a spurious failure caused by a non-erroneous race condition, as in logs from Scoot we are seeing that the digests reporting a failure have successfully been uploaded to the remote store. In testing, we have found that this occurs with files which are large and are used by multiple process requests (e.g.scala-reflect.jar
in zinc compilations). We believe theRpcFinished(None)
we are converting into anOk(None)
here occurs when pants has multiple concurrent write requests against the remote store, and the remote store cancels the others after it successfully receives the first write request.Solution
RpcFinished(None)
instore.rs
.Result
Pants no longer considers it a failure to receive an
RpcFinished(None)
response. The discussion is ongoing as to whether this should be an accepted part of the remote execution API at https://groups.google.com/forum/#!topic/remote-execution-apis/NXUe3ItCw68/discussion.