-
Notifications
You must be signed in to change notification settings - Fork 43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pool: log error when failed to connect to shard #276
Conversation
@muzarski let add an integration test with the relevant case (in the issue you have most of what needed) also I'm not sure about solution here, it's gonna hide any failure from the connection method |
total_shards=self.host.sharding_info.shards_count) | ||
conn.original_endpoint = self.host.endpoint | ||
except Exception as exc: | ||
log.error("Failed to open connection to %s, on shard_id=%i: %s", self.host, shard_id, exc) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reraise the exception here, otherwise it gonna hide all possible failure from the connection creation, and there plently of those.
also I'm not sure how this fixes anything regarding the blocked shard aware port
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR doesn't fix the issue mentioned in the comment.
As mentioned in the issue, the log messages are very misleading - they don't suggest that shard-aware port is blocked. The only thing I change here is to log something in case of shard-aware CQL connection failure.
The main issue (ThreadPoolExecutor
blocking enqueued tasks) is still there.
@fruch but the main issue is still not fixed. Is there a point to add an integration test? |
I was more concerned with the behavior change you introduced. anyhow the log doesn't really help in directing a use to what is the issue, and what he can do to handle it. |
Yeah, I forgot to reraise the exception - fixed it already.
I'm not sure I understand correctly. As for me, I wish I'd seen this kind of log message when I had debugged the issue. During the SCT run, there was no information regarding the shard-aware CQL connection failure - nothing suggested that this was the real issue. |
@Lorak-mmk can you review this one ? |
@muzarski Please rebase on top of the current master. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with the change - previously the exception was silent and it was hard to debug connection problems. The printed exception should give some clues to why the connection has failed.
cassandra/pool.py
Outdated
conn.original_endpoint = self.host.endpoint | ||
except Exception as exc: | ||
log.error("Failed to open connection to %s, on shard_id=%i: %s", self.host, shard_id, exc) | ||
raise exc |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't it be better to just raise
? that way you will preserve original backtrace iiuc
One thing I'd like to know about this change that I won't be able to understand from code changes: What happens with the exception? Why isn't it logged? |
4ab160b
to
db5df52
Compare
In the original issue we encountered during SCT run, the error was silently dropped in the for shard_id in range(self.host.sharding_info.shards_count):
if skip_shard_id is not None and skip_shard_id == shard_id:
continue
future = self._session.submit(self._open_connection_to_missing_shard, shard_id)
if isinstance(future, Future):
self._connecting.add(shard_id)
self._shard_connections_futures.append(future) The I believe the reason that this exception was ignored, is the fact that we don't check what the future holds (either result or exception) after execution. I'll add a short description in the commit message. |
The exception from `HostConnection::_open_connection_to_missing_shard` during connection failure is silently dropped by the callers. This function is submitted to the `ThreadPoolExecutor` which assigns the result of this function to the future (either success or exception). The callers throughout the code ignore the future's result and that is why this exception is ignored. In this commit we add an error log when opening a connection to the specific shard fails.
db5df52
to
c7e6ebb
Compare
v2:
|
This change is related to #275.
Silently ignoring the connection failure can be misleading, which was the case in the issue provided above.
This PR adds an error log when driver fails to open the CQL connection to specific shard.