For the hostgroup with a single backend, if the backend server goes down, the ProxySQL will return the error "Max connect timeout reached while reaching hostgroup" after the "mysql-connect_timeout_server_max" duration elapses if the backend is marked as "SHUNNED". In contrast, if establishing a direct connection without ProxySQL, the client will get a "connection refused" error or error code 111 immediately, which makes more sense.
After debugging the latest binary, it appears that ProxySQL keeps attempting to get a good connection at intervals controlled by "mysql-connect_retries_delay" until "mysql-connect_timeout_server_max" is reached according to the method MySQL_Session::handler_again___status_CONNECTING_SERVER.
|
bool MySQL_Session::handler_again___status_CONNECTING_SERVER(int *_rc) { |
And the retry is not even controlled by "mysql-connect_retries_on_failure" (as it should be) because the process always falls into this condition in the scenario I mentioned above.
|
if (mybe->server_myds->myconn==NULL) { |
If debug further, we will find the method MyHGC::get_random_MySrvC returns NULL all the time during the retry for this single "SHUNNED" backend.
|
MySrvC *MyHGC::get_random_MySrvC(char * gtid_uuid, uint64_t gtid_trxid, int max_lag_ms, MySQL_Session *sess) { |
According to this method, the ProxySQL should attempt to bring the "SHUNNED" backend online but it fails all the time because the "mysrvc->time_last_detected_error" is always in the future related to "mysql-monitor_ping_interval".
// if Monitor is enabled and mysql-monitor_ping_interval is
// set too high, ProxySQL will unshun hosts that are not
// available. For this reason time_last_detected_error will
// be tuned in the future
if (mysql_thread___monitor_enabled) {
int a = mysql_thread___shun_recovery_time_sec;
int b = mysql_thread___monitor_ping_interval;
b = b/1000;
if (b > a) {
t = t + (b - a);
}
}
mysrvc->time_last_detected_error = t;
I'm not sure whether the logic is intentionally designed this way. However, the retry time should be controlled by 'mysql-connect_retries_on_failure' rather than its current implementation. Additionally, I believe it would be more effective to return the exact failure error to the client.
For the hostgroup with a single backend, if the backend server goes down, the ProxySQL will return the error "Max connect timeout reached while reaching hostgroup" after the "mysql-connect_timeout_server_max" duration elapses if the backend is marked as "SHUNNED". In contrast, if establishing a direct connection without ProxySQL, the client will get a "connection refused" error or error code 111 immediately, which makes more sense.
After debugging the latest binary, it appears that ProxySQL keeps attempting to get a good connection at intervals controlled by "mysql-connect_retries_delay" until "mysql-connect_timeout_server_max" is reached according to the method MySQL_Session::handler_again___status_CONNECTING_SERVER.
proxysql/lib/MySQL_Session.cpp
Line 3177 in 27e71d2
And the retry is not even controlled by "mysql-connect_retries_on_failure" (as it should be) because the process always falls into this condition in the scenario I mentioned above.
proxysql/lib/MySQL_Session.cpp
Line 3244 in 27e71d2
If debug further, we will find the method MyHGC::get_random_MySrvC returns NULL all the time during the retry for this single "SHUNNED" backend.
proxysql/lib/MyHGC.cpp
Line 55 in 27e71d2
According to this method, the ProxySQL should attempt to bring the "SHUNNED" backend online but it fails all the time because the "mysrvc->time_last_detected_error" is always in the future related to "mysql-monitor_ping_interval".
I'm not sure whether the logic is intentionally designed this way. However, the retry time should be controlled by 'mysql-connect_retries_on_failure' rather than its current implementation. Additionally, I believe it would be more effective to return the exact failure error to the client.