You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently the C API (implemented in #915) calls abort() if any exception is thrown in the underlying C++ functions called. This is not ideal because some exceptions are non fatal and should be able to be handled (eg DBConnector throwing an exception when the database is not yet online, but it might come online later). Since it is not permitted to unwind through a C ABI function, we must decide on an appropriate method for passing exceptions across the ffi boundary.
Here are a few options and thoughts on the matter. Opinions welcome, as internal discussion hasn't come to an agreement.
Hard coding non-fatal exceptions
We could enumerate all non-fatal exceptions and code those into the signature of the C apis.
Eg: we would change SWSSDBConnector_new_tcp to something like this
// If this function returns null, db did not respond, but it might if you retrySWSSDBConnectorSWSSDBConnector_new_tcp(int32_tdbId, constchar*hostname, uint16_tport, uint32_ttimeout) {
SWSSTry({
try {
return (SWSSDBConnector) newDBConnector(dbId, string(hostname), port, timeout);
} catch (system_error&e) {
// we know this error is nonfatal so we hard coded it in herereturnnullptr;
}
// any other error will still abort() because any other error must be fatal
});
}
This is not ideal because now the C api becomes logically coupled to the implementation of the underlying function. If another nonfatal exception is added to DBConnector, they also have to change this C function to properly return null. We will have to do this for every possible nonfatal exception that every C function could trigger.
This may be OK if there are very few nonfatal exceptions, we expect to practically never add more, and if the C api stays very small. Otherwise, anyone who works on swsscommon now also has to understand these implications on the C api. It is also definitely the cleanest and best if we do think we can get away with it.
Global error string, never abort
We could introduce a error interface to the C api that provides informational error strings instead of aborting. This is probably a bad way of doing it, but I mention it because this is how I previously implemented the C api. We store errors as a global error string, and SWSSTry + code using it looks something like this:
#include<cstring>// for strdup#include<cstdlib>// for freeconstchar*globalErrorString=nullptr;
extern"C" { constchar*SWSSGetError(void) { returnglobalErrorString; } }
#defineSWSSTry(failure_value, ...) \
try { \
__VA_ARGS__ \
; \
} catch (std::exception &e) { \
free(globalErrorString); \
globalErrorString = strdup(e.what().c_str()); \
return failure_value; \
}
SWSSDBConnectorSWSSDBConnector_new_tcp(int32_tdbId, constchar*hostname, uint16_tport, uint32_ttimeout) {
SWSSTry(nullptr, {
// upon any exception, returns nullptr and globalErrorString is set to e.what()return (SWSSDBConnector) newDBConnector(dbId, string(hostname), port, timeout)
});
}
On the rust (or any other language) side:
fnmake_a_dbconnector(){loop{let dbconnector = unsafe{SWSSDBConnector_new_tcp(...)};if dbconnector.is_null(){// If any error occurred fatal or nonfatal, all we get is a descriptive stringlet error_str = unsafe{ str::from_utf8_unchecked(SWSSGetError())};// Magic strings!! yuck!!if error_str.contains("address not available"){// nonfatal error - retry latersleep_10_seconds();}else{panic!("fatal error: {error_str}");}}else{return dbconnector;}}}
I originally wrote the api this way, but it was nixed by Riff (#915 (comment)) because we are losing the core dump/stack trace for fatal errors. Another problem is that all we get is a string - if we want to catch specific exceptions, we have to know what e.what() will return. This is really bad because foreign code is logically coupled to the exact exception messages in swss-common. If somebody fixes a typo in an exception message, that the magic string in foreign code is now wrong!
Global error string, elective abort using pseudo-try/catch
Here's another approach to an error string interface that may be less bad. Introduce a generic try/catch to the C interface, like so:
boolglobalDisableAborts= false;
constchar*globalErrorString=nullptr;
extern"C" {
voidSWSSTry_(void) { free(globalErrorString); globalErrorString=nullptr; globalDisableAborts= true; }
constchar*SWSSCatch(void) { globalDisableAborts= false; returnglobalErrorString; }
}
#defineSWSSTry(failure_value, ...) \
try { \
__VA_ARGS__ \
; \
} catch (std::exception &e) { \
if (globalDisableAborts) {
free(globalErrorString); \
globalErrorString=strdup(e.what().c_str()); \
returnfailure_value; \
} else { \
SWSS_LOG_ERROR(... e.what() ...); \
abort(); \
} \
}
```rsfnmake_a_dbconnector_retrying() {
// We will only try 10 times before performing an aborting callfortriesin0..10 {
// begin a pseudo try blockunsafe { SWSSTry_() };
letdbconnector=unsafe { SWSSDBConnector_new_tcp(...) };
// end the pseudo try block, get the error if one happened.leterr=unsafe { SWSSCatch() };
iferr.is_null() {
// happy path: error is null so there was no error - dbconnector is validreturndbconnector;
} else {
// This might've been a fatal or nonfatal error but we aren't checking with a magic string. // We don't need to check because we are limiting our number of tries.leterror_str=unsafe { str::from_utf8_unchecked(SWSSGetError()) };
println!("error: {error_str}, retrying");
sleep_10_seconds();
}
}
// We have tried 10 times - it's probably a fatal error, so let's bite the bullet and do an aborting call// Since we did not use SWSSTry_, any exception will still abort. Just like current behavior.// We will get a nice core dump because abort() was called;letdbconnector=unsafe { SWSSDBConnector_new_tcp(...) };
returndbconnector;
}
With this solution, we can choose where to abort and where not to. Using SWSSTry_ and SWSSCatch we can retry something a few times before electing to lay down our cards and get a proper core dump. In simple cases, where any error is fatal, we can elide SWSSTry_ and SWSSCatch, and use aborting behavior on the first try.
Reimplement C++ standard exceptions for better exception catching & matching?
Prior art of note in this discussion is SWIG's solution to exceptions, which is to implement and FFI safe version of every standard C++ exception. This seems like a good idea, but I am unclear on how much work it would be to maintain. This would make matching on caught exceptions somewhat less magical, since we can consistently match on exception class (assuming swsscommon never throws non-standard exception types). https://github.com/swig/swig/blob/master/Lib/exception.i
The text was updated successfully, but these errors were encountered:
Currently the C API (implemented in #915) calls abort() if any exception is thrown in the underlying C++ functions called. This is not ideal because some exceptions are non fatal and should be able to be handled (eg
DBConnector
throwing an exception when the database is not yet online, but it might come online later). Since it is not permitted to unwind through a C ABI function, we must decide on an appropriate method for passing exceptions across the ffi boundary.Here are a few options and thoughts on the matter. Opinions welcome, as internal discussion hasn't come to an agreement.
Hard coding non-fatal exceptions
We could enumerate all non-fatal exceptions and code those into the signature of the C apis.
Eg: we would change SWSSDBConnector_new_tcp to something like this
This is not ideal because now the C api becomes logically coupled to the implementation of the underlying function. If another nonfatal exception is added to DBConnector, they also have to change this C function to properly return null. We will have to do this for every possible nonfatal exception that every C function could trigger.
This may be OK if there are very few nonfatal exceptions, we expect to practically never add more, and if the C api stays very small. Otherwise, anyone who works on swsscommon now also has to understand these implications on the C api. It is also definitely the cleanest and best if we do think we can get away with it.
Global error string, never abort
We could introduce a error interface to the C api that provides informational error strings instead of aborting. This is probably a bad way of doing it, but I mention it because this is how I previously implemented the C api. We store errors as a global error string, and SWSSTry + code using it looks something like this:
On the rust (or any other language) side:
I originally wrote the api this way, but it was nixed by Riff (#915 (comment)) because we are losing the core dump/stack trace for fatal errors. Another problem is that all we get is a string - if we want to catch specific exceptions, we have to know what
e.what()
will return. This is really bad because foreign code is logically coupled to the exact exception messages in swss-common. If somebody fixes a typo in an exception message, that the magic string in foreign code is now wrong!Global error string, elective abort using pseudo-try/catch
Here's another approach to an error string interface that may be less bad. Introduce a generic try/catch to the C interface, like so:
With this solution, we can choose where to abort and where not to. Using
SWSSTry_
andSWSSCatch
we can retry something a few times before electing to lay down our cards and get a proper core dump. In simple cases, where any error is fatal, we can elideSWSSTry_
andSWSSCatch
, and use aborting behavior on the first try.Reimplement C++ standard exceptions for better exception catching & matching?
Prior art of note in this discussion is SWIG's solution to exceptions, which is to implement and FFI safe version of every standard C++ exception. This seems like a good idea, but I am unclear on how much work it would be to maintain. This would make matching on caught exceptions somewhat less magical, since we can consistently match on exception class (assuming swsscommon never throws non-standard exception types).
https://github.com/swig/swig/blob/master/Lib/exception.i
The text was updated successfully, but these errors were encountered: