-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Possible concurrency issue with pal_os_event #83
Comments
Hello @matteofumagalli1275 optiga-trust-m/examples/utilities/pkcs11_trustm.c Line 1321 in 3b2a33c
An additional example of a real world qualified software (with multi-thread testing) can be found here and here |
I checked again. I already do the lock/unlock around the the apis, but the point i wanted to confirm is if the pal_os_event_trigger_registered_callback has to be synchronized with the api calls since i may get a context switch from ISR. If task A is calling for example optiga_crypt_ecc_generate_keypair, is it normal that the queue scheduler runs asynchronously in task B (event task) and and may call optiga_cmd_execute while optiga_crypt_ecc_generate_keypair has not returned yet? I attach the pal_os_event.c #define OPTIGA_TM_HANDLER_PRIORITY 4
#define OPTIGA_TM_HANDLER_STACK_SIZE configMINIMAL_STACK_SIZE*2
/*********************************************************************************************************************
* LOCAL DATA
*********************************************************************************************************************/
/// @cond hidden
/// Callback function when timer elapses
static volatile register_callback callback_registered = NULL;
static pal_os_event_t pal_os_event_0 = {0};
SemaphoreHandle_t xOSEventSemaphore = NULL;
/**
* Timer callback handler.
*
* This get called from the TIMER elapse event.<br>
* Once the timer expires, the registered callback funtion gets called from the timer event handler, if
* the call back is not NULL.<br>
*
*/
void pal_os_event_trigger_registered_callback(void)
{
register_callback func = NULL;
void *func_args = NULL;
do
{
if (xSemaphoreTake(xOSEventSemaphore, (TickType_t) portMAX_DELAY) == pdTRUE)
{
if (pal_os_event_0.callback_registered)
{
func = pal_os_event_0.callback_registered;
pal_os_event_0.callback_registered = NULL;
func_args = pal_os_event_0.callback_ctx;
func((void *)func_args);
}
}
}
while (1);
}
void _pal_os_event_trigger_registered_callback(void *pvParameters)
{
pal_os_event_trigger_registered_callback();
}
void vTimerCallback(void)
{
BaseType_t xHigherPriorityTaskWoken;
xSemaphoreGiveFromISR(xOSEventSemaphore, &xHigherPriorityTaskWoken);
portYIELD_FROM_ISR(xHigherPriorityTaskWoken);
}
void pal_os_event_init(void)
{
usec_timer_init();
xOSEventSemaphore = xSemaphoreCreateBinary();
xTaskCreate(_pal_os_event_trigger_registered_callback, "OptigaEvt",
OPTIGA_TM_HANDLER_STACK_SIZE, NULL, OPTIGA_TM_HANDLER_PRIORITY, NULL);
}
void pal_os_event_start(pal_os_event_t *p_pal_os_event, register_callback callback,
void *callback_args)
{
if (FALSE == p_pal_os_event->is_event_triggered)
{
p_pal_os_event->is_event_triggered = TRUE;
pal_os_event_register_callback_oneshot(p_pal_os_event, callback, callback_args, 1000);
}
}
void pal_os_event_stop(pal_os_event_t *p_pal_os_event)
{
//lint --e{714} suppress "The API pal_os_event_stop is not exposed in header file but used as extern in
//optiga_cmd.c"
p_pal_os_event->is_event_triggered = FALSE;
}
pal_os_event_t *pal_os_event_create(register_callback callback, void *callback_args)
{
if ((NULL != callback)&&(NULL != callback_args))
{
pal_os_event_start(&pal_os_event_0, callback, callback_args);
}
return (&pal_os_event_0);
}
/// @endcond
/**
* Platform specific event call back registration function to trigger once when timer expires.
* <br>
*
* <b>API Details:</b>
* This function registers the callback function supplied by the caller.<br>
* It triggers a timer with the supplied time interval in microseconds.<br>
* Once the timer expires, the registered callback function gets called.<br>
*
* \param[in] callback Callback function pointer
* \param[in] callback_args Callback arguments
* \param[in] time_us time in micro seconds to trigger the call back
*
*/
void pal_os_event_register_callback_oneshot(pal_os_event_t *p_pal_os_event,
register_callback callback,
void *callback_args,
uint32_t time_us)
{
configASSERT(pal_os_event_0.callback_registered == NULL);
pal_os_event_0.callback_registered = callback;
pal_os_event_0.callback_ctx = callback_args;
usec_timer_start(time_us, (usec_timer_clbk_t)vTimerCallback);
}
/**
* @}
*/
//lint --e{818,715} suppress "As there is no implementation, pal_os_event is not used"
void pal_os_event_destroy(pal_os_event_t *pal_os_event)
{
}
/**
* @}
*/ |
if the API is protected with a mutext mechanism, it shall be fine, all callback calls are synchronized and happens one by one, as soon as it's guaranteed that one API is called at once (the command + the response are atomic and protected), all underlaying API calls will be called sequentially. |
Hi, thanks for the suggestion. I'll also add that with the following modification i don't get issues: pal_os_event_lock();
crypt_sync_status = optiga_crypt_ecc_generate_keypair(me, curve_id,
(optiga_key_usage_t)(OPTIGA_KEY_USAGE_KEY_AGREEMENT | OPTIGA_KEY_USAGE_AUTHENTICATION),
FALSE,
&slot,
public_key,
(uint16_t *)&public_key_len) ;
pal_os_event_unlock(); and void pal_os_event_lock()
{
if(xOSEventMutex != NULL)
{
xSemaphoreTake(xOSEventMutex, (TickType_t) portMAX_DELAY);
}
}
void pal_os_event_unlock()
{
if(xOSEventMutex != NULL)
{
xSemaphoreGive(xOSEventMutex);
}
}
void pal_os_event_trigger_registered_callback(void)
{
register_callback func = NULL;
void *func_args = NULL;
do
{
if (xSemaphoreTake(xOSEventSemaphore, (TickType_t) portMAX_DELAY) == pdTRUE)
{
pal_os_event_lock();
if (pal_os_event_0.callback_registered)
{
func = pal_os_event_0.callback_registered;
pal_os_event_0.callback_registered = NULL;
func_args = pal_os_event_0.callback_ctx;
func((void *)func_args);
}
pal_os_event_unlock();
}
}
while (1);
} The main thing i would like to understand is the following. case OPTIGA_CMD_EXEC_REQUEST_LOCK:
case OPTIGA_CMD_EXEC_REQUEST_SESSION:
{
*exit_loop = TRUE;
if (me->cmd_sub_execution_state == OPTIGA_CMD_EXEC_REQUEST_SESSION)
{
me->exit_status = optiga_cmd_request_session(me);
}
else
{
me->exit_status = optiga_cmd_request_lock(me, OPTIGA_CMD_QUEUE_REQUEST_LOCK);
vTaskDelay(10); //MODIFICATION
}
if (OPTIGA_LIB_SUCCESS != me->exit_status)
{
EXIT_STATE_WITH_ERROR(me,*exit_loop);
break;
}
me->cmd_sub_execution_state = OPTIGA_CMD_EXEC_PREPARE_APDU;
break;
} I do get inside optiga_cmd_execute_prepare_command two times simultaneously from two different task (the main task calling the api and the event task) with the same context being written to. |
As indicated during the first post, the event task has greater priority then main task. |
Ok, I think it might be related to the point below. optiga-trust-m/examples/utilities/pkcs11_trustm.c Lines 1321 to 1352 in 3b2a33c
How you wait for the response it's up to you, but the optiga stack shall serve only for one full API call at once. In the baremetal implementation this isn't a problem as there is no alternative, in a multithread environment you need to have a command and a response to it executed atomically (from the optiga stack perspective). That shall not have any significant performance issues, as OPTIGA is capable of processing/executing only one crypto command at once. |
Hi, pal_os_lock_acquire(&optiga_mutex);
pkcs11_context.object_list.optiga_lib_status = OPTIGA_LIB_BUSY;
pal_os_event_lock();
xResult = optiga_crypt_ecc_generate_keypair(pkcs11_context.object_list.optiga_crypt_instance,
OPTIGA_ECC_CURVE_NIST_P_256,
(uint8_t)OPTIGA_KEY_USAGE_SIGN,
FALSE,
&lOptigaOid,
pucDumbData,
&ucDumbDataLength);
pal_os_event_unlock();
if (OPTIGA_LIB_SUCCESS != xResult)
{
PKCS11_PRINT( ( "ERROR: Failed to invalidate a keypair \r\n" ) );
xResult = CKR_FUNCTION_FAILED;
}
if (OPTIGA_LIB_SUCCESS == xResult)
{
while (OPTIGA_LIB_BUSY == pkcs11_context.object_list.optiga_lib_status)
{
}
// Either by timout or because of success it should end up here
if (OPTIGA_LIB_SUCCESS != pkcs11_context.object_list.optiga_lib_status)
{
PKCS11_PRINT( ( "ERROR: Failed to invalidate a keypair \r\n" ) );
xResult = CKR_FUNCTION_FAILED;
}
}
pal_os_lock_release(&optiga_mutex); I think that this condition occurs if i wakeup from IRQ and event task has superior priority then application task. |
Hi, _STATIC_H void optiga_cmd_queue_reset_slot(const optiga_cmd_t * me)
{
// Reset the arrival time
me->p_optiga->optiga_cmd_execution_queue[me->queue_id].arrival_time = 0xFFFFFFFF;
//add optiga_cmd ctx
me->p_optiga->optiga_cmd_execution_queue[me->queue_id].registered_ctx = NULL;
//add request type
me->p_optiga->optiga_cmd_execution_queue[me->queue_id].request_type = 0;
// set the slot state to assigned
me->p_optiga->optiga_cmd_execution_queue[me->queue_id].state_of_entry = OPTIGA_CMD_QUEUE_ASSIGNED;
// start the event scheduler
pal_os_event_start(me->p_optiga->p_pal_os_event_ctx, optiga_cmd_queue_scheduler, me->p_optiga);
} What i see is that when i opened the application, it keeps running the pal_os_event_register_callback_oneshot every ms even if i'm not doing anything. // call self
pal_os_event_register_callback_oneshot(my_os_event, optiga_cmd_queue_scheduler,
p_optiga_ctx,OPTIGA_CMD_SCHEDULER_IDLING_TIME_MS); So when i call any API i see that the queue gets written and then it relies on the optiga_cmd_queue_scheduler that is periodically called to poll the queue. If you want i can show you a live debug, it would probably be faster for both of us. |
no, it is correct. Till any of the created util/crypt instances are in the memory the scheduler will work to receive new commands from the user application for further processing. The event tasks does push the internal queue to complete all running sessions in the right order. I'm trying to understand now the issue you have or you may have, recalling your previous post:
In general that doesn't sound like a valid thing, as only one command to send shall be prepared at the time. Could it be because you have introduced that task delay? The internal state machine is designed in a way that one task (main) shall wait till the event task pushes an internal queue to process the stack and receive the final response from the chip. |
HI, |
No open questions or response requested since 14 December 2021. Issue will be closed. |
Prerequisites
Can you reproduce the problem reliably?
Yes
Did you check current release notes for known issues?
Yes
Description:
Optiga queue scheduler stops running after a 10 minute benchmark.
Steps to Reproduce:
Event task running with priority 4
Main task running with priority 3
Timer that triggers event runs with its own peripheral and IRQ and wakes event task with xSemaphoreGiveFromISR
Run a benchmark code with continuous signature and verify
Additional info:
Shielded connection is currently disabled.
Actual Result:
When we call for example optiga_crypt_ecc_generate_keypair it calls optiga_cmd_execute with state OPTIGA_CMD_EXEC_PREPARE_COMMAND and substate OPTIGA_CMD_EXEC_REQUEST_LOCK.
After 'optiga_cmd_request_lock' the queue scheduler may be triggered, thus processing the next phase while OPTIGA_CMD_EXEC_PREPARE_APDU hasn't been set yet.
I found this condition while logging each phase and subphase.
I saw that in normal condition a subphase OPTIGA_CMD_EXEC_REQUEST_LOCK is followed by OPTIGA_CMD_EXEC_PREPARE_APDU, but when the library gets stuck i see two consecutive OPTIGA_CMD_EXEC_REQUEST_LOCK.
By wrapping each optiga_crypt function with enter/exit critical the benchmark nevers stops.
Shouldn't each api be wrapped with a mutex since both the event task and an application task may call optiga_cmd_execute?
Or is it wrong that i see the queue scheduler always running (even if i'm not encrypting anything)?
Frequency (1/1, 1/50, 1/100):
1/1
Build/Commit:
optiga-trust-m-release-v3.1.2
Target:
STM32F7
Windows
Compiler:
IAR 8.32
Environment
FreeRTOS
The text was updated successfully, but these errors were encountered: