[BUG] Unstable 5G tests(suci_enabled) in github actions #2072

martin-mat · 2024-06-09T20:56:49Z

Describe the bug
5G tests fail frequently in github actions.
Example:
https://github.com/cnti-testcatalog/testsuite/actions/runs/9439225077/job/25997107636

To Reproduce
run github actions few times. Failures are quite frequent.

Expected behavior
Tests should pass if there is no error/change

svteb · 2024-06-24T17:36:55Z

EDIT: This comment is entirely incorrect, the issue likely stems from timing issues with tshark (yet there is some interesting info)

The issue stems from ueransim not having all SUCI components working during the test:

From task suci_enabled in src/tasks/workload/5g_validator.cr:

if K8sTshark.regex_tshark_log(/"nas_5gs.mm.type_id": "1"/, tshark_log_name) &&
            !K8sTshark.regex_tshark_log(/"nas_5gs.mm.suci.scheme_id": "0"/, tshark_log_name) &&
            !K8sTshark.regex_tshark_log(/"nas_5gs.mm.suci.pki": "0"/, tshark_log_name)

During failed tests the values nas_5gs.mm.suci.scheme_id and nas_5gs.mm.suci.pki are equal to 0. This seems to happen because the CNF sample_open5gs does not actually deploy "successfully" although all of its components are running. In the file cnf-testsuite.yml which is part of the CNFs directory are key values to enable encryption:

protectionScheme: 1
publicKey: 0ac95ceeb93308df01be82ff9994d8330e38804ece1700ee4b972d8028796275
publicKeyId: 1

These sometimes don't register (for as of yet unknown reasons) and thus ueransim does not use them, as can be seen in the logs from actions:

Successful run:

I, [2024-06-09 23:07:08 +00:00 #432707]  INFO -- cnf-testsuite: ue_values: amf:
  hostname: open5gs-amf-ngap

mcc: '999'
mnc: '70'
sst: 1
sd: "0x111111"
tac: '0001'

protectionScheme: 1
publicKey: '0ac95ceeb93308df01be82ff9994d8330e38804ece1700ee4b972d8028796275'
publicKeyId: 1
routingIndicator: '0000'
# protectionScheme: 0
# publicKey:
# publicKeyId: 1
# routingIndicator: '0000'

Unsuccessful run:

I, [2024-06-09 20:25:19 +00:00 #360248]  INFO -- cnf-testsuite: ue_values: amf:
  hostname: open5gs-amf-ngap

mcc: '999'
mnc: '70'
sst: 1
sd: "0x111111"
tac: '0001'





# protectionScheme: 0
# publicKey:
# publicKeyId: 1
# routingIndicator: '0000'

Will investigate further.

Refs: cnti-testcatalog#2072 cnti-testcatalog#2087 - Prior functionality was bound to fixed time of execution (120s), which introduced problems in testing (tshark session ending before the test began). - New functionality mainly implements infinite tshark execution along with the possibility of terminating it when deemed appropriate. This is complemented with robust error handling and termination of the tshark process on unexpected crashes during initialization. NOTE: The main tests currently do not handle states where a crash could occur elsewhere and thus a hanging tshark session can still happen (although unlikely). - The module is properly commented which should allow the user to get a quick understanding of its functionality. - The user functionality remains the same with easier-to-comprehend function names. - Handling of PIDs is rather problematic due to the nature of exec_by_node_bg function, which does not return the PID of the tshark process but rather the PID of the shell executing it (unverified). This is why the retrieval of PID may seem rather complicated (especially the pid_command variable). Possible solutions are listed in a comment, but these don't quite work for various reasons (globbing issues, return of incorrect PID, etc.). As for the kill -15 and kill -9 repetition, some tshark session would get stuck in a zombie state if the commands were not executed in this order. Signed-off-by: svteb <[email protected]>

…2097) Refs: #2072 #2087 - Prior functionality was bound to fixed time of execution (120s), which introduced problems in testing (tshark session ending before the test began). - New functionality mainly implements infinite tshark execution along with the possibility of terminating it when deemed appropriate. This is complemented with robust error handling and termination of the tshark process on unexpected crashes during initialization. NOTE: The main tests currently do not handle states where a crash could occur elsewhere and thus a hanging tshark session can still happen (although unlikely). - The module is properly commented which should allow the user to get a quick understanding of its functionality. - The user functionality remains the same with easier-to-comprehend function names. - Handling of PIDs is rather problematic due to the nature of exec_by_node_bg function, which does not return the PID of the tshark process but rather the PID of the shell executing it (unverified). This is why the retrieval of PID may seem rather complicated (especially the pid_command variable). Possible solutions are listed in a comment, but these don't quite work for various reasons (globbing issues, return of incorrect PID, etc.). As for the kill -15 and kill -9 repetition, some tshark session would get stuck in a zombie state if the commands were not executed in this order. Signed-off-by: svteb <[email protected]>

martin-mat added the bug Something isn't working label Jun 9, 2024

martin-mat assigned svteb Jun 24, 2024

svteb mentioned this issue Jul 1, 2024

Feat: Rework k8s_tshark functionality and apply the changes in tests #2097

Merged

20 tasks

martin-mat closed this as completed in #2097 Jul 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Unstable 5G tests(suci_enabled) in github actions #2072

[BUG] Unstable 5G tests(suci_enabled) in github actions #2072

martin-mat commented Jun 9, 2024

svteb commented Jun 24, 2024 •

edited

Loading

[BUG] Unstable 5G tests(suci_enabled) in github actions #2072

[BUG] Unstable 5G tests(suci_enabled) in github actions #2072

Comments

martin-mat commented Jun 9, 2024

svteb commented Jun 24, 2024 • edited Loading

EDIT: This comment is entirely incorrect, the issue likely stems from timing issues with tshark (yet there is some interesting info)

svteb commented Jun 24, 2024 •

edited

Loading