Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Unstable 5G tests(suci_enabled) in github actions #2072

Closed
martin-mat opened this issue Jun 9, 2024 · 1 comment · Fixed by #2097
Closed

[BUG] Unstable 5G tests(suci_enabled) in github actions #2072

martin-mat opened this issue Jun 9, 2024 · 1 comment · Fixed by #2097
Assignees
Labels
bug Something isn't working

Comments

@martin-mat
Copy link
Collaborator

Describe the bug
5G tests fail frequently in github actions.
Example:
https://github.com/cnti-testcatalog/testsuite/actions/runs/9439225077/job/25997107636

To Reproduce
run github actions few times. Failures are quite frequent.

Expected behavior
Tests should pass if there is no error/change

@martin-mat martin-mat added the bug Something isn't working label Jun 9, 2024
@svteb
Copy link
Collaborator

svteb commented Jun 24, 2024

EDIT: This comment is entirely incorrect, the issue likely stems from timing issues with tshark (yet there is some interesting info)

The issue stems from ueransim not having all SUCI components working during the test:

From task suci_enabled in src/tasks/workload/5g_validator.cr:

if K8sTshark.regex_tshark_log(/"nas_5gs.mm.type_id": "1"/, tshark_log_name) &&
            !K8sTshark.regex_tshark_log(/"nas_5gs.mm.suci.scheme_id": "0"/, tshark_log_name) &&
            !K8sTshark.regex_tshark_log(/"nas_5gs.mm.suci.pki": "0"/, tshark_log_name)

During failed tests the values nas_5gs.mm.suci.scheme_id and nas_5gs.mm.suci.pki are equal to 0. This seems to happen because the CNF sample_open5gs does not actually deploy "successfully" although all of its components are running. In the file cnf-testsuite.yml which is part of the CNFs directory are key values to enable encryption:

protectionScheme: 1
publicKey: 0ac95ceeb93308df01be82ff9994d8330e38804ece1700ee4b972d8028796275
publicKeyId: 1

These sometimes don't register (for as of yet unknown reasons) and thus ueransim does not use them, as can be seen in the logs from actions:

Successful run:

I, [2024-06-09 23:07:08 +00:00 #432707]  INFO -- cnf-testsuite: ue_values: amf:
  hostname: open5gs-amf-ngap

mcc: '999'
mnc: '70'
sst: 1
sd: "0x111111"
tac: '0001'

protectionScheme: 1
publicKey: '0ac95ceeb93308df01be82ff9994d8330e38804ece1700ee4b972d8028796275'
publicKeyId: 1
routingIndicator: '0000'
# protectionScheme: 0
# publicKey:
# publicKeyId: 1
# routingIndicator: '0000'

Unsuccessful run:

I, [2024-06-09 20:25:19 +00:00 #360248]  INFO -- cnf-testsuite: ue_values: amf:
  hostname: open5gs-amf-ngap

mcc: '999'
mnc: '70'
sst: 1
sd: "0x111111"
tac: '0001'





# protectionScheme: 0
# publicKey:
# publicKeyId: 1
# routingIndicator: '0000'

Will investigate further.

svteb added a commit to svteb/testsuite that referenced this issue Jul 1, 2024
Refs: cnti-testcatalog#2072 cnti-testcatalog#2087
- Prior functionality was bound to fixed time of execution (120s), which introduced problems
in testing (tshark session ending before the test began).
- New functionality mainly implements infinite tshark execution along with the possibility
of terminating it when deemed appropriate. This is complemented with robust error handling
and termination of the tshark process on unexpected crashes during initialization.
NOTE: The main tests currently do not handle states where a crash could occur elsewhere and
thus a hanging tshark session can still happen (although unlikely).
- The module is properly commented which should allow the user to get a quick understanding
of its functionality.
- The user functionality remains the same with easier-to-comprehend function names.
- Handling of PIDs is rather problematic due to the nature of exec_by_node_bg function, which
does not return the PID of the tshark process but rather the PID of the shell executing it
(unverified). This is why the retrieval of PID may seem rather complicated (especially the
pid_command variable). Possible solutions are listed in a comment, but these don't quite
work for various reasons (globbing issues, return of incorrect PID, etc.).
As for the kill -15 and kill -9 repetition, some tshark session would
get stuck in a zombie state if the commands were not executed in this order.

Signed-off-by: svteb <[email protected]>
svteb added a commit to svteb/testsuite that referenced this issue Jul 4, 2024
Refs: cnti-testcatalog#2072 cnti-testcatalog#2087
- Prior functionality was bound to fixed time of execution (120s), which introduced problems
in testing (tshark session ending before the test began).
- New functionality mainly implements infinite tshark execution along with the possibility
of terminating it when deemed appropriate. This is complemented with robust error handling
and termination of the tshark process on unexpected crashes during initialization.
NOTE: The main tests currently do not handle states where a crash could occur elsewhere and
thus a hanging tshark session can still happen (although unlikely).
- The module is properly commented which should allow the user to get a quick understanding
of its functionality.
- The user functionality remains the same with easier-to-comprehend function names.
- Handling of PIDs is rather problematic due to the nature of exec_by_node_bg function, which
does not return the PID of the tshark process but rather the PID of the shell executing it
(unverified). This is why the retrieval of PID may seem rather complicated (especially the
pid_command variable). Possible solutions are listed in a comment, but these don't quite
work for various reasons (globbing issues, return of incorrect PID, etc.).
As for the kill -15 and kill -9 repetition, some tshark session would
get stuck in a zombie state if the commands were not executed in this order.

Signed-off-by: svteb <[email protected]>
svteb added a commit to svteb/testsuite that referenced this issue Jul 9, 2024
Refs: cnti-testcatalog#2072 cnti-testcatalog#2087
- Prior functionality was bound to fixed time of execution (120s), which introduced problems
in testing (tshark session ending before the test began).
- New functionality mainly implements infinite tshark execution along with the possibility
of terminating it when deemed appropriate. This is complemented with robust error handling
and termination of the tshark process on unexpected crashes during initialization.
NOTE: The main tests currently do not handle states where a crash could occur elsewhere and
thus a hanging tshark session can still happen (although unlikely).
- The module is properly commented which should allow the user to get a quick understanding
of its functionality.
- The user functionality remains the same with easier-to-comprehend function names.
- Handling of PIDs is rather problematic due to the nature of exec_by_node_bg function, which
does not return the PID of the tshark process but rather the PID of the shell executing it
(unverified). This is why the retrieval of PID may seem rather complicated (especially the
pid_command variable). Possible solutions are listed in a comment, but these don't quite
work for various reasons (globbing issues, return of incorrect PID, etc.).
As for the kill -15 and kill -9 repetition, some tshark session would
get stuck in a zombie state if the commands were not executed in this order.

Signed-off-by: svteb <[email protected]>
svteb added a commit to svteb/testsuite that referenced this issue Jul 9, 2024
Refs: cnti-testcatalog#2072 cnti-testcatalog#2087
- Prior functionality was bound to fixed time of execution (120s), which introduced problems
in testing (tshark session ending before the test began).
- New functionality mainly implements infinite tshark execution along with the possibility
of terminating it when deemed appropriate. This is complemented with robust error handling
and termination of the tshark process on unexpected crashes during initialization.
NOTE: The main tests currently do not handle states where a crash could occur elsewhere and
thus a hanging tshark session can still happen (although unlikely).
- The module is properly commented which should allow the user to get a quick understanding
of its functionality.
- The user functionality remains the same with easier-to-comprehend function names.
- Handling of PIDs is rather problematic due to the nature of exec_by_node_bg function, which
does not return the PID of the tshark process but rather the PID of the shell executing it
(unverified). This is why the retrieval of PID may seem rather complicated (especially the
pid_command variable). Possible solutions are listed in a comment, but these don't quite
work for various reasons (globbing issues, return of incorrect PID, etc.).
As for the kill -15 and kill -9 repetition, some tshark session would
get stuck in a zombie state if the commands were not executed in this order.

Signed-off-by: svteb <[email protected]>
martin-mat pushed a commit that referenced this issue Jul 9, 2024
…2097)

Refs: #2072 #2087
- Prior functionality was bound to fixed time of execution (120s), which introduced problems
in testing (tshark session ending before the test began).
- New functionality mainly implements infinite tshark execution along with the possibility
of terminating it when deemed appropriate. This is complemented with robust error handling
and termination of the tshark process on unexpected crashes during initialization.
NOTE: The main tests currently do not handle states where a crash could occur elsewhere and
thus a hanging tshark session can still happen (although unlikely).
- The module is properly commented which should allow the user to get a quick understanding
of its functionality.
- The user functionality remains the same with easier-to-comprehend function names.
- Handling of PIDs is rather problematic due to the nature of exec_by_node_bg function, which
does not return the PID of the tshark process but rather the PID of the shell executing it
(unverified). This is why the retrieval of PID may seem rather complicated (especially the
pid_command variable). Possible solutions are listed in a comment, but these don't quite
work for various reasons (globbing issues, return of incorrect PID, etc.).
As for the kill -15 and kill -9 repetition, some tshark session would
get stuck in a zombie state if the commands were not executed in this order.

Signed-off-by: svteb <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants