Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] rv-virt/citest: test_hello or test_pipe failed #14808

Open
1 task done
lupyuen opened this issue Nov 15, 2024 · 2 comments
Open
1 task done

[BUG] rv-virt/citest: test_hello or test_pipe failed #14808

lupyuen opened this issue Nov 15, 2024 · 2 comments
Labels
Arch: risc-v Issues related to the RISC-V (32-bit or 64-bit) architecture Area: Build system OS: Linux Issues related to Linux (building system, etc) Type: Bug Something isn't working

Comments

@lupyuen
Copy link
Member

lupyuen commented Nov 15, 2024

Description / Steps to reproduce the issue

Since yesterday: rv-virt/citest has been failing test_hello onwards, or test_pipe onwards, hanging our CI Checks in GitHub and Build Farm. (GitHub will cancel it after 6 hours)

It might have been caused by one of these NuttX Commits:

Or maybe one of these NuttX Apps Commits:

Also when one test fails: Why do the rest of the tests take a loooong time to fail, hanging our CI Checks in GitHub and Build Farm?

Fail at test_hello onwards: https://github.com/NuttX/nuttx/actions/runs/11833005280/job/32970891697#step:7:143

Configuration/Tool: rv-virt/citest
$ cd /github/workspace/sources/nuttx/tools/ci/testrun/script
$ python3 -m pytest -m 'qemu or rv_virt' ./ -B rv-virt -P /github/workspace/sources/nuttx -L /github/workspace/sources/nuttx/boards/risc-v/qemu-rv/rv-virt/configs/citest/logs/rv-virt/qemu -R qemu -C --json=/github/workspace/sources/nuttx/boards/risc-v/qemu-rv/rv-virt/configs/citest/logs/rv-virt/qemu/pytest.json

test_framework/test_cmocka.py::test_cmocka PASSED                        [  0%]
test_example/test_example.py::test_hello FAILED                          [  0%]
test_example/test_example.py::test_helloxx FAILED                        [  0%]
test_example/test_example.py::test_pipe FAILED                           [  0%]
test_example/test_example.py::test_popen FAILED                          [  0%]
test_example/test_example.py::test_usrsocktest FAILED                    [  0%]
[ Everything fails very slowly ]

Fail at test_pipe onwards: https://github.com/NuttX/nuttx/actions/runs/11850442831/job/33025374105#step:7:145

test_framework/test_cmocka.py::test_cmocka PASSED                        [  0%]
test_example/test_example.py::test_hello PASSED                          [  0%]
test_example/test_example.py::test_helloxx PASSED                        [  0%]
test_example/test_example.py::test_pipe FAILED                           [  0%]
test_example/test_example.py::test_popen FAILED                          [  0%]
test_example/test_example.py::test_usrsocktest FAILED                    [  0%]
[ Everything fails very slowly ]

On which OS does this issue occur?

[OS: Linux]

What is the version of your OS?

Ubuntu LTS at GitHub Actions

NuttX Version

master

Issue Architecture

[Arch: risc-v]

Issue Area

[Area: Build System]

Verification

  • I have verified before submitting the report.
@lupyuen lupyuen added the Type: Bug Something isn't working label Nov 15, 2024
@github-actions github-actions bot added Arch: risc-v Issues related to the RISC-V (32-bit or 64-bit) architecture Area: Build system OS: Linux Issues related to Linux (building system, etc) labels Nov 15, 2024
@lupyuen
Copy link
Member Author

lupyuen commented Nov 16, 2024

The Timeout Values are configured to One Minute or longer for some Python Tests. What if we reduce the Timeout Values? https://github.com/search?q=repo%3Aapache%2Fnuttx+timeout%3D+language%3APython+path%3A%2F%5Etools%5C%2Fci%5C%2Ftestrun%5C%2F%2F&type=code

Update: Nope, doesn't work: https://github.com/lupyuen/nuttx-build-farm/blob/main/run-job-macos.sh#L107-L131

Somehow the Timeout Value is hard-coded inside expect? https://github.com/apache/nuttx/blob/master/tools/ci/testrun/utils/common.py#L229-L288

@lupyuen
Copy link
Member Author

lupyuen commented Nov 16, 2024

For now we patched the NuttX Mirror Repo: Kill the CI Test if it exceeds 2 hours. Also for Ubuntu Build Farm and macOS Build Farm.

lupyuen added a commit to lupyuen2/wip-nuttx that referenced this issue Nov 19, 2024
CI Test will sometimes run for 6 hours (before getting auto-terminated by GitHub):
- apache#14808
- apache#14680

This is a problem because:
- It will increase our usage of GitHub Runners. Which may overrun the [GitHub Actions Budget](https://infra.apache.org/github-actions-policy.html) allocated by ASF.
- Suppose right after CI Test there's another build. If CI Test runs for all 6 hours, then the build after CI Test will never run.

For this PR: We assume that Every CI Job (e.g. risc-v-05) will complete normally within 2 hours. If any CI Job exceeds 2 hours: This PR will kill the CI Test Process `pytest` and allow the next build to run.
xiaoxiang781216 pushed a commit that referenced this issue Nov 19, 2024
CI Test will sometimes run for 6 hours (before getting auto-terminated by GitHub):
- #14808
- #14680

This is a problem because:
- It will increase our usage of GitHub Runners. Which may overrun the [GitHub Actions Budget](https://infra.apache.org/github-actions-policy.html) allocated by ASF.
- Suppose right after CI Test there's another build. If CI Test runs for all 6 hours, then the build after CI Test will never run.

For this PR: We assume that Every CI Job (e.g. risc-v-05) will complete normally within 2 hours. If any CI Job exceeds 2 hours: This PR will kill the CI Test Process `pytest` and allow the next build to run.
JaeheeKwon pushed a commit to JaeheeKwon/nuttx that referenced this issue Nov 28, 2024
CI Test will sometimes run for 6 hours (before getting auto-terminated by GitHub):
- apache#14808
- apache#14680

This is a problem because:
- It will increase our usage of GitHub Runners. Which may overrun the [GitHub Actions Budget](https://infra.apache.org/github-actions-policy.html) allocated by ASF.
- Suppose right after CI Test there's another build. If CI Test runs for all 6 hours, then the build after CI Test will never run.

For this PR: We assume that Every CI Job (e.g. risc-v-05) will complete normally within 2 hours. If any CI Job exceeds 2 hours: This PR will kill the CI Test Process `pytest` and allow the next build to run.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Arch: risc-v Issues related to the RISC-V (32-bit or 64-bit) architecture Area: Build system OS: Linux Issues related to Linux (building system, etc) Type: Bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant