Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add bench workflow for AWS #1330

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

Conversation

tschneider-aneo
Copy link
Contributor

@tschneider-aneo tschneider-aneo commented Oct 31, 2024

Motivation

Related to ArmoniK benchmarking automation project, Bench application is the most suited one for testing the ArmoniK's sheer orchestration performance

Description

Use Bench action from Armonik.Action.Deploy to benchmark ArmoniK with Bench application on localhost on each commit on main, and on AWS at each release.

There's also a possibility to launch this workflow manually, with an option to choose whether the infrastructure must be automatically destroyed after the session is finished. This can be useful to perform post-mortem analysis.

It introduces a Python script that is meant to retrieve the throughput of a session and return a JSON containing its value.
It also introduces a infrastructure of reference for ArmoniK benchmarks.

Testing

This workflow was run on every push on the PR's branch.

Impact

This PR implements a milestone for ArmoniK benchmarking capacity, as AK would be benchmarked frequently.

Additional Information

Needs aneoconsulting/Armonik.Action.Deploy#24 to be merged.

Sometimes infrastructure destruction fails on AWS due to a deadlock between a security group and a subnet that happens unpredictably.

When this happens, the destruction must be taken over manually by destroying the security group and the network interface associated with it, and finished with make recipes destroy and bootstrap-destroy with the prefix used by the GitHub workflow (currently benchmark).

Checklist

  • My code adheres to the coding and style guidelines of the project.
  • I have performed a self-review of my code.
  • I have commented my code, particularly in hard-to-understand areas.
  • I have made corresponding changes to the documentation.
  • I have thoroughly tested my modifications and added tests when necessary.
  • Tests pass locally and in the CI.
  • I have assessed the performance impact of my modifications.

ttlSecondsAfterFinished: 0
template:
spec:
containers:

Check warning

Code scanning / SonarCloud

Service account permissions should be restricted Medium

Bind this resource's automounted service account to RBAC or disable automounting. See more on SonarQube Cloud
template:
spec:
containers:
- name: bench-session

Check warning

Code scanning / SonarCloud

Memory limits should be enforced Medium

Specify a memory limit for this container. See more on SonarQube Cloud
tools/ci/bench-job-template.yml Fixed Show fixed Hide fixed
@tschneider-aneo tschneider-aneo force-pushed the ts/add-bench-aws branch 4 times, most recently from ee84b2d to e5e9472 Compare November 6, 2024 14:25
@tschneider-aneo tschneider-aneo force-pushed the ts/add-bench-aws branch 3 times, most recently from 701ae60 to e767835 Compare November 20, 2024 16:46
.github/workflows/bench-aws.yml Outdated Show resolved Hide resolved
.github/workflows/bench-aws.yml Outdated Show resolved Hide resolved
.github/workflows/bench-aws.yml Outdated Show resolved Hide resolved
AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
AWS_EC2_METADATA_DISABLED: true
run: |
aws s3 cp "$BENCH_RESULTS_PATH" "s3://test-armonik-bench-storage/benchclient_benchmark_${EVENT_NAME}_${TYPE}_${GHRUNID}.json"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe you should use an environment variable here instead of the putting directly the bucket uri

Copy link
Contributor Author

@tschneider-aneo tschneider-aneo Dec 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't get your point. Could you be a bit more explicit ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The bucket should be configuration variable that is stored within the repo so that we can change it if needed without modifying the workflow.
See: https://docs.github.com/en/actions/writing-workflows/choosing-what-your-workflow-does/store-information-in-variables#defining-configuration-variables-for-multiple-workflows

.github/workflows/bench-aws.yml Outdated Show resolved Hide resolved
run: |
set -ex
if [ "$TRIGGER" == 'push' ]; then
echo '{"include":[{"type": "localhost", "ntasks":1000, "polling-limit": 300}, {"type": "aws", "ntasks":1200000, "polling-limit": 1000, "parameters-file-path": "tools/ci/bench-aws.tfvars"}]}' > matrix.json
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"ntasks":1200000`

Why such an oddly specific number?

Copy link
Contributor Author

@tschneider-aneo tschneider-aneo Dec 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had session resume troubles when resuming 1.5M tasks with this infra size. I wanted to ensure we wouldn't encounter this kind of problem in the CI and 1,2M seemed to be fine.

I know it is odd to not have a million tasks but I thought 1M could be a bit short for the bench duration.

@tschneider-aneo tschneider-aneo marked this pull request as ready for review December 23, 2024 11:32
.github/workflows/bench-benchmark.yml Outdated Show resolved Hide resolved
tools/ci/bench-aws.tfvars Outdated Show resolved Hide resolved
tools/ci/bench-aws.tfvars Outdated Show resolved Hide resolved
qdelamea-aneo
qdelamea-aneo previously approved these changes Jan 6, 2025
DATE=$(date +"%Y-%m-%d")
aws s3 cp "$BENCH_RESULTS_PATH" "s3://armonik-bench-storage/${FILE_PREFIX}/${GHRUNID}_${DATE}/benchclient_benchmark_${EVENT_NAME}_${TYPE}.json"

- if: ${{ (github.event_name == 'workflow_dispatch' && inputs.destroy-on-session-end) || (github.event_name != 'workflow_dispatch' && always()) }}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the 'always()' really necessary?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

always() is here to ensure that if the workflow isn't triggered manually the infrastructure on AWS is always destroyed even when a previous step has failed.

- name: BenchOptions__PayloadSize
value: "1"
- name: BenchOptions__ResultSize
value: "1"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to put zero here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Never tested ! Worth trying

Copy link

sonarqubecloud bot commented Jan 8, 2025

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants