Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

updates for optimizing failure scenario #402

Merged
merged 2 commits into from
Jun 30, 2023

Conversation

kami619
Copy link
Contributor

@kami619 kami619 commented Jun 29, 2023

@kami619 kami619 requested review from ahus1 and tkyjovsk June 29, 2023 13:11
@@ -34,6 +34,8 @@ JAVA_OPTS="${JAVA_OPTS} -Xmx1G -XX:+HeapDumpOnOutOfMemoryError"
DEBUG_MODE="${DEBUG:-false}"
DEBUG_PORT="${DEBUG_PORT:-8787}"

CHAOS_MODE="${CHAOS_MODE:-false}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For distributed runners, this mode is not very usable as we don't want to run the chaos script from all runners, but only once.

We can move the trigger back to the workflow (the log directory can be obtained also there) and then we can print the info from the pod at the end of kcb script from another workflow step. This way we would also get rid of coupling of kcb script with Openshift. Any thoughts @kami619 @ahus1 @tkyjovsk?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, for distributed runners we wouldn't want this to be run from kcb.sh.

If it helps you with the things you're doing at the moment, you could still merge it. The benefit would be that the folder information is passed to the script, including the right timeout parameter.

Eventually I'd like to see it removed for the reasons Michal described, or changed to an environment variable like SUT_DESCRIBE where the caller could specify any script to do any kind of chaos (not only Kubernetes chaos). But let's see if we would ever do this / and how the chaos testing proceeds - we might eventually not use the shell script once we move to something more advanced.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also don't think that kubectl should be called from anywhere in kcb.sh as the Benchark is supposed to be agnostic of the system under test's provisioning layer. Failure testing, and any other kind of testing which requires a specific provisioning platform (kubectl/oc/docker-compose) should be separated from the benchmark itself, IMO.

Same I thing applies to the stress-testing loop which runs multiple iterations of Gatling runs. In the context of distributed execution this loop would need to be outside the script that is running the benchmark in distributed mode.

Copy link
Contributor

@tkyjovsk tkyjovsk Jun 29, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the context of PR #295 I've placed the startup-time measuring scripts inside provisioning/minikube/measure but I wonder whether we should instead place them in a separate test directory like test/k8s/measure or something like that.

Then we could also move the chaos testing scripts to this directory. WDYT?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I would suggest creating a separate script which would call both kubectl (or whatever else is needed for the test), and kcb.sh from outside, either locally or remotely (via the ansible layer).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the review comments everyone. given the majority feedback leans towards modularization of this, I shall work with Michal tomorrow to come up with a better design and implement that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in order to keep the scope intact and not let the design decisions take up more time for this PR to get merged. we are going to merge this as is created a ticket to tackle the modularization of script and solving the puzzle with distributed load tests in mind as the next increment.

Copy link
Contributor

@ahus1 ahus1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my review below - it is a good step into the right direction. All comments can be addressed in future PRs.


kubectl get pods -n "${PROJECT}" -l app=keycloak -o wide
kubectl logs -f -n "${PROJECT}" "${RANDOM_KC_POD}" > "$LOGS_DIR/${ATTEMPT}-${RANDOM_KC_POD}.log" 2>&1 &
kubectl describe -n "${PROJECT}" pod "${RANDOM_KC_POD}" > "$LOGS_DIR/${ATTEMPT}-${RANDOM_KC_POD}-complete-resource.log" 2>&1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of retrieving the text format, I'd suggest to also pull the JSON format via

kubectl get -n ... pod -o json

as we can analyze that better using jq.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yup, we had this in mind to parse the information and create a json file akin to what you did for SUT information, but as the first pass we are storing in the log files. I would suggest at-least this part to be done in a subsequent PR to speed things up.

@@ -34,6 +34,8 @@ JAVA_OPTS="${JAVA_OPTS} -Xmx1G -XX:+HeapDumpOnOutOfMemoryError"
DEBUG_MODE="${DEBUG:-false}"
DEBUG_PORT="${DEBUG_PORT:-8787}"

CHAOS_MODE="${CHAOS_MODE:-false}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, for distributed runners we wouldn't want this to be run from kcb.sh.

If it helps you with the things you're doing at the moment, you could still merge it. The benefit would be that the folder information is passed to the script, including the right timeout parameter.

Eventually I'd like to see it removed for the reasons Michal described, or changed to an environment variable like SUT_DESCRIBE where the caller could specify any script to do any kind of chaos (not only Kubernetes chaos). But let's see if we would ever do this / and how the chaos testing proceeds - we might eventually not use the shell script once we move to something more advanced.

benchmark/src/main/content/bin/kc-chaos.sh Outdated Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Optimise failure scenario
4 participants