updates for optimizing failure scenario #402

kami619 · 2023-06-29T13:09:01Z

fixes #393

Client Secret Failure run -https://github.com/kami619/keycloak-benchmark/actions/runs/5402389905/jobs/9813602924

Auth Code Failure run - https://github.com/kami619/keycloak-benchmark/actions/runs/5404050859/jobs/9817708034

co-author : @mhajas

mhajas · 2023-06-29T15:09:03Z

benchmark/src/main/content/bin/kcb.sh

@@ -34,6 +34,8 @@ JAVA_OPTS="${JAVA_OPTS} -Xmx1G -XX:+HeapDumpOnOutOfMemoryError"
 DEBUG_MODE="${DEBUG:-false}"
 DEBUG_PORT="${DEBUG_PORT:-8787}"

+CHAOS_MODE="${CHAOS_MODE:-false}"


For distributed runners, this mode is not very usable as we don't want to run the chaos script from all runners, but only once.

We can move the trigger back to the workflow (the log directory can be obtained also there) and then we can print the info from the pod at the end of kcb script from another workflow step. This way we would also get rid of coupling of kcb script with Openshift. Any thoughts @kami619 @ahus1 @tkyjovsk?

Yes, for distributed runners we wouldn't want this to be run from kcb.sh.

If it helps you with the things you're doing at the moment, you could still merge it. The benefit would be that the folder information is passed to the script, including the right timeout parameter.

Eventually I'd like to see it removed for the reasons Michal described, or changed to an environment variable like SUT_DESCRIBE where the caller could specify any script to do any kind of chaos (not only Kubernetes chaos). But let's see if we would ever do this / and how the chaos testing proceeds - we might eventually not use the shell script once we move to something more advanced.

I also don't think that kubectl should be called from anywhere in kcb.sh as the Benchark is supposed to be agnostic of the system under test's provisioning layer. Failure testing, and any other kind of testing which requires a specific provisioning platform (kubectl/oc/docker-compose) should be separated from the benchmark itself, IMO.

Same I thing applies to the stress-testing loop which runs multiple iterations of Gatling runs. In the context of distributed execution this loop would need to be outside the script that is running the benchmark in distributed mode.

In the context of PR #295 I've placed the startup-time measuring scripts inside provisioning/minikube/measure but I wonder whether we should instead place them in a separate test directory like test/k8s/measure or something like that.

Then we could also move the chaos testing scripts to this directory. WDYT?

So I would suggest creating a separate script which would call both kubectl (or whatever else is needed for the test), and kcb.sh from outside, either locally or remotely (via the ansible layer).

thanks for the review comments everyone. given the majority feedback leans towards modularization of this, I shall work with Michal tomorrow to come up with a better design and implement that.

in order to keep the scope intact and not let the design decisions take up more time for this PR to get merged. we are going to merge this as is created a ticket to tackle the modularization of script and solving the puzzle with distributed load tests in mind as the next increment.

ahus1

See my review below - it is a good step into the right direction. All comments can be addressed in future PRs.

ahus1 · 2023-06-29T17:17:34Z

benchmark/src/main/content/bin/kc-chaos.sh

+
+  kubectl get pods -n "${PROJECT}" -l app=keycloak -o wide
+  kubectl logs -f -n "${PROJECT}" "${RANDOM_KC_POD}" > "$LOGS_DIR/${ATTEMPT}-${RANDOM_KC_POD}.log" 2>&1 &
+  kubectl describe -n "${PROJECT}" pod "${RANDOM_KC_POD}" > "$LOGS_DIR/${ATTEMPT}-${RANDOM_KC_POD}-complete-resource.log" 2>&1


Instead of retrieving the text format, I'd suggest to also pull the JSON format via

kubectl get -n ... pod -o json

as we can analyze that better using jq.

yup, we had this in mind to parse the information and create a json file akin to what you did for SUT information, but as the first pass we are storing in the log files. I would suggest at-least this part to be done in a subsequent PR to speed things up.

ahus1 · 2023-06-29T17:29:31Z

benchmark/src/main/content/bin/kcb.sh

@@ -34,6 +34,8 @@ JAVA_OPTS="${JAVA_OPTS} -Xmx1G -XX:+HeapDumpOnOutOfMemoryError"
 DEBUG_MODE="${DEBUG:-false}"
 DEBUG_PORT="${DEBUG_PORT:-8787}"

+CHAOS_MODE="${CHAOS_MODE:-false}"


Yes, for distributed runners we wouldn't want this to be run from kcb.sh.

If it helps you with the things you're doing at the moment, you could still merge it. The benefit would be that the folder information is passed to the script, including the right timeout parameter.

Eventually I'd like to see it removed for the reasons Michal described, or changed to an environment variable like SUT_DESCRIBE where the caller could specify any script to do any kind of chaos (not only Kubernetes chaos). But let's see if we would ever do this / and how the chaos testing proceeds - we might eventually not use the shell script once we move to something more advanced.

benchmark/src/main/content/bin/kc-chaos.sh

Co-authored-by: Alexander Schwartz <[email protected]>

updates for optimizing failure scenario

1ebb3ec

kami619 requested review from ahus1 and tkyjovsk June 29, 2023 13:11

mhajas reviewed Jun 29, 2023

View reviewed changes

ahus1 approved these changes Jun 29, 2023

View reviewed changes

Update benchmark/src/main/content/bin/kc-chaos.sh

011d78c

Co-authored-by: Alexander Schwartz <[email protected]>

kami619 mentioned this pull request Jun 30, 2023

rearrange the pieces of kc-chao.sh, kcb.sh and distributed load testing playbooks for better execution path #405

Closed

kami619 merged commit e9434e9 into keycloak:main Jun 30, 2023

kami619 deleted the is-393-optimize-failure-script branch June 30, 2023 11:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

updates for optimizing failure scenario #402

updates for optimizing failure scenario #402

kami619 commented Jun 29, 2023 •

edited

Loading

mhajas Jun 29, 2023

ahus1 Jun 29, 2023

tkyjovsk Jun 29, 2023

tkyjovsk Jun 29, 2023 •

edited

Loading

tkyjovsk Jun 29, 2023

kami619 Jun 30, 2023

kami619 Jun 30, 2023

ahus1 left a comment

ahus1 Jun 29, 2023

kami619 Jun 30, 2023

ahus1 Jun 29, 2023

updates for optimizing failure scenario #402

updates for optimizing failure scenario #402

Conversation

kami619 commented Jun 29, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tkyjovsk Jun 29, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ahus1 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kami619 commented Jun 29, 2023 •

edited

Loading

tkyjovsk Jun 29, 2023 •

edited

Loading