Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JPERF-831 Migrate CI from CircleCI to Github Actions. Change the pipeline config to use IAM role and OIDC token for assuming an identity on AWS. #89

Merged
merged 1 commit into from
Jan 31, 2023

Conversation

ewefie
Copy link
Contributor

@ewefie ewefie commented Jan 25, 2023

The reason for those changes was initially the need for migration from long-term credentials to IAM role during authentication on AWS.
Additionally, we decided to migrate the CI from CircleCI to Github Actions in one go.
After just rewriting the pipeline it turned out that not everything works as expected, I observed many failures, and as a result, the scope and time spent on the migration increased significantly.
The team decided to split the task into smaller parts, and this is the first part that consists on:

Caused by: java.util.concurrent.ExecutionException: java.util.concurrent.ExecutionException: java.lang.Exception: Error while executing sudo add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/ubuntu focal
[1201](https://github.com/atlassian/jira-hardware-exploration/actions/runs/3997267095/jobs/6871345394#step:8:1202)
   stable". Exit status code SshResult(exitStatus=100, output=Hit:1 http://eu-west-1.ec2.archive.ubuntu.com/ubuntu focal InRelease
[1202](https://github.com/atlassian/jira-hardware-exploration/actions/runs/3997267095/jobs/6871345394#step:8:1203)
  Hit:2 http://eu-west-1.ec2.archive.ubuntu.com/ubuntu focal-updates InRelease
[1203](https://github.com/atlassian/jira-hardware-exploration/actions/runs/3997267095/jobs/6871345394#step:8:1204)
  Hit:3 http://eu-west-1.ec2.archive.ubuntu.com/ubuntu focal-backports InRelease
[1204](https://github.com/atlassian/jira-hardware-exploration/actions/runs/3997267095/jobs/6871345394#step:8:1205)
  Get:4 https://download.docker.com/linux/ubuntu focal InRelease [57.7 kB]
[1205](https://github.com/atlassian/jira-hardware-exploration/actions/runs/3997267095/jobs/6871345394#step:8:1206)
  Get:5 http://security.ubuntu.com/ubuntu focal-security InRelease [114 kB]
[1206](https://github.com/atlassian/jira-hardware-exploration/actions/runs/3997267095/jobs/6871345394#step:8:1207)
  Err:4 https://download.docker.com/linux/ubuntu focal InRelease
[1207](https://github.com/atlassian/jira-hardware-exploration/actions/runs/3997267095/jobs/6871345394#step:8:1208)
   The following signatures couldn't be verified because the public key is not available: NO_PUBKEY 7EA0A9C3F273FCD8
[1208](https://github.com/atlassian/jira-hardware-exploration/actions/runs/3997267095/jobs/6871345394#step:8:1209)
  Reading package lists...
[1209](https://github.com/atlassian/jira-hardware-exploration/actions/runs/3997267095/jobs/6871345394#step:8:1210)
  , errorOutput=W: GPG error: https://download.docker.com/linux/ubuntu focal InRelease: The following signatures couldn't be verified because the public key is not available: NO_PUBKEY 7EA0A9C3F273FCD8
[1210](https://github.com/atlassian/jira-hardware-exploration/actions/runs/3997267095/jobs/6871345394#step:8:1211)
  E: The repository 'https://download.docker.com/linux/ubuntu focal InRelease' is not signed.

Note: CopiedDocker.kt was, as the name suggests, a copy of Docker.kt located in infrastructure lib. The foremention error was already addressed in infrastructure, but not here.
In this PR I followed the previous approach and just copied the implementation from infrastructure, but IMO keeping those two classes doesn’t make sense in a long term. I’d personally opt for getting rid of both CopiedDocker.kt and CopiedDockerImage.kt and using original classes from infrastructure. I’m aware though that this would require changes in API (currently they are only for internal use) - we can discuss it.

For now, tests fail and in most cases, the reason is insufficient AWS resources. It’s manifested either by error with the message: “We currently do not have sufficient xxxxx capacity in the Availability Zone you requested (eu-west-1a). Our system will be working on provisioning additional capacity.” or “ResourceStatus: CREATE_FAILED,ResourceStatusReason: Request limit exceeded.” But in general, we can see that the new auth method works, we are allowed to provision instances, and some of them are created successfully.

The most recent error I encountered is this one:

Caused by: java.lang.Exception: There's no max action error
[1133](https://github.com/atlassian/jira-hardware-exploration/actions/runs/4005396808/jobs/6875728893#step:6:1133)
  	at com.atlassian.performance.tools.lib.ErrorGauge.measureMaxAction(ErrorGauge.kt:20) ~[test/:?]
[1134](https://github.com/atlassian/jira-hardware-exploration/actions/runs/4005396808/jobs/6875728893#step:6:1134)
  	at com.atlassian.performance.tools.hardware.HardwareMetric.score(HardwareMetric.kt:40) ~[test/:?]
[1135](https://github.com/atlassian/jira-hardware-exploration/actions/runs/4005396808/jobs/6875728893#step:6:1135)
  	at com.atlassian.performance.tools.hardware.HardwareExploration$testHardware$1.apply(HardwareExploration.kt:296) ~[test/:?]
[1136](https://github.com/atlassian/jira-hardware-exploration/actions/runs/4005396808/jobs/6875728893#step:6:1136)
  	at com.atlassian.performance.tools.hardware.HardwareExploration$testHardware$1.apply(HardwareExploration.kt:48) ~[test/:?]
[1137](https://github.com/atlassian/jira-hardware-exploration/actions/runs/4005396808/jobs/6875728893#step:6:1137)
  	at java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:642) ~[?:?]
[1138](https://github.com/atlassian/jira-hardware-exploration/actions/runs/4005396808/jobs/6875728893#step:6:1138)
  	at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506) ~[?:?]
[1139](https://github.com/atlassian/jira-hardware-exploration/actions/runs/4005396808/jobs/6875728893#step:6:1139)
  	at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1705) ~[?:?]

I found that it is caused by errors in "log in" action, but I didn't investigate it further yet.

I don’t know if there are any other issues obscured by the errors above - it is possible. I tried to run those tests in the night, I also reduced for some time the list of instances provisioned during the execution, but all I found I mentioned above or already addressed. Maybe after running those tests regularly (there’s a cron schedule for that) for a longer time some other issues appear.

cc/ @Nubzor

@ewefie ewefie requested review from a team and mminns as code owners January 25, 2023 12:27
pczuj
pczuj previously approved these changes Jan 25, 2023
.github/workflows/build-and-test.yml Show resolved Hide resolved
.github/workflows/build-and-test.yml Show resolved Hide resolved
.github/workflows/build-and-test.yml Outdated Show resolved Hide resolved
@ewefie ewefie dismissed stale reviews from pczuj and mgrzaslewicz via 785453e January 27, 2023 09:22
@ewefie ewefie force-pushed the issue/JPERF-831-migrate-to-iam-roles-v3 branch from 02b6128 to 785453e Compare January 27, 2023 09:22
@dagguh
Copy link
Member

dagguh commented Jan 27, 2023

I'm running testIntegration locally to debug Caused by: java.lang.Exception: There's no max action error.

@ewefie ewefie force-pushed the issue/JPERF-831-migrate-to-iam-roles-v3 branch from 785453e to 4904eef Compare January 30, 2023 07:10
…line config to use IAM role and OIDC token for assuming an identity on AWS.

This change allows getting rid of the access key and secret. While applying changes I ran into a problem where some ubuntu images were not available on a particular AWS region. To fix that I had to bump both aws-resources and aws-infrastructure and some other required dependencies.
@ewefie ewefie force-pushed the issue/JPERF-831-migrate-to-iam-roles-v3 branch from 4904eef to 21dec1c Compare January 30, 2023 09:59
@ewefie ewefie merged commit 30e2b27 into master Jan 31, 2023
@ewefie ewefie deleted the issue/JPERF-831-migrate-to-iam-roles-v3 branch January 31, 2023 12:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants