-
Notifications
You must be signed in to change notification settings - Fork 15
Conversation
Apologies for any merge conflicts this causes ;).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Everything looks fine for me, some minor comments added.
The service account creation is something that will be handled by the common part. Although you can create one with the same name and features which will be imported in the TF state before applying the common part in order to avoid conflicts. (There are many already created resources that must be imported before, as a previous step for the first execution of the common part).
About merging the file entrypoint.rake
, that is something that foresees painfully anyway.
stage: setup | ||
tags: | ||
- aws |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's only a cosmetic change, but is there a particular reason whay the tags
identation is different of the script
section?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
¯(°_o)/¯
(This is fixed in my new PR, #92.)
- git tag "deploy-aws-stg-$DATESTAMP" | ||
# gitlab is not clever enough to clean up an added remote and git complains | ||
# if we add a remote that already exists. | ||
- git remote | grep -q "^origin-rw" || git remote add origin-rw [email protected]:gpii-ops/gpii-infra |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
although this works, other option could be setting the origin url using:
git remote set-url origin [email protected]:gpii-ops/gpii-infra
it's just an idea.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I couldn't find it in my notes, but I think I considered this idea when I wrote this task originally. I decided it was safer to create a separate origin rather than fight for control of origin
with Gitlab, which may have certain expectations about the origin URL that we might break.
I checked and the origin url Gitlab uses is https://gitlab-ci-token:<PASSWORD>@gitlab.com/gpii-ops/gpii-infra.git
. So I feel like there might be consequences if we change this to SSH-based authentication. :)
I propose we keep this as-is.
.gitlab-ci.yml
Outdated
script: | ||
- docker version | ||
- docker-compose version | ||
- docker pull gpii/exekube:0.3.1-google |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if #81 is merged first, perhaps this should be updated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is merged, and I updated to 0.4.0.
@@ -46,6 +46,18 @@ Initial instructions based on [exekube's Getting Started](https://exekube.github | |||
* @mrtyler requested a quota bump to 100 Projects. | |||
* He only authorized his own email for now, to see what it did. But it's possible other Ops team members will need to go through this step. | |||
|
|||
## One-time CI Setup |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't we have anything to manage the CI runner box? (not a big fan of any manual steps :))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed! We have https://github.com/gpii-ops/ansible-gpii-ci-worker.
I share your preference for automation. There are a few reasons I have this as a manual step:
- It was simple :). Also it was a record of what I had done to facilitate my manual testing while I figured things out.
- Laziness :). Installing credentials on the CI machine is currently a rare event and there are other things to do.
- I forgot that I had already solved a similar problem (docker hub creds) with ansible-gpii-ci-worker.
- I knew that some details about auth/creds would change with Alfredo's work in GPII-3125 Init GCP organization #60, so I postponed a more robust solution.
- This step currently has a manual component regardless, since a human must use their credentials to obtain
owner.json
.
All of that said, you are certainly wise to raise the question! I was going to respond by adding owner.json
to the ansible vault and deploying it automatically with ansible-gpii-ci-worker.
However, now that I'm pursuing your suggestion of using Volumes instead of Bind Mounts, I'm not sure exactly how I'm going to provide owner.json
. Let's talk about it in my new PR, #92.
Look good, I'm slightly unhappy about secrets on the host - effectively anyone with access to that machine, or with permission to run jobs there, can get complete access to our infra. Also using one big fat account for all the environments (not sure what would be a good strategy to mitigate this, but ideally you run under stage env, you wouldn't be able to impact prod one). Re. permission issues - this might help (not convinced is a good one, just an option :)) https://docs.docker.com/engine/security/userns-remap/ |
This is so that Volumes mounted from the host don't get a bunch of root-owned files.
The docker Volumes/permissions thing is kind of a mess[1]. There are ~3 problems:
I don't have everything working yet, but this at least produces files with the correct permissions: (I'll add these to Note that this PR is dependent on this one in (our fork of) exekube. [1] Some issues I read while researching this issue: |
Aha, this is where you left that link! I forgot to read this before. It isn't clear to me whether userns remapping helps with our Volume ownership problems or not. It looks like using it requires some special handling (especially on CentOS 7). Let's talk about how this solution compares with the problems I've identified (my previous comment). |
I definitely agree. To some extent this is unavoidable -- something has to both execute code and have administrative production credentials. But there are mitigations, some existing, some planned, and some possible in the future, that can help:
|
I am closing this PR (and gpii-ops/exekube#8) in favor of #92. |
This PR adds:
prd
, a la GPII-3199I think 1 and 2 should be mostly uncontroversial. See https://issues.gpii.net/browse/GPII-2996?focusedCommentId=33804&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-33804 for a little more about the "tagged runner" strategy.
2 and 3 may warrant further discussion, particularly in light of recent conversations about secrets handling.
For now, I'm letting CI use the credentials for
projectowner@
in each Project. A dedicated IAM would be better. I'd prefer to wait for @amatas's work in #60 and/or for https://issues.gpii.net/browse/GPII-2947 so that we have a place to put Terraform code to manage IAMs, but I can whip up something by hand if the team thinks it's worth doing.BTW @amatas I was unable to create credentials for stg and prd because:
gpii-gcp-stg
. Instead there is agpii-stg
.gpii-gcp-prd
, but it doesn't have aprojectowner@
IAM.Perhaps these are expected until your work in #60 is complete?
4 may cause merge conflicts for in-flight branches (mostly @amatas I think). Sorry about that, but it helped me reason about the changes I was making.
Mostly I tried to reduce the number of verbs in task names. Let me know if I made the names better or worse :).
The next problem
Directories created inside the exekube container (even those created implicitly, like volume mounts for
.config/<env>/gcloud
) are created with ownershiproot.root
. This preventsrake clobber
from cleaning up these directories (https://gitlab.com/gpii-ops/gpii-infra/-/jobs/85893841), and preventssecrets.rb
from writing secrets files (https://gitlab.com/gpii-ops/gpii-infra/-/jobs/85889990).I do not see this behavior on my machine / MacOS. The CI worker is CentOS 7.
My guess is this is because commands like
gcloud
andsecrets-fetch
run asroot
inside the exekube container, so files created on mounted volumes inside the container "leak" back to the host with ownershiproot
. This may be fixable by adding and then using a role user inside the container instead of defaulting toUSER root
, but that approach can get complicated so I'm stopping here to ask for advice.