-
Notifications
You must be signed in to change notification settings - Fork 87
Grant a workflow permission to upload files to S3
This short wiki explains the step to grant a workflow the permission to upload files to S3. A common use case is to upload JSON records to an S3 bucket so that they can be ingested into a ClickHouse table for querying.
The first step is to consult with PyTorch Dev Infra if you need a new S3 bucket or if it's ok to reuse an existing one. If a new bucket is needed, you will need to submit a PR to create one at https://github.com/pytorch-labs/pytorch-gha-infra/blob/main/runners/s3_bucket.tf. After the change is committed, run the Terraform https://github.com/pytorch-labs/pytorch-gha-infra/actions/workflows/runners-on-dispatch-release.yml workflow to deploy the bucket.
There are two ways to grant the workflows permission to upload files to S3. The permission could be at the runner level where all jobs handling by the runner can do the upload or it could be granted at the workflow level where only specific workflows with the permission can do so.
This part can be skipped if the target is an existing bucket and the runner already has write access to it
You will need to submit two pull requests for runners on Meta and LF AWS fleets respectively:
- Meta runners, i.e. https://github.com/pytorch-labs/pytorch-gha-infra/pull/533
- LF runners, i.e. https://github.com/pytorch/ci-infra/pull/296
If the bucket is on Meta account (the default setup), you will also need to update the bucket policy there to grant the permission to the LF account, for example https://github.com/pytorch-labs/pytorch-gha-infra/pull/537
Again, run the Terraform deployment workflow https://github.com/pytorch-labs/pytorch-gha-infra/actions/workflows/runners-on-dispatch-release.yml after these PRs land to deploy the changes.
If you have done it yet, please take a look at https://docs.github.com/en/actions/security-for-github-actions/security-hardening-your-deployments/configuring-openid-connect-in-amazon-web-services to get an understanding on how OIDC works on GitHub workflows.
This approach is mainly used for non-AWS runners like ROCm or GitHub runners. It can also be used if we want to selectively grant write access only to specific workflow.
- Create a new OIDC role (or edit an existing one) to grant the same permission, i.e. https://github.com/pytorch-labs/pytorch-gha-infra/pull/358.
- The new role can be used in your workflow, i.e. https://github.com/pytorch/executorch/pull/2449.
CH uses the clickhouse_role
to read from our bucket. So, a new bucket will need to be added explicitly to that role, i.e. https://github.com/pytorch-labs/pytorch-gha-infra/pull/536. However, this is not needed if the bucket is public.