Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: use changed-files instead of jitterbit for deployment workflow [DE-231] #147

Merged
merged 52 commits into from
May 16, 2023

Conversation

emily-flambe
Copy link
Contributor

@emily-flambe emily-flambe commented May 12, 2023

Our current list new/modified/deleted files comes from a repo, jitterbit, that is apparently abandoned. This means its functionality will degrade as GHA makes new releases.

This PR implements a different tool, changed-files, to do the same thing. Notably, this resolves a limitation of jitterbit - namely that there was no way to delete old filenames for renamed files, which would cause duplicate files to appear in GCS, resulting in chaos, heartbreak, etc. (which was the impetus for this journey of discovery).

Assigning to @jitoquinto and @kanelouise , and cc @michaelefisher since we'll likely want to do the same for our automatic airflow deployment as well.

Here is an example of a workflow run showing this doing the stuff it's supposed to do: https://github.com/community-tech-alliance/dbt-cta/actions/runs/4962581795/jobs/8880816533

One more note: I made this deliberately WET to make it super clear what's happening at every step. (Surely you could rewrite this with about 1/3 as many lines of code, but I personally think the readability is more important, at the expense of maybe failing a technical interview)

@emily-flambe emily-flambe added the dev_deploy Using this label causes the code to automatically deploy to dev! Neat! label May 12, 2023
@emily-flambe emily-flambe temporarily deployed to dev May 12, 2023 17:24 — with GitHub Actions Inactive
@emily-flambe emily-flambe temporarily deployed to dev May 12, 2023 17:26 — with GitHub Actions Inactive
@emily-flambe emily-flambe temporarily deployed to dev May 12, 2023 18:23 — with GitHub Actions Inactive
@emily-flambe emily-flambe temporarily deployed to dev May 12, 2023 18:28 — with GitHub Actions Inactive
@emily-flambe emily-flambe temporarily deployed to dev May 12, 2023 18:28 — with GitHub Actions Inactive
@emily-flambe emily-flambe temporarily deployed to dev May 16, 2023 13:20 — with GitHub Actions Inactive
@emily-flambe emily-flambe temporarily deployed to dev May 16, 2023 13:22 — with GitHub Actions Inactive
@emily-flambe emily-flambe temporarily deployed to dev May 16, 2023 13:24 — with GitHub Actions Inactive
@emily-flambe emily-flambe temporarily deployed to dev May 16, 2023 13:26 — with GitHub Actions Inactive
@emily-flambe emily-flambe temporarily deployed to dev May 16, 2023 13:27 — with GitHub Actions Inactive
@emily-flambe emily-flambe temporarily deployed to dev May 16, 2023 13:27 — with GitHub Actions Inactive
@emily-flambe emily-flambe temporarily deployed to dev May 16, 2023 13:29 — with GitHub Actions Inactive
@emily-flambe emily-flambe temporarily deployed to dev May 16, 2023 13:33 — with GitHub Actions Inactive
@emily-flambe emily-flambe temporarily deployed to dev May 16, 2023 13:33 — with GitHub Actions Inactive
@emily-flambe
Copy link
Contributor Author

emily-flambe commented May 16, 2023

@emily-flambe taking a look at the linked run for this it looks like something weird is going on or the logs for List all changed files step arent correct?? It looks like theres files being renamed/removed that arent in there

so one annoying thing about this strat in general, which I'm not quite sure how to avoid, is that it can cause weird behavior if you run the workflow multiple times on a branch and make multiple rounds of changes to the files on that branch. The workflow is always comparing your local files with the main branch, so if you add a file, then rename it, I think the workflow would deploy both versions of that file, because it's comparing against main rather than the most recent commit on your branch...

Actually, would that logic make more sense? I think that's what happens if you configure fetch_depth = 1. Maybe we should contemplate that before merging this 🤔 🤔 🤔 🤔 🤔

anyway it's hard to say for sure but it seems likely the weirdness you're seeing has something to do with that!

I did a little science and I think this is just a quirk of the workflow the way it's set up. Maybe there's a way to fix this behavior, but I think it would require some additional logic.

For awareness @jitoquinto @kanelouise @jackwelty, this means: After you deploy a branch to dev, the workflow will always be comparing your branch to main - so if you do something like, say, create a file and then delete it, the workflow will deploy it but then it won't subsequently delete it. The run triggered by the commit deleting the file will compare main to your branch, neither of which contain the file you added and deleted, so that file will continue to linger in dev. (This shouldn't be a problem in prod deploys since the workflow only runs when you merge the PR.)

I'll add a comment in the workflow explaining the above - just requested re-review @jitoquinto, lmk if this sounds ok!

@emily-flambe emily-flambe temporarily deployed to dev May 16, 2023 13:41 — with GitHub Actions Inactive
@jitoquinto
Copy link
Contributor

@emily-flambe gotcha, thanks for doing the science! That sounds good to me, especially since this really only applies to dev environment right? Prod will always just run when merged to main so we shouldnt run into that weirdness

@emily-flambe
Copy link
Contributor Author

@emily-flambe gotcha, thanks for doing the science! That sounds good to me, especially since this really only applies to dev environment right? Prod will always just run when merged to main so we shouldnt run into that weirdness

yep that's correct! a bit of weirdness is par for the course in dev anyway lol

@emily-flambe emily-flambe merged commit 364989d into main May 16, 2023
@emily-flambe emily-flambe deleted the ec/dbt-deploy-renamed-files branch May 16, 2023 21:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dev_deploy Using this label causes the code to automatically deploy to dev! Neat!
Development

Successfully merging this pull request may close these issues.

3 participants