Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use Airflow variable for Google Drive API key in analytics DAG #125

Merged
merged 2 commits into from
Jan 16, 2024

Conversation

antidipyramid
Copy link
Collaborator

@antidipyramid antidipyramid commented Jan 5, 2024

Overview

This PR transitions to using an Airflow variable (set in the Airflow web GUI) to store the Google Service Account API Key used in the analytics DAG.

Metro-Records/la-metro-councilmatic#1054 should be merged in first.

Checklist

Handles #123

  • PR has a descriptive enough title to be useful in changelogs

Testing Instructions

  • Create an Airflow variable (located in Admin > Variables in the toolbar) called google_service_acct_api_key (the key is in Bitwarden under LA Metro Google Service Account Key)
  • Run the analytics DAG (will fail on upload unless you've set the REMOTE_ANALYTICS_FOLDER env var but should authenticate to Google Drive successfully)

@antidipyramid antidipyramid marked this pull request as ready for review January 5, 2024 20:05
@antidipyramid antidipyramid requested a review from fgregg January 5, 2024 20:06
Copy link
Collaborator

@fgregg fgregg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a surprising way to access a variable. I understand that it's possible because of jinja templating, but that's not a normal way that i think about these files as templates.

could you access it the other way the docs suggest:

from airflow.models import Variable
...
"GOOGLE_SERVICE_ACCT_API_KEY": Variable.get("google_service_account_key")

@antidipyramid
Copy link
Collaborator Author

@fgregg I agree it's odd! But on the Best Practices page, Airflow recommends against calling Variable from top-level DAG code.

Using Airflow Variables yields network calls and database access, so their usage in top-level Python code for DAGs should be avoided as much as possible...

In top-level code, variables using jinja templates do not produce a request until a task is running, whereas, Variable.get() produces a request every time the dag file is parsed by the scheduler if caching is not enabled.

My understanding is that that means we should use the Jinja template syntax.

@fgregg
Copy link
Collaborator

fgregg commented Jan 5, 2024

okay, then let's add a section in the README explaining the use of this pattern.

@fgregg
Copy link
Collaborator

fgregg commented Jan 5, 2024

Actually, no. let's do it the not-recommended way for now. the network call is to it's own self, and the overhead of that call worth avoiding this surprising pattern.

@antidipyramid antidipyramid requested a review from fgregg January 8, 2024 19:18
@antidipyramid
Copy link
Collaborator Author

@fgregg Any more comments on this one?

@antidipyramid antidipyramid merged commit e77c7f5 into main Jan 16, 2024
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants