Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jon colorado data (boulder) #2

Merged
merged 14 commits into from
Nov 14, 2023
Merged

Jon colorado data (boulder) #2

merged 14 commits into from
Nov 14, 2023

Conversation

jonbig
Copy link
Contributor

@jonbig jonbig commented Nov 1, 2023

  • I first created an intermediate table where I did most of the transformations, then created tables for boulder races, offices, politicians, and race candidates. I used the existing tables as a guide and they appear to almost match. One area where I see differences is when it comes to the ID fields. I left those fields in models, but commented out.

  • I was under the impression those uuid fields are generated when we insert the rows into the tables, so maybe they just need to be inserted in a specific order so that the first uuid field is generated, and the rest of the uuid fields are based on that?

  • I commented out the insert statements.

  • I used sql fluff to lint the models.

@wileymc
Copy link
Contributor

wileymc commented Nov 2, 2023

Some issues neeed to be addressed before merging this in:

  • lets remove the source .csv files from source control
  • needs to be linted (the sql fluff CI job is broken so it will pass regardless)
  • dbt run doesn't currently work for all models



--fields for politician table
----Where is the id field coming from?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Id's are generated on INSERT for pretty much every table (they are UUIDs which aren't sequential or dependent on one another)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, that has been confusing me (and is why I had a few fields. commented out) because if the IDs are generated on insert they don't exist in my table yet, I won't be able to add them to the select, right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct, they aren't needed for these staging SELECTs

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add this file to .gitignore

models/staging/stg_co_boulder_city_offices.sql Outdated Show resolved Hide resolved
@jonbig
Copy link
Contributor Author

jonbig commented Nov 2, 2023

I created the boulder_updated_filings table from the boulder_updated_filings.csv, that should be the only file needed to run the models.

ELSE 'district'
END AS election_scope,
CASE
WHEN office ILIKE '%Mayor%' THEN 'Mayor'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure we want to set seat as "Mayor" in this case

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You'll also need to join this model to our existing public.politician table so that we can deduplicate politicians and insert the race_candidate records properly. Look at what i did in the mn intermediate model

Comment on lines +66 to +67
LEFT JOIN transformed_filings AS tf ON f.email = tf.email
LEFT JOIN transformed_filings_1 AS tf1 ON tf.email = tf1.email
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i dont think these joins are needed as you can select from either of the CTEs in this final select statement to get exactly what you need

Copy link
Contributor

@wileymc wileymc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've pushed up a bunch of changes and dbt run is looking good so far! Lets get this merged then we can get the data into the public schema. You can click into my commits above to see what changes were needed to get this working.

@jonbig jonbig merged commit cb07699 into main Nov 14, 2023
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants