Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update DB with TY2022 data #25

Merged
merged 21 commits into from
Jan 19, 2024
Merged

Update DB with TY2022 data #25

merged 21 commits into from
Jan 19, 2024

Conversation

dfsnow
Copy link
Member

@dfsnow dfsnow commented Jan 18, 2024

This PR adds Tax Year 2022 data to the PTAXSIM database. It also slightly refactors the raw data processing for simplicity. Thanks @erhla for all the help on this PR. Closes #21.

Database Release Checklist

  • Make your code updates and commit them in a branch
  • Make any necessary updates to the raw data. If necessary, force add the raw data files if they are ignored by git. Be sure to update .gitattributes such that the raw data files are tracked by git LFS
  • Run the raw data scripts (anything in data-raw/) that prepare and clean the data. These scripts will save the cleaned data to a staging area in S3. Ensure that the relevant S3 keys in the PTAXSIM bucket are updated using the AWS console or API
  • Inside data-raw/create_db.R, increment the db_version variable following the schema outlined above
  • If necessary, also increment the requires_pkg_version variable in data-raw/create_db.R
  • Increment the database versions in DESCRIPTION file:
    • Config/Requires_DB_Version: This is the minimum database version required for this version of the package. It should be incremented whenever there is a breaking change
    • Config/Wants_DB_Version: This is the maximum database version required for this version of the package. It is the version of the database pulled from S3 during CI/testing on GitHub
  • If necessary, be sure to update the SQL statements in data-raw/create_db.sql. These statements define the structure of the database
  • Run the database generation script data-raw/create_db.R. This will create the SQLite database file by pulling data from S3. The file will be generated in a temporary directory (usually /tmp/Rtmp...), then compressed using pbzip2 (required for this script)
  • Using the command line, grab the final compressed database file from the temporary directory (found at db_path after running data-raw/create_db.R) and move it to the project directory. Rename the file ptaxsim-<TAX_YEAR>.<MAJOR VERSION>.<MINOR VERSION>.db.bz2
  • Decompress the database file for local testing using pbzip2. The typical command will be something like pbzip2 -d -k ptaxsim-2021.0.2.db.bz2
  • Rename the decompressed local database file to ptaxsim.db for local testing. This is the file name that the unit tests and vignettes expect
  • Restart R. Then run the unit tests (devtools::test() in the console) and vignettes (pkgdown::build_site() in the console) locally
  • Knit the README.Rmd file to update the database link at the top of the README. The link is pulled from the ptaxsim.db file's metadata table
  • If necessary, update the database diagrams in the README with any new fields or tables
  • Move the compressed database file to S3 for public distribution. The typical command will be something like aws s3 mv ptaxsim-2021.0.2.db.bz2 s3://ccao-data-public-us-east-1/ptaxsim/ptaxsim-2021.0.2.db.bz2
  • Use the S3 console (or API) to make the database file public via an ACL
  • Push the code updates on GitHub. Wait for the resulting CI pipeline to finish
  • If there are no pipeline errors, merge the branch to master

erhla and others added 4 commits January 18, 2024 14:39
* 2022 has different column name for levy_plus_loss "levy+loss"

* update agency

make sheet explicit, update across syntax, add 2022 column names

* Update cpihistory.pdf

* Switching to pdftools

Pretty sure this is the same but didn't want to use noncran tabulizer

* from press release

* remove tabulizer

* add 2022

* add excel conversions

* update 2006 to 2012 to excel versions

* add 2022 tax code

* sample 2022 bills

* update with pdftools

* lint /style
Copy link

codecov bot commented Jan 18, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (84a46d3) 100.00% compared to head (26638ae) 100.00%.

Additional details and impacted files
@@            Coverage Diff            @@
##            master       #25   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files            4         4           
  Lines          439       439           
=========================================
  Hits           439       439           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Give priority to certain names on bills in the detail output and add
names for TY2022
@dfsnow dfsnow self-assigned this Jan 18, 2024
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@erhla I had to revert your xlsx TIF changes since there were a bunch of errors in the conversions. The xlsx files were missing ~100 rows.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah oops. I'll look into that.

@dfsnow dfsnow requested a review from erhla January 19, 2024 08:13
@@ -193,7 +193,7 @@ tax_bill <- function(year_vec,

# Calculate the exemption effect by subtracting the exempt amount from
# the total taxable EAV
dt[, agency_tax_rate := agency_total_ext / agency_total_eav]
dt[, agency_tax_rate := agency_total_ext / as.numeric(agency_total_eav)]
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix for a weird integer overflow using the int64 type and 0 values. Coercing to numeric solves it fine 🤷

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've encountered this before with ptaxsim it is an annoying int64 quirk.

@dfsnow
Copy link
Member Author

dfsnow commented Jan 19, 2024

@erhla If you have the time can you give this a quick skim? Else I'll merge it by EOD today (1/19).

@erhla
Copy link
Collaborator

erhla commented Jan 19, 2024

Confirm this looks good.

The int64 overflow was an issue with tax_bill I had as well for 2022 unsimplified bills (e.g. tax_bill(2022, all_pins, simplify = FALSE)) which is also fixed here so flagging that. Likely that EAV/AV inflation caused some numbers to overflow.

@erhla erhla closed this Jan 19, 2024
@erhla erhla reopened this Jan 19, 2024
Copy link
Collaborator

@erhla erhla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good

@dfsnow dfsnow merged commit 9971e4e into master Jan 19, 2024
24 checks passed
@dfsnow dfsnow deleted the ty2022-update branch January 19, 2024 18:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add Tax Year 2022 data
3 participants