Skip to content

covid case datamart fixes#770

Open
ericbuckley wants to merge 76 commits intomainfrom
eb/app-460/covid-case-datamart-fixes
Open

covid case datamart fixes#770
ericbuckley wants to merge 76 commits intomainfrom
eb/app-460/covid-case-datamart-fixes

Conversation

@ericbuckley
Copy link
Copy Markdown
Contributor

@ericbuckley ericbuckley commented Apr 28, 2026

Description

Fixing a bug that prevents covid case datamart from running cleanly and converting some columns in those tables to display codes rather than descriptions.

Related Issue

APP-460

Additional Notes

  • Removing the use of TestContainers to orchestrate the database and liquibase migrations (this was causing timing issues with the newly added test case of data not being available at assertion time).
  • Leveraging the Java Liquibase SDK to apply migrations before the stored proc unit tests begin.
  • Converted as many descriptions to codes, per alignment with the Covid19ETL process, that was possible based on the data available in the RDB_Modern.
  • Updated 004-nrt_lab_test_result_group_key-001.sql onboarding script to fix a potential primary key violation that could occur during database updates or re-onboarding by strictly enforcing uniqueness based on the table's primary identity column (this allows us to re-run the migration on the database without failure).
  • Updated DataDrivenUnitTests.java to use a transaction block to make the tests more robust by guaranteeing a clean database state for every iteration. This allows us to reuse the same primary keys from test to test.
  • Added a dotenv plugin to automatically load variables from devs .env file into the application-test.yaml. This allows developers to override testing settings before running gradle tests locally.

Checklist

  • I have ensured that the pull request is of a manageable size, allowing it to be reviewed within a single session.
  • I have reviewed my changes to ensure they are clear, concise, and well-documented.
  • I have updated the documentation, if applicable.
  • I have added or updated test cases to cover my changes, if applicable.

…datamart

Refactored sp_covid_case_datamart_postprocessing to ensure reliable execution and data integrity:
- Resolved 'Column specified multiple times' errors by adding DISTINCT to all metadata selection and PIVOT subqueries.
- Fixed database name mismatch by pointing INFORMATION_SCHEMA queries to the local context instead of the legacy rdb database.
- Prevented silent failures in the final INSERT by wrapping dynamic column concatenations in ISNULL to stop NULL propagation.
- Improved schema compatibility by implementing dynamic CAST logic for VARCHAR columns based on actual table metadata.
- Optimized performance by replacing the row-by-row PIVOT logic with set-based distinct selections.

These changes ensure that the RTR pipeline correctly populates the COVID_CASE_DATAMART table when triggered by incoming Kafka events.
@ericbuckley ericbuckley self-assigned this Apr 28, 2026
Update the SQL setup script for the COVID-19 case datamart unit test to
include population of the shared_ind, case_class_cd, and
investigation_status_cd columns. This ensures the test state accurately
reflects required database fields for the reporting pipeline.
Update the `nrt_page_case_answer` setup script for the
`covidCaseDatamart` unit test to include `nbs_ui_metadata_uid` and
`nbs_rdb_metadata_uid`.

This aligns the unit test seed data with recent schema changes in the
modernized reporting database to prevent insertion failures during
test execution.
Update the unit test query for covidCaseDatamart to explicitly reference
the rdb_modern database instead of the default schema. This ensures the
test aligns with the project's modern reporting database structure.
Update the covidCaseDatamart unit test query to execute the
sp_covid_case_datamart_postprocessing stored procedure. This ensures
the target datamart table is populated before the test performs its
assertions.
…data

Expand the `covidCaseDatamart` unit test to include comprehensive seed data
and verify a wider range of fields populated by the post-processing
stored procedure.

Changes include:
- Updating `setup.sql` with extensive seed data for ODSE, NRT, and D
  tables, including symptoms, demographics, and notifications.
- Modifying `query.sql` to execute the
  `sp_covid_case_datamart_postprocessing` stored procedure.
- Expanding `expected.json` to assert the correctness of the additional
  fields, such as transmission mode, MMWR data, and clinical symptoms.
@mpeels mpeels mentioned this pull request May 5, 2026
@ericbuckley ericbuckley changed the title Eb/app 460/covid case datamart fixes covid case datamart fixes May 6, 2026
@ericbuckley ericbuckley marked this pull request as ready for review May 6, 2026 16:08
@ericbuckley ericbuckley requested a review from a team as a code owner May 6, 2026 16:08
@mpeels
Copy link
Copy Markdown
Contributor

mpeels commented May 6, 2026

Github will not allow me to add a comment to the code -- but regarding the removal of the TestContainers usage for the UnitTest.java:

With the complete removal of this, it is now required to have a database running locally prior to kicking off the unit tests and a database must not be running prior to starting the functional tests.

Thoughts on having an environment variable to control wether or not to use TestContainers for the database?

@ericbuckley
Copy link
Copy Markdown
Contributor Author

Github will not allow me to add a comment to the code -- but regarding the removal of the TestContainers usage for the UnitTest.java:

With the complete removal of this, it is now required to have a database running locally prior to kicking off the unit tests and a database must not be running prior to starting the functional tests.

Thoughts on having an environment variable to control wether or not to use TestContainers for the database?

I didn't fully think through how difference between functional and unit setup will impact development workflows. Let me sit on that for a bit, and experiment with bringing just the nbs-mssql container back in a TestContainer

cast(pat.PATIENT_DECEASED_INDICATOR as varchar(20)) AS 'PATIENT_DECEASED_IND',
cast(nrtPat.DECEASED_IND_CD as varchar(20)) AS 'PATIENT_DECEASED_IND',
pat.PATIENT_DECEASED_DATE AS 'PATIENT_DECEASED_DT',
pat.PATIENT_MARITAL_STATUS AS 'PATIENT_MARITAL_STS',
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@CDCgov/nbs-dragon When the covid19 ETL runs in NBS6, PATIENT_MARITAL_STATUS is populated with a code and not a description (eg 'M' vs 'Married').

From what I can tell of the postprocessing stored procs, there seems to be a deliberate decision to only pull data from other RDB_MODERN tables (eg nrt_* and d_*). If we stick with that design decision, we can't change this column to display the code, as it doesn't exist in D_PATIENT or NRT_PATIENT (both of those tables only have the full marital status value, eg 'Married').

For discrepancies like these, should we be changing the person service and sp_patient_event stored proc to capture new data to make it possible?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might need to bring it up with Kasey. If the answer is "the table needs to show the code, make it happen", then I do think that modifying the upstream sp_patient_event to pull the data makes the most sense.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants