covid case datamart fixes#770
Conversation
…datamart Refactored sp_covid_case_datamart_postprocessing to ensure reliable execution and data integrity: - Resolved 'Column specified multiple times' errors by adding DISTINCT to all metadata selection and PIVOT subqueries. - Fixed database name mismatch by pointing INFORMATION_SCHEMA queries to the local context instead of the legacy rdb database. - Prevented silent failures in the final INSERT by wrapping dynamic column concatenations in ISNULL to stop NULL propagation. - Improved schema compatibility by implementing dynamic CAST logic for VARCHAR columns based on actual table metadata. - Optimized performance by replacing the row-by-row PIVOT logic with set-based distinct selections. These changes ensure that the RTR pipeline correctly populates the COVID_CASE_DATAMART table when triggered by incoming Kafka events.
Update the SQL setup script for the COVID-19 case datamart unit test to include population of the shared_ind, case_class_cd, and investigation_status_cd columns. This ensures the test state accurately reflects required database fields for the reporting pipeline.
Update the `nrt_page_case_answer` setup script for the `covidCaseDatamart` unit test to include `nbs_ui_metadata_uid` and `nbs_rdb_metadata_uid`. This aligns the unit test seed data with recent schema changes in the modernized reporting database to prevent insertion failures during test execution.
Update the unit test query for covidCaseDatamart to explicitly reference the rdb_modern database instead of the default schema. This ensures the test aligns with the project's modern reporting database structure.
Update the covidCaseDatamart unit test query to execute the sp_covid_case_datamart_postprocessing stored procedure. This ensures the target datamart table is populated before the test performs its assertions.
…data Expand the `covidCaseDatamart` unit test to include comprehensive seed data and verify a wider range of fields populated by the post-processing stored procedure. Changes include: - Updating `setup.sql` with extensive seed data for ODSE, NRT, and D tables, including symptoms, demographics, and notifications. - Modifying `query.sql` to execute the `sp_covid_case_datamart_postprocessing` stored procedure. - Expanding `expected.json` to assert the correctness of the additional fields, such as transmission mode, MMWR data, and clinical symptoms.
…d modern covid_case_datamart stored procs
|
Github will not allow me to add a comment to the code -- but regarding the removal of the TestContainers usage for the UnitTest.java: With the complete removal of this, it is now Thoughts on having an environment variable to control wether or not to use TestContainers for the database? |
I didn't fully think through how difference between functional and unit setup will impact development workflows. Let me sit on that for a bit, and experiment with bringing just the nbs-mssql container back in a TestContainer |
| cast(pat.PATIENT_DECEASED_INDICATOR as varchar(20)) AS 'PATIENT_DECEASED_IND', | ||
| cast(nrtPat.DECEASED_IND_CD as varchar(20)) AS 'PATIENT_DECEASED_IND', | ||
| pat.PATIENT_DECEASED_DATE AS 'PATIENT_DECEASED_DT', | ||
| pat.PATIENT_MARITAL_STATUS AS 'PATIENT_MARITAL_STS', |
There was a problem hiding this comment.
@CDCgov/nbs-dragon When the covid19 ETL runs in NBS6, PATIENT_MARITAL_STATUS is populated with a code and not a description (eg 'M' vs 'Married').
From what I can tell of the postprocessing stored procs, there seems to be a deliberate decision to only pull data from other RDB_MODERN tables (eg nrt_* and d_*). If we stick with that design decision, we can't change this column to display the code, as it doesn't exist in D_PATIENT or NRT_PATIENT (both of those tables only have the full marital status value, eg 'Married').
For discrepancies like these, should we be changing the person service and sp_patient_event stored proc to capture new data to make it possible?
There was a problem hiding this comment.
Might need to bring it up with Kasey. If the answer is "the table needs to show the code, make it happen", then I do think that modifying the upstream sp_patient_event to pull the data makes the most sense.
Description
Fixing a bug that prevents covid case datamart from running cleanly and converting some columns in those tables to display codes rather than descriptions.
Related Issue
APP-460
Additional Notes
004-nrt_lab_test_result_group_key-001.sqlonboarding script to fix a potential primary key violation that could occur during database updates or re-onboarding by strictly enforcing uniqueness based on the table's primary identity column (this allows us to re-run the migration on the database without failure).DataDrivenUnitTests.javato use a transaction block to make the tests more robust by guaranteeing a clean database state for every iteration. This allows us to reuse the same primary keys from test to test.dotenvplugin to automatically load variables from devs.envfile into the application-test.yaml. This allows developers to override testing settings before running gradle tests locally.Checklist