APP-356 Convert POTNTL_DUP_INV_SUM (Potential Duplicate Investigations)#3131
APP-356 Convert POTNTL_DUP_INV_SUM (Potential Duplicate Investigations)#3131krista-skylight wants to merge 50 commits intomainfrom
Conversation
…into kc/convert_POTNTL_DUP_INV_SUM merge branches
…into kc/convert_POTNTL_DUP_INV_SUM merge main
…into kc/convert_POTNTL_DUP_INV_SUM merge from main
…into kc/convert_POTNTL_DUP_INV_SUM merge with main
…into kc/convert_POTNTL_DUP_INV_SUM merge changes from main
…into kc/convert_POTNTL_DUP_INV_SUM merge from main
| for col in self.expected_columns: | ||
| data.get_column(col) | ||
|
|
||
| def test_execute_report_with_small_days_value(self): |
There was a problem hiding this comment.
what happens if a negative number of days is passed?
There was a problem hiding this comment.
Then I believe no data is returned since, in my query, I filter for anything less than the days_value. Should I have the library throw an error instead?
There was a problem hiding this comment.
the front end doesn't let you enter negatives, so I don't think it's high risk in practice. We probably just want to make sure it doesn't explode in an opaque way and if it does make it error more clearly - add a test for this case?
| assert result.subheader == 'Duplicate Investigations Time Frame: 30 Days' | ||
|
|
||
| # Just verify it runs and returns data | ||
| assert len(result.content.data) >= 0 |
There was a problem hiding this comment.
is this test functionally different than the first one?
There was a problem hiding this comment.
No, it's just testing a different days_value. delete?
| result_30 = execute_report(spec_30) | ||
|
|
||
| # 3650 days should return more or equal rows than 30 days | ||
| assert len(result.content.data) >= len(result_30.content.data) |
There was a problem hiding this comment.
I'd expect more always for that difference - do we ever see equal?
There was a problem hiding this comment.
I think it could be equal if all potential duplicate investigations in the data are already less than 30 days.
There was a problem hiding this comment.
given the data is largely a fixed data set, is that true of our data? I'd expect not given the randomness at play
There was a problem hiding this comment.
That's not true of our data. I can modify to >, but add a little comment about this.
| assert result.content_type == 'table' | ||
|
|
||
| # Verify only selected diseases appear | ||
| disease_cds = result.content.get_column('Disease Code') |
There was a problem hiding this comment.
this test seems to be testing the subset sql not really the library - remove?
Co-authored-by: Mary McGrath <m.c.mcgrath13@gmail.com>
…into kc/convert_POTNTL_DUP_INV_SUM merge main
JordanGuinn
left a comment
There was a problem hiding this comment.
This is looking good!! 🔥
| library_name: str = Field(min_length=1) | ||
| data_source_name: str = Field(min_length=1) | ||
| subset_query: str = Field(min_length=1) | ||
| days_value: int | None = None # Specific to potntl_dup_inv_sum |
There was a problem hiding this comment.
(thought, nb): Probably not worth addressing at this moment, but if there end up being multiple reports that require unique properties like this, maybe we end up sketching out a custom_props Object field or similar to capture them all? Just in the spirit of minimizing the amount of library-specific bits on the ReportSpec model.
There was a problem hiding this comment.
Yea def a good idea to revisit as we work through more translations!
| sas: | ||
| image: ghcr.io/cdcent/nbs7-sas-linux:v1.0.4 | ||
| platform: linux/amd64 | ||
| container_name: "${COMPOSE_PROJECT_NAME}-sas" |
There was a problem hiding this comment.
(q, nb): Any particular reason for this addition?
There was a problem hiding this comment.
I do not know enough to understand why but when I was troubleshooting issues with @mcmcgrath13 doing sas translations, she suggested making this change to get around an error message that I can no longer recall lol.
Mary, do you think I should push this for everyone or leave it as something only I modify?
There was a problem hiding this comment.
I think it's fine to push - makes working with the sas container easier via compose
|
|
||
| db_tables = [t['table_name'] for t in schema['tables']] | ||
| fk_tables = schema['config']['nbs']['fk_tables'] | ||
| fk_tables = schema['config'].get('nbs', {}).get('fk_tables', []) |
There was a problem hiding this comment.
(q, nb): What's this change for?
There was a problem hiding this comment.
so if there are no fk tables specified (not always relevant), then we default to empty list
| ] | ||
|
|
||
| def test_execute_report_with_days_value(self, snapshot): | ||
| """Test with a specific days value (e.g., 365 days).""" |
There was a problem hiding this comment.
(q, nb): Can/should we do any actual date comparisons of the underlying results returned by the report lib, to ensure they actually match the days value provided?
There was a problem hiding this comment.
No, unfortunately, the report doesn't retain data about the two event dates for the potential duplicate or how close in days they are to each other. That's why I did a comparison between 3650 and 30 days to check that 3650 has more potential dupilcates than 30.
|
| - column_name: ILLNESS_ONSET_DATE | ||
| type: string | ||
| data: None | ||
| null_percentage: 1.0 |
There was a problem hiding this comment.
(q, nb): why include the column, but have everything be null?
| data: f"{fake.date_between(start_date='-1y', end_date='today').strftime('%Y-%m-%d')} 12:00:00.000" | ||
| null_percentage: 0.05 | ||
|
|
||
| - column_name: EVENT_DATE_TYPE |
There was a problem hiding this comment.
should this be linked off of the non-null ness of EVENT_DATE? (similar to how in phdc the state is dependent on state cd)
|
|
||
| # KLUDGE: NULL writing is not always correct | ||
| result = result.replace(' nan,', ' NULL,') | ||
| result = result.replace('nan', ' NULL') |
There was a problem hiding this comment.
(q, nb): why did we need to add this one? there's a risk that a valid part of a string with nan in it ill now be turned into NULL is it the opening paren case of (nan,?
| with db_transaction(conn_string) as trx: | ||
| # Tables with foreign keys pointing to the table we want to replace need to | ||
| # be backed up and cleared out to avoid FK constraint violations | ||
|
|




Description
Please include a summary of the changes and any key information a reviewer may need.
Tickets
Checklist for adding a library:
apps/modernization-api/src/main/resources/db/changelog/report/execution/03_ODSE_Data_Report_Library_Init.sqlapps/modernization-api/src/main/resources/db/report/execution/librariesnamed<your library name>.sqlapps/modernization-api/src/main/resources/db/report-execution-changelog.yml. It should be added to the latestchangeSetsince the last release - this could require a newchangeSetif there isn't one since last releaseapps/report-execution/src/librariesnamed in lowercase, but generally following the naming convention of SAS (needs to be human recognizable as the same library)executefunction is the required method and its signature will (someday) be checked for validityapps/report-execution/tests/libraries/<your_library_here>pyfile following the conventions established for other librariessubset_sqlcan be assumed to be well tested by the modernization-api, so focus on any logic and additional joins/queries/analysis that is added in the library