Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
50 commits
Select commit Hold shift + click to select a range
60e01c0
chore: initial copies of new library modules
Apr 1, 2026
4970f48
Merge branch 'main' of https://github.com/CDCgov/NEDSS-Modernization …
Apr 6, 2026
a8399b9
feat: tablefaker schema for dup_inv_sum and initial conversion and tests
Apr 6, 2026
a62cbfe
chore: add changelog and sql file for potntl dup inv
Apr 7, 2026
c73d401
chore: merge main branch
Apr 8, 2026
e7ad140
Merge branch 'main' of https://github.com/CDCgov/NEDSS-Modernization …
Apr 9, 2026
fdce598
chore: switch to rdb database
Apr 9, 2026
fae8225
chore: more nan handling and add INVESTIGATION_KEY
Apr 9, 2026
fad7088
chore: new snapshot
Apr 10, 2026
ba5b29b
chore: changes from main
Apr 13, 2026
b01e674
chore: try getting rid of datetime to pass tests
Apr 13, 2026
a7a9b99
Merge branch 'main' of https://github.com/CDCgov/NEDSS-Modernization …
Apr 15, 2026
1aabe0d
tests: data type fixes and typo fixes to make tests run
Apr 15, 2026
44e6f1c
chore: fixes to subheader, add all needed columns, handle days_value …
Apr 15, 2026
0679b42
tests: fix 30 day assertions. switch to triple quotes
Apr 16, 2026
09ffc64
chore: if no fktables
Apr 16, 2026
3fab55c
tests: rewrite without the extra fields
Apr 16, 2026
ecd93e6
chore: change fk_table logic and go back to 3650 days default
Apr 16, 2026
501788f
chore: pull main merge changes
Apr 17, 2026
f6610c0
Merge branch 'main' of https://github.com/CDCgov/NEDSS-Modernization …
Apr 21, 2026
f2fd4bb
chore: changes to get sas to work
Apr 21, 2026
3349cd9
chore: rework without TimeRange
Apr 21, 2026
22764c7
chore: rename files
Apr 21, 2026
f64f9a9
chore: more renaming
Apr 21, 2026
e375a87
tests: tablefaker schema matches actual data
Apr 22, 2026
e6359e9
chore: date formats that actually work
Apr 22, 2026
ca04489
chore: change migration filenames
Apr 22, 2026
7f3f99d
chore: bring in changes from main
Apr 23, 2026
d239bc3
Merge branch 'main' of https://github.com/CDCgov/NEDSS-Modernization …
Apr 24, 2026
9ba225d
Merge branch 'main' of https://github.com/CDCgov/NEDSS-Modernization …
Apr 29, 2026
bf6e1a5
Merge branch 'main' of https://github.com/CDCgov/NEDSS-Modernization …
May 4, 2026
688dc51
feat: new report sort
May 4, 2026
287be73
chore: linter fixes
May 4, 2026
3ecaf99
chore: a way to handle kwargs
May 4, 2026
a4a9dce
feat: get sql to ascii sort
May 4, 2026
db9e350
chore: linter fixes
May 4, 2026
7ad22fe
Merge branch 'main' of https://github.com/CDCgov/NEDSS-Modernization …
May 4, 2026
aee7399
chore: another linter fix
May 4, 2026
5201f3e
Update apps/report-execution/src/libraries/potntl_dup_inv_sum.py
krista-skylight May 5, 2026
9ea33ba
Merge branch 'main' of https://github.com/CDCgov/NEDSS-Modernization …
May 5, 2026
6c5fa6b
chore: remove execute_kwargs, add kwargs to all the lambda functions
May 5, 2026
95d38b6
tests: no days defaults to 3650
May 5, 2026
da5bf06
chore: remove disease code check
May 5, 2026
d814b1d
chore: linter fixes
May 5, 2026
1293fe3
chore: linter fixes
May 5, 2026
86f46a3
Merge branch 'main' of https://github.com/CDCgov/NEDSS-Modernization …
May 6, 2026
6bb51ec
tests: add a negative days value test
May 6, 2026
3a0e74d
tests: remove small days value test
May 6, 2026
955a6b8
tests: remove disease filter test, change to >
May 6, 2026
756f107
chore: linter fixes
May 6, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,9 @@ databaseChangeLog:
- sqlFile:
path: db/report/execution/libraries/nbs_sr_13.sql
splitStatements: false
- sqlFile:
path: db/report/execution/libraries/potntl_dup_inv_sum.sql
splitStatements: false
- sqlFile:
path: db/report/execution/libraries/nbs_sr_12.sql
splitStatements: false
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
-- Migrate the NBSCUSTOM.SAS library to the nbs_sr_05 python library

USE [NBS_ODSE]

DECLARE @pyLib VARCHAR(50) = 'potntl_dup_inv_sum'
DECLARE @sasLib VARCHAR(50) = 'POTNTL_DUP_INV_SUM.SAS'
DECLARE @desc VARCHAR(300) = 'Potential Duplicate Investigations - Identifies potential duplicate investigations for the same patient with the
same disease within a user-specified number of days.'

IF EXISTS (SELECT * FROM [dbo].[Report_Library] WHERE UPPER(library_name) = @sasLib)
BEGIN
UPDATE [dbo].[Report_Library]
SET
library_name = @pyLib,
runner = 'python',
desc_txt = @desc,
last_chg_time = CURRENT_TIMESTAMP,
last_chg_user_id = 99999999
WHERE
UPPER(library_name) = @sasLib;
END
ELSE
BEGIN
-- Create a row for this library
INSERT INTO [dbo].[Report_Library] (
library_name,
desc_txt,
runner,
is_builtin_ind,
add_time,
add_user_id,
last_chg_time,
last_chg_user_id
) VALUES (
@pyLib,
@desc,
'python',
'Y',
CURRENT_TIMESTAMP,
99999999,
CURRENT_TIMESTAMP,
99999999
);
END
1 change: 1 addition & 0 deletions apps/report-execution/src/execute_report.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@ def execute_report(report_spec: models.ReportSpec):
trx,
subset_query=report_spec.subset_query,
data_source_name=report_spec.data_source_name,
days_value=report_spec.days_value,
)

check_valid_result(result, report_spec)
Expand Down
121 changes: 121 additions & 0 deletions apps/report-execution/src/libraries/potntl_dup_inv_sum.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,121 @@
from src.db_transaction import Transaction
from src.models import ReportResult


def execute(
trx: Transaction,
subset_query: str,
data_source_name: str,
days_value: None | int,
**kwargs,
Comment thread
krista-skylight marked this conversation as resolved.
):
"""Potential Duplicate Investigations.

Identifies potential duplicate investigations for the same patient,
with the same disease, within a user-specified number of days.
"""
# Only use default if days_value is None (not provided)
# If days_value is 0, treat it as 0 (not default)
# days_value = kwargs.get('days_value')
if days_value is None:
days_value = 3650

full_query = f"""
WITH subset AS ({subset_query})
-- Capture SQL Server's physical row order
, source_order AS (
SELECT
*,
ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS sas_row_num
FROM subset
)
, clean_data AS (
SELECT
PATIENT_LOCAL_ID,
PATIENT_FIRST_NAME,
PATIENT_LAST_NAME,
PATIENT_DOB,
INVESTIGATION_LOCAL_ID,
DISEASE,
CASE_STATUS,
EVENT_DATE,
EVENT_DATE_TYPE,
MMWR_YEAR,
NOTIFICATION_STATUS,
DISEASE_CD,
sas_row_num
FROM source_order
WHERE EVENT_DATE IS NOT NULL
AND PATIENT_LOCAL_ID IS NOT NULL
AND DISEASE_CD IS NOT NULL
)
-- Calculate days since previous and until next event
, datediff_calc AS (
SELECT
*,
DATEDIFF(day,
LAG(EVENT_DATE) OVER (
PARTITION BY
PATIENT_LOCAL_ID,
DISEASE_CD
ORDER BY EVENT_DATE, sas_row_num
),
EVENT_DATE
) AS days_since_prev,
DATEDIFF(day,
EVENT_DATE,
LEAD(EVENT_DATE) OVER (
PARTITION BY PATIENT_LOCAL_ID,
DISEASE_CD
ORDER BY EVENT_DATE, sas_row_num
)
) AS days_until_next
FROM clean_data
)
-- Count events for each patient and disease to identify potential duplicates
, event_counts AS (
SELECT
PATIENT_LOCAL_ID,
DISEASE_CD,
COUNT(*) AS event_count
FROM clean_data
GROUP BY PATIENT_LOCAL_ID, DISEASE_CD
)
-- Final selection of potential duplicates based on days thresholds
SELECT
d.PATIENT_LOCAL_ID AS [Patient Local ID],
d.PATIENT_FIRST_NAME AS [Patient First Name],
d.PATIENT_LAST_NAME AS [Patient Last Name],
d.PATIENT_DOB AS DOB,
d.INVESTIGATION_LOCAL_ID AS [Investigation Local ID],
d.DISEASE AS Disease,
d.CASE_STATUS AS [Case Status],
d.EVENT_DATE AS [Event Date],
d.EVENT_DATE_TYPE AS [Event Date Type],
d.MMWR_YEAR AS [MMWR Year],
d.NOTIFICATION_STATUS AS [Notification Record Status],
d.DISEASE_CD AS [Disease Code]
FROM datediff_calc d
JOIN event_counts c
ON d.PATIENT_LOCAL_ID = c.PATIENT_LOCAL_ID
AND d.DISEASE_CD = c.DISEASE_CD
WHERE c.event_count > 1
AND (
(d.days_since_prev IS NOT NULL AND d.days_since_prev <= {days_value})
OR (d.days_until_next IS NOT NULL AND d.days_until_next <= {days_value})
)
ORDER BY
d.PATIENT_LOCAL_ID COLLATE Latin1_General_BIN,
d.DISEASE_CD COLLATE Latin1_General_BIN,
d.EVENT_DATE,
d.sas_row_num
"""

content = trx.query(full_query)

header = 'Potential Duplicate Investigations'
subheader = f'Duplicate Investigations Time Frame: {days_value} Days'

return ReportResult(
content_type='table', content=content, header=header, subheader=subheader
)
1 change: 1 addition & 0 deletions apps/report-execution/src/models.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ class ReportSpec(BaseModel):
library_name: str = Field(min_length=1)
data_source_name: str = Field(min_length=1)
subset_query: str = Field(min_length=1)
days_value: int | None = None # Specific to potntl_dup_inv_sum
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(thought, nb): Probably not worth addressing at this moment, but if there end up being multiple reports that require unique properties like this, maybe we end up sketching out a custom_props Object field or similar to capture them all? Just in the spirit of minimizing the amount of library-specific bits on the ReportSpec model.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea def a good idea to revisit as we work through more translations!



# column names and values
Expand Down
4 changes: 3 additions & 1 deletion apps/report-execution/tests/conftest.py
Original file line number Diff line number Diff line change
Expand Up @@ -153,6 +153,7 @@ def get_faker_sql(schema_name: str) -> str:

# KLUDGE: NULL writing is not always correct
result = result.replace(' nan,', ' NULL,')
result = result.replace('nan', ' NULL')
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(q, nb): why did we need to add this one? there's a risk that a valid part of a string with nan in it ill now be turned into NULL is it the opening paren case of (nan,?

result = result.replace(' nan)', ' NULL)')
result = result.replace(' <NA>,', ' NULL,')
result = result.replace(' <NA>)', ' NULL)')
Expand All @@ -167,7 +168,7 @@ def get_tables_from_faker(schema_name: str) -> tuple[list[str], list[str]]:
schema = yaml.safe_load(f.read())

db_tables = [t['table_name'] for t in schema['tables']]
fk_tables = schema['config']['nbs']['fk_tables']
fk_tables = schema['config'].get('nbs', {}).get('fk_tables', [])
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(q, nb): What's this change for?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so if there are no fk tables specified (not always relevant), then we default to empty list


return (db_tables, fk_tables)

Expand Down Expand Up @@ -234,6 +235,7 @@ def insert_fake_data(
with db_transaction(conn_string) as trx:
# Tables with foreign keys pointing to the table we want to replace need to
# be backed up and cleared out to avoid FK constraint violations

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change

for fk_table in fk_tables:
temp_fk_table = temp_name(fk_table)
trx.execute(
Expand Down
Loading
Loading