Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Info on script 00 and best practices/known issues #33

Open
wants to merge 4 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
33 changes: 33 additions & 0 deletions CureID Cohort Creation Guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,8 @@ The following scripts are to be run on a site’s full OMOP dataset in order to
## Instructions
Replace the database name and schema in each of these scripts with your own, then run the cohort creation and deidentification scripts in the following sequence:

0. Create Concept Table (Filename: 00_CURE_ID_create_concept_table.sql)

1. Cohort Creation (Filename: 01_CURE_ID_Cohort.sql)

2. Generate CURE ID Tables (Filename: 02_CURE_ID_All_Tables.sql)
Expand All @@ -26,6 +28,37 @@ Replace the database name and schema in each of these scripts with your own, the

## OMOP Cohort Creation and Deidentification Process

### 0. Create Concept Table Script

**Filename**: 00_CURE_ID_create_concept_table.sql

**Purpose**: This script creates a table of standard concepts required for the CureID Registry project. It is used in conjunction with CONCEPT_ANCESTOR table in 02_CURE_ID_All_Tables.sql script.

**Description**: Fields particularly important to the process are "is_standard" and "include_descendants".

"is_standard" determines the standardization of the concept, either: a "C", "S", or "N"
-- C is for classification. This concept will not be used, but it may have useable descendants
-- S is for Standard. These codes will be used. They may or may not have descendants
-- N is Non-standard. These codes will not be used. If they have descendants

"include_descendants" determines whether the script should look for descendents
-- Values are either TRUE or FALSE

They will be used in 02_CURE_ID_All_Tables.sql in the FROM clauses:
Measurement example:
INNER JOIN omop.CONCEPT_ANCESTOR
ON descendant_concept_id = m.measurement_concept_id
INNER JOIN [Results].[cure_id_concepts]
ON ancestor_concept_id = concept_id
WHERE
domain = 'Measurement'
AND (include_descendants = 'TRUE' OR ancestor_concept_id = descendant_concept_id)

If "include_descendants" is either 'TRUE' or if the ancestor_concept_id is the same as descendant_concept_id,
the concept will be used. The "is_standard" field is informational only and does not participate in the script.

**Dependencies**: None

### 1. Cohort Creation Script

**Filename**: 01_CURE_ID_Cohort.sql
Expand Down
37 changes: 37 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,28 @@ The Cohort is comprised of the anonymized person_id, birthdate, and first date o

## Explanation of the Curation Script Files:

**00 - Create Concept Table**
- Creates a table of standard concepts required for the CureID Registry project.
- It is used in conjunction with CONCEPT_ANCESTOR table in 02_CURE_ID_All_Tables.sql script
- Fields particularly important to the process are "is_standard" and "include_descendants".
- "is_standard" determines the standardization of the concept, either: a "C", "S", or "N"
- C is for classification. This concept will not be used, but it may have useable descendants
- S is for Standard. These codes will be used. They may or may not have descendants
- N is Non-standard. These codes will not be used. If they have descendants
- "include_descendants" determines whether the script should look for descendents
- Values are either TRUE or FALSE
- These will be used in 02_CURE_ID_All_Tables.sql in the FROM clauses:
Measurement example:
INNER JOIN omop.CONCEPT_ANCESTOR
ON descendant_concept_id = m.measurement_concept_id
INNER JOIN [Results].[cure_id_concepts]
ON ancestor_concept_id = concept_id
WHERE
domain = 'Measurement'
AND (include_descendants = 'TRUE' OR ancestor_concept_id = descendant_concept_id)
- If "include_descendants" is either 'TRUE' or if the ancestor_concept_id is the same as descendant_concept_id,
the concept will be used. The "is_standard" field is informational only and does not participate in the script.

**01 - Create Cohort**
- Identifies all patients with a positive lab result measurement, patient_id and first positive lab result
- Identifies all patients with a "strong" or "weak" COVID diagnosis based on condition codes
Expand Down Expand Up @@ -85,3 +107,18 @@ Say we want to add a concept X into the set.
6. Click the "CSV" button. This will download these concepts to a CSV file.
7. Override the corresponding "cure_id_{domain}.csv" file in the repo.
8. Save the concept set in ATLAS, push changes to repo.

--------------------------------------------------------------------------------------------------

## Best Practices/Guidance

Do install the Data Quality Dashboard early on in the process.

Don’t include source values in your export - they might include PHI.

Don’t forget to check GitHub for the most recent version of the script before you run it.

Don’t send your data to the coordinating center until the tech team has a chance to review it with you in a live session.

Duplicated results have been reported at some sites when running script 02_CURE_ID_ALL_Tables.sql. However, at this time no root cause has been found. There is some belief that adding DISTINCT to the various SELECT statements could resolve the issue, while there is some concern over the latest updates to script 00_CURE_ID_create_concept_table.sql as it has standard concepts as well as standard descendants. Be sure to review your data carefully and report any signs of duplication so that it can be investigated.