diff --git a/.github/workflows/pkgdown.yaml b/.github/workflows/pkgdown.yaml new file mode 100644 index 0000000..83fa4ef --- /dev/null +++ b/.github/workflows/pkgdown.yaml @@ -0,0 +1,51 @@ +# Workflow derived from https://github.com/r-lib/actions/tree/v2/examples +# Need help debugging build failures? Start at https://github.com/r-lib/actions#where-to-find-help +on: + push: + branches: [main, develop] + release: + types: [published] + workflow_dispatch: + +name: pkgdown + +jobs: + pkgdown: + runs-on: ubuntu-latest + # Only restrict concurrency for non-PR jobs + concurrency: + group: pkgdown-${{ github.event_name != 'pull_request' || github.run_id }} + env: + GITHUB_PAT: ${{ secrets.GITHUB_TOKEN }} + steps: + - uses: actions/checkout@v4 + + - uses: r-lib/actions/setup-pandoc@v2 + + - uses: r-lib/actions/setup-r@v2 + with: + use-public-rspm: true + + - uses: r-lib/actions/setup-r-dependencies@v2 + with: + cache: always + extra-packages: any::pkgdown, ohdsi/OhdsiRTools + needs: website + + - uses: lycheeverse/lychee-action@v2 + with: + args: --base . --verbose --no-progress --accept '100..=103, 200..=299, 403' './**/*.md' './**/*.Rmd' + + - name: Build site + run: Rscript -e 'pkgdown::build_site_github_pages(new_process = FALSE, install = TRUE)' + + - name: Fix Hades Logo + run: Rscript -e 'OhdsiRTools::fixHadesLogo()' + + - name: Deploy to GitHub pages 🚀 + if: github.event_name != 'pull_request' + uses: JamesIves/github-pages-deploy-action@v4 + with: + clean: false + branch: gh-pages + folder: docs diff --git a/_pkgdown.yml b/_pkgdown.yml index 5e3ef84..4735fda 100644 --- a/_pkgdown.yml +++ b/_pkgdown.yml @@ -1,6 +1,12 @@ template: + bootstrap: 5 params: bootswatch: cosmo + light-switch: false + +development: + mode: auto + development: docs/dev home: links: @@ -9,29 +15,27 @@ home: navbar: structure: - left: - - home - - intro - - reference - - articles - - news + left: [home, intro, reference, articles, news] right: [hades, github] components: home: icon: fa-home fa-lg href: index.html reference: + icon: fa-info-circle fa-lg text: Reference href: reference/index.html intro: + icon: fa-download fa-lg text: Get started href: articles/InstallationGuide.html news: + icon: fa-newspaper-o fa-lg text: Changelog href: news/index.html github: icon: fa-github fa-lg - href: https://github.com/OHDSI/PatientLevelPrediction + href: https://github.com/OHDSI/Characterization hades: text: hadesLogo href: https://ohdsi.github.io/Hades diff --git a/docs/404.html b/docs/404.html deleted file mode 100644 index 47c5d24..0000000 --- a/docs/404.html +++ /dev/null @@ -1,133 +0,0 @@ - - -
- - - - -vignettes/InstallationGuide.Rmd
- InstallationGuide.Rmd
This vignette describes how you need to install the Observational
-Health Data Sciences and Informatics (OHDSI) Characterization
-package under Windows, Mac, and Linux.
Under Windows the OHDSI Characterization package requires -installing:
-Under Mac and Linux the OHDSI Characterization package requires -installing:
-The preferred way to install the package is by using
-remotes
, which will automatically install the latest
-release and all the latest dependencies.
If you do not want the official release you could install the -bleeding edge version of the package (latest develop branch).
-Note that the latest develop branch could contain bugs, please report -them to us if you experience problems.
-To install using remotes
run:
-install.packages("remotes")
-remotes::install_github("OHDSI/Characterization")
When installing make sure to close any other Rstudio sessions that
-are using Characterization
or any dependency. Keeping
-Rstudio sessions open can cause locks that prevent the package
-installing.
Installation issues need to be posted in our issue tracker: http://github.com/OHDSI/Characterization/issues
-The list below provides solutions for some common issues:
-If you have an error when trying to install a package in R saying
-‘Dependency X not available …’ then this can sometimes
-be fixed by running install.packages('X')
and then once
-that completes trying to reinstall the package that had the
-error.
I have found that using the github `remotes`` to install packages -can be impacted if you have multiple R sessions open as -one session with a library open can cause the library to be locked and -this can prevent an install of a package that depends on that -library.
Considerable work has been dedicated to provide the
-Characterization
package.
-citation("Characterization")
##
-## To cite package 'Characterization' in publications use:
-##
-## Reps J, Ryan P, Knoll C (2024). _Characterization: Characterizations
-## of Cohorts_. https://ohdsi.github.io/Characterization,
-## https://github.com/OHDSI/Characterization.
-##
-## A BibTeX entry for LaTeX users is
-##
-## @Manual{,
-## title = {Characterization: Characterizations of Cohorts},
-## author = {Jenna Reps and Patrick Ryan and Chris Knoll},
-## year = {2024},
-## note = {https://ohdsi.github.io/Characterization, https://github.com/OHDSI/Characterization},
-## }
-vignettes/Specification.Rmd
- Specification.Rmd
Summary data.frame with the counts of how often an outcome occurred -within a time-period relative to the first target index date for each -combination of target and outcome. The counts are stratified by whether -the outcome was first event or subsequent and the timing category for -when the outcome occurred (before first target exposure, during first -target exposure, during a subsequent target exposure, between target -exposures and after last target exposure).
-Here we consider the inputs are:
- -Consider we have five patients; the target cohort (dates each of the -five patients are exposed to a drug) is in Table 1 and the outcome -cohort (dates each of the five patients have the outcome) is in Table 2. -This is also illustrated in Figures 1 and 2.
-patientId | -cohortDefinitionId | -cohortStartDate | -cohortEndDate | -
---|---|---|---|
1 | -1 | -2001-01-20 | -2001-01-25 | -
1 | -1 | -2001-10-20 | -2001-12-05 | -
2 | -1 | -2005-09-10 | -2005-09-15 | -
3 | -1 | -2004-04-02 | -2004-05-17 | -
4 | -1 | -2002-03-03 | -2002-06-12 | -
4 | -1 | -2003-02-01 | -2003-02-30 | -
4 | -1 | -2003-08-04 | -2003-08-24 | -
5 | -1 | -2005-02-01 | -2005-10-08 | -
5 | -1 | -2007-04-03 | -2007-05-03 | -
patientId | -cohortDefinitionId | -cohortStartDate | -cohortEndDate | -
---|---|---|---|
1 | -2 | -1999-10-03 | -1999-10-08 | -
1 | -2 | -2001-10-30 | -2001-11-07 | -
3 | -2 | -2004-05-16 | -2004-05-18 | -
4 | -2 | -2002-06-03 | -2002-06-14 | -
4 | -2 | -2003-02-20 | -2003-03-01 | -
5 | -2 | -2006-07-21 | -2006-08-03 | -
5 | -2 | -2008-01-01 | -2008-01-09 | -
For all rows in the outcome table, we calculate the time between the -patients first exposure in the target cohort and the outcome date -(time-to-event), we classify the ‘type’ as the timing of when the -outcome occurs with ‘before first exposure’ meaning the outcome occurs -before the patient is observed in the target cohort, ‘during first’ -means the outcome occurs during the first target cohort exposure, -‘between eras’ means the outcome occurs between target exposures, -‘during subsequent’ means the outcome occurs during a non-first target -exposure and ‘after last exposure’ means the outcome occurs after the -last exposure in the target cohort’s end date for the patient. The -outcome type is classified whether the outcome is the ‘first occurrence’ -or a ‘subsequent occurrence’. Let’s consider patient 1, he has the -outcome twice. The first outcome occurs 475 days before his first target -exposure and his second outcome occurs 283 days after his first target -exposure. The second outcome for patient 1 occurs during a subsequent -target exposure era (not the first). Patient 2 does not have the outcome -so does not contribute to the time-to-event. Patient 3 has her first -(and only) outcome during the first exposure to the drug and 44 days -after she started the drug for the first time. Patient 4 has the outcome -twice, 92 days after the first exposure to the drug and 354 days after. -The first time she has the outcome is during the first exposure to the -drug and the subsequent time she has the outcome is during her second -exposure (subsequent exposure). Patient 5 has the outcome twice, 535 -days after he is first exposure to the drug and 1064 days after. The -first time he has the outcome occurs between drug exposure eras and the -subsequent outcome occurs after the last exposure era. This is -summarized in Table 3.
-patientId | -outcomeDate | -firstExposureDate | -timeToEvent | -type | -outcomeType | -
---|---|---|---|---|---|
1 | -1999-10-03 | -2001-01-20 | --475 | -Before first exposure | -First | -
1 | -2001-10-30 | -2001-01-20 | -283 | -During subsequent | -Subsequent | -
3 | -2004-05-16 | -2004-04-02 | -44 | -During first | -First | -
4 | -2002-06-03 | -2002-03-03 | -92 | -During first | -First | -
4 | -2003-02-20 | -2002-03-03 | -354 | -During subsequent | -Subsequent | -
5 | -2006-07-21 | -2005-02-01 | -535 | -Between eras | -First | -
5 | -2008-01-01 | -2005-02-01 | -1064 | -After last exposure | -Subsequent | -
The time-to-event output aggregates the summary table into three -different perspectives:
-1-day aggregate – this calculates the total number of patients -that have the outcome at each time-to-event day grouped by type and -outcome type. Only looks at outcomes between -100 days and 100 days for -the time-to-event.
30-day aggregate – this calculates the total number of patients -that have the outcome at each 30-day sliding window for time-to-event -(e.g., 0-29, 30-59, 60-89, etc.) grouped by type and outcome type. Only -looks at outcomes between -1095 days and 1095 days for the -time-to-event.
365-day aggregate – this calculates the total number of patients -that have the outcome at each 365-day sliding window for time-to-event -(e.g., 0-365, 366-730, 731-1095, etc.) grouped by type and outcome type. -Only looks at outcomes between -1095 days and 1095 days for the -time-to-event.
The summary results that would be output by time-to-event are -displayed in Table 4.
-timeType | -Type | -outcomeType | -timeStart | -timeEnd | -count | -
---|---|---|---|---|---|
1-day | -During first | -First | -44 | -44 | -1 | -
1-day | -During first | -First | -92 | -92 | -1 | -
30-day | -Before first exposure | -First | --481 | --450 | -1 | -
30-day | -During first | -First | -31 | -60 | -1 | -
30-day | -During first | -First | -91 | -120 | -1 | -
30-day | -During subsequent | -Subsequent | -271 | -300 | -1 | -
30-day | -During subsequent | -Subsequent | -331 | -360 | -1 | -
30-day | -Between eras | -First | -511 | -540 | -1 | -
30-day | -After last exposure | -Subsequent | -1051 | -1080 | -1 | -
365-day | -Before first exposure | -First | --731 | --365 | -1 | -
365-day | -During first | -First | -1 | -365 | -2 | -
365-day | -During subsequent | -Subsequent | -1 | -365 | -2 | -
365-day | -Between eras | -First | -366 | -730 | -1 | -
365-day | -After last exposure | -Subsequent | -731 | -1095 | -1 | -
A vector of targetIds, a vector of outcomeIds, an integer -dechallengeStopInterval and an integer dechallengeEvaluationWindow.
-A summary data.frame with the number of dechallenge and rechallenge -attempts per target and outcome combination plus the number of -dechallange/rechallenge attempts that were successes and failures.
-The dechallenge-rechallenge analysis finds out how often a patient -stops the drug due to the occurrence of an outcome and whether the -outcome stops, then it looks at whether people re-exposed have the -outcome start again. In observational data we infer these situations by -finding cases where a patient has the outcome recorded during a drug -exposure and seems to stop the drug within <dechallenge stop interval -days – default 30 days> after the outcome occurs. For patients who -have a dechallenge, we then determine whether it is a success (the -outcome stops) or a failure (the outcome continues). This is determined -by seeing whether the outcome starts within <decallenge evaluation -window days – default 30 days> after the exposure ends (outcome -starting is a dechallenge failure otherwise it is a success). For -patients who had a dechallenge, we then look at whether they have -another exposure (after decallenge evaluation window days from the first -exposure end), which is a rechallenge and this is classed as a failure -(if the outcome does not start during the rechallenge exposure era) and -a success (if the outcome does occur during the rechallenge exposure -era).
- - -patientId | -cohortDefinitionId | -cohortStartDate | -cohortEndDate | -
---|---|---|---|
1 | -1 | -2001-01-20 | -2001-01-25 | -
1 | -1 | -2001-10-20 | -2001-12-05 | -
2 | -1 | -2005-09-10 | -2005-09-15 | -
2 | -1 | -2006-03-04 | -2006-03-21 | -
2 | -1 | -2006-05-03 | -2006-05-05 | -
3 | -1 | -2004-04-02 | -2004-05-17 | -
4 | -1 | -2002-03-03 | -2002-06-12 | -
4 | -1 | -2003-02-01 | -2003-02-30 | -
4 | -1 | -2003-08-04 | -2003-08-24 | -
5 | -1 | -2005-02-01 | -2005-10-08 | -
5 | -1 | -2007-04-03 | -2007-05-03 | -
patientId | -cohortDefinitionId | -cohortStartDate | -cohortEndDate | -
---|---|---|---|
1 | -2 | -1999-10-03 | -1999-10-08 | -
1 | -2 | -2001-10-30 | -2001-11-07 | -
3 | -2 | -2004-05-16 | -2004-05-18 | -
4 | -2 | -2002-06-03 | -2002-06-14 | -
4 | -2 | -2003-02-20 | -2003-03-01 | -
5 | -2 | -2006-07-21 | -2006-08-03 | -
5 | -2 | -2008-01-01 | -2008-01-09 | -
Let’s consider ten patients in Table 5 and Table 6 with 30 days for -the dechallenge stop interval and 31 days for the decallenge evaluation -window. First, find all cases where the outcome occurs during any -exposure era and then the exposure ends within 30 days after the outcome -start. These are the dechallenges. Then investigate whether a new -outcome starts within 31 days of the exposure era ending. These are the -failed dechallenges, otherwise the dechallenge is a success. Next, for -the dechallenges, find any drug exposures that occur more than 31 days -after the dechallenge exposure era end. These are rechallaneges. For -each rechallenge, determine whether the outcome starts within 31 days of -the rechallenge exposure era start. If an outcome occurs, the -rechallenge is a success, otherwise it is a failure.
-patientId | -outcomeDate | -exposureEnd | -outcomeAfter | -futureExposure | -futureOutcome | -dechallengeType | -rechallengeType | -
---|---|---|---|---|---|---|---|
1 | -2001-11-30 | -2001-12-05 | -- | -- | -- | -Success | -- | -
2 | -2006-03-10 | -2006-03-21 | -- | -2006-05-03 | -- | -Seccess | -Success | -
3 | -2004-05-16 | -2004-05-17 | -2004-01-12 | -- | -- | -Fail | -- | -
4 | -2002-06-03 | -2002-06-12 | -- | -2003-01-01 | -2003-02-20 | -Success | -Fail | -
We would then summarize the results by saying there were 4 -dechallenges, 3 of which were a success and 1 of which was a fail. 2 -patients had rechallenges with 1 being a fail and 1 being a success, see -Table 8 as the example output for one target and outcome.
-dechallengeAttempts | -dechallengeSuccess | -dechallengeFailure | -rechallengeAttempts | -rechallengeSuccess | -rechallangeFailure | -
---|---|---|---|---|---|
4 | -3 | -1 | -2 | -1 | -1 | -
Note: The way an outcome and exposure phenotype are -designed can make it impossible or unlikely to see a dechallenge fail. -For example, if an outcome is designed with a 365 day washout window, -then it means there cannot be another outcome occurring within 365 days -of another outcome. As a dechallange failure is the outcome occurring -within dechallenge evaluation window days after the exposure ends (and -the exposure must end within stop interval days of the outcome to be a -dechallenge), then using the defaults for these values means a -dechallenge failure requires an outcome to be possible within 60 days of -the dechallenge outcome, which is impossible with a 365 washout -window.
-A vector of targetIds plus the minimum prior observation required for -the target cohorts and specifying which features to extract -(covariateSettings).
-For each target cohort restricted to only the patients with minimum -prior observation at index and first occurrence the mean value for each -feature of interest is extracted into a data.frame.
-The aggregate covariates calculates the mean value of a feature -within a cohort of patients. In this analysis we restrict to first -occurrence in the cohort with a minimum prior observation in days -specified by the user (default 365 days). This restriction is -implemented as otherwise a patient could contribute multiple times to -the mean value and this makes interpretation difficult.
-Here we consider the inputs are:
-
-minPriorObservation <- 365
-covariateSettings <- FeatureExtraction::createCovariateSettings(
- useDemographicsAge = T,
- useDemographicsGender = T,
- useConditionOccurrenceAnyTimePrior = T, includedCovariateConceptIds = c(201820)
- )
Let’s assumed we have two two cohorts (id 1 and 2) the first cohort -contains five patients who have >365 days prior observation at index -and the second contains three patients who have >365 days prior -observation at index.
-The patients features’ are displayed in Table 7, containing patients’ -age at index, whether they have diabetes anytime prior to index and -their sex as features.
-patientId | -cohortId | -feature | -value | -
---|---|---|---|
1 | -1 | -age | -50 | -
1 | -1 | -sex | -Male | -
1 | -1 | -diabetes | -Yes | -
2 | -1 | -age | -18 | -
2 | -1 | -sex | -Female | -
2 | -1 | -diabetes | -No | -
3 | -1 | -age | -22 | -
3 | -1 | -sex | -Male | -
3 | -1 | -diabetes | -No | -
4 | -1 | -age | -40 | -
4 | -1 | -sex | -Male | -
4 | -1 | -diabetes | -No | -
5 | -1 | -age | -70 | -
5 | -1 | -sex | -Female | -
5 | -1 | -diabetes | -Yes | -
1 | -2 | -age | -24 | -
1 | -2 | -sex | -Female | -
1 | -2 | -diabetes | -No | -
2 | -2 | -age | -35 | -
2 | -2 | -sex | -Female | -
2 | -2 | -diabetes | -No | -
3 | -2 | -age | -31 | -
3 | -2 | -sex | -Female | -
3 | -2 | -diabetes | -No | -
We calculate the mean values for each feature per cohort:
-cohortId | -feature | -mean | -
---|---|---|
1 | -Age | -40.0 | -
1 | -Sex: Male | -0.6 | -
1 | -Diabetes: Yes | -0.4 | -
2 | -Age | -30.0 | -
2 | -Sex: Male | -0.0 | -
2 | -Diabetes: Yes | -0.0 | -
The database and cohort comparison implements the aggregate covariate -analysis for all target and outcome ids fed into characterization across -all OMOP CDM databases available and then lets users compare the mean -values of the features between databases for the same cohort or across -different cohorts within the same database. The standardized mean -different is calculated between two cohorts when possible, this is -calculated per feature as: abs(mean value in cohort 1 - mean value in -cohort 2)/((standard deviation of values in cohort 1 squared plus -standard deviation of values in cohort 2 squared)/2).
-A vector of targetIds and outcomeIds plus the minimum prior -observation required for the target cohorts, the outcome washout days -for the outcomes, settings for the time-at-risk and covariate settings -specifying which features to extract.
-For each target and outcome combination we run aggregate covariate -analysis for the special case of comparing patients in cohort 1 -(patients in the target cohort for the first time with 365 days prior -observation who go on to have the first occurrence of the outcome in -washout days during some time-at-risk relative to the target cohort -index) vs cohort 2 (patients in the target cohort for the first time -with 365 days prior observation who do not go on to have the first -occurrence of the outcome in washout days during some time-at-risk -relative to the target cohort index).
-Lets consider an example with a time-at-risk of target cohort start + -1 to target cohort start + 180.
-
-targetId <- 1
-outcomeId <- 2
-minPriorObservation <- 365
-outcomeWashoutDays <- 365
-riskWindowStart <- 1
-startAnchor <- 'cohort start'
-riskWindowEnd <- 180
-endAnchor <- 'cohort start'
-covariateSettings <- FeatureExtraction::createCovariateSettings(
- useDemographicsAge = T,
- useDemographicsGender = T,
- useConditionOccurrenceAnyTimePrior = T, includedCovariateConceptIds = c(201820)
- )
patientId | -targetCohortId | -cohortStartDate | -cohortEndDate | -observationStart | -
---|---|---|---|---|
1 | -1 | -2001-01-20 | -2001-01-25 | -2000-02-01 | -
1 | -1 | -2001-10-20 | -2001-12-05 | -2000-02-01 | -
2 | -1 | -2005-09-10 | -2005-09-15 | -2001-02-01 | -
3 | -1 | -2004-04-02 | -2004-05-17 | -2001-02-01 | -
4 | -1 | -2002-03-03 | -2002-06-12 | -2001-02-01 | -
4 | -1 | -2003-02-01 | -2003-02-30 | -2001-02-01 | -
4 | -1 | -2003-08-04 | -2003-08-24 | -2001-02-01 | -
5 | -1 | -2005-02-01 | -2005-10-08 | -2001-02-01 | -
5 | -1 | -2007-04-03 | -2007-05-03 | -2001-02-01 | -
patientId | -targetCohortId | -cohortStartDate | -cohortEndDate | -
---|---|---|---|
1 | -2 | -1999-10-03 | -1999-10-08 | -
1 | -2 | -2001-10-30 | -2001-11-07 | -
3 | -2 | -2004-05-16 | -2004-05-18 | -
4 | -2 | -2002-06-03 | -2002-06-14 | -
4 | -2 | -2003-02-20 | -2003-03-01 | -
5 | -2 | -2006-07-21 | -2006-08-03 | -
5 | -2 | -2008-01-01 | -2008-01-09 | -
First, we find the first target with 365 days prior obs. Patient 1 is -removed as they are exposed for the first time with less than 365 days -prior observation. Patients 4 and 5 non-first exposures are removed. -This leaves:
-patientId | -targetCohortId | -cohortStartDate | -cohortEndDate | -
---|---|---|---|
2 | -1 | -2005-09-10 | -2005-09-15 | -
3 | -1 | -2004-04-02 | -2004-05-17 | -
4 | -1 | -2002-03-03 | -2002-06-12 | -
5 | -1 | -2005-02-01 | -2005-10-08 | -
We then find the patients in the target cohort with the outcome and -no-outcome occurring during 1 day to 180 days after index:
-patientId | -targetCohortId | -cohortStartDate | -cohortEndDate | -labels | -
---|---|---|---|---|
2 | -1 | -2005-09-10 | -2005-09-15 | -Non-outcome | -
3 | -1 | -2004-04-02 | -2004-05-17 | -Outcome | -
4 | -1 | -2002-03-03 | -2002-06-12 | -Outcome | -
5 | -1 | -2005-02-01 | -2005-10-08 | -Non-outcome | -
Note: we also remove patients in the target who have -the outcome during outcome washout days prior to target index. In the -example, nobody had the outcome prior, so this was not observed.
-If the features for these four patients are:
-patientId | -cohortId | -feature | -value | -
---|---|---|---|
2 | -Non-outcome | -age | -50 | -
2 | -Non-outcome | -sex | -Male | -
2 | -Non-outcome | -diabetes | -Yes | -
3 | -Outcome | -age | -18 | -
3 | -Outcome | -sex | -Female | -
3 | -Outcome | -diabetes | -No | -
4 | -Outcome | -age | -22 | -
4 | -Outcome | -sex | -Male | -
4 | -Outcome | -diabetes | -No | -
5 | -Non-outcome | -age | -40 | -
5 | -Non-outcome | -sex | -Male | -
5 | -Non-outcome | -diabetes | -No | -
We calculate the mean values for each feature per non-outcome and -outcome cohort:
-cohortId | -feature | -mean | -
---|---|---|
Outcome | -Age | -20.0 | -
Outcome | -Sex: Male | -0.5 | -
Outcome | -Diabetes: Yes | -0.0 | -
Non-outcome | -Age | -45.0 | -
Non-outcome | -Sex: Male | -1.0 | -
Non-outcome | -Diabetes: Yes | -0.5 | -
We can then implement the standardized mean different calculated -between the outcome and non-outcome cohorts, this is calculated per -feature as: abs(mean value in outcome cohort - mean value in non-outcome -cohort)/((standard deviation of values in outcome cohort squared plus -standard deviation of values in non-outcome cohort squared)/2).
-The cases series looks at the patients in a target cohort who have -the outcome during a specified time-at-risk and calculates the aggregate -covariates at three different time periods: shortly before index, -between target index and outcome index and shortly after outcome -index.
-A vector of targetIds and outcomeIds plus the minimum prior -observation required for the target cohorts, the outcome washout days -for the outcomes, settings for the time-at-risk and covariate settings -specifying which features to extract.
-In addition you need to specify how long before target index to -extract before index features (preTargetIndexDays) and how long after -outcome index to extract after index features -(postOutcomeIndexDays).
-For each target and outcome combination we run aggregate covariate -analysis patients in the target patients (with a minimum of prior -observation day before index) who have the outcome (for the first time -in outcome washout days). We use three different time periods for -feature extraction:
-In this example we look at how often diabetes is recorded for the -cases (people with the target cohort who have the outcome within 180 -days of target index) in the year before target index, between target -index and outcome index and the 1 year after outcome index.
-Here we consider the inputs are:
-
-targetId <- 1
-outcomeId <- 2
-minPriorObservation <- 365
-outcomeWashoutDays <- 365
-preTargetIndexDays <- 365
-postOutcomeIndexDays <- 365
-riskWindowStart <- 1
-startAnchor <- 'cohort start'
-riskWindowEnd <- 180
-endAnchor <- 'cohort start'
-covariateSettings <- FeatureExtraction::createCovariateSettings(
- useConditionOccurrenceAnyTimePrior = T, includedCovariateConceptIds = c(201820)
- )
patientId | -targetCohortId | -cohortStartDate | -cohortEndDate | -observationStart | -
---|---|---|---|---|
1 | -1 | -2001-01-20 | -2001-01-25 | -2000-02-01 | -
1 | -1 | -2001-10-20 | -2001-12-05 | -2000-02-01 | -
2 | -1 | -2005-09-10 | -2005-09-15 | -2001-02-01 | -
3 | -1 | -2004-04-02 | -2004-05-17 | -2001-02-01 | -
4 | -1 | -2002-03-03 | -2002-06-12 | -2001-02-01 | -
4 | -1 | -2003-02-01 | -2003-02-30 | -2001-02-01 | -
4 | -1 | -2003-08-04 | -2003-08-24 | -2001-02-01 | -
5 | -1 | -2005-02-01 | -2005-10-08 | -2001-02-01 | -
5 | -1 | -2007-04-03 | -2007-05-03 | -2001-02-01 | -
patientId | -targetCohortId | -cohortStartDate | -cohortEndDate | -
---|---|---|---|
1 | -2 | -1999-10-03 | -1999-10-08 | -
1 | -2 | -2001-10-30 | -2001-11-07 | -
3 | -2 | -2004-05-16 | -2004-05-18 | -
4 | -2 | -2002-06-03 | -2002-06-14 | -
4 | -2 | -2003-02-20 | -2003-03-01 | -
5 | -2 | -2006-07-21 | -2006-08-03 | -
5 | -2 | -2008-01-01 | -2008-01-09 | -
First, we find the first target with 365 days prior obs. Patient 1 is -removed as they are exposed for the first time with less than 365 days -prior observation. Patients 4 and 5 non-first exposures are removed. -This leaves:
-patientId | -targetCohortId | -cohortStartDate | -cohortEndDate | -
---|---|---|---|
2 | -1 | -2005-09-10 | -2005-09-15 | -
3 | -1 | -2004-04-02 | -2004-05-17 | -
4 | -1 | -2002-03-03 | -2002-06-12 | -
5 | -1 | -2005-02-01 | -2005-10-08 | -
We then find the patients in the target cohort with the outcome -occurring during 1 day to 180 days after index:
-patientId | -targetCohortId | -cohortStartDate | -cohortEndDate | -labels | -
---|---|---|---|---|
3 | -1 | -2004-04-02 | -2004-05-17 | -Outcome | -
4 | -1 | -2002-03-03 | -2002-06-12 | -Outcome | -
Note: we also remove patients in the target who have -the outcome during outcome washout days prior to target index. In the -example, nobody had the outcome prior, so this was not observed.
-Now we define the before( 365 days before target index up to target -index), between (target index plus 1 and outcome) and after (outcome -index plus 1 to outcome index plus 365):
-patientId | -targetCohortId | -targetStartDate | -outcomeStartDate | -beforeStartDate | -beforeEndDate | -duringStartDate | -duringEndDate | -afterStartDate | -afterEndDate | -
---|---|---|---|---|---|---|---|---|---|
3 | -1 | -2004-04-02 | -2004-05-16 | -2003-04-03 | -2004-04-02 | -2004-04-03 | -2004-05-16 | -2004-05-17 | -2005-05-16 | -
4 | -1 | -2002-03-03 | -2002-06-03 | -2001-03-03 | -2002-03-03 | -2002-03-04 | -2002-06-03 | -2002-06-04 | -2003-06-03 | -
If the features for these two patients at the three time periods -are:
-patientId | -feature | -timePeriod | -value | -
---|---|---|---|
3 | -diabetes | -before | -No | -
3 | -diabetes | -during | -No | -
3 | -diabetes | -after | -Yes | -
4 | -diabetes | -before | -Yes | -
4 | -diabetes | -during | -Yes | -
4 | -diabetes | -after | -Yes | -
vignettes/UsingCharacterizationPackage.Rmd
- UsingCharacterizationPackage.Rmd
This vignette describes how you can use the Characterization package -for various descriptive studies using OMOP CDM data. The -Characterization package currently contains three different types of -analyses:
-In this vignette we will show working examples using the
-Eunomia
R package that contains simulated data. Run the
-following code to install the Eunomia
R package:
-install.packages("remotes")
-remotes::install_github("ohdsi/Eunomia")
Eunomia can be used to create a temporary SQLITE database with the
-simulated data. The function getEunomiaConnectionDetails
-creates a SQLITE connection to a temporary location. The function
-createCohorts
then populates the temporary SQLITE database
-with the simulated data ready to be used.
-connectionDetails <- Eunomia::getEunomiaConnectionDetails()
-Eunomia::createCohorts(connectionDetails = connectionDetails)
## Connecting using SQLite driver
-## Creating cohort: Celecoxib
-##
- |
- | | 0%
- |
- |======================================================================| 100%
-## Executing SQL took 0.00557 secs
-## Creating cohort: Diclofenac
-##
- |
- | | 0%
- |
- |======================================================================| 100%
-## Executing SQL took 0.00487 secs
-## Creating cohort: GiBleed
-##
- |
- | | 0%
- |
- |======================================================================| 100%
-## Executing SQL took 0.00951 secs
-## Creating cohort: NSAIDs
-##
- |
- | | 0%
- |
- |======================================================================| 100%
-## Executing SQL took 0.0507 secs
-## Cohorts created in table main.cohort
-## cohortId name
-## 1 1 Celecoxib
-## 2 2 Diclofenac
-## 3 3 GiBleed
-## 4 4 NSAIDs
-## description
-## 1 A simplified cohort definition for new users of celecoxib, designed specifically for Eunomia.
-## 2 A simplified cohort definition for new users ofdiclofenac, designed specifically for Eunomia.
-## 3 A simplified cohort definition for gastrointestinal bleeding, designed specifically for Eunomia.
-## 4 A simplified cohort definition for new users of NSAIDs, designed specifically for Eunomia.
-## count
-## 1 1844
-## 2 850
-## 3 479
-## 4 2694
-We also need to have the Characterization package installed and -loaded
-
-remotes::install_github("ohdsi/FeatureExtraction")
-remotes::install_github("ohdsi/Characterization")
##
-## Attaching package: 'dplyr'
-## The following objects are masked from 'package:stats':
-##
-## filter, lag
-## The following objects are masked from 'package:base':
-##
-## intersect, setdiff, setequal, union
-To run an ‘Aggregate Covariate’ analysis you need to create a setting
-object using createAggregateCovariateSettings
. This
-requires specifying:
FeatureExtraction::createCovariateSettings
or by creating
-your own custom feature extraction code.Using the Eunomia data were we previous generated four cohorts, we -can use cohort ids 1,2 and 4 as the targetIds and cohort id 3 as the -outcomeIds:
-
-exampleTargetIds <- c(1, 2, 4)
-exampleOutcomeIds <- 3
If we want to get information on the sex assigned at birth, age at
-index and Charlson Comorbidity index we can create the settings using
-FeatureExtraction::createCovariateSettings
:
-exampleCovariateSettings <- FeatureExtraction::createCovariateSettings(
- useDemographicsGender = T,
- useDemographicsAge = T,
- useCharlsonIndex = T
-)
If we want to create the aggregate features for all our target -cohorts, our outcome cohort and each target cohort restricted to those -with a record of the outcome 1 day after target cohort start date until -365 days after target cohort end date, excluding mean values below 0.01, -we can run:
-
-exampleAggregateCovariateSettings <- createAggregateCovariateSettings(
- targetIds = exampleTargetIds,
- outcomeIds = exampleOutcomeIds,
- riskWindowStart = 1, startAnchor = "cohort start",
- riskWindowEnd = 365, endAnchor = "cohort start",
- covariateSettings = exampleCovariateSettings,
- minCharacterizationMean = 0.01
-)
Next we need to use the
-exampleAggregateCovariateSettings
as the settings to
-computeAggregateCovariateAnalyses
, we need to use the
-Eunomia connectionDetails and in Eunomia the OMOP CDM data and cohort
-table are in the ‘main’ schema. The cohort table name is ‘cohort’. The
-following code will apply the aggregated covariates analysis using the
-previously specified settings on the simulated Eunomia data:
-agc <- computeAggregateCovariateAnalyses(
- connectionDetails = connectionDetails,
- cdmDatabaseSchema = "main",
- cdmVersion = 5,
- targetDatabaseSchema = "main",
- targetTable = "cohort",
- aggregateCovariateSettings = exampleAggregateCovariateSettings,
- databaseId = "Eunomia",
- runId = 1
-)
If you would like to save the results you can use the function
-saveAggregateCovariateAnalyses
and this can then be loaded
-using loadAggregateCovariateAnalyses
.
The results are Andromeda objects that can we viewed using
-dplyr
. There are four tables:
## Warning in !is.null(rmarkdown::metadata$output) && rmarkdown::metadata$output
-## %in% : 'length(x) = 3 > 1' in coercion to 'logical(1)'
--databaseId - | --runId - | --cohortDefinitionId - | --covariateId - | --sumValue - | --averageValue - | -
---|---|---|---|---|---|
-Eunomia - | --1 - | --170096739 - | --8507001 - | --57 - | --0.4596774 - | -
-Eunomia - | --1 - | --170096739 - | --8532001 - | --67 - | --0.5403226 - | -
-Eunomia - | --1 - | --1421450349 - | --8507001 - | --237 - | --0.4947808 - | -
-Eunomia - | --1 - | --1421450349 - | --8532001 - | --242 - | --0.5052192 - | -
-Eunomia - | --1 - | --1498088760 - | --8507001 - | --1289 - | --0.4901141 - | -
-Eunomia - | --1 - | --1498088760 - | --8532001 - | --1341 - | --0.5098859 - | -
-Eunomia - | --1 - | --2038349030 - | --8507001 - | --237 - | --0.4947808 - | -
-Eunomia - | --1 - | --2038349030 - | --8532001 - | --242 - | --0.5052192 - | -
-Eunomia - | --1 - | --2038795861 - | --8507001 - | --894 - | --0.4966667 - | -
-Eunomia - | --1 - | --2038795861 - | --8532001 - | --906 - | --0.5033333 - | -
-Eunomia - | --1 - | --2246615035 - | --8507001 - | --395 - | --0.4759036 - | -
-Eunomia - | --1 - | --2246615035 - | --8532001 - | --435 - | --0.5240964 - | -
-Eunomia - | --1 - | --3945088378 - | --8507001 - | --180 - | --0.5070423 - | -
-Eunomia - | --1 - | --3945088378 - | --8532001 - | --175 - | --0.4929577 - | -
-Eunomia - | --1 - | --1810054421 - | --8507001 - | --237 - | --0.4947808 - | -
-Eunomia - | --1 - | --1810054421 - | --8532001 - | --242 - | --0.5052192 - | -
-Eunomia - | --1 - | --3451257159 - | --8507001 - | --57 - | --0.4596774 - | -
-Eunomia - | --1 - | --3451257159 - | --8532001 - | --67 - | --0.5403226 - | -
-Eunomia - | --1 - | --4205858076 - | --8507001 - | --180 - | --0.5070423 - | -
-Eunomia - | --1 - | --4205858076 - | --8532001 - | --175 - | --0.4929577 - | -
-Eunomia - | --1 - | --1639865637 - | --8507001 - | --57 - | --0.4596774 - | -
-Eunomia - | --1 - | --1639865637 - | --8532001 - | --67 - | --0.5403226 - | -
-Eunomia - | --1 - | --2896235937 - | --8507001 - | --237 - | --0.4947808 - | -
-Eunomia - | --1 - | --2896235937 - | --8532001 - | --242 - | --0.5052192 - | -
-Eunomia - | --1 - | --3648795733 - | --8507001 - | --180 - | --0.5070423 - | -
-Eunomia - | --1 - | --3648795733 - | --8532001 - | --175 - | --0.4929577 - | -
-databaseId - | --runId - | --cohortDefinitionId - | --covariateId - | --countValue - | --minValue - | --maxValue - | --averageValue - | --standardDeviation - | --medianValue - | --p10Value - | --p25Value - | --p75Value - | --p90Value - | -
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
-Eunomia - | --1 - | --170096739 - | --1901 - | --41 - | --0 - | --2 - | --0.4274194 - | --0.4606464 - | --0 - | --0 - | --0 - | --1 - | --1 - | -
-Eunomia - | --1 - | --3945088378 - | --1901 - | --275 - | --0 - | --2 - | --0.9549296 - | --0.4233403 - | --1 - | --0 - | --1 - | --1 - | --2 - | -
-Eunomia - | --1 - | --2246615035 - | --1901 - | --296 - | --0 - | --3 - | --0.4024096 - | --0.3450454 - | --0 - | --0 - | --0 - | --1 - | --1 - | -
-Eunomia - | --1 - | --2038349030 - | --1901 - | --316 - | --0 - | --2 - | --0.8183716 - | --0.4280688 - | --1 - | --0 - | --0 - | --1 - | --2 - | -
-Eunomia - | --1 - | --1421450349 - | --1901 - | --316 - | --0 - | --2 - | --0.8204593 - | --0.4299773 - | --1 - | --0 - | --0 - | --1 - | --2 - | -
-Eunomia - | --1 - | --2038795861 - | --1901 - | --935 - | --0 - | --2 - | --0.6144444 - | --0.3867813 - | --1 - | --0 - | --0 - | --1 - | --1 - | -
-Eunomia - | --1 - | --1498088760 - | --1901 - | --1231 - | --0 - | --3 - | --0.5475285 - | --0.3777510 - | --0 - | --0 - | --0 - | --1 - | --1 - | -
-Eunomia - | --1 - | --170096739 - | --1002 - | --124 - | --32 - | --46 - | --38.8709677 - | --3.4000663 - | --39 - | --34 - | --36 - | --41 - | --44 - | -
-Eunomia - | --1 - | --3945088378 - | --1002 - | --355 - | --32 - | --46 - | --38.7746479 - | --3.2746121 - | --39 - | --35 - | --36 - | --41 - | --43 - | -
-Eunomia - | --1 - | --2038349030 - | --1002 - | --479 - | --32 - | --46 - | --38.7995825 - | --3.3042257 - | --39 - | --34 - | --36 - | --41 - | --44 - | -
-Eunomia - | --1 - | --1421450349 - | --1002 - | --479 - | --32 - | --47 - | --38.9206681 - | --3.2884308 - | --39 - | --35 - | --36 - | --41 - | --44 - | -
-Eunomia - | --1 - | --2246615035 - | --1002 - | --830 - | --31 - | --46 - | --38.5746988 - | --3.2910429 - | --39 - | --34 - | --36 - | --41 - | --43 - | -
-Eunomia - | --1 - | --2038795861 - | --1002 - | --1800 - | --31 - | --47 - | --38.6450000 - | --3.3212435 - | --39 - | --34 - | --36 - | --41 - | --43 - | -
-Eunomia - | --1 - | --1498088760 - | --1002 - | --2630 - | --31 - | --47 - | --38.6228137 - | --3.3112779 - | --39 - | --34 - | --36 - | --41 - | --43 - | -
-Eunomia - | --1 - | --3451257159 - | --1901 - | --41 - | --0 - | --2 - | --0.4274194 - | --0.4606464 - | --0 - | --0 - | --0 - | --1 - | --1 - | -
-Eunomia - | --1 - | --4205858076 - | --1901 - | --275 - | --0 - | --2 - | --0.9549296 - | --0.4233403 - | --1 - | --0 - | --1 - | --1 - | --2 - | -
-Eunomia - | --1 - | --1810054421 - | --1901 - | --316 - | --0 - | --2 - | --0.8183716 - | --0.4280688 - | --1 - | --0 - | --0 - | --1 - | --2 - | -
-Eunomia - | --1 - | --3451257159 - | --1002 - | --124 - | --32 - | --46 - | --38.8709677 - | --3.4000663 - | --39 - | --34 - | --36 - | --41 - | --44 - | -
-Eunomia - | --1 - | --4205858076 - | --1002 - | --355 - | --32 - | --46 - | --38.7746479 - | --3.2746121 - | --39 - | --35 - | --36 - | --41 - | --43 - | -
-Eunomia - | --1 - | --1810054421 - | --1002 - | --479 - | --32 - | --46 - | --38.7995825 - | --3.3042257 - | --39 - | --34 - | --36 - | --41 - | --44 - | -
-Eunomia - | --1 - | --1639865637 - | --1901 - | --41 - | --0 - | --2 - | --0.4274194 - | --0.4606464 - | --0 - | --0 - | --0 - | --1 - | --1 - | -
-Eunomia - | --1 - | --3648795733 - | --1901 - | --277 - | --0 - | --2 - | --0.9633803 - | --0.4245513 - | --1 - | --0 - | --1 - | --1 - | --2 - | -
-Eunomia - | --1 - | --2896235937 - | --1901 - | --318 - | --0 - | --2 - | --0.8246347 - | --0.4290528 - | --1 - | --0 - | --0 - | --1 - | --2 - | -
-Eunomia - | --1 - | --1639865637 - | --1002 - | --124 - | --32 - | --47 - | --38.9758065 - | --3.4226973 - | --39 - | --34 - | --36 - | --41 - | --44 - | -
-Eunomia - | --1 - | --3648795733 - | --1002 - | --355 - | --32 - | --46 - | --38.9014085 - | --3.2449654 - | --39 - | --35 - | --36 - | --41 - | --43 - | -
-Eunomia - | --1 - | --2896235937 - | --1002 - | --479 - | --32 - | --47 - | --38.9206681 - | --3.2884308 - | --39 - | --35 - | --36 - | --41 - | --44 - | -
-databaseId - | --runId - | --covariateId - | --covariateName - | --analysisId - | --conceptId - | -
---|---|---|---|---|---|
-Eunomia - | --1 - | --8507001 - | --gender = MALE - | --1 - | --8507 - | -
-Eunomia - | --1 - | --8532001 - | --gender = FEMALE - | --1 - | --8532 - | -
-Eunomia - | --1 - | --1901 - | --Charlson index - Romano adaptation - | --901 - | --0 - | -
-Eunomia - | --1 - | --1002 - | --age in years - | --2 - | --0 - | -
-databaseId - | --runId - | --analysisId - | --analysisName - | --domainId - | --startDay - | --endDay - | --isBinary - | --missingMeansZero - | -
---|---|---|---|---|---|---|---|---|
-Eunomia - | --1 - | --1 - | --DemographicsGender - | --Demographics - | --NA - | --NA - | --Y - | --NA - | -
-Eunomia - | --1 - | --901 - | --CharlsonIndex - | --Condition - | --NA - | --0 - | --N - | --Y - | -
-Eunomia - | --1 - | --2 - | --DemographicsAge - | --Demographics - | --NA - | --NA - | --N - | --Y - | -
To run a ‘Dechallenge Rechallenge’ analysis you need to create a
-setting object using createDechallengeRechallengeSettings
.
-This requires specifying:
Using the Eunomia data were we previous generated four cohorts, we -can use cohort ids 1,2 and 4 as the targetIds and cohort id 3 as the -outcomeIds:
-
-exampleTargetIds <- c(1, 2, 4)
-exampleOutcomeIds <- 3
If we want to create the dechallenge rechallenge for all our target -cohorts and our outcome cohort with a 30 day dechallengeStopInterval and -31 day dechallengeEvaluationWindow:
-
-exampleDechallengeRechallengeSettings <- createDechallengeRechallengeSettings(
- targetIds = exampleTargetIds,
- outcomeIds = exampleOutcomeIds,
- dechallengeStopInterval = 30,
- dechallengeEvaluationWindow = 31
-)
We can then run the analysis on the Eunomia data using
-computeDechallengeRechallengeAnalyses
and the settings
-previously specified:
-dc <- computeDechallengeRechallengeAnalyses(
- connectionDetails = connectionDetails,
- targetDatabaseSchema = "main",
- targetTable = "cohort",
- dechallengeRechallengeSettings = exampleDechallengeRechallengeSettings,
- databaseId = "Eunomia"
-)
## Inputs checked
-## Connecting using SQLite driver
-## Computing dechallenge rechallenge results
-##
- |
- | | 0%
- |
- |============ | 17%
- |
- |======================= | 33%
- |
- |=================================== | 50%
- |
- |=============================================== | 67%
- |
- |========================================================== | 83%
- |
- |======================================================================| 100%
-## Executing SQL took 0.00665 secs
-## Computing dechallenge rechallenge for 3 target ids and 1outcome ids took 0.0583 secs
-If you would like to save the results you can use the function
-saveDechallengeRechallengeAnalyses
and this can then be
-loaded using loadDechallengeRechallengeAnalyses
.
The results are Andromeda objects that can we viewed using
-dplyr
. There is just one table named
-dechallengeRechallenge:
-databaseId - | --dechallengeStopInterval - | --dechallengeEvaluationWindow - | --targetCohortDefinitionId - | --outcomeCohortDefinitionId - | --numExposureEras - | --numPersonsExposed - | --numCases - | --dechallengeAttempt - | --dechallengeFail - | --dechallengeSuccess - | --rechallengeAttempt - | --rechallengeFail - | --rechallengeSuccess - | --pctDechallengeAttempt - | --pctDechallengeSuccess - | --pctDechallengeFail - | --pctRechallengeAttempt - | --pctRechallengeSuccess - | --pctRechallengeFail - | -
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Next it is possible to computer and extract the failed rechallenge -cases
-
-failed <- computeRechallengeFailCaseSeriesAnalyses(
- connectionDetails = connectionDetails,
- targetDatabaseSchema = "main",
- targetTable = "cohort",
- dechallengeRechallengeSettings = exampleDechallengeRechallengeSettings,
- outcomeDatabaseSchema = "main",
- outcomeTable = "cohort",
- databaseId = "Eunomia"
-)
## Inputs checked
-## Connecting using SQLite driver
-## Computing dechallenge rechallenge results
-##
- |
- | | 0%
- |
- |============ | 17%
- |
- |======================= | 33%
- |
- |=================================== | 50%
- |
- |=============================================== | 67%
- |
- |========================================================== | 83%
- |
- |======================================================================| 100%
-## Executing SQL took 0.0953 secs
-## Computing dechallenge failed case series for 3 target IDs and 1 outcome IDs took 0.141 secs
-The results are Andromeda objects that can we viewed using
-dplyr
. There is just one table named
-rechallengeFailCaseSeries:
-databaseId - | --dechallengeStopInterval - | --dechallengeEvaluationWindow - | --targetCohortDefinitionId - | --outcomeCohortDefinitionId - | --personKey - | --subjectId - | --dechallengeExposureNumber - | --dechallengeExposureStartDateOffset - | --dechallengeExposureEndDateOffset - | --dechallengeOutcomeNumber - | --dechallengeOutcomeStartDateOffset - | --rechallengeExposureNumber - | --rechallengeExposureStartDateOffset - | --rechallengeExposureEndDateOffset - | --rechallengeOutcomeNumber - | --rechallengeOutcomeStartDateOffset - | -
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
To run a ‘Time-to-event’ analysis you need to create a setting object
-using createTimeToEventSettings
. This requires
-specifying:
-exampleTimeToEventSettings <- createTimeToEventSettings(
- targetIds = exampleTargetIds,
- outcomeIds = exampleOutcomeIds
-)
We can then run the analysis on the Eunomia data using
-computeTimeToEventAnalyses
and the settings previously
-specified:
-tte <- computeTimeToEventAnalyses(
- connectionDetails = connectionDetails,
- cdmDatabaseSchema = "main",
- targetDatabaseSchema = "main",
- targetTable = "cohort",
- timeToEventSettings = exampleTimeToEventSettings,
- databaseId = "Eunomia"
-)
## Connecting using SQLite driver
-## Uploading #cohort_settings
-##
-## Inserting data took 0.00319 secs
-## Computing time to event results
-##
- |
- | | 0%
- |
- |=== | 4%
- |
- |====== | 8%
- |
- |========= | 12%
- |
- |============ | 17%
- |
- |=============== | 21%
- |
- |================== | 25%
- |
- |==================== | 29%
- |
- |======================= | 33%
- |
- |========================== | 38%
- |
- |============================= | 42%
- |
- |================================ | 46%
- |
- |=================================== | 50%
- |
- |====================================== | 54%
- |
- |========================================= | 58%
- |
- |============================================ | 62%
- |
- |=============================================== | 67%
- |
- |================================================== | 71%
- |
- |==================================================== | 75%
- |
- |======================================================= | 79%
- |
- |========================================================== | 83%
- |
- |============================================================= | 88%
- |
- |================================================================ | 92%
- |
- |=================================================================== | 96%
- |
- |======================================================================| 100%
-## Executing SQL took 0.0416 secs
-## Computing time-to-event for T-O pairs took 0.146 secs
-If you would like to save the results you can use the function
-saveTimeToEventAnalyses
and this can then be loaded using
-loadTimeToEventAnalyses
.
The results are Andromeda objects that can we viewed using
-dplyr
. There is just one table named timeToEvent:
## Selecting by timeScale
--databaseId - | --targetCohortDefinitionId - | --outcomeCohortDefinitionId - | --outcomeType - | --targetOutcomeType - | --timeToEvent - | --numEvents - | --timeScale - | -
---|---|---|---|---|---|---|---|
-Eunomia - | --1 - | --3 - | --first - | --After last target end - | --30 - | --109 - | --per 30-day - | -
-Eunomia - | --1 - | --3 - | --first - | --After last target end - | --60 - | --114 - | --per 30-day - | -
-Eunomia - | --1 - | --3 - | --first - | --After last target end - | --90 - | --132 - | --per 30-day - | -
-Eunomia - | --2 - | --3 - | --first - | --After last target end - | --30 - | --46 - | --per 30-day - | -
-Eunomia - | --2 - | --3 - | --first - | --After last target end - | --60 - | --39 - | --per 30-day - | -
-Eunomia - | --2 - | --3 - | --first - | --After last target end - | --90 - | --39 - | --per 30-day - | -
-Eunomia - | --4 - | --3 - | --first - | --After last target end - | --30 - | --155 - | --per 30-day - | -
-Eunomia - | --4 - | --3 - | --first - | --After last target end - | --60 - | --153 - | --per 30-day - | -
-Eunomia - | --4 - | --3 - | --first - | --After last target end - | --90 - | --171 - | --per 30-day - | -
-Eunomia - | --1 - | --3 - | --first - | --After last target end - | --365 - | --355 - | --per 365-day - | -
-Eunomia - | --2 - | --3 - | --first - | --After last target end - | --365 - | --124 - | --per 365-day - | -
-Eunomia - | --4 - | --3 - | --first - | --After last target end - | --365 - | --479 - | --per 365-day - | -
If you want to run multiple analyses (of the three previously shown)
-you can use createCharacterizationSettings
. You need to
-input a list of each of the settings (or NULL if you do not want to run
-one type of analysis). To run all the analyses previously shown in one
-function:
-characterizationSettings <- createCharacterizationSettings(
- timeToEventSettings = list(
- exampleTimeToEventSettings
- ),
- dechallengeRechallengeSettings = list(
- exampleDechallengeRechallengeSettings
- ),
- aggregateCovariateSettings = list(
- exampleAggregateCovariateSettings
- )
-)
-
-# save the settings using
-saveCharacterizationSettings(
- settings = characterizationSettings,
- saveDirectory = file.path(tempdir(), "saveSettings")
-)
-
-# the settings can be loaded
-characterizationSettings <- loadCharacterizationSettings(
- saveDirectory = file.path(tempdir(), "saveSettings")
-)
-
-runCharacterizationAnalyses(
- connectionDetails = connectionDetails,
- cdmDatabaseSchema = "main",
- targetDatabaseSchema = "main",
- targetTable = "cohort",
- outcomeDatabaseSchema = "main",
- outcomeTable = "cohort",
- characterizationSettings = characterizationSettings,
- saveDirectory = file.path(tempdir(), "example"),
- tablePrefix = "c_",
- databaseId = "1"
-)
This will create an SQLITE database with all the analyses saved into -the saveDirectory. You can export the results as csv files using:
-
-connectionDetailsT <- DatabaseConnector::createConnectionDetails(
- dbms = "sqlite",
- server = file.path(tempdir(), "example", "sqliteCharacterization", "sqlite.sqlite")
-)
-
-exportDatabaseToCsv(
- connectionDetails = connectionDetailsT,
- resultSchema = "main",
- targetDialect = "sqlite",
- tablePrefix = "c_",
- saveDirectory = file.path(tempdir(), "csv")
-)
vignettes/UsingPackage.Rmd
- UsingPackage.Rmd
This vignette describes how you can use the Characterization package -for various descriptive studies using OMOP CDM data. The -Characterization package currently contains three different types of -analyses:
-In this vignette we will show working examples using the
-Eunomia
R package that contains simulated data. Run the
-following code to install the Eunomia
R package:
-install.packages("remotes")
-remotes::install_github("ohdsi/Eunomia")
Eunomia can be used to create a temporary SQLITE database with the
-simulated data. The function getEunomiaConnectionDetails
-creates a SQLITE connection to a temporary location. The function
-createCohorts
then populates the temporary SQLITE database
-with the simulated data ready to be used.
-connectionDetails <- Eunomia::getEunomiaConnectionDetails()
-Eunomia::createCohorts(connectionDetails = connectionDetails)
## Connecting using SQLite driver
-## Creating cohort: Celecoxib
-##
- |
- | | 0%
- |
- |======================================================================| 100%
-## Executing SQL took 0.0129 secs
-## Creating cohort: Diclofenac
-##
- |
- | | 0%
- |
- |======================================================================| 100%
-## Executing SQL took 0.00521 secs
-## Creating cohort: GiBleed
-##
- |
- | | 0%
- |
- |======================================================================| 100%
-## Executing SQL took 0.00938 secs
-## Creating cohort: NSAIDs
-##
- |
- | | 0%
- |
- |======================================================================| 100%
-## Executing SQL took 0.05 secs
-## Cohorts created in table main.cohort
-## cohortId name
-## 1 1 Celecoxib
-## 2 2 Diclofenac
-## 3 3 GiBleed
-## 4 4 NSAIDs
-## description
-## 1 A simplified cohort definition for new users of celecoxib, designed specifically for Eunomia.
-## 2 A simplified cohort definition for new users ofdiclofenac, designed specifically for Eunomia.
-## 3 A simplified cohort definition for gastrointestinal bleeding, designed specifically for Eunomia.
-## 4 A simplified cohort definition for new users of NSAIDs, designed specifically for Eunomia.
-## count
-## 1 1844
-## 2 850
-## 3 479
-## 4 2694
-We also need to have the Characterization package installed and -loaded
-
-remotes::install_github("ohdsi/FeatureExtraction")
-remotes::install_github("ohdsi/Characterization", ref = "new_approach")
##
-## Attaching package: 'dplyr'
-## The following objects are masked from 'package:stats':
-##
-## filter, lag
-## The following objects are masked from 'package:base':
-##
-## intersect, setdiff, setequal, union
-To run an ‘Aggregate Covariate’ analysis you need to create a setting
-object using createAggregateCovariateSettings
. This
-requires specifying:
FeatureExtraction::createCovariateSettings
or by creating
-your own custom feature extraction code.Using the Eunomia data were we previous generated four cohorts, we -can use cohort ids 1,2 and 4 as the targetIds and cohort id 3 as the -outcomeIds:
-
-exampleTargetIds <- c(1, 2, 4)
-exampleOutcomeIds <- 3
If we want to get information on the sex, age at index and Charlson
-Comorbidity index we can create the settings using
-FeatureExtraction::createCovariateSettings
:
-exampleCovariateSettings <- FeatureExtraction::createCovariateSettings(
- useDemographicsGender = T,
- useDemographicsAge = T,
- useCharlsonIndex = T
-)
There is an additional covariate setting require that is calculated
-for the cases (patients in the target cohort with have the outcome
-during the time-at-risk). This is called caseCovariateSettings and
-should be created using the createDuringCovariateSettings function. The
-user can pick conditions, drugs, measurements, procedures and
-observations. In this example, we just include condition eras groups by
-vocabulary heirarchy. We also need to specify two related variables
-casePreTargetDuration
which is the number of days before
-target index to extract features for the cases (answers what happens
-shortly before the target index) and
-casePostOutcomeDuration
which is the number of days after
-the outcome date to extract features for the cases (answers what happens
-after the outcome). The case covariates are also extracted between
-target index and outcome (answers the question what happens during
-target exposure).
-caseCovariateSettings <- Characterization::createDuringCovariateSettings(
- useConditionGroupEraDuring = T
-)
If we want to create the aggregate features for all our target -cohorts, our outcome cohort and each target cohort restricted to those -with a record of the outcome 1 day after target cohort start date until -365 days after target cohort end date with a outcome washout of 9999 -(meaning we only include outcomes that are the first occurrence in the -past 9999 days) and only include targets or outcomes where the patient -was observed for 365 days or more prior, we can run:
-
-exampleAggregateCovariateSettings <- createAggregateCovariateSettings(
- targetIds = exampleTargetIds,
- outcomeIds = exampleOutcomeIds,
- riskWindowStart = 1, startAnchor = "cohort start",
- riskWindowEnd = 365, endAnchor = "cohort start",
- outcomeWashoutDays = 9999,
- minPriorObservation = 365,
- covariateSettings = exampleCovariateSettings,
- caseCovariateSettings = caseCovariateSettings,
- casePreTargetDuration = 90,
- casePostOutcomeDuration = 90
-)
Next we need to use the
-exampleAggregateCovariateSettings
as the settings to
-computeAggregateCovariateAnalyses
, we need to use the
-Eunomia connectionDetails and in Eunomia the OMOP CDM data and cohort
-table are in the ‘main’ schema. The cohort table name is ‘cohort’. The
-following code will apply the aggregated covariates analysis using the
-previously specified settings on the simulated Eunomia data, but we can
-specify the minCharacterizationMean
to exclude covarites
-with mean values below 0.01, and we must specify the
-outputFolder
where the csv results will be written to.
-runCharacterizationAnalyses(
- connectionDetails = connectionDetails,
- cdmDatabaseSchema = "main",
- targetDatabaseSchema = "main",
- targetTable = "cohort",
- outcomeDatabaseSchema = "main",
- outcomeTable = "cohort",
- characterizationSettings = createCharacterizationSettings(
- aggregateCovariateSettings = exampleAggregateCovariateSettings
- ),
- databaseId = "Eunomia",
- runId = 1,
- minCharacterizationMean = 0.01,
- outputDirectory = file.path(getwd(), "example_char", "results"), executionPath = file.path(getwd(), "example_char", "execution"),
- minCellCount = 10,
- incremental = F,
- threads = 1
-)
You can then see the results in the location
-file.path(getwd(), 'example_char', 'results')
where you
-will find csv files.
To run a ‘Dechallenge Rechallenge’ analysis you need to create a
-setting object using createDechallengeRechallengeSettings
.
-This requires specifying:
Using the Eunomia data were we previous generated four cohorts, we -can use cohort ids 1,2 and 4 as the targetIds and cohort id 3 as the -outcomeIds:
-
-exampleTargetIds <- c(1, 2, 4)
-exampleOutcomeIds <- 3
If we want to create the dechallenge rechallenge for all our target -cohorts and our outcome cohort with a 30 day dechallengeStopInterval and -31 day dechallengeEvaluationWindow:
-
-exampleDechallengeRechallengeSettings <- createDechallengeRechallengeSettings(
- targetIds = exampleTargetIds,
- outcomeIds = exampleOutcomeIds,
- dechallengeStopInterval = 30,
- dechallengeEvaluationWindow = 31
-)
We can then run the analysis on the Eunomia data using
-computeDechallengeRechallengeAnalyses
and the settings
-previously specified, with minCellCount
removing values
-less than the specified value:
-dc <- computeDechallengeRechallengeAnalyses(
- connectionDetails = connectionDetails,
- targetDatabaseSchema = "main",
- targetTable = "cohort",
- settings = exampleDechallengeRechallengeSettings,
- databaseId = "Eunomia",
- outcomeTable = file.path(getwd(), "example_char", "results"),
- minCellCount = 5
-)
Next it is possible to compute the failed rechallenge cases
-
-failed <- computeRechallengeFailCaseSeriesAnalyses(
- connectionDetails = connectionDetails,
- targetDatabaseSchema = "main",
- targetTable = "cohort",
- settings = exampleDechallengeRechallengeSettings,
- outcomeDatabaseSchema = "main",
- outcomeTable = "cohort",
- databaseId = "Eunomia",
- outcomeTable = file.path(getwd(), "example_char", "results"),
- minCellCount = 5
-)
To run a ‘Time-to-event’ analysis you need to create a setting object
-using createTimeToEventSettings
. This requires
-specifying:
-exampleTimeToEventSettings <- createTimeToEventSettings(
- targetIds = exampleTargetIds,
- outcomeIds = exampleOutcomeIds
-)
We can then run the analysis on the Eunomia data using
-computeTimeToEventAnalyses
and the settings previously
-specified:
-tte <- computeTimeToEventAnalyses(
- connectionDetails = connectionDetails,
- cdmDatabaseSchema = "main",
- targetDatabaseSchema = "main",
- targetTable = "cohort",
- settings = exampleTimeToEventSettings,
- databaseId = "Eunomia",
- outcomeTable = file.path(getwd(), "example_char", "results"),
- minCellCount = 5
-)
If you want to run multiple analyses (of the three previously shown)
-you can use createCharacterizationSettings
. You need to
-input a list of each of the settings (or NULL if you do not want to run
-one type of analysis). To run all the analyses previously shown in one
-function:
-characterizationSettings <- createCharacterizationSettings(
- timeToEventSettings = list(
- exampleTimeToEventSettings
- ),
- dechallengeRechallengeSettings = list(
- exampleDechallengeRechallengeSettings
- ),
- aggregateCovariateSettings = exampleAggregateCovariateSettings
-)
-
-# save the settings using
-saveCharacterizationSettings(
- settings = characterizationSettings,
- saveDirectory = file.path(tempdir(), "saveSettings")
-)
-
-# the settings can be loaded
-characterizationSettings <- loadCharacterizationSettings(
- saveDirectory = file.path(tempdir(), "saveSettings")
-)
-
-runCharacterizationAnalyses(
- connectionDetails = connectionDetails,
- cdmDatabaseSchema = "main",
- targetDatabaseSchema = "main",
- targetTable = "cohort",
- outcomeDatabaseSchema = "main",
- outcomeTable = "cohort",
- characterizationSettings = characterizationSettings,
- outputDirectory = file.path(tempdir(), "example", "results"),
- executionPath = file.path(tempdir(), "example", "execution"),
- csvFilePrefix = "c_",
- databaseId = "1",
- incremental = F,
- minCharacterizationMean = 0.01,
- minCellCount = 5
-)
This will create csv files with the results in the saveDirectory. You -can run the following code to view the results in a shiny app:
-
-viewCharacterization(
- resultFolder = file.path(tempdir(), "example", "results"),
- cohortDefinitionSet = NULL
-)
Characterization is an R package for performing characterization of a target and a comparator cohort.
-
-
-library(Eunomia)
-library(Characterization)
-
-connectionDetails <- Eunomia::getEunomiaConnectionDetails()
-Eunomia::createCohorts(connectionDetails = connectionDetails)
-
-targetIds <- c(1,2,4)
- outcomeIds <- c(3)
-
- timeToEventSettings1 <- createTimeToEventSettings(
- targetIds = 1,
- outcomeIds = c(3,4)
- )
- timeToEventSettings2 <- createTimeToEventSettings(
- targetIds = 2,
- outcomeIds = c(3,4)
- )
-
- dechallengeRechallengeSettings <- createDechallengeRechallengeSettings(
- targetIds = targetIds,
- outcomeIds = outcomeIds,
- dechallengeStopInterval = 30,
- dechallengeEvaluationWindow = 31
- )
-
- aggregateCovariateSettings1 <- createAggregateCovariateSettings(
- targetIds = targetIds,
- outcomeIds = outcomeIds,
- riskWindowStart = 1,
- startAnchor = 'cohort start',
- riskWindowEnd = 365,
- endAnchor = 'cohort start',
- covariateSettings = FeatureExtraction::createCovariateSettings(
- useDemographicsGender = T,
- useDemographicsAge = T,
- useDemographicsRace = T
- )
- )
-
- aggregateCovariateSettings2 <- createAggregateCovariateSettings(
- targetIds = targetIds,
- outcomeIds = outcomeIds,
- riskWindowStart = 1,
- startAnchor = 'cohort start',
- riskWindowEnd = 365,
- endAnchor = 'cohort start',
- covariateSettings = FeatureExtraction::createCovariateSettings(
- useConditionOccurrenceLongTerm = T
- )
- )
-
- characterizationSettings <- createCharacterizationSettings(
- timeToEventSettings = list(
- timeToEventSettings1,
- timeToEventSettings2
- ),
- dechallengeRechallengeSettings = list(
- dechallengeRechallengeSettings
- ),
- aggregateCovariateSettings = list(
- aggregateCovariateSettings1,
- aggregateCovariateSettings2
- )
- )
-
-runCharacterizationAnalyses(
- connectionDetails = connectionDetails,
- cdmDatabaseSchema = 'main',
- targetDatabaseSchema = 'main',
- targetTable = 'cohort',
- outcomeDatabaseSchema = 'main',
- outcomeTable = 'cohort',
- characterizationSettings = characterizationSettings,
- outputDirectory = file.path(tempdir(), 'example', 'results'),
- executionPath = file.path(tempdir(), 'example', 'execution'),
- csvFilePrefix = 'c_',
- databaseId = 'Eunomia'
-)
Requires R (version 4.0.0 or higher). Libraries used in Characterization require Java.
-See the instructions here for configuring your R environment, including Java.
In R, use the following commands to download and install Characterization:
-install.packages("remotes")
-remotes::install_github("ohdsi/Characterization")
Documentation can be found on the package website.
-Read here how you can contribute to this package.
-NEWS.md
- Updated dependency to FeatureExtraction (>= 3.5.0) to support minCharacterizationMean paramater.
-Changed export to csv approach to use batch export from SQLite (#41)
-Optimized aggregate features to remove T and not Os (as these can be calculated using T and T and Os) - requires latest shiny app though Optimized database extraction to csv
-Fixing bug where first outcome was still all outcomes Updating shiny app to work with old and new ShinyAppBuilder
-Before you do a pull request, you should always file an issue and make sure the package maintainer agrees that it’s a problem, and is happy with your basic proposal for fixing it. We don’t want you to spend a bunch of time on something that we don’t think is a good idea.
-Additional requirements for pull requests:
-Adhere to the Developer Guidelines as well as the OHDSI Code Style.
If possible, add unit tests for new functionality you add.
Restrict your pull request to solving the issue at hand. Do not try to ‘improve’ parts of the code that are not related to the issue. If you feel other parts of the code need better organization, create a separate issue for that.
Make sure you pass R check without errors and warnings before submitting.
Always target the develop
branch, and make sure you are up-to-date with the develop branch.
R/Characterization.R
- Characterization-package.Rd
Various characterizations of target and outcome cohorts.
-Useful links:
R/Incremental.R
- cleanIncremental.Rd
Removes csv files from folders that have not been marked as completed -and removes the record of the execution file
-cleanIncremental(executionFolder)
The folder that has the execution files
A list with the settings
-Other Incremental:
-cleanNonIncremental()
R/Incremental.R
- cleanNonIncremental.Rd
Removes csv files from the execution folder as there should be no csv files -when running in non-incremental model
-cleanNonIncremental(executionFolder)
The folder that has the execution files
A list with the settings
-Other Incremental:
-cleanIncremental()
R/AggregateCovariates.R
- computeAggregateCovariateAnalyses.Rd
Compute aggregate covariate study
-computeAggregateCovariateAnalyses(
- connectionDetails = NULL,
- cdmDatabaseSchema,
- cdmVersion = 5,
- targetDatabaseSchema,
- targetTable,
- outcomeDatabaseSchema = targetDatabaseSchema,
- outcomeTable = targetTable,
- tempEmulationSchema = getOption("sqlRenderTempEmulationSchema"),
- aggregateCovariateSettings,
- databaseId = "database 1",
- runId = 1
-)
An object of type `connectionDetails` as created using the -[DatabaseConnector::createConnectionDetails()] function.
The schema with the OMOP CDM data
The version of the OMOP CDM
Schema name where your target cohort table resides. Note that for SQL Server, -this should include both the database and schema name, for example -'scratch.dbo'.
Name of the target cohort table.
Schema name where your outcome cohort table resides. Note that for SQL Server, -this should include both the database and schema name, for example -'scratch.dbo'.
Name of the outcome cohort table.
Some database platforms like Oracle and Impala do not truly support temp tables. -To emulate temp tables, provide a schema with write privileges where temp tables -can be created
The settings for the AggregateCovariate study
Unique identifier for the database (string)
Unique identifier for the tar and covariate setting
The descriptive results for each target cohort in the settings.
-R/DechallengeRechallenge.R
- computeDechallengeRechallengeAnalyses.Rd
Compute dechallenge rechallenge study
-computeDechallengeRechallengeAnalyses(
- connectionDetails = NULL,
- targetDatabaseSchema,
- targetTable,
- outcomeDatabaseSchema = targetDatabaseSchema,
- outcomeTable = targetTable,
- tempEmulationSchema = getOption("sqlRenderTempEmulationSchema"),
- settings,
- databaseId = "database 1",
- outputFolder = file.path(getwd(), "results"),
- minCellCount = 0,
- ...
-)
An object of type `connectionDetails` as created using the -[DatabaseConnector::createConnectionDetails()] function.
Schema name where your target cohort table resides. Note that for SQL Server, -this should include both the database and schema name, for example -'scratch.dbo'.
Name of the target cohort table.
Schema name where your outcome cohort table resides. Note that for SQL Server, -this should include both the database and schema name, for example -'scratch.dbo'.
Name of the outcome cohort table.
Some database platforms like Oracle and Impala do not truly support temp tables. -To emulate temp tables, provide a schema with write privileges where temp tables -can be created
The settings for the timeToEvent study
An identifier for the database (string)
A directory to save the results as csv files
The minimum cell value to display, values less than this will be replaced by -1
extra inputs
An Andromeda::andromeda()
object containing the dechallenge rechallenge results
Other DechallengeRechallenge:
-computeRechallengeFailCaseSeriesAnalyses()
,
-createDechallengeRechallengeSettings()
R/DechallengeRechallenge.R
- computeRechallengeFailCaseSeriesAnalyses.Rd
Compute fine the subjects that fail the dechallenge rechallenge study
-computeRechallengeFailCaseSeriesAnalyses(
- connectionDetails = NULL,
- targetDatabaseSchema,
- targetTable,
- outcomeDatabaseSchema = targetDatabaseSchema,
- outcomeTable = targetTable,
- tempEmulationSchema = getOption("sqlRenderTempEmulationSchema"),
- settings,
- databaseId = "database 1",
- showSubjectId = F,
- outputFolder = file.path(getwd(), "results"),
- minCellCount = 0,
- ...
-)
An object of type `connectionDetails` as created using the -[DatabaseConnector::createConnectionDetails()] function.
Schema name where your target cohort table resides. Note that for SQL Server, -this should include both the database and schema name, for example -'scratch.dbo'.
Name of the target cohort table.
Schema name where your outcome cohort table resides. Note that for SQL Server, -this should include both the database and schema name, for example -'scratch.dbo'.
Name of the outcome cohort table.
Some database platforms like Oracle and Impala do not truly support temp tables. -To emulate temp tables, provide a schema with write privileges where temp tables -can be created
The settings for the timeToEvent study
An identifier for the database (string)
if F then subject_ids are hidden (recommended if sharing results)
A directory to save the results as csv files
The minimum cell value to display, values less than this will be replaced by -1
extra inputs
An Andromeda::andromeda()
object with the case series details of the failed rechallenge
Other DechallengeRechallenge:
-computeDechallengeRechallengeAnalyses()
,
-createDechallengeRechallengeSettings()
Compute time to event study
-computeTimeToEventAnalyses(
- connectionDetails = NULL,
- targetDatabaseSchema,
- targetTable,
- outcomeDatabaseSchema = targetDatabaseSchema,
- outcomeTable = targetTable,
- tempEmulationSchema = getOption("sqlRenderTempEmulationSchema"),
- cdmDatabaseSchema,
- settings,
- databaseId = "database 1",
- outputFolder = file.path(getwd(), "results"),
- minCellCount = 0,
- ...
-)
An object of type `connectionDetails` as created using the -[DatabaseConnector::createConnectionDetails()] function.
Schema name where your target cohort table resides. Note that for SQL Server, -this should include both the database and schema name, for example -'scratch.dbo'.
Name of the target cohort table.
Schema name where your outcome cohort table resides. Note that for SQL Server, -this should include both the database and schema name, for example -'scratch.dbo'.
Name of the outcome cohort table.
Some database platforms like Oracle and Impala do not truly support temp tables. -To emulate temp tables, provide a schema with write privileges where temp tables -can be created
The database schema containing the OMOP CDM data
The settings for the timeToEvent study
An identifier for the database (string)
A directory to save the results as csv files
The minimum cell value to display, values less than this will be replaced by -1
extra inputs
An Andromeda::andromeda()
object containing the time to event results.
Other TimeToEvent:
-createTimeToEventSettings()
R/AggregateCovariates.R
- createAggregateCovariateSettings.Rd
Create aggregate covariate study settings
-createAggregateCovariateSettings(
- targetIds,
- outcomeIds,
- minPriorObservation = 0,
- outcomeWashoutDays = 0,
- riskWindowStart = 1,
- startAnchor = "cohort start",
- riskWindowEnd = 365,
- endAnchor = "cohort start",
- covariateSettings = FeatureExtraction::createCovariateSettings(useDemographicsGender =
- T, useDemographicsAge = T, useDemographicsAgeGroup = T, useDemographicsRace = T,
- useDemographicsEthnicity = T, useDemographicsIndexYear = T, useDemographicsIndexMonth
- = T, useDemographicsTimeInCohort = T, useDemographicsPriorObservationTime = T,
- useDemographicsPostObservationTime = T, useConditionGroupEraLongTerm = T,
- useDrugGroupEraOverlapping = T, useDrugGroupEraLongTerm = T,
- useProcedureOccurrenceLongTerm = T, useMeasurementLongTerm = T,
-
- useObservationLongTerm = T, useDeviceExposureLongTerm = T,
- useVisitConceptCountLongTerm = T, useConditionGroupEraShortTerm = T,
- useDrugGroupEraShortTerm = T, useProcedureOccurrenceShortTerm = T,
- useMeasurementShortTerm = T, useObservationShortTerm = T, useDeviceExposureShortTerm
- = T, useVisitConceptCountShortTerm = T, endDays = 0, longTermStartDays = -365,
- shortTermStartDays = -30),
- caseCovariateSettings = createDuringCovariateSettings(useConditionGroupEraDuring = T,
- useDrugGroupEraDuring = T, useProcedureOccurrenceDuring = T, useDeviceExposureDuring
- = T, useMeasurementDuring = T, useObservationDuring = T, useVisitConceptCountDuring =
- T),
- casePreTargetDuration = 365,
- casePostOutcomeDuration = 365,
- extractNonCaseCovariates = T
-)
A list of cohortIds for the target cohorts
A list of cohortIds for the outcome cohorts
The minimum time (in days) in the database a patient in the target cohorts must be observed prior to index
Patients with the outcome within outcomeWashout days prior to index are excluded from the risk factor analysis
The start of the risk window (in days) relative to the `startAnchor`.
The anchor point for the start of the risk window. Can be `"cohort start"` -or `"cohort end"`.
The end of the risk window (in days) relative to the `endAnchor`.
The anchor point for the end of the risk window. Can be `"cohort start"` -or `"cohort end"`.
An object created using FeatureExtraction::createCovariateSettings
An object created using createDuringCovariateSettings
The number of days prior to case index we use for FeatureExtraction
The number of days prior to case index we use for FeatureExtraction
Whether to extract aggregate covariates and counts for patients in the targets and outcomes in addition to the cases
A list with the settings
-R/RunCharacterization.R
- createCharacterizationSettings.Rd
This function creates a list of settings for different characterization studies
-createCharacterizationSettings(
- timeToEventSettings = NULL,
- dechallengeRechallengeSettings = NULL,
- aggregateCovariateSettings = NULL
-)
A list of timeToEvent settings
A list of dechallengeRechallenge settings
A list of aggregateCovariate settings
Returns the connection to the sqlite database
-Specify one or more timeToEvent, dechallengeRechallenge and aggregateCovariate settings
-Other LargeScale:
-loadCharacterizationSettings()
,
-runCharacterizationAnalyses()
,
-saveCharacterizationSettings()
R/Database.R
- createCharacterizationTables.Rd
This function executes a large set of SQL statements to create tables that can store results
-createCharacterizationTables(
- connectionDetails,
- resultSchema,
- targetDialect = "postgresql",
- deleteExistingTables = T,
- createTables = T,
- tablePrefix = "c_",
- tempEmulationSchema = getOption("sqlRenderTempEmulationSchema")
-)
The connectionDetails to a database created by using the
-function createConnectDetails
in the
-DatabaseConnector
package.
The name of the database schema that the result tables will be created.
The database management system being used
If true any existing tables matching the Characterization result tables names will be deleted
If true the Characterization result tables will be created
A string appended to the Characterization result tables
The temp schema used when the database management system is oracle
Returns NULL but creates the required tables into the specified database schema.
-This function can be used to create (or delete) Characterization result tables
-Other Database:
-createSqliteDatabase()
,
-insertResultsToDatabase()
R/DechallengeRechallenge.R
- createDechallengeRechallengeSettings.Rd
Create dechallenge rechallenge study settings
-createDechallengeRechallengeSettings(
- targetIds,
- outcomeIds,
- dechallengeStopInterval = 30,
- dechallengeEvaluationWindow = 30
-)
A list of cohortIds for the target cohorts
A list of cohortIds for the outcome cohorts
An integer specifying the how much time to add to the cohort_end when determining whether the event starts during cohort and ends after
An integer specifying the period of time after the cohort_end when you cannot see an outcome for a dechallenge success
A list with the settings
-Other DechallengeRechallenge:
-computeDechallengeRechallengeAnalyses()
,
-computeRechallengeFailCaseSeriesAnalyses()
R/CustomCovariates.R
- createDuringCovariateSettings.Rd
Create during covariate settings
-createDuringCovariateSettings(
- useConditionOccurrenceDuring = F,
- useConditionOccurrencePrimaryInpatientDuring = F,
- useConditionEraDuring = F,
- useConditionGroupEraDuring = F,
- useDrugExposureDuring = F,
- useDrugEraDuring = F,
- useDrugGroupEraDuring = F,
- useProcedureOccurrenceDuring = F,
- useDeviceExposureDuring = F,
- useMeasurementDuring = F,
- useObservationDuring = F,
- useVisitCountDuring = F,
- useVisitConceptCountDuring = F,
- includedCovariateConceptIds = c(),
- addDescendantsToInclude = F,
- excludedCovariateConceptIds = c(),
- addDescendantsToExclude = F,
- includedCovariateIds = c()
-)
One covariate per condition in the -condition_occurrence table starting between -cohort start and cohort end. (analysis ID 109)
One covariate per condition observed as -a primary diagnosis in an inpatient -setting in the condition_occurrence table starting between -cohort start and cohort end. (analysis ID 110)
One covariate per condition in the condition_era table -starting between cohort start and cohort end. -(analysis ID 217)
One covariate per condition era rolled -up to groups in the condition_era table -starting between cohort start and cohort end. -(analysis ID 218)
One covariate per drug in the drug_exposure table between cohort start and end. -(analysisId 305)
One covariate per drug in the drug_era table between cohort start and end. -(analysis ID 417)
One covariate per drug rolled up to ATC groups in the drug_era table between cohort start and end. -(analysis ID 418)
One covariate per procedure in the procedure_occurrence table between cohort start and end. -(analysis ID 505)
One covariate per device in the device exposure table starting between cohort start and end. -(analysis ID 605)
One covariate per measurement in the measurement table between cohort start and end. -(analysis ID 713)
One covariate per observation in the observation table between cohort start and end. -(analysis ID 805)
The number of visits observed between cohort start and end. -(analysis ID 926)
The number of visits observed between cohort start and end, stratified by visit concept ID. -(analysis ID 927)
A list of concept IDs that should be -used to construct covariates.
Should descendant concept IDs be added -to the list of concepts to include?
A list of concept IDs that should NOT be -used to construct covariates.
Should descendant concept IDs be added -to the list of concepts to exclude?
A list of covariate IDs that should be -restricted to.
An object of type covariateSettings
, to be used in other functions.
creates an object specifying how during covariates should be constructed from data in the CDM model.
-Other CovariateSetting:
-getDbDuringCovariateData()
settings <- createDuringCovariateSettings(
- useConditionOccurrenceDuring = TRUE,
- useConditionOccurrencePrimaryInpatientDuring = FALSE,
- useConditionEraDuring = FALSE,
- useConditionGroupEraDuring = FALSE
-)
-
-
This function creates a connection to an sqlite database
-createSqliteDatabase(sqliteLocation = tempdir())
The location of the sqlite database
Returns the connection to the sqlite database
-This function creates a sqlite database and connection
-Other Database:
-createCharacterizationTables()
,
-insertResultsToDatabase()
Create time to event study settings
-createTimeToEventSettings(targetIds, outcomeIds)
A list of cohortIds for the target cohorts
A list of cohortIds for the outcome cohorts
An list with the time to event settings
-Other TimeToEvent:
-computeTimeToEventAnalyses()
R/SaveLoad.R
- exportAggregateCovariateToCsv.Rd
export the AggregateCovariate results as csv
-exportAggregateCovariateToCsv(result, saveDirectory, minCellCount = 0)
The output of running computeAggregateCovariateAnalyses()
An directory location to save the results into
The minimum value that will be displayed in count columns
A string specifying the directory the csv results are saved to
-R/Database.R
- exportDatabaseToCsv.Rd
This function extracts the database tables into csv files
-exportDatabaseToCsv(
- connectionDetails,
- resultSchema,
- targetDialect = NULL,
- tablePrefix = "c_",
- filePrefix = NULL,
- tempEmulationSchema = getOption("sqlRenderTempEmulationSchema"),
- saveDirectory
-)
The connection details to input into the
-function connect
in the
-DatabaseConnector
package.
The name of the database schema that the result tables will be created.
DEPRECATED: derived from connectionDetails
.
The table prefix to apply to the characterization result tables
The prefix to apply to the files
The temp schema used when the database management system is oracle
The directory to save the csv results
csv file per table into the saveDirectory
-This function extracts the database tables into csv files
-R/SaveLoad.R
- exportDechallengeRechallengeToCsv.Rd
export the DechallengeRechallenge results as csv
-exportDechallengeRechallengeToCsv(result, saveDirectory, minCellCount = 0)
The output of running computeDechallengeRechallengeAnalyses()
An directory location to save the results into
The minimum value that will be displayed in count columns
A string specifying the directory the csv results are saved to
-Other SaveLoad:
-exportRechallengeFailCaseSeriesToCsv()
,
-exportTimeToEventToCsv()
R/SaveLoad.R
- exportRechallengeFailCaseSeriesToCsv.Rd
export the RechallengeFailCaseSeries results as csv
-exportRechallengeFailCaseSeriesToCsv(result, saveDirectory)
The output of running computeRechallengeFailCaseSeriesAnalyses()
An directory location to save the results into
A string specifying the directory the csv results are saved to
-Other SaveLoad:
-exportDechallengeRechallengeToCsv()
,
-exportTimeToEventToCsv()
export the TimeToEvent results as csv
-exportTimeToEventToCsv(result, saveDirectory, minCellCount = 0)
The output of running computeTimeToEventAnalyses()
An directory location to save the results into
The minimum value that will be displayed in count columns
A string specifying the directory the csv results are saved to
-Other SaveLoad:
-exportDechallengeRechallengeToCsv()
,
-exportRechallengeFailCaseSeriesToCsv()
R/CustomCovariates.R
- getDbDuringCovariateData.Rd
Extracts covariates that occur during a cohort
-getDbDuringCovariateData(
- connection,
- oracleTempSchema = NULL,
- cdmDatabaseSchema,
- cdmVersion = "5",
- cohortTable = "#cohort_person",
- rowIdField = "subject_id",
- aggregated = T,
- cohortIds = c(-1),
- covariateSettings,
- minCharacterizationMean = 0,
- ...
-)
The database connection
The temp schema if using oracle
The schema of the OMOP CDM data
version of the OMOP CDM data
the table name that contains the target population cohort
string representing the unique identifier in the target population cohort
whether the covariate should be aggregated
cohort id for the target cohort
settings for the covariate cohorts and time periods
the minimum value for a covariate to be extracted
additional arguments from FeatureExtraction
The the during covariates based on user settings
-The user specifies a what during covariates they want and this executes them using FE
-Other CovariateSetting:
-createDuringCovariateSettings()
- Aggregate Covariate Analysis-This analysis calculates the aggregate characteristics for a Target cohort (T), an Outcome cohort (O) and combiations of T with O during time at risk and T without O during time at risk. - |
- |
---|---|
- - | -Create aggregate covariate study settings |
-
- Dechallenge Rechallenge Analysis-For a given Target cohort (T) and Outcome cohort (O) find any occurrances of a dechallenge (when the T cohort stops close to when O started) and a rechallenge (when T restarts and O starts again) This is useful for investigating causality between drugs and events. - |
- |
- - | -Compute dechallenge rechallenge study |
-
- - | -Compute fine the subjects that fail the dechallenge rechallenge study |
-
- - | -Create dechallenge rechallenge study settings |
-
- Time to Event Analysis-This analysis calculates the timing between the Target cohort (T) and an Outcome cohort (O). - |
- |
- - | -Compute time to event study |
-
- - | -Create time to event study settings |
-
- Run Large Scale Characterization Study-Run multipe aggregate covariate analysis, time to event and dechallenge/rechallenge studies. - |
- |
- - | -Create the settings for a large scale characterization study |
-
- - | -Load the characterization settings previously saved as a json file |
-
- - | -execute a large-scale characterization study |
-
- - | -Save the characterization settings as a json |
-
- Save Load-Functions to save the analysis settings and the results (as sqlite or csv files). - |
- |
- - | -export the DechallengeRechallenge results as csv |
-
- - | -export the RechallengeFailCaseSeries results as csv |
-
- - | -export the TimeToEvent results as csv |
-
- Insert into Database-Functions to insert the results into a database. - |
- |
- - | -Create the results tables to store characterization results into a database |
-
- - | -Create an sqlite database connection |
-
- - | -Upload the results into a result database |
-
- Shiny App-Functions to interactively exlore the results from runCharacterizationAnalyses(). - |
- |
- - | -viewCharacterization - Interactively view the characterization results |
-
- Custom covariates-Code to create covariates during cohort start and end - |
- |
- - | -Create during covariate settings |
-
- - | -Extracts covariates that occur during a cohort |
-
- Incremental-Code to run incremetal model - |
- |
- - | -Removes csv files from folders that have not been marked as completed -and removes the record of the execution file |
-
- - | -Removes csv files from the execution folder as there should be no csv files -when running in non-incremental model |
-
This function uploads results in csv format into a result database
-insertResultsToDatabase(
- connectionDetails,
- schema,
- resultsFolder,
- tablePrefix = "",
- csvTablePrefix = "c_"
-)
The connection details to the result database
The schema for the result database
The folder containing the csv results
A prefix to append to the result tables for the characterization results
The prefix added to the csv results - default is 'c_'
Returns the connection to the sqlite database
-Calls ResultModelManager uploadResults function to upload the csv files
-Other Database:
-createCharacterizationTables()
,
-createSqliteDatabase()
Load the AggregateCovariate results
-loadAggregateCovariateAnalyses(fileName)
The file to save the results into.
A list of data.frames with the AggregateCovariate results
-R/RunCharacterization.R
- loadCharacterizationSettings.Rd
This function converts the json file back into an R object
-loadCharacterizationSettings(fileName)
The location of the the json settings
Returns the json settings as an R object
-Input the directory containing the 'characterizationSettings.json' file and load the settings into R
-Other LargeScale:
-createCharacterizationSettings()
,
-runCharacterizationAnalyses()
,
-saveCharacterizationSettings()
R/SaveLoad.R
- loadDechallengeRechallengeAnalyses.Rd
Load the DechallengeRechallenge results
-loadDechallengeRechallengeAnalyses(fileName)
The file to save the results into.
A data.frame with the DechallengeRechallenge results
-R/SaveLoad.R
- loadRechallengeFailCaseSeriesAnalyses.Rd
Load the RechallengeFailCaseSeries results
-loadRechallengeFailCaseSeriesAnalyses(fileName)
The file to save the results into.
A data.frame with the RechallengeFailCaseSeries results
-Load the TimeToEvent results
-loadTimeToEventAnalyses(fileName)
The file to save the results into.
A data.frame with the TimeToEvent results
-R/RunCharacterization.R
- runCharacterizationAnalyses.Rd
Specify the database connection containing the CDM data, the cohort database schemas/tables, -the characterization settings and the directory to save the results to
-runCharacterizationAnalyses(
- connectionDetails,
- targetDatabaseSchema,
- targetTable,
- outcomeDatabaseSchema,
- outcomeTable,
- tempEmulationSchema = getOption("sqlRenderTempEmulationSchema"),
- cdmDatabaseSchema,
- characterizationSettings,
- outputDirectory,
- executionPath = file.path(outputDirectory, "execution"),
- csvFilePrefix = "c_",
- databaseId = "1",
- showSubjectId = F,
- minCellCount = 0,
- incremental = T,
- threads = 1,
- minCharacterizationMean = 0.01
-)
The connection details to the database containing the OMOP CDM data
Schema name where your target cohort table resides. Note that for SQL Server, -this should include both the database and schema name, for example -'scratch.dbo'.
Name of the target cohort table.
Schema name where your outcome cohort table resides. Note that for SQL Server, -this should include both the database and schema name, for example -'scratch.dbo'.
Name of the outcome cohort table.
Some database platforms like Oracle and Impala do not truly support temp tables. -To emulate temp tables, provide a schema with write privileges where temp tables -can be created
The schema with the OMOP CDM data
The study settings created using createCharacterizationSettings
The location to save the final csv files to
The location where intermediate results are saved to
A string to append the csv files in the outputDirectory
The unique identifier for the cdm database
Whether to include subjectId of failed rechallenge case series or hide
The minimum count value that is calculated
If TRUE then skip previously executed analyses that completed
The number of threads to use when running aggregate covariates
The minimum mean threshold to extract when running aggregate covariates
Multiple csv files in the outputDirectory.
-The results of the characterization will be saved into an sqlite database inside the -specified saveDirectory
-Other LargeScale:
-createCharacterizationSettings()
,
-loadCharacterizationSettings()
,
-saveCharacterizationSettings()
Save the AggregateCovariate results
-saveAggregateCovariateAnalyses(result, fileName)
The output of running computeAggregateCovariateAnalyses()
The file to save the results into.
A string specifying the directory the results are saved to
-R/RunCharacterization.R
- saveCharacterizationSettings.Rd
This function converts the settings into a json object and saves it
-saveCharacterizationSettings(settings, fileName)
An object of class characterizationSettings created using createCharacterizationSettings
The location to save the json settings
Returns the location of the directory containing the json settings
-Input the characterization settings and output a json file to a file named 'characterizationSettings.json' inside the saveDirectory
-Other LargeScale:
-createCharacterizationSettings()
,
-loadCharacterizationSettings()
,
-runCharacterizationAnalyses()
R/SaveLoad.R
- saveDechallengeRechallengeAnalyses.Rd
Save the DechallengeRechallenge results
-saveDechallengeRechallengeAnalyses(result, fileName)
The output of running computeDechallengeRechallengeAnalyses()
The file to save the results into.
A string specifying the directory the results are saved to
-R/SaveLoad.R
- saveRechallengeFailCaseSeriesAnalyses.Rd
Save the RechallengeFailCaseSeries results
-saveRechallengeFailCaseSeriesAnalyses(result, fileName)
The output of running computeRechallengeFailCaseSeriesAnalyses()
The file to save the results into.
A string specifying the directory the results are saved to
-Save the TimeToEvent results
-saveTimeToEventAnalyses(result, fileName)
The output of running computeTimeToEventAnalyses()
The file to save the results into.
A string specifying the directory the results are saved to
-R/ViewShiny.R
- viewCharacterization.Rd
This is a shiny app for viewing interactive plots and tables
-viewCharacterization(resultFolder, cohortDefinitionSet = NULL)
The location of the csv results
The cohortDefinitionSet extracted using webAPI
Opens a shiny app for interactively viewing the results
-Input is the output of ...
-