TRIPOD_AnnIntMed.Rmd

---
title: "Development and validation of an EHR-based frailty index for predicting falls, adverse drug reactions, and heart failure"
author: 'Alex Bokov ^1,✉^, Sara Espinoza ^1^, Chandana Tripathy ^1^, and Kathleen R. Stevens ^1^'
css: "production.css"
abstract: | 
  We conducted a retrospective cohort study on structured data of 14,843
  randomly sampled from all adult patients in the Epic EHR system of a large 
  South Texas academic health center and its teaching hospital partner to 
  determine whether our implementation of the Rockwood frailty index (eFI) can 
  predict patient outcomes using only codes and numeric values routinely 
  collected in EHR systems. We found that eFI performs well as a predictor of 
  falls, adverse drug reactions, and heart failure.
documentclass: article
description: 'Manuscript'
clean: false
bibliography: efi_paper.bib
csl: harvard-cite-them-right.csl
always_allow_html: true
output:
  bookdown::html_document2:
    fig_caption: true
    self_contained: true
    number_sections: false
    keep_md: true
    clean: false
  bookdown::word_document2:
    reference_docx: template_AnnIntMed_native.docx
    fig_caption: true
    self_contained: true
    keep_md: true
    number_sections: false
---

```{r load_deps, echo=FALSE, message=FALSE, warning=FALSE, results='hide'}
.projpackages <- c('GGally','pander','dplyr','ggplot2','data.table'
                   ,'survival','broom','forcats','table1','english','bookdown'
                   ,'DiagrammeR','DiagrammeRsvg','rsvg','survminer');
.deps <- c( 'analysis.R' );
.debug <- 0;
message(paste0('\n\n working dir: ',getwd(),'\n\n'));
cat('\n\nAvailable Files:',list.files(),sep='\n','\n');
.junk<-capture.output(source(file.path('./scripts/global.R'),chdir=TRUE,echo=!1
                             ,local=TRUE));
# Set some formatting options for this document
panderOptions('table.alignment.default','right');
panderOptions('table.alignment.rownames','right');
panderOptions('table.split.table',Inf);
panderOptions('p.wrap','');
panderOptions('p.copula',', and ');
panderOptions('graph.fontfamily','serif');
# theme_get() gives default theme
theme_set(theme_bw(base_family = 'serif',base_size=14) +
            theme(strip.background = element_rect(fill=NA,color=NA)
                  ,strip.text = element_text(size=15)));
knitr::opts_chunk$set(echo=.debug>0, warning=.debug>0, message=.debug>0);
options(digits=5); # default is 7
knitr::knit_hooks$set(inline = nb);
# knitr::knit_hooks$set(inline = function(x) {
#   knitr:::format_sci(x, 'md')
# })

# detect current output format
.outfmt <- knitr::opts_knit$get('rmarkdown.pandoc.to');
if(is.null(.outfmt)){
  .outfmt <- if(knitr::is_html_output()) 'html' else {
    if(knitr::is_latex_output()) 'latex' else {
      if(interactive()) 'html' else {
        'unknown';
    }}}};
message('.outfmt = ',.outfmt);

.currentscript <- current_scriptname('TRIPOD_AnnIntMed.Rmd');
# .outcomenames <- c('vi_c_hospinfect','vi_c_hosptrauma'
#                    ,'vi_c_cardiac','vi_c_psi');
# outcomenames ----
.outcomenames <- c('vi_c_falls','vi_c_drugadverse','v157_unspcfd__ptnts_tf')
snip_outcomes <- rname(.outcomenames);
snip_todates <- dat03[,.(xx=max(start_date)),by=patient_num] %>%
  with(range(xx)) %>% paste0(collapse=' and ');
snip_fromdates <- dat03[,.(xx=min(start_date)),by=patient_num] %>%
  with(range(xx)) %>% paste0(collapse=' and ');

# limit the data to items of interest


# snippets ----

```
# Introduction/Background

`r trinote('...','
*[TRIPOD 3.a] Explain the medical context (including whether diagnostic or prognostic) and rationale for developing or validating the multivariable prediction model, including references to existing models.*  
*[TRIPOD 3.b] Specify the objectives, including whether the study describes the development or validation of the model or both.*  ')`

The project was set in the context of the learning health system (LHS).  LHS is
a strategy that leverages health information technology, clinical data
infrastructure (EHR), and data analytic techniques to achieved improved, more
affordable care [@etheredge07; @friedman14].  LHS is characterized as rapid
bidirectional cycle of learning where practice informs research evidence and
evidence informs practice [@greene12]. Improving patient safety is a key goal
for every health system. A simple and rapid frailty assessment would help
clinicians can tailor interventions to avoid poor patient outcomes. Our
objective was to evaluate a deficit-accumulation electronic frailty index (eFI)
as a predictor of falls, adverse reactions to medications, and heart failure and
then validate it against hold-out data.

Frailty is the lifelong erosion of stress resistance and accumulation of
impairments across multiple physiological systems. Among older community
dwelling adults 32% have been classified as pre-frail and 24% have been
classified as frail [@hoover13]. Frailty predicts disability, falls,
and mortality [@pajewski19]; emergency room visits and hospitalizations
[@fried01a]; and long-term care admissions [@pajewski19; @rockwood06;
@rockwood05]. The Fried phenotype [@fried01a] and Rockwood deficit accumulation
index [@rockwood07; @mitnitski01a], are two commonly used measures of frailty. 
There is reasonable convergence between these two approaches 
[@li15; @malmstrom14] but the latter is better suited to large-scale deployment
in EHR systems because it does not require patient questionnaires nor 
dedicated physical assessments. Kulminsky @kulminski08 found 
evidence for the Rockwood index outperforming the Fried phenotype. 

Our version of the eFI improves on previous EHR adaptations of the Rockwood 
index [@clegg16a; @pajewski19] by eliminating the need for manually curated lab 
reference ranges and by calculating the eFI over a rolling two-year window for 
every available patient encounter rather than at one fixed time point. The 
scripts, code-mappings, and usage notes are freely available online.


# Methods

Using EHR data, we employed an LHS approach to identification and analysis of
frailty with the goal of better decisions for preventive care and therapeutic
interventions. The eFI may also inform prospective study design and recruitment.

## Source of data

`r trinote('...','*[TRIPOD 4.a] Describe the study design or source of data (e.g., randomized trial, cohort, or registry data), separately for the development and validation data sets, if applicable.*  
*[TRIPOD 4.b] Specify the key study dates, including start of accrual; end of accrual; and, if applicable, end of follow-up. *')`

A random 1% sample (`r nb(subset(consort,node=='start')$patients)` patients, 
`r nb(subset(consort,node=='start')$patdays)` visit-days) was drawn from 
deidentified i2b2 data warehouse records representing the entire patient 
population in the EHR system of a large academic health center and its teaching 
hospital partner. 

The nominal date of index encounters ranged between `r snip_fromdates` while 
that of final encounters ranged between `r snip_todates`. The actual
dates were obscured by deidentification of the data prior to delivery to the 
authors but this date-shifting was within a one-year time window of the actual
dates and was done in a manner that preserved the chronological order of 
each patient's encounters and the duration between them.

## Participants

`r trinote('...','*[TRIPOD 5.a] OK Specify key elements of the study setting (e.g., primary care, secondary care, general population) including number and location of centres.*  
*[TRIPOD 5.b] OK Describe eligibility criteria for participants. *  
*[TRIPOD 5.c] OK Give details of treatments received, if relevant. *  
*[TRIPOD 13.a] OK Describe the flow of participants through the study, including the number of participants with and without the outcome and, if applicable, a summary of the follow-up time. A diagram may be helpful. *  ')`

The patient-visits in the data are a representative sample of the mix of all 
primary care, specialist, and inpatient encounters of UT Health San Antonio and 
the University Health System. As summarized in Figure \@ref(fig:consortcap), 
encounters during which the patient was below the age of 18 were excluded and
patients who had fewer than three such encounters were excluded. To avoid the
bias that would accompany always using the first valid visit as a starting 
point, an index date was randomly selected for each patient from among all 
their visit dates and only data recorded on or after that visit was used in 
analysis (not counting the eFI values which were pre-calculated over for every 
patient-visit in the EHR system, obtained as a separate file, and linked to 
the main dataset). After this step, patients who had fewer than two visits 
remaining were also excluded.

Some patients whose eFI is zero are non-frail, but many have zero eFIs for 
artifactual reasons-- e.g. incomplete records and sparse encounter histories. 
Furthermore, at the tail ends of their histories most patients have date-only 
records during which no diagnoses, procedures, nor lab results were recorded. 
These may have been missed appointments or accesses to patient files without 
actual encounters. We therefore excluded patients who never had 
an eFI greater than zero and for those that remained we excluded all visits 
after the final procedure, diagnosis, lab, or eFI (whichever happened last) was 
documented.

Finally, the patients were split into development (
`r subset(consort,node=='dev')$patients` patients, 
`r subset(consort,node=='dev')$patdays` visit-days) and validation (
`r subset(consort,node=='test')$patients` patients, 
`r subset(consort,node=='test')$patdays` visit-days) subsets. The incidence of 
the outcomes is reported in the [Results] section, below.

```{r consort,fig.cap=NULL,fig.width=5}
#grViz(consortgv);
grViz(consortgv) %>% DiagrammeRsvg::export_svg() %>% charToRaw %>% rsvg::rsvg_png(file='fig1_consort.png');
knitr::include_graphics('fig1_consort.png');
```
```{r consortcap, fig.cap='CONSORT diagram.',fig.height=0.1}
.oldmar <- par('mar'); par(mar=c(0,0,0,0));
plot(0,type='n',axes=F,xlab=NA,ylab=NA,cex=0.01);
par(mar=.oldmar);
```

This study was performed using only retrospective EHR data with no 
treatment, intervention or any other interaction with patients. The exposure 
groups were determined by patient eFI as will be discussed below.

## Outcomes 
*[TRIPOD 6.a] Clearly define the outcome that is predicted by the prediction model, including how and when assessed. *  

The three outcomes we predicted with eFI were falls, adverse reactions to drugs,
and heart failure. Following principles of implementation science
[@meissner20a], we drew on the expertise of the clinical stakeholders on our
team (CT and SE) to select outcomes that are meaningful, relevant, and useful in
guiding clinical care planning.

`r trinote('All decisions about processing and statistical analysis were made
using only the development data, blinded to the validation data. After the 
analysis scripts were finalized for publication, 
they were re-run on the validation data to generate the results reported here,
where appropriate showing the development data for comparison. The development 
version of each table and figure is available in the 
supplemental materials. 
','*[TRIPOD 6.b] Report any actions to blind assessment of the outcome to be predicted. *  
*[TRIPOD 7.b] Report any actions to blind assessment of predictors for the outcome and other predictors. *  
')`

## eFI, a deficit accumulation index

`r trinote('The standard procedure for creating a deficit accumulation frailty 
index','*[TRIPOD 7.a] Clearly define all predictors used in developing or validating the multivariable prediction model, including how and when they were measured.*')` [@mitnitski01a;@song10; @searle08a] is to select at least 30-40 functional 
deficits (e.g. diagnoses or abnormal lab results) that span multiple
physiological systems, dichotomize or rescale them such that 0 indicates absence
and 1 indicates presence, and then for a given time window, calculate an
unweighted proportion of deficits that are present in a patient. This proportion
is the eFI. Our selection of functional deficits is based on that of @clegg16a
for UK health systems as modified for ICD10 codes by @pajewski19. We built upon
their work by mapping laboratory tests to LOINC codes and omitting data elements
which might not be available in some EHR systems (e.g. patient questionnaires).
The resulting crosswalk tables map systolic and diastolic blood pressure,
smoking status, BMI, 32 LOINC codes, 626 ICD10 codes, and 503 ICD9 codes (for
compatibility with older data sources) to 48 functional deficits. These tables
are available in S5 and online [@bokov20]. For every visit date of every patient
in our site's i2b2 data warehouse, the above tables were used to aggregate all
matching codes documented during the preceding two-year window to a count of
distinct functional deficits and divided by the total number of defined deficits
(48). Thus, if a patient had six distinct functional deficits during the
two-year period preceding a given visit, their eFI for that visit would be
0.125.
`r trinote('Though eFI is a continuous variable, for purposes of  visualization 
and interpretation we use a cutpoint of 0.19, established by @stow18. Patients 
above this cut-point are described as frail while those below as non-frail.'
,'*[TRIPOD 11] Provide details on how risk groups were created, if done.  *')`.
The above deployment of the eFI was done prior to and separately from the 
design of this study. The eFI scores were provided to the authors as a separate 
file which the authors linked to the rest of the study data using shared but 
irreversibly de-identified keys. For this reason, when we omitted all visits 
prior to a randomly select index visit for each patient (see [Participants] 
section, above) this did not interfere with eFI calculation which was the only 
piece of information retained about a patient's history prior to the index 
visit. 

## Statistical analysis methods

`r trinote('
For each of the four outcomes, we used a Cox proportional hazard model [@cox72]
with eFI as the predictor of the first occurrences of
the respective outcomes after the patients\' index visits. Unlike most previous 
studies, did not select a single time point to represent each patient. In 
real-world scenarios patient data updates at every encounter so we designed eFI 
as a time-varying numeric predictor updated at every visit (encompassing all
relevant codes documented during a two-year rolling window preceding that 
visit). Follow-ups for each patient ended at the first occurrence of the outcome
being modeled. Harrell\'s c-statistic [@harrell84; @harrell96] was used to 
assess model performance ',
'*[TRIPOD 10.a, development only] Describe how predictors were handled in the analyses.*  
*[TRIPOD 10.b, development only] Specify type of model, all model-building procedures (including any predictor selection), and method for internal validation.*  
*[TRIPOD 10.d] Specify all measures used to assess model performance and, if relevant, to compare multiple models.*
')`
and the proportionality assumption was tested for each model as per @grambsch94 
(Table \@ref(tab:tb3)).

`r trinote('
We used the R statistical language [@rteam21] and the Survival package for R 
[@therneau00] for all analysis and visualizations. We fit simple univariate Cox 
models that have the following specification, in the R syntax:','
*[TRIPOD 15.a, development only] Present the full prediction model to allow predictions for individuals (i.e., all regression coefficients, and model intercept or baseline survival at a given time point).*  ')`

::: {.example #modelspec}
`coxph( Surv(t0,t1,xx) ~ I(10*efi), data=DATA)`
:::

Here `t0` and `t1` are the start and stop of an observation period (i.e. the 
time elapsed from one encounter and the next). These are integers, representing
the number of days relative to each patient's index visit. The `xx` is an 
indicator variable for the outcome with a value of `TRUE` if that outcome 
occurred during the observation period and `FALSE` if it did not. The predictor
eFI is represented by `efi`, a continuous variable with a range of 0 to 1. The 
full expression, `I(10*efi)` means we scaled `efi` by 10 so that the 
coefficient could be interpreted as the change in hazard ratio per 0.1 change in 
eFI. All model coefficients are shown below the corresponding Kaplan-Meier 
curves in Figures \@ref(fig:kmfallscap)-\@ref(fig:kmhfcap).

`r trinote('
Our missing data strategy approximates a complete-case analysis in 
that we excluded patients with too few visits or missing eFI. No imputation was
done on the remaining records; absence of ICD10 codes for the respective 
outcomes was interpreted as non-occurrence.'
,'*[TRIPOD 9] Describe how missing data were handled (e.g., complete-case analysis, single imputation, multiple imputation) with details of any imputation method. *  ')`

`r trinote('
We limited ourselves to a 1% initial sample for two reasons. First, given 
anticipated utility of the eFI for predicting other clinically important
outcomes in future studies, and given that our approach uses information from a 
large time-span for each sampled patient, we need to avoid depleting the 
un-analyzed patient population in these early stages. Secondly, we needed to 
stay within processing-time and file-size constraints of our research
data warehouse team. Extrapolating from the effect sizes in @pajewski19, events 
with an overall prevalence of 5% and a hazard ratio of 1.33 between frail and 
non-frail groups should be detectable with a power well over 0.80.'
,'*[TRIPOD 8] Explain how the study size was arrived at.*  ')`

## Development vs. validation

`r trinote('We used the following validation methods recommended by @royston13: '
,'*[TRIPOD 10.c, validation only] For validation, describe how the predictions were calculated.*  
*[TRIPOD 10.e, validation only] Describe any model updating (e.g., recalibration) arising from the validation, if done.*  
*[TRIPOD 12, validation only] For validation, identify any differences from the development data in setting, eligibility criteria, outcome, and predictors. *')`
regressing out-of-sample linear predictors against the in-sample linear 
predictors (Table \@ref(tab:coxgof)), re-fitting the validations model using the 
developmental predictors as the sole explanatory variables
(Table \@ref(tab:lmdiscrim)), and comparing the cumulative
distributions of the out-of-sample and in-sample linear predictors (supplement).
There were no differences in the choice of predictors, outcome, inclusion 
criteria, or clinical setting between the development and validation data. 
Both were mutually exclusive, randomly assigned subgroups of patients 
from the same sample. No updates or model recalibration were needed after 
analyzing the validation data.


# Results

## Participants

`r trinote('...','*[TRIPOD 13.b] Describe the characteristics of the participants (basic demographics, clinical features, available predictors), including the number of participants with missing data for predictors and outcome. *  
*[TRIPOD 13.c, validation only] For validation, show a comparison with the development data of the distribution of important variables (demographics, predictors and outcome). *  
*[TRIPOD 14.a, development only] Specify the number of participants and outcome events in each analysis. *  
*~~[TRIPOD 14.b, development only] If done, report the unadjusted association between each candidate predictor and outcome.~~*')`


```{r tb1prep,results='hide'}
.dat04keep <- c('sex_cd','race_cd','BLANK0',.outcomenames ,'BLANK1'
                ,'Frailty'
                ,'Patient age (years)');
.tb1formula <- wrap(.dat04keep,'`') %>% paste0(collapse=' + ') %>%
  paste('~',.,'|frl+Dataset') %>% formula;
tb1 <- bind_rows(dat04[,.SD,.SDcols=.dat04keep]
                   ,dat04v[,.SD,.SDcols=.dat04keep]
                   ,.id='Dataset') %>%
  mutate(frl=ifelse(Frailty>0.19,'Frail (eFI > 0.19)'
                    ,'Non-Frail (eFI <= 0.19)')
         ,Dataset=ifelse(Dataset==1,'Development','Validation')) %>%
  mutate(across(everything(),~({
    label(.)<-paste0('<b>',rnameshort(cur_column()),'</b>'
                     ,if(is.numeric(.)) ': Mean(SD)&nbsp;' else ': N(%)&nbsp;');
    if(grepl('^BLANK',cur_column())) label(.) <- ' ';
    .}))) %>%
  table1(.tb1formula,data=.,render=proj_render,render.strat=proj_render_strat
         ,overall='Total',digits=2);
tb1df <- as.data.frame(tb1);

```


There were `r tb1df[3,6]` females and `r tb1df[4,6]` males in the development 
data and `r tb1df[3,7]` females and `r tb1df[4,7]` in the validation data.
In the development data `r tb1df[11,6]` of the patients were white and in the 
validation data, `r tb1df[11,7]` were. As expected, most patients had 
an eFI below 0.19 those with a higher eFI were moderately older 
(`r gsub(' .*$','',tb1df[21,2])` vs `r gsub(' .*$','',tb1df[21,4])` in the 
development data and `r gsub(' .*$','',tb1df[21,3])` vs 
`r gsub(' .*$','',tb1df[21,5])` in the validation data). Patients in the high
eFI groups had a mean (SD) eFI of `r tb1df[20,2]` and `r tb1df[20,3]` in the 
development and validation data respectively. Overall, the development and 
validation cohorts were very similar to each other in both demographics and 
outcome incidence.

```{r table1header}
    pander(rbind(c('.','','','','','','',''))
           ,caption='(\\#tab:table1) Cohort demographics\n\n');
```

```{r tb1,results='asis'}
if(.outfmt == 'html'){
  #rname(method='partial') %>% gsub('BLANK.','</br>',.) %>% 
  #  gsub('Patient age','<br/>Patient age',.);
  
  print(tb1);
};
```

***

## Prediction of Outcomes

Kaplan-Meier plots of the validation data are shown in 
Figures \@ref(fig:kmfallscap)-\@ref(fig:kmhfcap) , split on eFI 
(Frail if above 0.19, non-Frail otherwise). Because these are all relatively 
rare outcomes, the y-range of each plot is limited to 70%-100%, which fully 
encompasses all events over a 3-year period. Each plot is followed by
Cox model results with p-values adjusted for multiple comparisons 
[@holm79] because we are testing four separate hypotheses on the same data. 
Below that, each figure has a risk table for a sample of time points 
(30 days, 60 days, 90 days, one year, and 3 years). The count of censored events
is the number of patient-visits during which the outcome has not yet been 
observed at that time point. The estimated survival function is also shown. 
Kaplan-Meier plots from the development data are superimposed as dashed lines on
each plot. 
`r valnote('For each of the outcomes, eFI was found to be a statistically and practically significant predictor.')`
`r valnote('For each 0.1 increase in EFI, we found approximately a doubling of the hazard.')`
Roughly approximating these hazards as probabilities, if a hypothetical patient 
with an eFI of 0 has a 
`r scales::label_percent(accuracy=0.1)(predict(fitsval$vi_c_falls$models$Frailty,newdata=list(a_t0=0,a_t1=30,xx=NA,a_efi=0),type='expected'))` 
baseline chance of experiencing a falling injury during a 30-day period, 
then if their eFI were 0.1 their chance would rise to
`r scales::label_percent(accuracy=0.1)(predict(fitsval$vi_c_falls$models$Frailty,newdata=list(a_t0=0,a_t1=30,xx=NA,a_efi=0.1),type='expected'))` 
and if their eFI were 0.2, just above our operational definition of frailty,
it would be 
`r scales::label_percent(accuracy=0.1)(predict(fitsval$vi_c_falls$models$Frailty,newdata=list(a_t0=0,a_t1=30,xx=NA,a_efi=0.2),type='expected'))` 


```{r kmfalls, fig.cap=NULL, message=FALSE,collapse=TRUE,results='asis'}
panderOptions('knitr.auto.asis', FALSE);
with(fitsval$vi_c_falls,{
  message(dispname);
  plotsurv01(survfit(Surv(a_t0,a_t1,xx)~Frail,id=patient_num,data=data)
             ,survfit(Surv(a_t0,a_t1,xx)~Frail,id=patient_num
                      ,data=fits$vi_c_falls$data)
             ,ylim=c(0.7,1.0)
             ,labs=c('Non-Frail','Frail',
                     '\nNon-Frail\n(development)','\nFrail\n(development)')
             ,title=dispname) %>% print; 
  pander(cphunivar(models$Frailty,3,3),row.names=F);
  risktable00(survfit(Surv(a_t0, a_t1, xx) ~ Frail,data,id = patient_num)) %>%
    pander(row.names=F);
  });

```
```{r kmfallscap, fig.cap='Falls',fig.height=0.1}
.oldmar <- par('mar'); par(mar=c(0,0,0,0));
plot(0,type='n',axes=F,xlab=NA,ylab=NA,cex=0.01);
par(mar=.oldmar);
```


```{r kmdrug, fig.cap=NULL, message=FALSE,collapse=TRUE,results='asis'}
panderOptions('knitr.auto.asis', FALSE);
.subfig <- 0;
with(fitsval$vi_c_drugadverse,{
  message(dispname);
  plotsurv01(survfit(Surv(a_t0,a_t1,xx)~Frail,id=patient_num,data=data)
             ,survfit(Surv(a_t0,a_t1,xx)~Frail,id=patient_num
                      ,data=fits$vi_c_drugadverse$data)
             ,ylim=c(0.7,1.0)
             ,labs=c('Non-Frail','Frail',
                     '\nNon-Frail\n(development)','\nFrail\n(development)')
             ,title=dispname) %>% print; 
  pander(cphunivar(models$Frailty,3,3),row.names=F);
  risktable00(survfit(Surv(a_t0, a_t1, xx) ~ Frail,data,id = patient_num)) %>%
    pander(row.names=F);
  });
```
```{r kmdrugcap, fig.cap='Adverse drug reactions.',fig.height=0.1}
.oldmar <- par('mar'); par(mar=c(0,0,0,0));
plot(0,type='n',axes=F,xlab=NA,ylab=NA,cex=0.01);
par(mar=.oldmar);
```


```{r kmhf, fig.cap=NULL, message=FALSE,collapse=TRUE,results='asis'}
panderOptions('knitr.auto.asis', FALSE);
with(fitsval$v157_unspcfd__ptnts_tf,{
  message(dispname);
  plotsurv01(survfit(Surv(a_t0,a_t1,xx)~Frail,id=patient_num,data=data)
             ,survfit(Surv(a_t0,a_t1,xx)~Frail,id=patient_num
                      ,data=fits$v157_unspcfd__ptnts_tf$data)
             ,ylim=c(0.7,1.0)
             ,labs=c('Non-Frail','Frail',
                     '\nNon-Frail\n(development)','\nFrail\n(development)')
             ,title=dispname) %>% print; 
  pander(cphunivar(models$Frailty,3,3),row.names=F);
  risktable00(survfit(Surv(a_t0, a_t1, xx) ~ Frail,data,id = patient_num)) %>%
    pander(row.names=F);
  });
```
```{r kmhfcap, fig.cap='Heart failure.',fig.height=0.1}
.oldmar <- par('mar'); par(mar=c(0,0,0,0));
plot(0,type='n',axes=F,xlab=NA,ylab=NA,cex=0.01);
par(mar=.oldmar);
```


## Model performance

`r trinote('...','*[TRIPOD 16] Report performance measures (with CIs) for the prediction model.*  
*~~[TRIPOD 17, validation only] If done, report the results from any model updating (i.e., model specification, model performance).~~*')`

`r valnote("All models had a C-statistic above 0.7. The proportional hazard assumption was satisfied for all outcomes except adverse drug events
(Table \\@ref(tab:tb3)). For adverse drug events, the deviation from 
proportionality is time-dependent and results in some loss sensitivity with
increasing followup time (Figure S1).") `

```{r tb3}
panderOptions('knitr.auto.asis', TRUE);
subset(tb3v,Outcome %in% .outcomenames & Predictor == 'Frailty') %>% 
  mutate(Outcome = rnameshort(Outcome)
         ,`Concordance (95% CI)`=paste0(nb(Concordance,2)
                                    ,' ('
                                    ,nb(Concordance-1.96*`SE Concordance`,2)
                                    ,', '
                                    ,nb(Concordance+1.96*`SE Concordance`,2)
                                    ,')')
         ,`PH: χ^2^~df=1~`=nb(`χ² (ZPH)`,2),`PH: P`=nb(`P (ZPH)`,3)
         ) %>% 
  select('Outcome','Concordance (95% CI)','PH: χ^2^~df=1~','PH: P') %>% 
  pander(caption="(\\#tab:tb3) Performance of EFI models. 
Concordance shows Harrell's c-statistic for each model. The remaining two 
columns are the χ^2^ statistic and p-value the proportional hazard test. A
low p-value indicates a violation of the proportionality assumption.",row.names=F);

```

`r valnote('In Table \\@ref(tab:coxgof) we report the calibration of the models by 
using the development models to make out-of-sample predictions on the validation
data and using those predictions as offsets in Cox models fitted to the
validation data [@royston13]. The "Mismatch (CI)" column represents the amount
of remaining variation explained by the updated (validation) model but not
explained by the out-of-sample prediction of the development model. Here a
non-significant result means that the model predicted out-of-sample results
without there being any benefit to updating on the new data. The models for
all the outcomes passed this test.')`

```{r coxgof}
lapply(valresults[.outcomenames]
       ,function(xx) tidy(xx$Frailty$coxgof,conf.int=T)[-1]) %>%
  bind_rows(.id='outcome') %>% 
  mutate(outcome=rnameshort(outcome)
         , `Mismatch (CI)`=paste0(nb(estimate,3),' (',nb(conf.low,3),', '
                                        ,nb(conf.high,3),')')
         ,SE=nb(std.error,3),t=nb(statistic,3),P=nb(p.value)) %>% 
  tibble::column_to_rownames(var='outcome') %>% 
  select(c('Mismatch (CI)','SE','t','P')) %>% 
  pander(caption='(\\#tab:coxgof) Goodness of fit and calibration.');
```

`r valnote('In Table \\@ref(tab:lmdiscrim), the agreement between development and 
validation models with each other is evaluated by regressing their linear
predictors against each other [@royston13]. Exact agreement would result in an
intercept of zero and a slope of one. A slope greater than one indicates that 
the validation model is actually more sensitive than implied by the 
development results, a slope less than one indicates lowered sensitivity and a 
slope of zero would imply no correlation between the development and validation 
results. For all outcomes the slopes were significantly lower than 1 with very
small standard errors, so relying on the development data alone would have 
over-esimated the sensitivity of eFI as a predictor. All the slopes were above 
0.9 and adverse drug events validatiom model was slightly more sensitive than 
the development model.')`

```{r lmdiscrim}
valresultsout <- lapply(valresults[.outcomenames]
                        ,function(xx){
                          tidy(xx$Frailty$lmdiscrim,conf.int=T)[-1]}) %>%
  bind_rows(.id='outcome') %>%
  mutate(Outcome=paste0(rnameshort(ifelse(lag(outcome,1,'(Intercept)')==outcome
                                   ,outcome,'(Intercept)'))
                        ,ifelse(lag(outcome,1,'(Intercept)')==outcome,'\\\n  \\\n  ',''))
         ,Coefficient=nb(estimate,3),SE=nb(std.error,2)
         ,t=nb(statistic,2),P=nb(p.value,2));
select(valresultsout,'Outcome','Coefficient','SE','t') %>%
  pander(row.names=F
         ,caption='(\\#tab:lmdiscrim) Validation against out-of sample predictions from the development model.'
         ,keep.line.breaks=T);
```

`r valnote('For all outcomes, distributions of their linear predictors were 
similar whether they were generated out-of-sample by the development models or 
in-sample by the validation models (Figure S2).')`

# Discussion

We implemented an electronic Frailty Index using the Rockwood
deficit-accumulation framework and used it to predict a diverse set of poor
patient outcomes in real-world EHR data. This LHS approach offers a low-effort
non-invasive risk assessment method for informing clinical and utilization
decisions that scales to large patient populations.

## Limitations

*[TRIPOD 18] Discuss any limitations of the study (such as nonrepresentative sample, few events per predictor, missing data). *  
Our study sample is representative of patients at one health system, but not of 
the general population. Furthermore, since patients with zero eFIs were excluded
our findings may not apply to patients who completely lack functional deficits.
Our data also shares the fundamental limitation of the EHR system from which it 
was obtained: like all EHR systems, it only has information that providers and
coders put into it. Events taking place outside the health system or at
un-connected health systems are not visible to our analysis. On the other hand,
providers who rely on EHR systems at point of care are also working under these
limitations and the patients whose outcomes they most need to predict are those
who do in fact have functional deficits. The data we used is representative of 
this scenario, and despite the limitations eFI provides accurate predictions of 
poor patient outcomes. Since it is based on real EHR data, eFI is more directly 
transferable to clinical use than implementations based on curated registries 
and is more immediate than claims data. eFI is most accurate for patients who 
have accumulated a reasonable in-system visit-history and further work is needed
to find a more precise relationship between the length of a patient's
visit-history and the accuracy of eFI.

## Interpretation

*[TRIPOD 19.b] Give an overall interpretation of the results, considering objectives, limitations, results from similar studies, and other relevant evidence. *  

This study only used outcome data for each patient after a randomly selected
index date between that patient's first and last available visit. This design
mimics a scenario where an existing patient is seen for the first time after
clinical deployment of eFI and the goal is to determine how well eFI by itself
will predict the outcomes without any additional information about prior
occurrences. eFI predicted falls, adverse drug reactions, and heart failure with
high statistical significance. In all cases the effect size was large enough to
be clinically significant and the models had high concordances
(Table \@ref(tab:tb3)). 
`r trinote('Drug reactions was the one outcome for which the Cox model\'s 
proportionality assumption was not met. This may be due to adverse drug events
being more rare or encompassing a
more heterogeneous set of diagnoses than the other outcomes and some of these
diagnoses may be more prevalent with age thus introducing a time interaction
which the univariate model does not include. This may also be why drug events 
was the outcome for which there was a significant amount of variation in the 
validation data which was not explained by out-of-sample predictions 
(Table \\@ref(tab:coxgof)). For all outcomes the 
out-of-sample prediction was not significantly worse than the in-sample 
prediction.'
,'*[TRIPOD 19.a, validation only] For validation, discuss the results with reference to performance in the development data, and any other validation data. *  ')`
There was generally a high correlation between the out-of-sample and 
in-sample predictions (`r subset(valresultsout,Outcome!='(Intercept)')$estimate %>% range %>% nb(digits=2) %>% paste(collapse=' - ')`
, Table \@ref(tab:lmdiscrim)).


## Implications

*[TRIPOD 20] Discuss the potential clinical use of the model and implications for future research. *

The LHS approach to harness EHR data and rapidly generate clinical knowledge can
be transformative for primary and specialist care. Assessing frailty helps
clinicians identify high risk patients and tailor interventions to prevent
health decline and poor outcomes. There is no gold standard method to assess
Frailty in clinical practice. Currently available frailty assessment tools used
in geriatric practice have good validity (for example @fried01a) but these are
time intensive and often difficult to implement in a busy general practice.
Because of the simplicity of the Rockwood Frailty Index, it is more likely to be
adopted by clinicians in a busy practice.

The statistical analysis we used to _validate_ eFI as a predictor is not 
a prerequisite for _deploying_ eFI in the clinic. Though Cox models can be 
useful initially to find optimal thresholds tuned to specific populations and 
outcomes instead of relying on the 0.19 cutoff we used here though it was 
effective at partitioning patients into risk groups for very different types of 
events. Nor is eFI limited to any specific EHR system. As long as 
it is possible to extract a table of dates, patient IDs, ICD-10 codes, and LOINC 
codes for labs with accompanying value-flags, such a result-set can be 
cross-walked to eFI values at patient-visit granularity. 


# Conclusions

The LHS approach enabled our study to successfully draw knowledge from
real-world care-delivery data to promote improvement and health system change
based on rigorous, generalizable research.

Our results contribute to a growing body of evidence that
risk scores built using the Rockwood framework [@mitnitski01a] will be a
valuable tool for clinical decision support not restricted to any one illness or
specialty. The variables used to calculate EFI are ones that are in some form
available in every EMR system. If there are EMR systems where a few of these
variables are not available, Rockwood deficit accumulation indexes have been
shown to continue giving consistent results despite variations in what
individual codes are available [@rockwood06] as long as there is a sufficiently
large and representative collection of deficits into which these variables can
be binned [@searle08a; @mitnitski05].

Our future work will include promoting the adoption of the eFI by Learning
Health Systems, bridging gaps between clinical data and improved care through
data analytics and informatics tools, and connecting eFI-informed decisions with
patient and health system outcomes.

`r trinote('
To facilitate adoption and refinement of EFI, in addition to the crosswalk 
tables (S5 and S6) we released mapping tables and example SQL scripts [@bokov20]
as a starting point for researchers or enterprise analytics teams that wish to 
deploy eFI at their own health systems. With input from colleagues at other
learning health systems we intend to evolve these scripts into a self-contained
app that interfaces with EMR systems (via FHIR) to provide real-time frailty
assessment at the clinical point of care and assist clinicians in developing
care plans to mitigate the risks of frailty.'
,'*[TRIPOD 21] Provide information about the availability of supplementary resources, such as study protocol, Web calculator, and data sets. *  
*[TRIPOD 15.b, development only] Explain how to the use the prediction model.*
')`.


# Acknowledgments

`r note("This work was supported by NIH/NCATS UL1TR001120 and UL1TR002645 (AFB,
KRS), the Long School of Medicine KL2 Award (AFB), the Claude D. Pepper Older
Americans Independence Center P30 AG044271 (SER),  and the Berneice Castella
Distinguished Professorship in Aging Research in Nursing  (KRS)."
,"Did I get everybody's correctly?",date='2020-12-04T14:20')` `r trinote("The 
funders of this study had no influence on the design, data collection, analysis, 
interpretation, or manuscript preparation and submission.",
"*[TRIPOD 22] Give the source of funding and the role of the funders for the present study. *  ")`

We would also like to thank Dr. Meredith Zozus, director
of the Informatics core at the Institute for Integration of Medicine and Science
and division chief for research informatics for helping us and countless other 
researchers leverage clinical data in their studies.

# References


---
date: '2021-08-08'

---