Evaluation and Improvement of the National Early Warning Score (NEWS2) for COVID-19: a multi-hospital study

Ewan Carr^*, Rebecca Bendayan^*, Daniel Bean, Matthew Stammers, Wenjuan Wang, Huayu Zhang, Thomas Searle, Zeljko Kraljevic, Anthony Shek, Hang T T Phan, Walter Muruet, Anthony J Shinton, Ting Shi, Xin Zhang, Andrew Pickles, Daniel Stahl, Rosita Zakeri, Kevin O'Gallagher, Amos Folarin, Lukasz Roguski, Florina Borca, James Batchelor, Xiaodong Wu, Jiaxing Sun, Ashwin Pinto, Bruce Guthrie, Cormac Breen, Abdel Douiri, Honghan Wu, Vasa Curcin, James T Teo^#, Ajay Shah^#, Richard Dobson^#

^*#Joint authors

doi: 10.1186/s12916-020-01893-3

Overview

This repository provides pre-trained models to validate models in the medRxiv pre-print.
Please get in touch if you would like to collaborate on this replication.
- Clinical contact: [email protected]
- Technical contact: [email protected]
If you use code/trained models from this repository, please cite the pre-print as a condition of use.

About the models

The file replicate.py validates four models that were initially trained on the KCH sample. The four models are:

Endpoint	Model	Included features
3-day ICU/death	1	NEWS2 only
3-day ICU/death	2	NEWS2, oxygen litres
14-day ICU/death	3	NEWS2 only
14-day ICU/death	4	NEWS2, oxygen litres, urea, age, oxygen saturation, CRP, estimated GFR, neutrophils, neutrophil/lymphocyte ratio

The script imports a validation dataset (validation.csv) or generates a simulated dataset if this is missing. For each model (1-4), we:

Evaluate discrimination of the pre-trained models (loaded from pretrained.joblib) in the validaton dataset.
Generate estimates needed for calibration plots;
Test re-calbrated models, based on:
- Shrinkage factors derived from internal validation;
- Recalibration in the validation sample, based on Platt's method.
Save the estimates (via joblib.dump).

Some notes:

The code does not perform any training or cross-validation, with the exception of KNN imputation, see below.
Some code for data cleaning is provided (cleaning.R) but this is quite specific to the structure of the source data. It should demonstrate how we prepared the training and validation datasets, but will likely require modification before running on replication samples.

How to use this repository

Prepare your validation dataset (named validation.csv) and according to the below specification.
Run replicate.py.
Email the resulting replication.joblib file to [email protected].

Please note: Some variables require transformation. The script replicate.py provides code to perform these. If your validation dataset has already been transformed (as per the below specifications) you will need to disable this part of the script, by switching True to False:

if True:
    # NOTE: set to False if transformations have already been applied.

Measures needed to validate these models

Outcomes

We're using a combined outcome of transfer to ICU or death (WHO-COVID-19 Outcomes Scales 6-8) within 3 or 14 days following index date. Index date is defined as follows:
- For non-nosocomial patients (i.e. community-acquired COVID infection) index date is hospital admission.
- For nosocomial patients index date is the date of symptom onset. If onset is unavailable the date of diagnosis (positive SARS-CoV-2 RT-PCR) minus 4 days can be used instead.
Each outcome is coded as 1 if the patient was transferred to ICU or died within the period (3 days or 14 days, respectively); 0 otherwise. All patients not experiencing the outcome must have reached the respective endpoint.

In the KCH training sample (n=1276) the event rates were as follows:

	N (%)
3-day ICU/death	163 (12.8%)
14-day ICU/death	389 (30.5%)

The time-to-event for the training sample shown below:

Required variables

The 10 required variables are listed below:

		Measure	Transformation	Range in KCH
Demographics	`age`	Age at admission in years	None	20-99
Blood parameters	`crp`	C-reative protein (CRP; mg/L)	`np.sqrt`	1.3-19.0
	`estimatedgfr`	Estimated Glomerular Filtration Rate (mL/min)	None	4-90
	`neutrophils`	Neutrophil count (x 10⁹)	`np.sqrt`	0.8-4.6
	`nlr`	Neutrophil-to-lymphocyte ratio	`np.log`	-0.3-3.6
	`urea`	Urea (mmol/L)	`np.sqrt`	1.3-6.2
Physiological parameters	`news2`	NEWS2 total score	None	0-10
	`oxsat`	Oxygen saturation (%)	None	87-100
	`oxlt`	Oxygen litres (L/min)	None	0-15
Other	`nosoc`	Nosocomial patient (0/1)	None	0-1

Transformations

Some features must be transformed before use:

All continuous features must be winsorized by setting the top/bottom 1% of values to the 1st and 99th percentile values.
Some features (crp, neutrophils, nlr, urea) require transformation as listed in the table.

The provided script will carry out these transformation on the provided dataset (see here). If your data are already transformmed you will need to disable this section of the script, by setting True to False.

Important

All features must be measured within 48 hours of index date (hospital admission or symptom onset). Where multiple measures are available, use the first available value post-index date.
Oxygen litres (oxlt) is the O₂ supplemental flow rate measured in L/min. This should be scored as 0 for patients not on supplemental oxygen.
The variable nosoc identifies nosocomial patients (i.e. those developing COVID infection in hospital). This is used to stratify the models. It should be set to 1 for patients developing COVID infection after hospital admission; 0 for all other patients. If all patients in your validation sample have community-acquired COVID infection, set nosoc to 0 for all patients (this will skip running the relevant models).

Cohort selection

The training sample was defined as all adult inpatients testing positive for SARS-Cov2 by reverse transcription polymerase chain reaction (RT-PCR);
All patients included in the study had symptoms consistent with COVID-19 disease (e.g. cough, fever, dyspnoea, myalgia, delirium).
We excluded subjects who were seen in the emergency department but not admitted.
The training sample included patients testing positive for SARS-Cov2 between 1^st and 30^th April 2020.

Software environment

Data cleaning and training was performed in Python 3.8.2 using scikit-learn. A minimal set of packages is required (pandas, numpy, scikit-learn, statsmodels; see requirements.txt).
```
joblib==0.14.1
matplotlib==3.1.3
numpy==1.18.1
pandas==1.0.3
scikit-learn==0.23.1
scipy==1.4.1
statsmodels==0.11.1
```
For testing purposes, replicate.py will generate simulated data if a validation is not provided. These values are randomly generated and are not representative of the training dataset.

To test all models on the simulated dataset:

git clone https://github.com/ewancarr/NEWS2-COVID-19
cd NEWS2-COVID-19
pip install -r requirements.txt
python replicate.py

Missing data

Missing feature information in the training sample was imputed using KNN imputation (sklearn.impute.KNNImputer).
However, since the trained KNN models would contain a copy of the training data, which cannot be shared publically, this repository does not provide pre-trained KNN models.
Therefore, replicate.py will train the KNN imputation models on the provided validation dataset (see here). This only applies to imputation; for all other models pre-trainied models are used.

Name		Name	Last commit message	Last commit date
Latest commit History 59 Commits
images		images
talks		talks
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
cleaning.py		cleaning.py
functions.py		functions.py
pretrained.joblib		pretrained.joblib
replicate.py		replicate.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Evaluation and Improvement of the National Early Warning Score (NEWS2) for COVID-19: a multi-hospital study

Overview

About the models

How to use this repository

Measures needed to validate these models

Outcomes

Required variables

Transformations

Important

Cohort selection

Software environment

Missing data

About

Releases 2

Contributors 3

Languages

License

ewancarr/NEWS2-COVID-19

Folders and files

Latest commit

History

Repository files navigation

Evaluation and Improvement of the National Early Warning Score (NEWS2) for COVID-19: a multi-hospital study

Overview

About the models

How to use this repository

Measures needed to validate these models

Outcomes

Required variables

Transformations

Important

Cohort selection

Software environment

Missing data

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 2

Contributors 3

Languages