Dataset Prep

Source code repo for the analysis and experiments found in the paper: Paper title

Dataset Prep

./dataset_prep : Scripts and notebooks for all M3 Dataset prep. Running MedCAT the pre-trained NEr+L model and aligning to raw input text.

Extractive Model (Top-K) sentence Ranking

./extractive_approach : The extractive approach, model training, run scripts etc.

Abstractive Model Fine-Tuning

Our compute for CogStack data requires pre-built containers. I've included the required Dockerfile in the repo. to rebuild them use:

$ docker build -f Dockerfile.builder -t tsearle/summ_exp-base:latest .

This builds the base image, then use:

$ docker build . -t tsearle/summ_exp:latest

Once finished to run the container on all available GPU compute:

$ bash run_container.sh

If you just want to test the container on a CPU machine (i.e. a laptop) use:

$ bash run_container_cpu.sh

These run scripts mount two dirs, ./mimic_summ_data and ./cg_summ_data/.

Abstractive Model Fine-Tuning with Guidance Signal

You can use the pre-buil container again here. Open the guidance_experiment_cfg/<mim3 or cg>/<.json> file, and edit the ds_path to point to your huggingface

Acknowledgements

Huggingface, Meta and Nviai libraries enable this research:

MedCAT: docs, downloads and more on our clinical NER+L framework here.

Get in touch with the CogStack team here: [email protected]

Citation

@ARTICLE{Searle2022-bg, title = "Summarisation of Electronic Health Records with Clinical Concept Guidance", author = "Searle, Thomas and Ibrahim, Zina and Teo, James and Dobson, Richard", month = nov, year = 2022, archivePrefix = "arXiv", eprint = "2211.07126", primaryClass = "cs.CL", arxivid = "2211.07126" }

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
cg_summ_data		cg_summ_data
dataset_prep		dataset_prep
experiment_cfg		experiment_cfg
extractive_approach		extractive_approach
guidance_experiment_cfg		guidance_experiment_cfg
guidance_models		guidance_models
mimic_summ_data		mimic_summ_data
model-outputs		model-outputs
outputs		outputs
Dockerfile		Dockerfile
Dockerfile.builder		Dockerfile.builder
README.md		README.md
download_hf_assets.py		download_hf_assets.py
mimic_note_clean.py		mimic_note_clean.py
offline-env.env		offline-env.env
requirements.txt		requirements.txt
run_container.sh		run_container.sh
run_container_cpu.sh		run_container_cpu.sh
run_summarization-bhc-data-preprocessed-bart-guidance.py		run_summarization-bhc-data-preprocessed-bart-guidance.py
run_summarization-bhc-dataset-preprocessed.py		run_summarization-bhc-dataset-preprocessed.py
section_parser.py		section_parser.py
split.py		split.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Dataset Prep

Extractive Model (Top-K) sentence Ranking

Abstractive Model Fine-Tuning

Abstractive Model Fine-Tuning with Guidance Signal

Acknowledgements

Citation

About

Releases

Packages

Languages

tomolopolis/BHC-Summarisation

Folders and files

Latest commit

History

Repository files navigation

Dataset Prep

Extractive Model (Top-K) sentence Ranking

Abstractive Model Fine-Tuning

Abstractive Model Fine-Tuning with Guidance Signal

Acknowledgements

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages