Please refer to the short paper we wrote regarding this work: Paper
This project is a final project of NLP course.
We took all the annotated data(train1, train2, test) combined it and randomly split to train and test.
** NOTE: The data for this project is not publicly available, for more details see: https://www.i2b2.org/NLP/Obesity/ **
preprocessing the data - run preprocessing.py
train and evaluate natural clinical notes classification - run classifier_train.py
train T5 model and generate new synthetic clinical notes - run generate_synthetic_clinical_notes.py
train and evaluate natural and synthetic clinical notes classification - run combined_calassifier_train.py