TUM-NLPLab-2022 · Dec 28, 2022
diff --git a/‎README.md
+100 b/‎README.md
+100
@@ -0,0 +1,100 @@
+# Dialog Generation with a Dynamic Human Preference Predictor
+
+[//]: # (* RL: reinforcement learning codes)
+
+[//]: # (* seq2seq_models: generating responses according to the chat history, the behaviour and the question mark)
+
+[//]: # (* DYME_reward: codes for the rest of the environment, wrapping dyme, metrics and calculating rewards)
+
+[//]: # (* dyme_models: models used for the reward calculation; directory has to be placed in the root of the project, can be downloaded from: https://syncandshare.lrz.de/getlink/fi7H1aJhZwK9Zn2Qh3Gss9LT/dyme_models &#40;link valid until 08.09.2022&#41;)
+
+
+## 1 File Structure
+The repository structure is as follows:
+- [data/dailydialog](https://gitlab.lrz.de/virtual-dietary-advisor/nlp-lab-ss22/dialog-generation-with-a-dynamic-human-preference-predictor/-/tree/main/data): The daily dialog dataset which is used by the user chatbot to randomly sample an utterance in the beginning of the conversation.
+- [dyme_reward](https://gitlab.lrz.de/virtual-dietary-advisor/nlp-lab-ss22/dialog-generation-with-a-dynamic-human-preference-predictor/-/tree/main/dyme_reward):  
+  - [config.py](https://gitlab.lrz.de/virtual-dietary-advisor/nlp-lab-ss22/dialog-generation-with-a-dynamic-human-preference-predictor/-/blob/main/dyme_reward/config.py): File for defining the directory of models.
+  - [dyme_wrapper.py](https://gitlab.lrz.de/virtual-dietary-advisor/nlp-lab-ss22/dialog-generation-with-a-dynamic-human-preference-predictor/-/blob/main/dyme_reward/dyme_wrapper.py): Code for wrapping DYME in order to 1)predict metrics and 2)compute metrics given an utterance.
+  - [environment.py](https://gitlab.lrz.de/virtual-dietary-advisor/nlp-lab-ss22/dialog-generation-with-a-dynamic-human-preference-predictor/-/blob/main/dyme_reward/environment.py): Class for environment of RL based on the Gym interface.
+  - [external_metrics_api.py](https://gitlab.lrz.de/virtual-dietary-advisor/nlp-lab-ss22/dialog-generation-with-a-dynamic-human-preference-predictor/-/blob/main/dyme_reward/external_metrics_api.py): Code for computing metrics which need external models (Empathy, Infersent and Deepmoji)
+  - [metric_helpers.py](https://gitlab.lrz.de/virtual-dietary-advisor/nlp-lab-ss22/dialog-generation-with-a-dynamic-human-preference-predictor/-/blob/main/dyme_reward/metric_helpers.py): Helper code to calculate metrics from Dyme original repository.
+  - [metrics.py](https://gitlab.lrz.de/virtual-dietary-advisor/nlp-lab-ss22/dialog-generation-with-a-dynamic-human-preference-predictor/-/blob/main/dyme_reward/metrics.py): Definition of metric calculation. Adapted from Dyme original repository.
+  - [rewards.py](https://gitlab.lrz.de/virtual-dietary-advisor/nlp-lab-ss22/dialog-generation-with-a-dynamic-human-preference-predictor/-/blob/main/dyme_reward/rewards.py): Definition of reward functions for the RL. 
+  - [models](https://gitlab.lrz.de/virtual-dietary-advisor/nlp-lab-ss22/dialog-generation-with-a-dynamic-human-preference-predictor/-/tree/main/dyme_reward/models): Directory with external models and configurations (from [Empathy-Mental-Health](https://github.com/behavioral-data/Empathy-Mental-Health), [InferSent](https://github.com/facebookresearch/InferSent) and [DYME](https://github.com/florianvonunold/DYME))
+  - [torchmoji](https://gitlab.lrz.de/virtual-dietary-advisor/nlp-lab-ss22/dialog-generation-with-a-dynamic-human-preference-predictor/-/tree/main/dyme_reward/torchmoji): Directory with code for external model - TorchMoji (from [TorchMoji](https://github.com/natashamjaques/neural_chat/tree/master/torchMoji/torchmoji))
+- [seq2seq_models](https://gitlab.lrz.de/virtual-dietary-advisor/nlp-lab-ss22/dialog-generation-with-a-dynamic-human-preference-predictor/-/tree/main/seq2seq_models): 
+  - [/blenderbot-400M-distill/fine-tuning.ipynb](https://gitlab.lrz.de/virtual-dietary-advisor/nlp-lab-ss22/dialog-generation-with-a-dynamic-human-preference-predictor/-/blob/main/seq2seq_models/blenderbot-400M-distill/fine-tuning.ipynb): codes for fine-tuning Blenderbot-400M-distil.
+  - [/blenderbot-400M-distill/example_run.ipynb](https://gitlab.lrz.de/virtual-dietary-advisor/nlp-lab-ss22/dialog-generation-with-a-dynamic-human-preference-predictor/-/blob/main/seq2seq_models/blenderbot-400M-distill/example_run.ipynb): codes for example human interaction with the fine-tuned Blenderbot-400M-distill.
+  - [conversation.py](https://gitlab.lrz.de/virtual-dietary-advisor/nlp-lab-ss22/dialog-generation-with-a-dynamic-human-preference-predictor/-/blob/main/seq2seq_models/conversation.py): python class for conversation with the fine-tuned blenderbot, with augmented tokenizer and other specific processing.
+- [rl](https://gitlab.lrz.de/virtual-dietary-advisor/nlp-lab-ss22/dialog-generation-with-a-dynamic-human-preference-predictor/-/tree/main/rl):
+  - [/RLmain.py](https://gitlab.lrz.de/virtual-dietary-advisor/nlp-lab-ss22/dialog-generation-with-a-dynamic-human-preference-predictor/-/blob/main/rl/RLmain.py): codes for training the RL agent.
+  - [/RLinfer.py](https://gitlab.lrz.de/virtual-dietary-advisor/nlp-lab-ss22/dialog-generation-with-a-dynamic-human-preference-predictor/-/blob/main/rl/RLinfer.py): codes for multiple turns of interaction with the RL agent with fixed networks in the environment.
+  - [/RLinfer_single.py](https://gitlab.lrz.de/virtual-dietary-advisor/nlp-lab-ss22/dialog-generation-with-a-dynamic-human-preference-predictor/-/blob/main/rl/RLinfer_single.py): codes providing an interface for using the RL agent with fixed networks for single turn of inference without the environment.
+  - [/model.py](https://gitlab.lrz.de/virtual-dietary-advisor/nlp-lab-ss22/dialog-generation-with-a-dynamic-human-preference-predictor/-/blob/main/rl/model.py): codes of the definitions of RL models.
+  - [/sac.py](https://gitlab.lrz.de/virtual-dietary-advisor/nlp-lab-ss22/dialog-generation-with-a-dynamic-human-preference-predictor/-/blob/main/rl/sac.py): codes of SAC algorithm.
+  - [/rl.ipynb](https://gitlab.lrz.de/virtual-dietary-advisor/nlp-lab-ss22/dialog-generation-with-a-dynamic-human-preference-predictor/-/blob/main/rl/rl.ipynb): example codes for installing and training the RL agent and doing interaction or inference with it.
+- [evaluation_and_results](https://gitlab.lrz.de/virtual-dietary-advisor/nlp-lab-ss22/dialog-generation-with-a-dynamic-human-preference-predictor/-/tree/main/evaluation_and_results):
+  - [/blenderbot_responses.ipynb](https://gitlab.lrz.de/virtual-dietary-advisor/nlp-lab-ss22/dialog-generation-with-a-dynamic-human-preference-predictor/-/blob/main/evaluation_and_results/blenderbot_responses.ipynb): codes for generating responses of the baseline for evaluation.
+  - [/RLPA_responses.ipynb](https://gitlab.lrz.de/virtual-dietary-advisor/nlp-lab-ss22/dialog-generation-with-a-dynamic-human-preference-predictor/-/blob/main/evaluation_and_results/RLPA_responses.ipynb): codes for generating responses of RLPA for evaluation.
+  - [/generated_responses.csv](https://gitlab.lrz.de/virtual-dietary-advisor/nlp-lab-ss22/dialog-generation-with-a-dynamic-human-preference-predictor/-/blob/main/evaluation_and_results/generated_responses.csv): generated responses in summary.
+  - [/automatic_evaluation.ipynb](https://gitlab.lrz.de/virtual-dietary-advisor/nlp-lab-ss22/dialog-generation-with-a-dynamic-human-preference-predictor/-/blob/main/evaluation_and_results/automatic_evaluation.ipynb): codes for automatic evaluation.
+  - [/automatic_evaluation_results.csv](https://gitlab.lrz.de/virtual-dietary-advisor/nlp-lab-ss22/dialog-generation-with-a-dynamic-human-preference-predictor/-/blob/main/evaluation_and_results/automatic_evaluation_results.csv): results of automatic evaluation
+  - [/Human_evaluation_Krippendorff_s_Alpha_2.ipynb](https://gitlab.lrz.de/virtual-dietary-advisor/nlp-lab-ss22/dialog-generation-with-a-dynamic-human-preference-predictor/-/blob/main/evaluation_and_results/Human_evaluation_Krippendorff_s_Alpha_2.ipynb): codes for calculating Krippendorff's Alpha and adjacency matrices for inter-rater agreement in the human evaluation.
+  - [/Human_evaluation_randomizing_samples.ipynb](https://gitlab.lrz.de/virtual-dietary-advisor/nlp-lab-ss22/dialog-generation-with-a-dynamic-human-preference-predictor/-/blob/main/evaluation_and_results/Human_evaluation_randomizing_samples.ipynb): codes for randomizing samples for bline evaluation in the human evaluation.
+  - [/Human_evaluation_samples_to_be_evaluated_randomly_switched_2.csv](https://gitlab.lrz.de/virtual-dietary-advisor/nlp-lab-ss22/dialog-generation-with-a-dynamic-human-preference-predictor/-/blob/main/evaluation_and_results/Human_evaluation_samples_to_be_evaluated_randomly_switched_2.csv): the randomized samples used in the human evaluation.
+- [_report](https://gitlab.lrz.de/virtual-dietary-advisor/nlp-lab-ss22/dialog-generation-with-a-dynamic-human-preference-predictor/-/tree/main/_report): It contains final report of our project.
+- [_slides](https://gitlab.lrz.de/virtual-dietary-advisor/nlp-lab-ss22/dialog-generation-with-a-dynamic-human-preference-predictor/-/tree/main/_slides): It contains our milestones and final presentation slides.
+
+## 2 Installation <a name="sec2"></a>
+Run the following scripts to install the requirements. Attention: due to the outdated libraries used by Conversational Sentence Encoder, you may be interested in using our frozen requirements with "--no-dependencies". 
+```commandline
+conda create -n dialogueGeneration python=3.7.13
+conda activate dialogueGeneration
+pip install -r requirements.txt --no-dependencies
+```
+
+Download models used for the reward calculation. Directory `dyme_models` has to be placed in the root of the project.\
+The download link (valid until 08.09.2022): 
+```
+https://syncandshare.lrz.de/getlink/fi7H1aJhZwK9Zn2Qh3Gss9LT/dyme_models
+```
+
+## 3 Reproduce
+Users can either run the codes on colab or locally. To run locally, make sure to first install dependencies as in [2 Installation](#sec2)
+1. fine tune Blenderbot: run [fine-tuning.ipynb](https://gitlab.lrz.de/virtual-dietary-advisor/nlp-lab-ss22/dialog-generation-with-a-dynamic-human-preference-predictor/-/blob/main/seq2seq_models/blenderbot-400M-distill/fine-tuning.ipynb)
+2. train policy network:
+   ```commandline
+    python rl/RLmain.py
+   ```
+    Attention: The python file will load saved models from "./savedmodels"; if no saved model exists, it will train one from scratch. For training from a saved checkpoint, please make sure you have the capacity to load all models and data including those huge ones used in DYME. If using Colab, Pro+ is recommended. \
+    Defaut settings in arguments: 
+    
+    - total-timesteps: 1,000,000
+    - batch-size: 256
+    - learning-starts: 5e3
+    - autosaving-per: 100
+    - IO: False (set to True to enable interaction with human through IO) \
+
+    All other arguments can be found in rl/RLmain.py
+
+3. automatic evaluation
+   - generate sample responses: run [blenderbot_responses.ipynb](https://gitlab.lrz.de/virtual-dietary-advisor/nlp-lab-ss22/dialog-generation-with-a-dynamic-human-preference-predictor/-/blob/main/evaluation_and_results/blenderbot_responses.ipynb) and [RLPA_responses.ipynb](https://gitlab.lrz.de/virtual-dietary-advisor/nlp-lab-ss22/dialog-generation-with-a-dynamic-human-preference-predictor/-/blob/main/evaluation_and_results/RLPA_responses.ipynb)
+   - run  [automatic_evaluation.ipynb](https://gitlab.lrz.de/virtual-dietary-advisor/nlp-lab-ss22/dialog-generation-with-a-dynamic-human-preference-predictor/-/blob/main/evaluation_and_results/automatic_evaluation.ipynb)
+
+## 4 Usage
+  We have uploaded the following models and dataset on Huggingface:
+  - [fine-tuned Blenderbot-400M-distil](https://huggingface.co/Adapting/dialogue_agent_nlplab2022) 
+  - [sentiment classifier for augmenting EmpatheticDialogues dataset](https://huggingface.co/Adapting/comfort_congratulations_neutral-classifier)
+  - [augmented EmpatheticDialogues dataset](https://huggingface.co/datasets/Adapting/empathetic_dialogues_with_special_tokens)
+  
+  For Inference/human interaction with RLPA-ODDS:\
+  **Attention**: Make sure the trained model is in `./savedmodels`
+  - Multiple turns of interaction with the fixed network in the environment:
+     ```commandline
+     python rl/RLinfer.py
+     ```
+  - Inference API without the environment:
+    ```commandline
+    python rl/RLinfer_single.py
+    ```
+