|
| 1 | +# Dialog Generation with a Dynamic Human Preference Predictor |
| 2 | + |
| 3 | +[//]: # (* RL: reinforcement learning codes) |
| 4 | + |
| 5 | +[//]: # (* seq2seq_models: generating responses according to the chat history, the behaviour and the question mark) |
| 6 | + |
| 7 | +[//]: # (* DYME_reward: codes for the rest of the environment, wrapping dyme, metrics and calculating rewards) |
| 8 | + |
| 9 | +[//]: # (* dyme_models: models used for the reward calculation; directory has to be placed in the root of the project, can be downloaded from: https://syncandshare.lrz.de/getlink/fi7H1aJhZwK9Zn2Qh3Gss9LT/dyme_models (link valid until 08.09.2022)) |
| 10 | + |
| 11 | + |
| 12 | +## 1 File Structure |
| 13 | +The repository structure is as follows: |
| 14 | +- [data/dailydialog](https://gitlab.lrz.de/virtual-dietary-advisor/nlp-lab-ss22/dialog-generation-with-a-dynamic-human-preference-predictor/-/tree/main/data): The daily dialog dataset which is used by the user chatbot to randomly sample an utterance in the beginning of the conversation. |
| 15 | +- [dyme_reward](https://gitlab.lrz.de/virtual-dietary-advisor/nlp-lab-ss22/dialog-generation-with-a-dynamic-human-preference-predictor/-/tree/main/dyme_reward): |
| 16 | + - [config.py](https://gitlab.lrz.de/virtual-dietary-advisor/nlp-lab-ss22/dialog-generation-with-a-dynamic-human-preference-predictor/-/blob/main/dyme_reward/config.py): File for defining the directory of models. |
| 17 | + - [dyme_wrapper.py](https://gitlab.lrz.de/virtual-dietary-advisor/nlp-lab-ss22/dialog-generation-with-a-dynamic-human-preference-predictor/-/blob/main/dyme_reward/dyme_wrapper.py): Code for wrapping DYME in order to 1)predict metrics and 2)compute metrics given an utterance. |
| 18 | + - [environment.py](https://gitlab.lrz.de/virtual-dietary-advisor/nlp-lab-ss22/dialog-generation-with-a-dynamic-human-preference-predictor/-/blob/main/dyme_reward/environment.py): Class for environment of RL based on the Gym interface. |
| 19 | + - [external_metrics_api.py](https://gitlab.lrz.de/virtual-dietary-advisor/nlp-lab-ss22/dialog-generation-with-a-dynamic-human-preference-predictor/-/blob/main/dyme_reward/external_metrics_api.py): Code for computing metrics which need external models (Empathy, Infersent and Deepmoji) |
| 20 | + - [metric_helpers.py](https://gitlab.lrz.de/virtual-dietary-advisor/nlp-lab-ss22/dialog-generation-with-a-dynamic-human-preference-predictor/-/blob/main/dyme_reward/metric_helpers.py): Helper code to calculate metrics from Dyme original repository. |
| 21 | + - [metrics.py](https://gitlab.lrz.de/virtual-dietary-advisor/nlp-lab-ss22/dialog-generation-with-a-dynamic-human-preference-predictor/-/blob/main/dyme_reward/metrics.py): Definition of metric calculation. Adapted from Dyme original repository. |
| 22 | + - [rewards.py](https://gitlab.lrz.de/virtual-dietary-advisor/nlp-lab-ss22/dialog-generation-with-a-dynamic-human-preference-predictor/-/blob/main/dyme_reward/rewards.py): Definition of reward functions for the RL. |
| 23 | + - [models](https://gitlab.lrz.de/virtual-dietary-advisor/nlp-lab-ss22/dialog-generation-with-a-dynamic-human-preference-predictor/-/tree/main/dyme_reward/models): Directory with external models and configurations (from [Empathy-Mental-Health](https://github.com/behavioral-data/Empathy-Mental-Health), [InferSent](https://github.com/facebookresearch/InferSent) and [DYME](https://github.com/florianvonunold/DYME)) |
| 24 | + - [torchmoji](https://gitlab.lrz.de/virtual-dietary-advisor/nlp-lab-ss22/dialog-generation-with-a-dynamic-human-preference-predictor/-/tree/main/dyme_reward/torchmoji): Directory with code for external model - TorchMoji (from [TorchMoji](https://github.com/natashamjaques/neural_chat/tree/master/torchMoji/torchmoji)) |
| 25 | +- [seq2seq_models](https://gitlab.lrz.de/virtual-dietary-advisor/nlp-lab-ss22/dialog-generation-with-a-dynamic-human-preference-predictor/-/tree/main/seq2seq_models): |
| 26 | + - [/blenderbot-400M-distill/fine-tuning.ipynb](https://gitlab.lrz.de/virtual-dietary-advisor/nlp-lab-ss22/dialog-generation-with-a-dynamic-human-preference-predictor/-/blob/main/seq2seq_models/blenderbot-400M-distill/fine-tuning.ipynb): codes for fine-tuning Blenderbot-400M-distil. |
| 27 | + - [/blenderbot-400M-distill/example_run.ipynb](https://gitlab.lrz.de/virtual-dietary-advisor/nlp-lab-ss22/dialog-generation-with-a-dynamic-human-preference-predictor/-/blob/main/seq2seq_models/blenderbot-400M-distill/example_run.ipynb): codes for example human interaction with the fine-tuned Blenderbot-400M-distill. |
| 28 | + - [conversation.py](https://gitlab.lrz.de/virtual-dietary-advisor/nlp-lab-ss22/dialog-generation-with-a-dynamic-human-preference-predictor/-/blob/main/seq2seq_models/conversation.py): python class for conversation with the fine-tuned blenderbot, with augmented tokenizer and other specific processing. |
| 29 | +- [rl](https://gitlab.lrz.de/virtual-dietary-advisor/nlp-lab-ss22/dialog-generation-with-a-dynamic-human-preference-predictor/-/tree/main/rl): |
| 30 | + - [/RLmain.py](https://gitlab.lrz.de/virtual-dietary-advisor/nlp-lab-ss22/dialog-generation-with-a-dynamic-human-preference-predictor/-/blob/main/rl/RLmain.py): codes for training the RL agent. |
| 31 | + - [/RLinfer.py](https://gitlab.lrz.de/virtual-dietary-advisor/nlp-lab-ss22/dialog-generation-with-a-dynamic-human-preference-predictor/-/blob/main/rl/RLinfer.py): codes for multiple turns of interaction with the RL agent with fixed networks in the environment. |
| 32 | + - [/RLinfer_single.py](https://gitlab.lrz.de/virtual-dietary-advisor/nlp-lab-ss22/dialog-generation-with-a-dynamic-human-preference-predictor/-/blob/main/rl/RLinfer_single.py): codes providing an interface for using the RL agent with fixed networks for single turn of inference without the environment. |
| 33 | + - [/model.py](https://gitlab.lrz.de/virtual-dietary-advisor/nlp-lab-ss22/dialog-generation-with-a-dynamic-human-preference-predictor/-/blob/main/rl/model.py): codes of the definitions of RL models. |
| 34 | + - [/sac.py](https://gitlab.lrz.de/virtual-dietary-advisor/nlp-lab-ss22/dialog-generation-with-a-dynamic-human-preference-predictor/-/blob/main/rl/sac.py): codes of SAC algorithm. |
| 35 | + - [/rl.ipynb](https://gitlab.lrz.de/virtual-dietary-advisor/nlp-lab-ss22/dialog-generation-with-a-dynamic-human-preference-predictor/-/blob/main/rl/rl.ipynb): example codes for installing and training the RL agent and doing interaction or inference with it. |
| 36 | +- [evaluation_and_results](https://gitlab.lrz.de/virtual-dietary-advisor/nlp-lab-ss22/dialog-generation-with-a-dynamic-human-preference-predictor/-/tree/main/evaluation_and_results): |
| 37 | + - [/blenderbot_responses.ipynb](https://gitlab.lrz.de/virtual-dietary-advisor/nlp-lab-ss22/dialog-generation-with-a-dynamic-human-preference-predictor/-/blob/main/evaluation_and_results/blenderbot_responses.ipynb): codes for generating responses of the baseline for evaluation. |
| 38 | + - [/RLPA_responses.ipynb](https://gitlab.lrz.de/virtual-dietary-advisor/nlp-lab-ss22/dialog-generation-with-a-dynamic-human-preference-predictor/-/blob/main/evaluation_and_results/RLPA_responses.ipynb): codes for generating responses of RLPA for evaluation. |
| 39 | + - [/generated_responses.csv](https://gitlab.lrz.de/virtual-dietary-advisor/nlp-lab-ss22/dialog-generation-with-a-dynamic-human-preference-predictor/-/blob/main/evaluation_and_results/generated_responses.csv): generated responses in summary. |
| 40 | + - [/automatic_evaluation.ipynb](https://gitlab.lrz.de/virtual-dietary-advisor/nlp-lab-ss22/dialog-generation-with-a-dynamic-human-preference-predictor/-/blob/main/evaluation_and_results/automatic_evaluation.ipynb): codes for automatic evaluation. |
| 41 | + - [/automatic_evaluation_results.csv](https://gitlab.lrz.de/virtual-dietary-advisor/nlp-lab-ss22/dialog-generation-with-a-dynamic-human-preference-predictor/-/blob/main/evaluation_and_results/automatic_evaluation_results.csv): results of automatic evaluation |
| 42 | + - [/Human_evaluation_Krippendorff_s_Alpha_2.ipynb](https://gitlab.lrz.de/virtual-dietary-advisor/nlp-lab-ss22/dialog-generation-with-a-dynamic-human-preference-predictor/-/blob/main/evaluation_and_results/Human_evaluation_Krippendorff_s_Alpha_2.ipynb): codes for calculating Krippendorff's Alpha and adjacency matrices for inter-rater agreement in the human evaluation. |
| 43 | + - [/Human_evaluation_randomizing_samples.ipynb](https://gitlab.lrz.de/virtual-dietary-advisor/nlp-lab-ss22/dialog-generation-with-a-dynamic-human-preference-predictor/-/blob/main/evaluation_and_results/Human_evaluation_randomizing_samples.ipynb): codes for randomizing samples for bline evaluation in the human evaluation. |
| 44 | + - [/Human_evaluation_samples_to_be_evaluated_randomly_switched_2.csv](https://gitlab.lrz.de/virtual-dietary-advisor/nlp-lab-ss22/dialog-generation-with-a-dynamic-human-preference-predictor/-/blob/main/evaluation_and_results/Human_evaluation_samples_to_be_evaluated_randomly_switched_2.csv): the randomized samples used in the human evaluation. |
| 45 | +- [_report](https://gitlab.lrz.de/virtual-dietary-advisor/nlp-lab-ss22/dialog-generation-with-a-dynamic-human-preference-predictor/-/tree/main/_report): It contains final report of our project. |
| 46 | +- [_slides](https://gitlab.lrz.de/virtual-dietary-advisor/nlp-lab-ss22/dialog-generation-with-a-dynamic-human-preference-predictor/-/tree/main/_slides): It contains our milestones and final presentation slides. |
| 47 | + |
| 48 | +## 2 Installation <a name="sec2"></a> |
| 49 | +Run the following scripts to install the requirements. Attention: due to the outdated libraries used by Conversational Sentence Encoder, you may be interested in using our frozen requirements with "--no-dependencies". |
| 50 | +```commandline |
| 51 | +conda create -n dialogueGeneration python=3.7.13 |
| 52 | +conda activate dialogueGeneration |
| 53 | +pip install -r requirements.txt --no-dependencies |
| 54 | +``` |
| 55 | + |
| 56 | +Download models used for the reward calculation. Directory `dyme_models` has to be placed in the root of the project.\ |
| 57 | +The download link (valid until 08.09.2022): |
| 58 | +``` |
| 59 | +https://syncandshare.lrz.de/getlink/fi7H1aJhZwK9Zn2Qh3Gss9LT/dyme_models |
| 60 | +``` |
| 61 | + |
| 62 | +## 3 Reproduce |
| 63 | +Users can either run the codes on colab or locally. To run locally, make sure to first install dependencies as in [2 Installation](#sec2) |
| 64 | +1. fine tune Blenderbot: run [fine-tuning.ipynb](https://gitlab.lrz.de/virtual-dietary-advisor/nlp-lab-ss22/dialog-generation-with-a-dynamic-human-preference-predictor/-/blob/main/seq2seq_models/blenderbot-400M-distill/fine-tuning.ipynb) |
| 65 | +2. train policy network: |
| 66 | + ```commandline |
| 67 | + python rl/RLmain.py |
| 68 | + ``` |
| 69 | + Attention: The python file will load saved models from "./savedmodels"; if no saved model exists, it will train one from scratch. For training from a saved checkpoint, please make sure you have the capacity to load all models and data including those huge ones used in DYME. If using Colab, Pro+ is recommended. \ |
| 70 | + Defaut settings in arguments: |
| 71 | + |
| 72 | + - total-timesteps: 1,000,000 |
| 73 | + - batch-size: 256 |
| 74 | + - learning-starts: 5e3 |
| 75 | + - autosaving-per: 100 |
| 76 | + - IO: False (set to True to enable interaction with human through IO) \ |
| 77 | + |
| 78 | + All other arguments can be found in rl/RLmain.py |
| 79 | + |
| 80 | +3. automatic evaluation |
| 81 | + - generate sample responses: run [blenderbot_responses.ipynb](https://gitlab.lrz.de/virtual-dietary-advisor/nlp-lab-ss22/dialog-generation-with-a-dynamic-human-preference-predictor/-/blob/main/evaluation_and_results/blenderbot_responses.ipynb) and [RLPA_responses.ipynb](https://gitlab.lrz.de/virtual-dietary-advisor/nlp-lab-ss22/dialog-generation-with-a-dynamic-human-preference-predictor/-/blob/main/evaluation_and_results/RLPA_responses.ipynb) |
| 82 | + - run [automatic_evaluation.ipynb](https://gitlab.lrz.de/virtual-dietary-advisor/nlp-lab-ss22/dialog-generation-with-a-dynamic-human-preference-predictor/-/blob/main/evaluation_and_results/automatic_evaluation.ipynb) |
| 83 | + |
| 84 | +## 4 Usage |
| 85 | + We have uploaded the following models and dataset on Huggingface: |
| 86 | + - [fine-tuned Blenderbot-400M-distil](https://huggingface.co/Adapting/dialogue_agent_nlplab2022) |
| 87 | + - [sentiment classifier for augmenting EmpatheticDialogues dataset](https://huggingface.co/Adapting/comfort_congratulations_neutral-classifier) |
| 88 | + - [augmented EmpatheticDialogues dataset](https://huggingface.co/datasets/Adapting/empathetic_dialogues_with_special_tokens) |
| 89 | + |
| 90 | + For Inference/human interaction with RLPA-ODDS:\ |
| 91 | + **Attention**: Make sure the trained model is in `./savedmodels` |
| 92 | + - Multiple turns of interaction with the fixed network in the environment: |
| 93 | + ```commandline |
| 94 | + python rl/RLinfer.py |
| 95 | + ``` |
| 96 | + - Inference API without the environment: |
| 97 | + ```commandline |
| 98 | + python rl/RLinfer_single.py |
| 99 | + ``` |
| 100 | + |
0 commit comments