Skip to content

Commit e213c20

Browse files
authoredDec 28, 2022
Add files via upload
1 parent 5cbde12 commit e213c20

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

72 files changed

+124389
-0
lines changed
 

‎README.md

+100
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,100 @@
1+
# Dialog Generation with a Dynamic Human Preference Predictor
2+
3+
[//]: # (* RL: reinforcement learning codes)
4+
5+
[//]: # (* seq2seq_models: generating responses according to the chat history, the behaviour and the question mark)
6+
7+
[//]: # (* DYME_reward: codes for the rest of the environment, wrapping dyme, metrics and calculating rewards)
8+
9+
[//]: # (* dyme_models: models used for the reward calculation; directory has to be placed in the root of the project, can be downloaded from: https://syncandshare.lrz.de/getlink/fi7H1aJhZwK9Zn2Qh3Gss9LT/dyme_models (link valid until 08.09.2022))
10+
11+
12+
## 1 File Structure
13+
The repository structure is as follows:
14+
- [data/dailydialog](https://gitlab.lrz.de/virtual-dietary-advisor/nlp-lab-ss22/dialog-generation-with-a-dynamic-human-preference-predictor/-/tree/main/data): The daily dialog dataset which is used by the user chatbot to randomly sample an utterance in the beginning of the conversation.
15+
- [dyme_reward](https://gitlab.lrz.de/virtual-dietary-advisor/nlp-lab-ss22/dialog-generation-with-a-dynamic-human-preference-predictor/-/tree/main/dyme_reward):
16+
- [config.py](https://gitlab.lrz.de/virtual-dietary-advisor/nlp-lab-ss22/dialog-generation-with-a-dynamic-human-preference-predictor/-/blob/main/dyme_reward/config.py): File for defining the directory of models.
17+
- [dyme_wrapper.py](https://gitlab.lrz.de/virtual-dietary-advisor/nlp-lab-ss22/dialog-generation-with-a-dynamic-human-preference-predictor/-/blob/main/dyme_reward/dyme_wrapper.py): Code for wrapping DYME in order to 1)predict metrics and 2)compute metrics given an utterance.
18+
- [environment.py](https://gitlab.lrz.de/virtual-dietary-advisor/nlp-lab-ss22/dialog-generation-with-a-dynamic-human-preference-predictor/-/blob/main/dyme_reward/environment.py): Class for environment of RL based on the Gym interface.
19+
- [external_metrics_api.py](https://gitlab.lrz.de/virtual-dietary-advisor/nlp-lab-ss22/dialog-generation-with-a-dynamic-human-preference-predictor/-/blob/main/dyme_reward/external_metrics_api.py): Code for computing metrics which need external models (Empathy, Infersent and Deepmoji)
20+
- [metric_helpers.py](https://gitlab.lrz.de/virtual-dietary-advisor/nlp-lab-ss22/dialog-generation-with-a-dynamic-human-preference-predictor/-/blob/main/dyme_reward/metric_helpers.py): Helper code to calculate metrics from Dyme original repository.
21+
- [metrics.py](https://gitlab.lrz.de/virtual-dietary-advisor/nlp-lab-ss22/dialog-generation-with-a-dynamic-human-preference-predictor/-/blob/main/dyme_reward/metrics.py): Definition of metric calculation. Adapted from Dyme original repository.
22+
- [rewards.py](https://gitlab.lrz.de/virtual-dietary-advisor/nlp-lab-ss22/dialog-generation-with-a-dynamic-human-preference-predictor/-/blob/main/dyme_reward/rewards.py): Definition of reward functions for the RL.
23+
- [models](https://gitlab.lrz.de/virtual-dietary-advisor/nlp-lab-ss22/dialog-generation-with-a-dynamic-human-preference-predictor/-/tree/main/dyme_reward/models): Directory with external models and configurations (from [Empathy-Mental-Health](https://github.com/behavioral-data/Empathy-Mental-Health), [InferSent](https://github.com/facebookresearch/InferSent) and [DYME](https://github.com/florianvonunold/DYME))
24+
- [torchmoji](https://gitlab.lrz.de/virtual-dietary-advisor/nlp-lab-ss22/dialog-generation-with-a-dynamic-human-preference-predictor/-/tree/main/dyme_reward/torchmoji): Directory with code for external model - TorchMoji (from [TorchMoji](https://github.com/natashamjaques/neural_chat/tree/master/torchMoji/torchmoji))
25+
- [seq2seq_models](https://gitlab.lrz.de/virtual-dietary-advisor/nlp-lab-ss22/dialog-generation-with-a-dynamic-human-preference-predictor/-/tree/main/seq2seq_models):
26+
- [/blenderbot-400M-distill/fine-tuning.ipynb](https://gitlab.lrz.de/virtual-dietary-advisor/nlp-lab-ss22/dialog-generation-with-a-dynamic-human-preference-predictor/-/blob/main/seq2seq_models/blenderbot-400M-distill/fine-tuning.ipynb): codes for fine-tuning Blenderbot-400M-distil.
27+
- [/blenderbot-400M-distill/example_run.ipynb](https://gitlab.lrz.de/virtual-dietary-advisor/nlp-lab-ss22/dialog-generation-with-a-dynamic-human-preference-predictor/-/blob/main/seq2seq_models/blenderbot-400M-distill/example_run.ipynb): codes for example human interaction with the fine-tuned Blenderbot-400M-distill.
28+
- [conversation.py](https://gitlab.lrz.de/virtual-dietary-advisor/nlp-lab-ss22/dialog-generation-with-a-dynamic-human-preference-predictor/-/blob/main/seq2seq_models/conversation.py): python class for conversation with the fine-tuned blenderbot, with augmented tokenizer and other specific processing.
29+
- [rl](https://gitlab.lrz.de/virtual-dietary-advisor/nlp-lab-ss22/dialog-generation-with-a-dynamic-human-preference-predictor/-/tree/main/rl):
30+
- [/RLmain.py](https://gitlab.lrz.de/virtual-dietary-advisor/nlp-lab-ss22/dialog-generation-with-a-dynamic-human-preference-predictor/-/blob/main/rl/RLmain.py): codes for training the RL agent.
31+
- [/RLinfer.py](https://gitlab.lrz.de/virtual-dietary-advisor/nlp-lab-ss22/dialog-generation-with-a-dynamic-human-preference-predictor/-/blob/main/rl/RLinfer.py): codes for multiple turns of interaction with the RL agent with fixed networks in the environment.
32+
- [/RLinfer_single.py](https://gitlab.lrz.de/virtual-dietary-advisor/nlp-lab-ss22/dialog-generation-with-a-dynamic-human-preference-predictor/-/blob/main/rl/RLinfer_single.py): codes providing an interface for using the RL agent with fixed networks for single turn of inference without the environment.
33+
- [/model.py](https://gitlab.lrz.de/virtual-dietary-advisor/nlp-lab-ss22/dialog-generation-with-a-dynamic-human-preference-predictor/-/blob/main/rl/model.py): codes of the definitions of RL models.
34+
- [/sac.py](https://gitlab.lrz.de/virtual-dietary-advisor/nlp-lab-ss22/dialog-generation-with-a-dynamic-human-preference-predictor/-/blob/main/rl/sac.py): codes of SAC algorithm.
35+
- [/rl.ipynb](https://gitlab.lrz.de/virtual-dietary-advisor/nlp-lab-ss22/dialog-generation-with-a-dynamic-human-preference-predictor/-/blob/main/rl/rl.ipynb): example codes for installing and training the RL agent and doing interaction or inference with it.
36+
- [evaluation_and_results](https://gitlab.lrz.de/virtual-dietary-advisor/nlp-lab-ss22/dialog-generation-with-a-dynamic-human-preference-predictor/-/tree/main/evaluation_and_results):
37+
- [/blenderbot_responses.ipynb](https://gitlab.lrz.de/virtual-dietary-advisor/nlp-lab-ss22/dialog-generation-with-a-dynamic-human-preference-predictor/-/blob/main/evaluation_and_results/blenderbot_responses.ipynb): codes for generating responses of the baseline for evaluation.
38+
- [/RLPA_responses.ipynb](https://gitlab.lrz.de/virtual-dietary-advisor/nlp-lab-ss22/dialog-generation-with-a-dynamic-human-preference-predictor/-/blob/main/evaluation_and_results/RLPA_responses.ipynb): codes for generating responses of RLPA for evaluation.
39+
- [/generated_responses.csv](https://gitlab.lrz.de/virtual-dietary-advisor/nlp-lab-ss22/dialog-generation-with-a-dynamic-human-preference-predictor/-/blob/main/evaluation_and_results/generated_responses.csv): generated responses in summary.
40+
- [/automatic_evaluation.ipynb](https://gitlab.lrz.de/virtual-dietary-advisor/nlp-lab-ss22/dialog-generation-with-a-dynamic-human-preference-predictor/-/blob/main/evaluation_and_results/automatic_evaluation.ipynb): codes for automatic evaluation.
41+
- [/automatic_evaluation_results.csv](https://gitlab.lrz.de/virtual-dietary-advisor/nlp-lab-ss22/dialog-generation-with-a-dynamic-human-preference-predictor/-/blob/main/evaluation_and_results/automatic_evaluation_results.csv): results of automatic evaluation
42+
- [/Human_evaluation_Krippendorff_s_Alpha_2.ipynb](https://gitlab.lrz.de/virtual-dietary-advisor/nlp-lab-ss22/dialog-generation-with-a-dynamic-human-preference-predictor/-/blob/main/evaluation_and_results/Human_evaluation_Krippendorff_s_Alpha_2.ipynb): codes for calculating Krippendorff's Alpha and adjacency matrices for inter-rater agreement in the human evaluation.
43+
- [/Human_evaluation_randomizing_samples.ipynb](https://gitlab.lrz.de/virtual-dietary-advisor/nlp-lab-ss22/dialog-generation-with-a-dynamic-human-preference-predictor/-/blob/main/evaluation_and_results/Human_evaluation_randomizing_samples.ipynb): codes for randomizing samples for bline evaluation in the human evaluation.
44+
- [/Human_evaluation_samples_to_be_evaluated_randomly_switched_2.csv](https://gitlab.lrz.de/virtual-dietary-advisor/nlp-lab-ss22/dialog-generation-with-a-dynamic-human-preference-predictor/-/blob/main/evaluation_and_results/Human_evaluation_samples_to_be_evaluated_randomly_switched_2.csv): the randomized samples used in the human evaluation.
45+
- [_report](https://gitlab.lrz.de/virtual-dietary-advisor/nlp-lab-ss22/dialog-generation-with-a-dynamic-human-preference-predictor/-/tree/main/_report): It contains final report of our project.
46+
- [_slides](https://gitlab.lrz.de/virtual-dietary-advisor/nlp-lab-ss22/dialog-generation-with-a-dynamic-human-preference-predictor/-/tree/main/_slides): It contains our milestones and final presentation slides.
47+
48+
## 2 Installation <a name="sec2"></a>
49+
Run the following scripts to install the requirements. Attention: due to the outdated libraries used by Conversational Sentence Encoder, you may be interested in using our frozen requirements with "--no-dependencies".
50+
```commandline
51+
conda create -n dialogueGeneration python=3.7.13
52+
conda activate dialogueGeneration
53+
pip install -r requirements.txt --no-dependencies
54+
```
55+
56+
Download models used for the reward calculation. Directory `dyme_models` has to be placed in the root of the project.\
57+
The download link (valid until 08.09.2022):
58+
```
59+
https://syncandshare.lrz.de/getlink/fi7H1aJhZwK9Zn2Qh3Gss9LT/dyme_models
60+
```
61+
62+
## 3 Reproduce
63+
Users can either run the codes on colab or locally. To run locally, make sure to first install dependencies as in [2 Installation](#sec2)
64+
1. fine tune Blenderbot: run [fine-tuning.ipynb](https://gitlab.lrz.de/virtual-dietary-advisor/nlp-lab-ss22/dialog-generation-with-a-dynamic-human-preference-predictor/-/blob/main/seq2seq_models/blenderbot-400M-distill/fine-tuning.ipynb)
65+
2. train policy network:
66+
```commandline
67+
python rl/RLmain.py
68+
```
69+
Attention: The python file will load saved models from "./savedmodels"; if no saved model exists, it will train one from scratch. For training from a saved checkpoint, please make sure you have the capacity to load all models and data including those huge ones used in DYME. If using Colab, Pro+ is recommended. \
70+
Defaut settings in arguments:
71+
72+
- total-timesteps: 1,000,000
73+
- batch-size: 256
74+
- learning-starts: 5e3
75+
- autosaving-per: 100
76+
- IO: False (set to True to enable interaction with human through IO) \
77+
78+
All other arguments can be found in rl/RLmain.py
79+
80+
3. automatic evaluation
81+
- generate sample responses: run [blenderbot_responses.ipynb](https://gitlab.lrz.de/virtual-dietary-advisor/nlp-lab-ss22/dialog-generation-with-a-dynamic-human-preference-predictor/-/blob/main/evaluation_and_results/blenderbot_responses.ipynb) and [RLPA_responses.ipynb](https://gitlab.lrz.de/virtual-dietary-advisor/nlp-lab-ss22/dialog-generation-with-a-dynamic-human-preference-predictor/-/blob/main/evaluation_and_results/RLPA_responses.ipynb)
82+
- run [automatic_evaluation.ipynb](https://gitlab.lrz.de/virtual-dietary-advisor/nlp-lab-ss22/dialog-generation-with-a-dynamic-human-preference-predictor/-/blob/main/evaluation_and_results/automatic_evaluation.ipynb)
83+
84+
## 4 Usage
85+
We have uploaded the following models and dataset on Huggingface:
86+
- [fine-tuned Blenderbot-400M-distil](https://huggingface.co/Adapting/dialogue_agent_nlplab2022)
87+
- [sentiment classifier for augmenting EmpatheticDialogues dataset](https://huggingface.co/Adapting/comfort_congratulations_neutral-classifier)
88+
- [augmented EmpatheticDialogues dataset](https://huggingface.co/datasets/Adapting/empathetic_dialogues_with_special_tokens)
89+
90+
For Inference/human interaction with RLPA-ODDS:\
91+
**Attention**: Make sure the trained model is in `./savedmodels`
92+
- Multiple turns of interaction with the fixed network in the environment:
93+
```commandline
94+
python rl/RLinfer.py
95+
```
96+
- Inference API without the environment:
97+
```commandline
98+
python rl/RLinfer_single.py
99+
```
100+

0 commit comments

Comments
 (0)