diff --git a/notebooks/en/_toctree.yml b/notebooks/en/_toctree.yml
index 16054153..84125fd3 100644
--- a/notebooks/en/_toctree.yml
+++ b/notebooks/en/_toctree.yml
@@ -66,4 +66,6 @@
 - title: Enterprise Hub Cookbook
   sections:
   - local: enterprise_cookbook_overview
-    title: Overview
\ No newline at end of file
+    title: Overview
+  - local: enterprise_cookbook_argilla
+    title: Data annotation with Argilla Spaces
\ No newline at end of file
diff --git a/notebooks/en/enterprise_cookbook_argilla.ipynb b/notebooks/en/enterprise_cookbook_argilla.ipynb
new file mode 100644
index 00000000..fbe10886
--- /dev/null
+++ b/notebooks/en/enterprise_cookbook_argilla.ipynb
@@ -0,0 +1,1413 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "c9a872bb-d364-4939-865e-6f01b16ca1f4",
+   "metadata": {},
+   "source": [
+    "# Data Annotation with Argilla Spaces\n",
+    "This notebook illustrates the workflow for systematically evaluating LLM outputs and creating LLM training data. You can start by using this notebook for evaluating the zeroshot performance of your favourite LLM on your task without any fine-tuning. If you want to improve performance, you can then easily reuse this workflow to create training data.\n",
+    "\n",
+    "**Example use-case: code generation.** For this tutorial we demonstrate how to create high quality test & train data for *code generation tasks*. The same workflow can, however, be adapted to any other task that's relevant for your specific use-case. \n",
+    "\n",
+    "**In this notebook, we:**\n",
+    "1. Download data for the example task.\n",
+    "2. Prompt two LLMs to respond to these tasks. This results in \"synthetic data\" to speed up manual data creation. \n",
+    "3. Create an Argilla annotation interface on HF Spaces to compare and evaluate the outputs from the two LLMs.\n",
+    "4. Upload the example data and the zeroshot LLM responses into the Argilla annotation interface.\n",
+    "5. Download the annotated data.\n",
+    "\n",
+    "You can adapt this notebook to your needs, e.g. by using a different LLM and API provider for step (2) or by adapting the annotation interface in step (3)."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a482a2f5-9f0d-4117-a606-6d6bf80c4c14",
+   "metadata": {},
+   "source": [
+    "## Install required packages and connect to HF Hub"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "972076ae-2ad4-4afa-b9be-e3146ffbfe69",
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "!pip install \"argilla[server]~=1.27.0\"\n",
+    "!pip install transformers~=4.40.0\n",
+    "!pip install datasets~=2.19.0\n",
+    "!pip install huggingface_hub~=0.23.2"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "id": "dbc6293c-4f10-4cd3-b009-664929a3cbb9",
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [
+    {
+     "data": {
+      "application/vnd.jupyter.widget-view+json": {
+       "model_id": "08d5e2d3ab4644c9b2e31ca0649b43ec",
+       "version_major": 2,
+       "version_minor": 0
+      },
+      "text/plain": [
+       "VBox(children=(HTML(value='<center> <img\\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "# Login to the HF Hub. We recommend using this login method \n",
+    "# to avoid the need for explicitly storing your HF token in variables \n",
+    "import huggingface_hub\n",
+    "!git config --global credential.helper store\n",
+    "huggingface_hub.login(add_to_git_credential=True)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b0443963-9704-49f9-97e6-48b0b8d7b7cc",
+   "metadata": {},
+   "source": [
+    "## Download example task data"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d7c2f97c-4a30-40ed-8057-6df595774ad9",
+   "metadata": {},
+   "source": [
+    "We first download an example dataset that contains code generation tasks for LLMs. We want to evaluate how well two different LLMs perform on these code generation tasks. We use instructions from the [bigcode/self-oss-instruct-sc2-exec-filter-50k](https://huggingface.co/datasets/bigcode/self-oss-instruct-sc2-exec-filter-50k) dataset that was used to train the [StarCoder2-Instruct](https://huggingface.co/bigcode/starcoder2-15b-instruct-v0.1) model."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "id": "81644ec1-0bcc-44b5-b0c4-036b02bb54d7",
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Dataset structure:\n",
+      " Dataset({\n",
+      "    features: ['fingerprint', 'sha1', 'seed', 'response', 'concepts', 'prompt', 'instruction', 'id'],\n",
+      "    num_rows: 3\n",
+      "}) \n",
+      "\n",
+      "Example instructions:\n",
+      " ['Write a Python function named `get_value` that takes a matrix (represented by a list of lists) and a tuple of indices, and returns the value at that index in the matrix. The function should handle index out of range errors by returning None.', 'Write a Python function `check_collision` that takes a list of `rectangles` as input and checks if there are any collisions between any two rectangles. A rectangle is represented as a tuple (x, y, w, h) where (x, y) is the top-left corner of the rectangle, `w` is the width, and `h` is the height.\\n\\nThe function should return True if any pair of rectangles collide, and False otherwise. Use an iterative approach and check for collisions based on the bounding box collision detection algorithm. If a collision is found, return True immediately without checking for more collisions.']\n"
+     ]
+    }
+   ],
+   "source": [
+    "from datasets import load_dataset\n",
+    "\n",
+    "# taking a small sample here for faster testing\n",
+    "dataset_codetask = load_dataset(\"bigcode/self-oss-instruct-sc2-exec-filter-50k\", split=\"train[:3]\")\n",
+    "print(\"Dataset structure:\\n\", dataset_codetask, \"\\n\")\n",
+    "\n",
+    "# We are are only interested in the instructions/prompts provided in the dataset\n",
+    "instructions_lst = dataset_codetask[\"instruction\"]\n",
+    "print(\"Example instructions:\\n\", instructions_lst[:2])"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d95b4013-b506-47d0-b85f-226c88f1ed0a",
+   "metadata": {},
+   "source": [
+    "## Prompt two LLMs on the example task"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1f9f69d8-7bf7-4d23-8509-ddf595704fd3",
+   "metadata": {},
+   "source": [
+    "#### Formatting the instructions with a chat_template\n",
+    "Before sending the instructions to an LLM API, we need to format the instructions with the correct `chat_template` for each of the models we want to evaluate. This essentially entails wrapping some special tokens around the instructions. See the [docs](https://huggingface.co/docs/transformers/main/en/chat_templating) on chat templates for details."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "id": "bb1c5904-0530-42ee-9499-87ece671bed2",
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used.\n",
+      "/home/user/miniconda/lib/python3.9/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.\n",
+      "  warnings.warn(\n",
+      "Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "\n",
+      "First prompt formatted for mistralai/Mixtral-8x7B-Instruct-v0.1:\n",
+      "\n",
+      " <s>[INST] Write a Python function named `get_value` that takes a matrix (represented by a list of lists) and a tuple of indices, and returns the value at that index in the matrix. The function should handle index out of range errors by returning None. [/INST] \n",
+      "\n",
+      "\n",
+      "First prompt formatted for meta-llama/Meta-Llama-3-70B-Instruct:\n",
+      "\n",
+      " <|begin_of_text|><|start_header_id|>user<|end_header_id|>\n",
+      "\n",
+      "Write a Python function named `get_value` that takes a matrix (represented by a list of lists) and a tuple of indices, and returns the value at that index in the matrix. The function should handle index out of range errors by returning None.<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n",
+      "\n",
+      " \n",
+      "\n",
+      "\n"
+     ]
+    }
+   ],
+   "source": [
+    "# apply correct chat formatting to instructions from dataset \n",
+    "from transformers import AutoTokenizer\n",
+    "\n",
+    "models_to_compare = [\"mistralai/Mixtral-8x7B-Instruct-v0.1\", \"meta-llama/Meta-Llama-3-70B-Instruct\"]\n",
+    "\n",
+    "def format_prompt(prompt, tokenizer):\n",
+    "    messages = [{\"role\": \"user\", \"content\": prompt}]\n",
+    "    messages_tokenized = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True, return_tensors=\"pt\")\n",
+    "    return messages_tokenized\n",
+    "\n",
+    "\n",
+    "prompts_formatted_dic = {}\n",
+    "for model in models_to_compare:\n",
+    "    tokenizer = AutoTokenizer.from_pretrained(model)\n",
+    "\n",
+    "    prompt_formatted = []\n",
+    "    for instruction in instructions_lst: \n",
+    "        prompt_formatted.append(format_prompt(instruction, tokenizer))\n",
+    "        \n",
+    "    prompts_formatted_dic.update({model: prompt_formatted})\n",
+    "\n",
+    "\n",
+    "print(f\"\\nFirst prompt formatted for {models_to_compare[0]}:\\n\\n\", prompts_formatted_dic[models_to_compare[0]][0], \"\\n\\n\")\n",
+    "print(f\"First prompt formatted for {models_to_compare[1]}:\\n\\n\", prompts_formatted_dic[models_to_compare[1]][0], \"\\n\\n\")\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e161a9ae-680c-4daa-99fa-ca9d75d07bdc",
+   "metadata": {},
+   "source": [
+    "#### Sending the instructions to the HF Inference API\n",
+    "Now we can send the instructions to the APIs for both LLMs to get outputs we can evaluate. We first define some parameters for generating the responses correctly. Hugging Face's LLM APIs are powered by [Text Generation Inference (TGI)](https://huggingface.co/docs/text-generation-inference/index) containers. See the TGI OpenAPI specifications [here](https://huggingface.github.io/text-generation-inference/#/Text%20Generation%20Inference/generate) and the explanations of different parameters in the Transformers Generation Parameters [docs](https://huggingface.co/docs/transformers/v4.30.0/main_classes/text_generation#transformers.GenerationConfig). "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "id": "a9dc6397-fc06-4b94-9bef-c7138d86f0e6",
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "generation_params = dict(\n",
+    "    # we use low temperatur and top_p to reduce creativity and increase likelihood of highly probable tokens\n",
+    "    temperature=0.2,\n",
+    "    top_p=0.60,\n",
+    "    top_k=None,\n",
+    "    repetition_penalty=1.0,\n",
+    "    do_sample=True,\n",
+    "    max_new_tokens=512*2,\n",
+    "    return_full_text=False,\n",
+    "    seed=42,\n",
+    "    #details=True,\n",
+    "    #stop=[\"<|END_OF_TURN_TOKEN|>\"],\n",
+    "    #grammar={\"type\": \"json\"}\n",
+    "    max_time=None, \n",
+    "    stream=False,\n",
+    "    use_cache=False,\n",
+    "    wait_for_model=False,\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b2fee8a4-91e9-4cf4-8d5a-414ad0b17daa",
+   "metadata": {},
+   "source": [
+    "Now we can make a standard API request to the Serverless Inference API ([docs](https://huggingface.co/docs/api-inference/index)). Note that the Serverless Inference API is mostly for testing and is rate limited. For testing without rate limits, you can create your own API via the HF Dedicated Endpoints ([docs](https://huggingface.co/docs/inference-endpoints/index)). See also our corresponding tutorials in the [Open Source AI Cookbook](https://huggingface.co/learn/cookbook/index)."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1b072231",
+   "metadata": {},
+   "source": [
+    "> [!TIP]\n",
+    "> The code below will be updated once the Inference API recipe is finished."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "id": "40e03f80-16d4-41a6-9df8-4a22d7197936",
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [
+    {
+     "data": {
+      "application/vnd.jupyter.widget-view+json": {
+       "model_id": "655a7bc50f41468fb55ab507769dcd2c",
+       "version_major": 2,
+       "version_minor": 0
+      },
+      "text/plain": [
+       "  0%|          | 0/3 [00:00<?, ?it/s]"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "application/vnd.jupyter.widget-view+json": {
+       "model_id": "6368ada8f257474e979b966773dbbd99",
+       "version_major": 2,
+       "version_minor": 0
+      },
+      "text/plain": [
+       "  0%|          | 0/3 [00:00<?, ?it/s]"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "---First generation of mistralai/Mixtral-8x7B-Instruct-v0.1:\n",
+      "Here's a Python function that meets your requirements:\n",
+      "\n",
+      "```python\n",
+      "def get_value(matrix, indices):\n",
+      "    try:\n",
+      "        return matrix[indices[0]][indices[1]]\n",
+      "    except IndexError:\n",
+      "        return None\n",
+      "```\n",
+      "\n",
+      "This function takes a matrix (represented by a list of lists) and a tuple of indices as input. It first tries to access the value at the given indices in the matrix. If the indices are out of range, it catches the `IndexError` exception and returns `None`.\n",
+      "\n",
+      "\n",
+      "---First generation of meta-llama/Meta-Llama-3-70B-Instruct:\n",
+      "Here is a Python function that does what you described:\n",
+      "```\n",
+      "def get_value(matrix, indices):\n",
+      "    try:\n",
+      "        row, col = indices\n",
+      "        return matrix[row][col]\n",
+      "    except IndexError:\n",
+      "        return None\n",
+      "```\n",
+      "Here's an explanation of how the function works:\n",
+      "\n",
+      "1. The function takes two arguments: `matrix` (a list of lists) and `indices` (a tuple of two integers, representing the row and column indices).\n",
+      "2. The function tries to access the value at the specified indices using `matrix[row][col]`.\n",
+      "3. If the indices are out of range (i.e., `row` or `col` is greater than the length of the corresponding dimension of the matrix), an `IndexError` exception is raised.\n",
+      "4. The `except` block catches the `IndexError` exception and returns `None` instead of raising an error.\n",
+      "\n",
+      "Here's an example usage of the function:\n",
+      "```\n",
+      "matrix = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]\n",
+      "\n",
+      "print(get_value(matrix, (0, 0)))  # prints 1\n",
+      "print(get_value(matrix, (1, 1)))  # prints 5\n",
+      "print(get_value(matrix, (3, 0)))  # prints None (out of range)\n",
+      "print(get_value(matrix, (0, 3)))  # prints None (out of range)\n",
+      "```\n",
+      "I hope this helps! Let me know if you have any questions.\n"
+     ]
+    }
+   ],
+   "source": [
+    "import requests\n",
+    "from tqdm.auto import tqdm\n",
+    "\n",
+    "# Hint: use asynchronous API calls (and dedicated endpoints) to increase speed\n",
+    "def query(payload=None, api_url=None):\n",
+    "    response = requests.post(api_url, headers=headers, json=payload)\n",
+    "    return response.json()\n",
+    "\n",
+    "headers = {\"Authorization\": f\"Bearer {huggingface_hub.get_token()}\"}\n",
+    "\n",
+    "output_dic = {}\n",
+    "for model in models_to_compare:\n",
+    "    # create API urls for each model\n",
+    "    # When using dedicated endpoints, you can reuse the same code and simply replace this URL\n",
+    "    api_url = \"https://api-inference.huggingface.co/models/\" + model\n",
+    "    \n",
+    "    # send requests to API \n",
+    "    output_lst = []\n",
+    "    for prompt in tqdm(prompt_formatted):\n",
+    "        output = query(\n",
+    "            payload={\n",
+    "                \"inputs\": prompt,\n",
+    "                \"parameters\": {**generation_params}\n",
+    "            },\n",
+    "            api_url=api_url \n",
+    "        )\n",
+    "        output_lst.append(output[0][\"generated_text\"])\n",
+    "    \n",
+    "    output_dic.update({model: output_lst})\n",
+    "\n",
+    "# inspect first generation\n",
+    "print(f\"---First generation of {models_to_compare[0]}:\\n{output_dic[models_to_compare[0]][0]}\\n\\n\")\n",
+    "print(f\"---First generation of {models_to_compare[1]}:\\n{output_dic[models_to_compare[1]][0]}\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9c6dfdfe-d1b5-48c2-9941-d108bdad4fa9",
+   "metadata": {},
+   "source": [
+    "#### Store the LLM outputs in a dataset\n",
+    "We can now store the LLM outputs in a dataset together with the original instructions."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "id": "0c3f94d4-a3d2-49e5-acf1-13e892d848dc",
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "Dataset({\n",
+       "    features: ['instructions', 'response_model_1', 'response_model_2'],\n",
+       "    num_rows: 3\n",
+       "})"
+      ]
+     },
+     "execution_count": 7,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "# create a HF dataset with the instructions and model outputs\n",
+    "from datasets import Dataset\n",
+    "\n",
+    "dataset = Dataset.from_dict({\n",
+    "    \"instructions\": instructions_lst,\n",
+    "    \"response_model_1\": output_dic[models_to_compare[0]],\n",
+    "    \"response_model_2\": output_dic[models_to_compare[1]]\n",
+    "})\n",
+    "\n",
+    "dataset"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e1a8353e-a925-4c73-9c4c-c24d80d5048e",
+   "metadata": {},
+   "source": [
+    "## Create and configure your Argilla annotation interface\n",
+    "We use [Argilla](https://argilla.io/), an open-source data annotation tool. We run Argilla via a HF Space, which you can set up with just a few clicks without any local setup. You can create the HF Argilla Space by following the instructions for Space creation [here](https://huggingface.co/new-space?template=argilla%2Fargilla-template-space). For more details on HF Argilla Spaces, see also the detailed [docs](https://huggingface.co/docs/hub/spaces-sdks-docker-argilla). If you want, you can also run Argilla locally via Argilla's docker containers (see [Argilla docs](https://docs.argilla.io/en/latest/getting_started/quickstart_installation.html)).\n",
+    "\n",
+    "\n",
+    "**Things to consider when creating the HF Argilla Space:**\n",
+    "\n",
+    "1. **Persistent storage:** While creating the Argilla Space, it is important to enable persistent storage for your Space, to make sure that all annotations are saved. \n",
+    "\n",
+    "2. **User management:** While creating the Argilla Space, three users are created automatically: `owner`, `admin`, `annotator`. You can set their respective names, API keys and passwords for logging into the Argilla interface manually while creating the Space (see image below). \n",
+    "\n",
+    "<img src=\"https://github.com/MoritzLaurer/huggingface_materials/blob/996d4f942948335b884bd251b4d2d5ce0e162468/images_misc/argilla-space-variables.png?raw=true\" width=\"350\" alt=\"image description\">  \n",
+    "\n",
+    "If you leave these fields blank, the following default values will be used.  \n",
+    "\n",
+    "| **Item**                  | **Default Value**                                |\n",
+    "|---------------------------|------------------------------------------|\n",
+    "| OWNER_USERNAME        | `owner` |\n",
+    "| ADMIN_USERNAME        | `admin` |\n",
+    "| ANNOTATOR_USERNAME    | `annotator` |\n",
+    "| PASSWORD (all)         | `12345678`                                 |\n",
+    "| OWNER_API_KEY         | `owner.apikey`          |\n",
+    "| ADMIN_API_KEY         | `admin.apikey`           |\n",
+    "| ARGILLA_WORKSPACE    | `admin`                                  |\n",
+    "\n",
+    "\n",
+    "For more details on user management with Argilla see the [docs here](https://docs.argilla.io/en/latest/getting_started/installation/configurations/user_management.html).  \n",
+    "\n",
+    "Once you've created the Argilla Space, you should see the following login screen in your browser and you can login with the password and username specified above. \n",
+    "\n",
+    "<img src=\"https://raw.githubusercontent.com/MoritzLaurer/huggingface_materials/996d4f942948335b884bd251b4d2d5ce0e162468/images_misc/argilla-login-screen.png\" alt=\"image description\">  \n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7ad026ef-909c-4756-a301-e9883c492407",
+   "metadata": {},
+   "source": [
+    "#### Programmatically interact with Argilla\n",
+    "\n",
+    "Before we can tailor the interface to our specific task and upload data, we need to first set up a few things.\n",
+    "\n",
+    "**Connecting this notebook to Argilla:** We can now connect this notebook to Argilla to programmatically configure the interface and upload/download data. "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 8,
+   "id": "8e765940-9518-49ce-ac23-d45cada12ff2",
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "# After starting the Argilla Space (or local docker container) you can connect to the Space with the code below.\n",
+    "# Here we use Argilla as the \"owner\" user\n",
+    "import argilla as rg\n",
+    "\n",
+    "rg.init(\n",
+    "    # The `api_url` to the space follows the pattern \"https://username-spacename.hf.space\"\n",
+    "    api_url=\"https://moritzlaurer-argilla-00.hf.space\",  # Ff you run Argilla locally: \"http://localhost:6900\"\n",
+    "    api_key=\"owner.apikey\",  # \"owner.apikey\", \"admin.apikey\"\n",
+    "    # To use a private HF Argilla Space, also pass your HF token\n",
+    "    extra_headers={\"Authorization\": f\"Bearer {huggingface_hub.get_token()}\"},\n",
+    "    workspace=\"admin\"\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 9,
+   "id": "525c8790-bc0a-4089-b254-e064cc90f201",
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "User(id=df9e99aa-861c-46da-a0f5-a381bdcd078d, username=owner, role=owner, api_key=owner.apikey, first_name=Owner, last_name=None, inserted_at=2024-05-02 11:40:30.819056, updated_at=2024-05-02 11:40:30.819056)"
+      ]
+     },
+     "execution_count": 9,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "user = rg.User.me()\n",
+    "user"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5b58564d-0711-4a9c-a7e0-5bab080f5ebe",
+   "metadata": {},
+   "source": [
+    "#### Write good annotator guidelines \n",
+    "Writing good guidelines for your human annotators is just as important (and difficult) as writing good training code. Good instructions should fulfill the following criteria: \n",
+    "- **Simple and clear**: The guidelines should be simple and clear to understand for people who do not know anything about your task yet. Always ask at least one colleague to reread the guidelines to make sure that there are no ambiguities. \n",
+    "- **Reproducible and explicit**: All information for doing the annotation task should be contained in the guidelines. A common mistake is to create informal interpretations of the guidelines during conversations with selected annotators. Future annotators will not have this information and might do the task differently than intended if it is not made explicit in the guidelines.\n",
+    "- **Short and comprehensive**: The guidelines should as short as possible, while containing all necessary information. Annotators tend not to read long guidelines properly, so try to keep them as short as possible, while remaining comprehensive.\n",
+    "\n",
+    "Note that creating annotator guidelines is an iterative process. It is good practice to do a few dozen annotations yourself and refine the guidelines based on your learnings from the data before assigning the task to others. Versioning the guidelines can also help as the task evolves over time. See further tips in this [blog post](https://argilla.io/blog/annotation-guidelines-practices/)."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 10,
+   "id": "40e10eb0-f04e-4b52-be80-604f7f18615d",
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "annotator_guidelines = \"\"\"\\\n",
+    "Your task is to evaluate the responses of two LLMs to code generation tasks. \n",
+    "\n",
+    "First, you need to score each response on a scale from 0 to 7. You add points to your final score based on the following criteria:\n",
+    "- Add up to +2 points, if the code is properly commented, with inline comments and doc strings for functions.\n",
+    "- Add up to +2 points, if the code contains a good example for testing. \n",
+    "- Add up to +3 points, if the code runs and works correctly. Copy the code into an IDE and test it with at least two different inputs. Attribute one point if the code is overall correct, but has some issues. Attribute three points if the code is fully correct and robust against different scenarios. \n",
+    "Your resulting final score can be any value between 0 to 7. \n",
+    "\n",
+    "If both responses have a final score of <= 4, select one response and correct it manually in the text field. \n",
+    "The corrected response must fulfill all criteria from above. \n",
+    "\"\"\"\n",
+    "\n",
+    "rating_tooltip = \"\"\"\\\n",
+    "- Add up to +2 points, if the code is properly commented, with inline comments and doc strings for functions.\n",
+    "- Add up to +2 points, if the code contains a good example for testing. \n",
+    "- Add up to +3 points, if the code runs and works correctly. Copy the code into an IDE and test it with at least two different inputs. Attribute one point if the code works mostly correctly, but has some issues. Attribute three points if the code is fully correct and robust against different scenarios. \n",
+    "\"\"\""
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "cc2fd8b1-5025-432f-aaad-81741c97b862",
+   "metadata": {},
+   "source": [
+    "**Cumulative ratings vs. Likert scales:** Note that the guidelines above ask the annotators to do cumulative ratings by adding points for explicit criteria. An alternative approach are \"Likert scales\", where annotators are asked to rate responses on a continuous scale e.g. from 1 (very bad) to 3 (mediocre) to 5 (very good). We generally recommend cumulative ratings, because they force you and the annotators to make quality criteria explicit, while just rating a response as \"4\" (good) is ambiguous and will be interpreted differently by different annotators. "
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e7e99b32-e481-46cf-88b6-942c5e05fb2d",
+   "metadata": {},
+   "source": [
+    "#### Tailor the Argilla interface to your specific task\n",
+    "\n",
+    "We can now can create our own `code-llm` task with it's own interface tailored to the specific task.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 11,
+   "id": "d5e01421-7416-415c-89e9-a0d6834b1994",
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [
+    {
+     "data": {
+      "text/html": [
+       "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"><span style=\"color: #7fbfbf; text-decoration-color: #7fbfbf\">[05/29/24 12:51:18] </span><span style=\"color: #000080; text-decoration-color: #000080\">INFO    </span> INFO:argilla.client.feedback.dataset.local.mixins:✓ Dataset succesfully  <a href=\"file:///home/user/miniconda/lib/python3.9/site-packages/argilla/client/feedback/dataset/local/mixins.py\" target=\"_blank\"><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">mixins.py</span></a><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">:</span><a href=\"file:///home/user/miniconda/lib/python3.9/site-packages/argilla/client/feedback/dataset/local/mixins.py#271\" target=\"_blank\"><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">271</span></a>\n",
+       "<span style=\"color: #7fbfbf; text-decoration-color: #7fbfbf\">                    </span>         pushed to Argilla                                                        <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">             </span>\n",
+       "</pre>\n"
+      ],
+      "text/plain": [
+       "\u001b[2;36m[05/29/24 12:51:18]\u001b[0m\u001b[2;36m \u001b[0m\u001b[34mINFO    \u001b[0m INFO:argilla.client.feedback.dataset.local.mixins:✓ Dataset succesfully  \u001b]8;id=789408;file:///home/user/miniconda/lib/python3.9/site-packages/argilla/client/feedback/dataset/local/mixins.py\u001b\\\u001b[2mmixins.py\u001b[0m\u001b]8;;\u001b\\\u001b[2m:\u001b[0m\u001b]8;id=322126;file:///home/user/miniconda/lib/python3.9/site-packages/argilla/client/feedback/dataset/local/mixins.py#271\u001b\\\u001b[2m271\u001b[0m\u001b]8;;\u001b\\\n",
+       "\u001b[2;36m                    \u001b[0m         pushed to Argilla                                                        \u001b[2m             \u001b[0m\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/html": [
+       "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"><span style=\"color: #7fbfbf; text-decoration-color: #7fbfbf\">                    </span><span style=\"color: #000080; text-decoration-color: #000080\">INFO    </span> INFO:argilla.client.feedback.dataset.local.mixins:<span style=\"color: #800080; text-decoration-color: #800080; font-weight: bold\">RemoteFeedbackDataset</span><span style=\"font-weight: bold\">(</span> <a href=\"file:///home/user/miniconda/lib/python3.9/site-packages/argilla/client/feedback/dataset/local/mixins.py\" target=\"_blank\"><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">mixins.py</span></a><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">:</span><a href=\"file:///home/user/miniconda/lib/python3.9/site-packages/argilla/client/feedback/dataset/local/mixins.py#272\" target=\"_blank\"><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">272</span></a>\n",
+       "<span style=\"color: #7fbfbf; text-decoration-color: #7fbfbf\">                    </span>            <span style=\"color: #808000; text-decoration-color: #808000\">id</span>=<span style=\"color: #ffff00; text-decoration-color: #ffff00\">b3a9098f</span><span style=\"color: #ffff00; text-decoration-color: #ffff00\">-25a9-4b59-9fb6-739e885c5ab3</span>                               <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">             </span>\n",
+       "<span style=\"color: #7fbfbf; text-decoration-color: #7fbfbf\">                    </span>            <span style=\"color: #808000; text-decoration-color: #808000\">name</span>=<span style=\"color: #800080; text-decoration-color: #800080\">code</span>-llm                                                         <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">             </span>\n",
+       "<span style=\"color: #7fbfbf; text-decoration-color: #7fbfbf\">                    </span>            <span style=\"color: #808000; text-decoration-color: #808000\">workspace</span>=<span style=\"color: #800080; text-decoration-color: #800080; font-weight: bold\">Workspace</span><span style=\"font-weight: bold\">(</span><span style=\"color: #808000; text-decoration-color: #808000\">id</span>=<span style=\"color: #ffff00; text-decoration-color: #ffff00\">d9ec781d</span><span style=\"color: #ffff00; text-decoration-color: #ffff00\">-8505-430c-ab87-0deac2951f00</span>,          <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">             </span>\n",
+       "<span style=\"color: #7fbfbf; text-decoration-color: #7fbfbf\">                    </span>         <span style=\"color: #808000; text-decoration-color: #808000\">name</span>=<span style=\"color: #800080; text-decoration-color: #800080\">admin</span>, <span style=\"color: #808000; text-decoration-color: #808000\">inserted_at</span>=<span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">2024</span>-<span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">05</span>-<span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">02</span> <span style=\"color: #00ff00; text-decoration-color: #00ff00; font-weight: bold\">11:40:30</span>.<span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">831848</span>,                      <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">             </span>\n",
+       "<span style=\"color: #7fbfbf; text-decoration-color: #7fbfbf\">                    </span>         <span style=\"color: #808000; text-decoration-color: #808000\">updated_at</span>=<span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">2024</span>-<span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">05</span>-<span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">02</span> <span style=\"color: #00ff00; text-decoration-color: #00ff00; font-weight: bold\">11:40:30</span>.<span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">831848</span><span style=\"font-weight: bold\">)</span>                                   <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">             </span>\n",
+       "<span style=\"color: #7fbfbf; text-decoration-color: #7fbfbf\">                    </span>            <span style=\"color: #808000; text-decoration-color: #808000\">url</span>=<span style=\"color: #0000ff; text-decoration-color: #0000ff; text-decoration: underline\">https://moritzlaurer-argilla-00.hf.space/dataset/b3a9098f-25a9-4b</span> <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">             </span>\n",
+       "<span style=\"color: #7fbfbf; text-decoration-color: #7fbfbf\">                    </span>         <span style=\"color: #0000ff; text-decoration-color: #0000ff; text-decoration: underline\">59-9fb6-739e885c5ab3/annotation-mode</span>                                     <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">             </span>\n",
+       "<span style=\"color: #7fbfbf; text-decoration-color: #7fbfbf\">                    </span>            <span style=\"color: #808000; text-decoration-color: #808000\">fields</span>=<span style=\"font-weight: bold\">[</span><span style=\"color: #800080; text-decoration-color: #800080; font-weight: bold\">RemoteTextField</span><span style=\"font-weight: bold\">(</span><span style=\"color: #808000; text-decoration-color: #808000\">id</span>=<span style=\"color: #800080; text-decoration-color: #800080; font-weight: bold\">UUID</span><span style=\"font-weight: bold\">(</span><span style=\"color: #008000; text-decoration-color: #008000\">'22667685-d826-4281-82ec-fa2d8334aff2</span> <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">             </span>\n",
+       "<span style=\"color: #7fbfbf; text-decoration-color: #7fbfbf\">                    </span>         <span style=\"color: #008000; text-decoration-color: #008000\">'</span><span style=\"font-weight: bold\">)</span>, <span style=\"color: #808000; text-decoration-color: #808000\">client</span>=<span style=\"color: #800080; text-decoration-color: #800080; font-style: italic\">None</span>, <span style=\"color: #808000; text-decoration-color: #808000\">name</span>=<span style=\"color: #008000; text-decoration-color: #008000\">'instruction'</span>, <span style=\"color: #808000; text-decoration-color: #808000\">title</span>=<span style=\"color: #008000; text-decoration-color: #008000\">'Instruction:'</span>,               <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">             </span>\n",
+       "<span style=\"color: #7fbfbf; text-decoration-color: #7fbfbf\">                    </span>         <span style=\"color: #808000; text-decoration-color: #808000\">required</span>=<span style=\"color: #00ff00; text-decoration-color: #00ff00; font-style: italic\">True</span>, <span style=\"color: #808000; text-decoration-color: #808000\">type</span>=<span style=\"color: #008000; text-decoration-color: #008000\">'text'</span>, <span style=\"color: #808000; text-decoration-color: #808000\">use_markdown</span>=<span style=\"color: #00ff00; text-decoration-color: #00ff00; font-style: italic\">True</span><span style=\"font-weight: bold\">)</span>,                          <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">             </span>\n",
+       "<span style=\"color: #7fbfbf; text-decoration-color: #7fbfbf\">                    </span>         <span style=\"color: #800080; text-decoration-color: #800080; font-weight: bold\">RemoteTextField</span><span style=\"font-weight: bold\">(</span><span style=\"color: #808000; text-decoration-color: #808000\">id</span>=<span style=\"color: #800080; text-decoration-color: #800080; font-weight: bold\">UUID</span><span style=\"font-weight: bold\">(</span><span style=\"color: #008000; text-decoration-color: #008000\">'17140de6-63c8-43a9-ad04-10606c85d70b'</span><span style=\"font-weight: bold\">)</span>,         <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">             </span>\n",
+       "<span style=\"color: #7fbfbf; text-decoration-color: #7fbfbf\">                    </span>         <span style=\"color: #808000; text-decoration-color: #808000\">client</span>=<span style=\"color: #800080; text-decoration-color: #800080; font-style: italic\">None</span>, <span style=\"color: #808000; text-decoration-color: #808000\">name</span>=<span style=\"color: #008000; text-decoration-color: #008000\">'generation_1'</span>, <span style=\"color: #808000; text-decoration-color: #808000\">title</span>=<span style=\"color: #008000; text-decoration-color: #008000\">'Response model 1:'</span>,             <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">             </span>\n",
+       "<span style=\"color: #7fbfbf; text-decoration-color: #7fbfbf\">                    </span>         <span style=\"color: #808000; text-decoration-color: #808000\">required</span>=<span style=\"color: #00ff00; text-decoration-color: #00ff00; font-style: italic\">True</span>, <span style=\"color: #808000; text-decoration-color: #808000\">type</span>=<span style=\"color: #008000; text-decoration-color: #008000\">'text'</span>, <span style=\"color: #808000; text-decoration-color: #808000\">use_markdown</span>=<span style=\"color: #00ff00; text-decoration-color: #00ff00; font-style: italic\">True</span><span style=\"font-weight: bold\">)</span>,                          <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">             </span>\n",
+       "<span style=\"color: #7fbfbf; text-decoration-color: #7fbfbf\">                    </span>         <span style=\"color: #800080; text-decoration-color: #800080; font-weight: bold\">RemoteTextField</span><span style=\"font-weight: bold\">(</span><span style=\"color: #808000; text-decoration-color: #808000\">id</span>=<span style=\"color: #800080; text-decoration-color: #800080; font-weight: bold\">UUID</span><span style=\"font-weight: bold\">(</span><span style=\"color: #008000; text-decoration-color: #008000\">'f095d45a-53b3-4531-b541-29c8bbfa0156'</span><span style=\"font-weight: bold\">)</span>,         <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">             </span>\n",
+       "<span style=\"color: #7fbfbf; text-decoration-color: #7fbfbf\">                    </span>         <span style=\"color: #808000; text-decoration-color: #808000\">client</span>=<span style=\"color: #800080; text-decoration-color: #800080; font-style: italic\">None</span>, <span style=\"color: #808000; text-decoration-color: #808000\">name</span>=<span style=\"color: #008000; text-decoration-color: #008000\">'generation_2'</span>, <span style=\"color: #808000; text-decoration-color: #808000\">title</span>=<span style=\"color: #008000; text-decoration-color: #008000\">'Response model 2:'</span>,             <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">             </span>\n",
+       "<span style=\"color: #7fbfbf; text-decoration-color: #7fbfbf\">                    </span>         <span style=\"color: #808000; text-decoration-color: #808000\">required</span>=<span style=\"color: #00ff00; text-decoration-color: #00ff00; font-style: italic\">True</span>, <span style=\"color: #808000; text-decoration-color: #808000\">type</span>=<span style=\"color: #008000; text-decoration-color: #008000\">'text'</span>, <span style=\"color: #808000; text-decoration-color: #808000\">use_markdown</span>=<span style=\"color: #00ff00; text-decoration-color: #00ff00; font-style: italic\">True</span><span style=\"font-weight: bold\">)]</span>                          <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">             </span>\n",
+       "<span style=\"color: #7fbfbf; text-decoration-color: #7fbfbf\">                    </span>            <span style=\"color: #808000; text-decoration-color: #808000\">questions</span>=<span style=\"font-weight: bold\">[</span><span style=\"color: #800080; text-decoration-color: #800080; font-weight: bold\">RemoteRatingQuestion</span><span style=\"font-weight: bold\">(</span><span style=\"color: #808000; text-decoration-color: #808000\">id</span>=<span style=\"color: #800080; text-decoration-color: #800080; font-weight: bold\">UUID</span><span style=\"font-weight: bold\">(</span><span style=\"color: #008000; text-decoration-color: #008000\">'7714ab9f-5297-461b-9b9c-53de</span> <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">             </span>\n",
+       "<span style=\"color: #7fbfbf; text-decoration-color: #7fbfbf\">                    </span>         <span style=\"color: #008000; text-decoration-color: #008000\">0bc1633a'</span><span style=\"font-weight: bold\">)</span>, <span style=\"color: #808000; text-decoration-color: #808000\">client</span>=<span style=\"color: #800080; text-decoration-color: #800080; font-style: italic\">None</span>, <span style=\"color: #808000; text-decoration-color: #808000\">name</span>=<span style=\"color: #008000; text-decoration-color: #008000\">'score_response_1'</span>, <span style=\"color: #808000; text-decoration-color: #808000\">title</span>=<span style=\"color: #008000; text-decoration-color: #008000\">'Your score for </span> <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">             </span>\n",
+       "<span style=\"color: #7fbfbf; text-decoration-color: #7fbfbf\">                    </span>         <span style=\"color: #008000; text-decoration-color: #008000\">the response of model 1:'</span>, <span style=\"color: #808000; text-decoration-color: #808000\">description</span>=<span style=\"color: #008000; text-decoration-color: #008000\">'- Add up to +2 points, if the </span>   <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">             </span>\n",
+       "<span style=\"color: #7fbfbf; text-decoration-color: #7fbfbf\">                    </span>         <span style=\"color: #008000; text-decoration-color: #008000\">code is properly commented, with inline comments and doc strings for </span>    <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">             </span>\n",
+       "<span style=\"color: #7fbfbf; text-decoration-color: #7fbfbf\">                    </span>         <span style=\"color: #008000; text-decoration-color: #008000\">functions.\\n- Add up to +2 points, if the code contains a good example </span>  <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">             </span>\n",
+       "<span style=\"color: #7fbfbf; text-decoration-color: #7fbfbf\">                    </span>         <span style=\"color: #008000; text-decoration-color: #008000\">for testing. \\n- Add up to +3 points, if the code runs and works </span>        <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">             </span>\n",
+       "<span style=\"color: #7fbfbf; text-decoration-color: #7fbfbf\">                    </span>         <span style=\"color: #008000; text-decoration-color: #008000\">correctly. Copy the code into an IDE and test it with at least two </span>      <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">             </span>\n",
+       "<span style=\"color: #7fbfbf; text-decoration-color: #7fbfbf\">                    </span>         <span style=\"color: #008000; text-decoration-color: #008000\">different inputs. Attribute one point if the code works mostly </span>          <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">             </span>\n",
+       "<span style=\"color: #7fbfbf; text-decoration-color: #7fbfbf\">                    </span>         <span style=\"color: #008000; text-decoration-color: #008000\">correctly, but has some issues. Attribute three points if the code is </span>   <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">             </span>\n",
+       "<span style=\"color: #7fbfbf; text-decoration-color: #7fbfbf\">                    </span>         <span style=\"color: #008000; text-decoration-color: #008000\">fully correct and robust against different scenarios. \\n'</span>,               <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">             </span>\n",
+       "<span style=\"color: #7fbfbf; text-decoration-color: #7fbfbf\">                    </span>         <span style=\"color: #808000; text-decoration-color: #808000\">required</span>=<span style=\"color: #00ff00; text-decoration-color: #00ff00; font-style: italic\">True</span>, <span style=\"color: #808000; text-decoration-color: #808000\">type</span>=<span style=\"color: #008000; text-decoration-color: #008000\">'rating'</span>, <span style=\"color: #808000; text-decoration-color: #808000\">values</span>=<span style=\"font-weight: bold\">[</span><span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">1</span>, <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">2</span>, <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">3</span>, <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">4</span>, <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">5</span>, <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">6</span>, <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">7</span><span style=\"font-weight: bold\">])</span>,             <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">             </span>\n",
+       "<span style=\"color: #7fbfbf; text-decoration-color: #7fbfbf\">                    </span>         <span style=\"color: #800080; text-decoration-color: #800080; font-weight: bold\">RemoteRatingQuestion</span><span style=\"font-weight: bold\">(</span><span style=\"color: #808000; text-decoration-color: #808000\">id</span>=<span style=\"color: #800080; text-decoration-color: #800080; font-weight: bold\">UUID</span><span style=\"font-weight: bold\">(</span><span style=\"color: #008000; text-decoration-color: #008000\">'43db47a7-1d83-4d2d-b5e2-3992262afb97'</span><span style=\"font-weight: bold\">)</span>,    <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">             </span>\n",
+       "<span style=\"color: #7fbfbf; text-decoration-color: #7fbfbf\">                    </span>         <span style=\"color: #808000; text-decoration-color: #808000\">client</span>=<span style=\"color: #800080; text-decoration-color: #800080; font-style: italic\">None</span>, <span style=\"color: #808000; text-decoration-color: #808000\">name</span>=<span style=\"color: #008000; text-decoration-color: #008000\">'score_response_2'</span>, <span style=\"color: #808000; text-decoration-color: #808000\">title</span>=<span style=\"color: #008000; text-decoration-color: #008000\">'Your score for the response</span> <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">             </span>\n",
+       "<span style=\"color: #7fbfbf; text-decoration-color: #7fbfbf\">                    </span>         <span style=\"color: #008000; text-decoration-color: #008000\">of model 2:'</span>, <span style=\"color: #808000; text-decoration-color: #808000\">description</span>=<span style=\"color: #008000; text-decoration-color: #008000\">'- Add up to +2 points, if the code is </span>        <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">             </span>\n",
+       "<span style=\"color: #7fbfbf; text-decoration-color: #7fbfbf\">                    </span>         <span style=\"color: #008000; text-decoration-color: #008000\">properly commented, with inline comments and doc strings for </span>            <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">             </span>\n",
+       "<span style=\"color: #7fbfbf; text-decoration-color: #7fbfbf\">                    </span>         <span style=\"color: #008000; text-decoration-color: #008000\">functions.\\n- Add up to +2 points, if the code contains a good example </span>  <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">             </span>\n",
+       "<span style=\"color: #7fbfbf; text-decoration-color: #7fbfbf\">                    </span>         <span style=\"color: #008000; text-decoration-color: #008000\">for testing. \\n- Add up to +3 points, if the code runs and works </span>        <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">             </span>\n",
+       "<span style=\"color: #7fbfbf; text-decoration-color: #7fbfbf\">                    </span>         <span style=\"color: #008000; text-decoration-color: #008000\">correctly. Copy the code into an IDE and test it with at least two </span>      <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">             </span>\n",
+       "<span style=\"color: #7fbfbf; text-decoration-color: #7fbfbf\">                    </span>         <span style=\"color: #008000; text-decoration-color: #008000\">different inputs. Attribute one point if the code works mostly </span>          <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">             </span>\n",
+       "<span style=\"color: #7fbfbf; text-decoration-color: #7fbfbf\">                    </span>         <span style=\"color: #008000; text-decoration-color: #008000\">correctly, but has some issues. Attribute three points if the code is </span>   <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">             </span>\n",
+       "<span style=\"color: #7fbfbf; text-decoration-color: #7fbfbf\">                    </span>         <span style=\"color: #008000; text-decoration-color: #008000\">fully correct and robust against different scenarios. \\n'</span>,               <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">             </span>\n",
+       "<span style=\"color: #7fbfbf; text-decoration-color: #7fbfbf\">                    </span>         <span style=\"color: #808000; text-decoration-color: #808000\">required</span>=<span style=\"color: #00ff00; text-decoration-color: #00ff00; font-style: italic\">True</span>, <span style=\"color: #808000; text-decoration-color: #808000\">type</span>=<span style=\"color: #008000; text-decoration-color: #008000\">'rating'</span>, <span style=\"color: #808000; text-decoration-color: #808000\">values</span>=<span style=\"font-weight: bold\">[</span><span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">1</span>, <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">2</span>, <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">3</span>, <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">4</span>, <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">5</span>, <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">6</span>, <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">7</span><span style=\"font-weight: bold\">])</span>,             <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">             </span>\n",
+       "<span style=\"color: #7fbfbf; text-decoration-color: #7fbfbf\">                    </span>         <span style=\"color: #800080; text-decoration-color: #800080; font-weight: bold\">RemoteLabelQuestion</span><span style=\"font-weight: bold\">(</span><span style=\"color: #808000; text-decoration-color: #808000\">id</span>=<span style=\"color: #800080; text-decoration-color: #800080; font-weight: bold\">UUID</span><span style=\"font-weight: bold\">(</span><span style=\"color: #008000; text-decoration-color: #008000\">'dd2d916c-f0d7-4d2b-aa27-fc1e60575d37'</span><span style=\"font-weight: bold\">)</span>,     <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">             </span>\n",
+       "<span style=\"color: #7fbfbf; text-decoration-color: #7fbfbf\">                    </span>         <span style=\"color: #808000; text-decoration-color: #808000\">client</span>=<span style=\"color: #800080; text-decoration-color: #800080; font-style: italic\">None</span>, <span style=\"color: #808000; text-decoration-color: #808000\">name</span>=<span style=\"color: #008000; text-decoration-color: #008000\">'which_response_corrected'</span>, <span style=\"color: #808000; text-decoration-color: #808000\">title</span>=<span style=\"color: #008000; text-decoration-color: #008000\">'If both responses </span>  <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">             </span>\n",
+       "<span style=\"color: #7fbfbf; text-decoration-color: #7fbfbf\">                    </span>         <span style=\"color: #008000; text-decoration-color: #008000\">score below 4, select a response to correct:'</span>, <span style=\"color: #808000; text-decoration-color: #808000\">description</span>=<span style=\"color: #008000; text-decoration-color: #008000\">'Select the </span>  <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">             </span>\n",
+       "<span style=\"color: #7fbfbf; text-decoration-color: #7fbfbf\">                    </span>         <span style=\"color: #008000; text-decoration-color: #008000\">response you will correct in the text field below.'</span>, <span style=\"color: #808000; text-decoration-color: #808000\">required</span>=<span style=\"color: #ff0000; text-decoration-color: #ff0000; font-style: italic\">False</span>,     <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">             </span>\n",
+       "<span style=\"color: #7fbfbf; text-decoration-color: #7fbfbf\">                    </span>         <span style=\"color: #808000; text-decoration-color: #808000\">type</span>=<span style=\"color: #008000; text-decoration-color: #008000\">'label_selection'</span>, <span style=\"color: #808000; text-decoration-color: #808000\">labels</span>=<span style=\"font-weight: bold\">[</span><span style=\"color: #008000; text-decoration-color: #008000\">'Response 1'</span>, <span style=\"color: #008000; text-decoration-color: #008000\">'Response 2'</span>, <span style=\"color: #008000; text-decoration-color: #008000\">'Combination</span> <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">             </span>\n",
+       "<span style=\"color: #7fbfbf; text-decoration-color: #7fbfbf\">                    </span>         <span style=\"color: #008000; text-decoration-color: #008000\">of both'</span>, <span style=\"color: #008000; text-decoration-color: #008000\">'Neither'</span><span style=\"font-weight: bold\">]</span>, <span style=\"color: #808000; text-decoration-color: #808000\">visible_labels</span>=<span style=\"color: #800080; text-decoration-color: #800080; font-style: italic\">None</span><span style=\"font-weight: bold\">)</span>,                              <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">             </span>\n",
+       "<span style=\"color: #7fbfbf; text-decoration-color: #7fbfbf\">                    </span>         <span style=\"color: #800080; text-decoration-color: #800080; font-weight: bold\">RemoteTextQuestion</span><span style=\"font-weight: bold\">(</span><span style=\"color: #808000; text-decoration-color: #808000\">id</span>=<span style=\"color: #800080; text-decoration-color: #800080; font-weight: bold\">UUID</span><span style=\"font-weight: bold\">(</span><span style=\"color: #008000; text-decoration-color: #008000\">'700649a7-a3f9-4a58-a3b9-3b40b2090a3d'</span><span style=\"font-weight: bold\">)</span>,      <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">             </span>\n",
+       "<span style=\"color: #7fbfbf; text-decoration-color: #7fbfbf\">                    </span>         <span style=\"color: #808000; text-decoration-color: #808000\">client</span>=<span style=\"color: #800080; text-decoration-color: #800080; font-style: italic\">None</span>, <span style=\"color: #808000; text-decoration-color: #808000\">name</span>=<span style=\"color: #008000; text-decoration-color: #008000\">'correction'</span>, <span style=\"color: #808000; text-decoration-color: #808000\">title</span>=<span style=\"color: #008000; text-decoration-color: #008000\">'Paste the selected response below</span> <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">             </span>\n",
+       "<span style=\"color: #7fbfbf; text-decoration-color: #7fbfbf\">                    </span>         <span style=\"color: #008000; text-decoration-color: #008000\">and correct it manually:'</span>, <span style=\"color: #808000; text-decoration-color: #808000\">description</span>=<span style=\"color: #008000; text-decoration-color: #008000\">'Your corrected response must </span>    <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">             </span>\n",
+       "<span style=\"color: #7fbfbf; text-decoration-color: #7fbfbf\">                    </span>         <span style=\"color: #008000; text-decoration-color: #008000\">fulfill all criteria from the annotation guidelines.'</span>, <span style=\"color: #808000; text-decoration-color: #808000\">required</span>=<span style=\"color: #ff0000; text-decoration-color: #ff0000; font-style: italic\">False</span>,   <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">             </span>\n",
+       "<span style=\"color: #7fbfbf; text-decoration-color: #7fbfbf\">                    </span>         <span style=\"color: #808000; text-decoration-color: #808000\">type</span>=<span style=\"color: #008000; text-decoration-color: #008000\">'text'</span>, <span style=\"color: #808000; text-decoration-color: #808000\">use_markdown</span>=<span style=\"color: #00ff00; text-decoration-color: #00ff00; font-style: italic\">True</span><span style=\"font-weight: bold\">)</span>,                                         <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">             </span>\n",
+       "<span style=\"color: #7fbfbf; text-decoration-color: #7fbfbf\">                    </span>         <span style=\"color: #800080; text-decoration-color: #800080; font-weight: bold\">RemoteTextQuestion</span><span style=\"font-weight: bold\">(</span><span style=\"color: #808000; text-decoration-color: #808000\">id</span>=<span style=\"color: #800080; text-decoration-color: #800080; font-weight: bold\">UUID</span><span style=\"font-weight: bold\">(</span><span style=\"color: #008000; text-decoration-color: #008000\">'a10e164c-655b-434a-ba8a-fdf8f4b572e6'</span><span style=\"font-weight: bold\">)</span>,      <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">             </span>\n",
+       "<span style=\"color: #7fbfbf; text-decoration-color: #7fbfbf\">                    </span>         <span style=\"color: #808000; text-decoration-color: #808000\">client</span>=<span style=\"color: #800080; text-decoration-color: #800080; font-style: italic\">None</span>, <span style=\"color: #808000; text-decoration-color: #808000\">name</span>=<span style=\"color: #008000; text-decoration-color: #008000\">'comments'</span>, <span style=\"color: #808000; text-decoration-color: #808000\">title</span>=<span style=\"color: #008000; text-decoration-color: #008000\">'Annotator Comments'</span>,                <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">             </span>\n",
+       "<span style=\"color: #7fbfbf; text-decoration-color: #7fbfbf\">                    </span>         <span style=\"color: #808000; text-decoration-color: #808000\">description</span>=<span style=\"color: #008000; text-decoration-color: #008000\">'Add any additional comments here. E.g.: edge cases, issues </span> <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">             </span>\n",
+       "<span style=\"color: #7fbfbf; text-decoration-color: #7fbfbf\">                    </span>         <span style=\"color: #008000; text-decoration-color: #008000\">with the interface etc.'</span>, <span style=\"color: #808000; text-decoration-color: #808000\">required</span>=<span style=\"color: #ff0000; text-decoration-color: #ff0000; font-style: italic\">False</span>, <span style=\"color: #808000; text-decoration-color: #808000\">type</span>=<span style=\"color: #008000; text-decoration-color: #008000\">'text'</span>,                   <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">             </span>\n",
+       "<span style=\"color: #7fbfbf; text-decoration-color: #7fbfbf\">                    </span>         <span style=\"color: #808000; text-decoration-color: #808000\">use_markdown</span>=<span style=\"color: #00ff00; text-decoration-color: #00ff00; font-style: italic\">True</span><span style=\"font-weight: bold\">)]</span>                                                      <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">             </span>\n",
+       "<span style=\"color: #7fbfbf; text-decoration-color: #7fbfbf\">                    </span>            <span style=\"color: #808000; text-decoration-color: #808000\">guidelines</span>=<span style=\"color: #800080; text-decoration-color: #800080\">Your</span> task is to evaluate the responses of two LLMs to code <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">             </span>\n",
+       "<span style=\"color: #7fbfbf; text-decoration-color: #7fbfbf\">                    </span>         generation tasks.                                                        <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">             </span>\n",
+       "<span style=\"color: #7fbfbf; text-decoration-color: #7fbfbf\">                    </span>                                                                                  <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">             </span>\n",
+       "<span style=\"color: #7fbfbf; text-decoration-color: #7fbfbf\">                    </span>            First, you need to score each response on a scale from <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">0</span> to <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">7</span>. You    <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">             </span>\n",
+       "<span style=\"color: #7fbfbf; text-decoration-color: #7fbfbf\">                    </span>         add points to your final score based on the following criteria:          <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">             </span>\n",
+       "<span style=\"color: #7fbfbf; text-decoration-color: #7fbfbf\">                    </span>            - Add up to +<span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">2</span> points, if the code is properly commented, with inline <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">             </span>\n",
+       "<span style=\"color: #7fbfbf; text-decoration-color: #7fbfbf\">                    </span>         comments and doc strings for functions.                                  <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">             </span>\n",
+       "<span style=\"color: #7fbfbf; text-decoration-color: #7fbfbf\">                    </span>            - Add up to +<span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">2</span> points, if the code contains a good example for        <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">             </span>\n",
+       "<span style=\"color: #7fbfbf; text-decoration-color: #7fbfbf\">                    </span>         testing.                                                                 <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">             </span>\n",
+       "<span style=\"color: #7fbfbf; text-decoration-color: #7fbfbf\">                    </span>            - Add up to +<span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">3</span> points, if the code runs and works correctly. Copy the <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">             </span>\n",
+       "<span style=\"color: #7fbfbf; text-decoration-color: #7fbfbf\">                    </span>         code into an IDE and test it with at least two different inputs.         <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">             </span>\n",
+       "<span style=\"color: #7fbfbf; text-decoration-color: #7fbfbf\">                    </span>         Attribute one point if the code is overall correct, but has some issues. <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">             </span>\n",
+       "<span style=\"color: #7fbfbf; text-decoration-color: #7fbfbf\">                    </span>         Attribute three points if the code is fully correct and robust against   <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">             </span>\n",
+       "<span style=\"color: #7fbfbf; text-decoration-color: #7fbfbf\">                    </span>         different scenarios.                                                     <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">             </span>\n",
+       "<span style=\"color: #7fbfbf; text-decoration-color: #7fbfbf\">                    </span>            Your resulting final score can be any value between <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">0</span> to <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">7</span>.           <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">             </span>\n",
+       "<span style=\"color: #7fbfbf; text-decoration-color: #7fbfbf\">                    </span>                                                                                  <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">             </span>\n",
+       "<span style=\"color: #7fbfbf; text-decoration-color: #7fbfbf\">                    </span>            If both responses have a final score of <span style=\"font-weight: bold\">&lt;</span><span style=\"color: #000000; text-decoration-color: #000000\">= </span><span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">4</span><span style=\"color: #000000; text-decoration-color: #000000\">, select one response and</span> <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">             </span>\n",
+       "<span style=\"color: #7fbfbf; text-decoration-color: #7fbfbf\">                    </span>         <span style=\"color: #000000; text-decoration-color: #000000\">correct it manually in the text field. </span>                                  <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">             </span>\n",
+       "<span style=\"color: #7fbfbf; text-decoration-color: #7fbfbf\">                    </span>         <span style=\"color: #000000; text-decoration-color: #000000\">   The corrected response must fulfill all criteria from above. </span>         <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">             </span>\n",
+       "<span style=\"color: #7fbfbf; text-decoration-color: #7fbfbf\">                    </span>                                                                                  <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">             </span>\n",
+       "<span style=\"color: #7fbfbf; text-decoration-color: #7fbfbf\">                    </span>         <span style=\"color: #000000; text-decoration-color: #000000\">   </span><span style=\"color: #808000; text-decoration-color: #808000\">metadata_properties</span><span style=\"color: #000000; text-decoration-color: #000000\">=</span><span style=\"color: #000000; text-decoration-color: #000000; font-weight: bold\">[</span><span style=\"color: #800080; text-decoration-color: #800080; font-weight: bold\">RemoteTermsMetadataProperty</span><span style=\"color: #000000; text-decoration-color: #000000; font-weight: bold\">(</span><span style=\"color: #808000; text-decoration-color: #808000\">id</span><span style=\"color: #000000; text-decoration-color: #000000\">=</span><span style=\"color: #800080; text-decoration-color: #800080; font-weight: bold\">UUID</span><span style=\"color: #000000; text-decoration-color: #000000; font-weight: bold\">(</span><span style=\"color: #008000; text-decoration-color: #008000\">'77d28aae-77</span> <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">             </span>\n",
+       "<span style=\"color: #7fbfbf; text-decoration-color: #7fbfbf\">                    </span>         <span style=\"color: #008000; text-decoration-color: #008000\">44-431d-ada9-817e49b55ae3'</span><span style=\"color: #000000; text-decoration-color: #000000; font-weight: bold\">)</span><span style=\"color: #000000; text-decoration-color: #000000\">, </span><span style=\"color: #808000; text-decoration-color: #808000\">client</span><span style=\"color: #000000; text-decoration-color: #000000\">=&lt;httpx.Client object at </span>             <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">             </span>\n",
+       "<span style=\"color: #7fbfbf; text-decoration-color: #7fbfbf\">                    </span>         <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">0x7f10042af790</span><span style=\"color: #000000; text-decoration-color: #000000\">&gt;, </span><span style=\"color: #808000; text-decoration-color: #808000\">name</span><span style=\"color: #000000; text-decoration-color: #000000\">=</span><span style=\"color: #008000; text-decoration-color: #008000\">'annotator-groups'</span><span style=\"color: #000000; text-decoration-color: #000000\">, </span><span style=\"color: #808000; text-decoration-color: #808000\">title</span><span style=\"color: #000000; text-decoration-color: #000000\">=</span><span style=\"color: #008000; text-decoration-color: #008000\">'Annotator groups'</span><span style=\"color: #000000; text-decoration-color: #000000\">, </span>     <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">             </span>\n",
+       "<span style=\"color: #7fbfbf; text-decoration-color: #7fbfbf\">                    </span>         <span style=\"color: #808000; text-decoration-color: #808000\">visible_for_annotators</span><span style=\"color: #000000; text-decoration-color: #000000\">=</span><span style=\"color: #00ff00; text-decoration-color: #00ff00; font-style: italic\">True</span><span style=\"color: #000000; text-decoration-color: #000000\">, </span><span style=\"color: #808000; text-decoration-color: #808000\">type</span><span style=\"color: #000000; text-decoration-color: #000000\">=</span><span style=\"color: #008000; text-decoration-color: #008000\">'terms'</span><span style=\"color: #000000; text-decoration-color: #000000\">, </span><span style=\"color: #808000; text-decoration-color: #808000\">values</span><span style=\"color: #000000; text-decoration-color: #000000\">=</span><span style=\"color: #000000; text-decoration-color: #000000; font-weight: bold\">[</span><span style=\"color: #008000; text-decoration-color: #008000\">'annotator-1'</span><span style=\"color: #000000; text-decoration-color: #000000\">, </span>       <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">             </span>\n",
+       "<span style=\"color: #7fbfbf; text-decoration-color: #7fbfbf\">                    </span>         <span style=\"color: #008000; text-decoration-color: #008000\">'annotator-2'</span><span style=\"color: #000000; text-decoration-color: #000000\">, </span><span style=\"color: #008000; text-decoration-color: #008000\">'annotator-3'</span><span style=\"color: #000000; text-decoration-color: #000000; font-weight: bold\">])</span><span style=\"color: #000000; text-decoration-color: #000000\">, </span>                                         <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">             </span>\n",
+       "<span style=\"color: #7fbfbf; text-decoration-color: #7fbfbf\">                    </span>         <span style=\"color: #800080; text-decoration-color: #800080; font-weight: bold\">RemoteTermsMetadataProperty</span><span style=\"color: #000000; text-decoration-color: #000000; font-weight: bold\">(</span><span style=\"color: #808000; text-decoration-color: #808000\">id</span><span style=\"color: #000000; text-decoration-color: #000000\">=</span><span style=\"color: #800080; text-decoration-color: #800080; font-weight: bold\">UUID</span><span style=\"color: #000000; text-decoration-color: #000000; font-weight: bold\">(</span><span style=\"color: #008000; text-decoration-color: #008000\">'ebeed9fe-23df-4f07-b29e-501e2e42c93</span> <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">             </span>\n",
+       "<span style=\"color: #7fbfbf; text-decoration-color: #7fbfbf\">                    </span>         <span style=\"color: #008000; text-decoration-color: #008000\">f'</span><span style=\"color: #000000; text-decoration-color: #000000; font-weight: bold\">)</span><span style=\"color: #000000; text-decoration-color: #000000\">, </span><span style=\"color: #808000; text-decoration-color: #808000\">client</span><span style=\"color: #000000; text-decoration-color: #000000\">=&lt;httpx.Client object at </span><span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">0x7f10042af790</span><span style=\"font-weight: bold\">&gt;</span>,                     <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">             </span>\n",
+       "<span style=\"color: #7fbfbf; text-decoration-color: #7fbfbf\">                    </span>         <span style=\"color: #808000; text-decoration-color: #808000\">name</span>=<span style=\"color: #008000; text-decoration-color: #008000\">'source-dataset'</span>, <span style=\"color: #808000; text-decoration-color: #808000\">title</span>=<span style=\"color: #008000; text-decoration-color: #008000\">'Original dataset source'</span>,                  <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">             </span>\n",
+       "<span style=\"color: #7fbfbf; text-decoration-color: #7fbfbf\">                    </span>         <span style=\"color: #808000; text-decoration-color: #808000\">visible_for_annotators</span>=<span style=\"color: #00ff00; text-decoration-color: #00ff00; font-style: italic\">True</span>, <span style=\"color: #808000; text-decoration-color: #808000\">type</span>=<span style=\"color: #008000; text-decoration-color: #008000\">'terms'</span>, <span style=\"color: #808000; text-decoration-color: #808000\">values</span>=<span style=\"color: #800080; text-decoration-color: #800080; font-style: italic\">None</span><span style=\"font-weight: bold\">)]</span>                 <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">             </span>\n",
+       "<span style=\"color: #7fbfbf; text-decoration-color: #7fbfbf\">                    </span>            <span style=\"color: #808000; text-decoration-color: #808000\">vectors_settings</span>=<span style=\"font-weight: bold\">[]</span>                                                   <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">             </span>\n",
+       "<span style=\"color: #7fbfbf; text-decoration-color: #7fbfbf\">                    </span>         <span style=\"font-weight: bold\">)</span>                                                                        <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">             </span>\n",
+       "</pre>\n"
+      ],
+      "text/plain": [
+       "\u001b[2;36m                   \u001b[0m\u001b[2;36m \u001b[0m\u001b[34mINFO    \u001b[0m INFO:argilla.client.feedback.dataset.local.mixins:\u001b[1;35mRemoteFeedbackDataset\u001b[0m\u001b[1m(\u001b[0m \u001b]8;id=289816;file:///home/user/miniconda/lib/python3.9/site-packages/argilla/client/feedback/dataset/local/mixins.py\u001b\\\u001b[2mmixins.py\u001b[0m\u001b]8;;\u001b\\\u001b[2m:\u001b[0m\u001b]8;id=960637;file:///home/user/miniconda/lib/python3.9/site-packages/argilla/client/feedback/dataset/local/mixins.py#272\u001b\\\u001b[2m272\u001b[0m\u001b]8;;\u001b\\\n",
+       "\u001b[2;36m                    \u001b[0m            \u001b[33mid\u001b[0m=\u001b[93mb3a9098f\u001b[0m\u001b[93m-25a9-4b59-9fb6-739e885c5ab3\u001b[0m                               \u001b[2m             \u001b[0m\n",
+       "\u001b[2;36m                    \u001b[0m            \u001b[33mname\u001b[0m=\u001b[35mcode\u001b[0m-llm                                                         \u001b[2m             \u001b[0m\n",
+       "\u001b[2;36m                    \u001b[0m            \u001b[33mworkspace\u001b[0m=\u001b[1;35mWorkspace\u001b[0m\u001b[1m(\u001b[0m\u001b[33mid\u001b[0m=\u001b[93md9ec781d\u001b[0m\u001b[93m-8505-430c-ab87-0deac2951f00\u001b[0m,          \u001b[2m             \u001b[0m\n",
+       "\u001b[2;36m                    \u001b[0m         \u001b[33mname\u001b[0m=\u001b[35madmin\u001b[0m, \u001b[33minserted_at\u001b[0m=\u001b[1;36m2024\u001b[0m-\u001b[1;36m05\u001b[0m-\u001b[1;36m02\u001b[0m \u001b[1;92m11:40:30\u001b[0m.\u001b[1;36m831848\u001b[0m,                      \u001b[2m             \u001b[0m\n",
+       "\u001b[2;36m                    \u001b[0m         \u001b[33mupdated_at\u001b[0m=\u001b[1;36m2024\u001b[0m-\u001b[1;36m05\u001b[0m-\u001b[1;36m02\u001b[0m \u001b[1;92m11:40:30\u001b[0m.\u001b[1;36m831848\u001b[0m\u001b[1m)\u001b[0m                                   \u001b[2m             \u001b[0m\n",
+       "\u001b[2;36m                    \u001b[0m            \u001b[33murl\u001b[0m=\u001b[4;94mhttps\u001b[0m\u001b[4;94m://moritzlaurer-argilla-00.hf.space/dataset/b3a9098f-25a9-4b\u001b[0m \u001b[2m             \u001b[0m\n",
+       "\u001b[2;36m                    \u001b[0m         \u001b[4;94m59-9fb6-739e885c5ab3/annotation-mode\u001b[0m                                     \u001b[2m             \u001b[0m\n",
+       "\u001b[2;36m                    \u001b[0m            \u001b[33mfields\u001b[0m=\u001b[1m[\u001b[0m\u001b[1;35mRemoteTextField\u001b[0m\u001b[1m(\u001b[0m\u001b[33mid\u001b[0m=\u001b[1;35mUUID\u001b[0m\u001b[1m(\u001b[0m\u001b[32m'22667685-d826-4281-82ec-fa2d8334aff2\u001b[0m \u001b[2m             \u001b[0m\n",
+       "\u001b[2;36m                    \u001b[0m         \u001b[32m'\u001b[0m\u001b[1m)\u001b[0m, \u001b[33mclient\u001b[0m=\u001b[3;35mNone\u001b[0m, \u001b[33mname\u001b[0m=\u001b[32m'instruction'\u001b[0m, \u001b[33mtitle\u001b[0m=\u001b[32m'Instruction:'\u001b[0m,               \u001b[2m             \u001b[0m\n",
+       "\u001b[2;36m                    \u001b[0m         \u001b[33mrequired\u001b[0m=\u001b[3;92mTrue\u001b[0m, \u001b[33mtype\u001b[0m=\u001b[32m'text'\u001b[0m, \u001b[33muse_markdown\u001b[0m=\u001b[3;92mTrue\u001b[0m\u001b[1m)\u001b[0m,                          \u001b[2m             \u001b[0m\n",
+       "\u001b[2;36m                    \u001b[0m         \u001b[1;35mRemoteTextField\u001b[0m\u001b[1m(\u001b[0m\u001b[33mid\u001b[0m=\u001b[1;35mUUID\u001b[0m\u001b[1m(\u001b[0m\u001b[32m'17140de6-63c8-43a9-ad04-10606c85d70b'\u001b[0m\u001b[1m)\u001b[0m,         \u001b[2m             \u001b[0m\n",
+       "\u001b[2;36m                    \u001b[0m         \u001b[33mclient\u001b[0m=\u001b[3;35mNone\u001b[0m, \u001b[33mname\u001b[0m=\u001b[32m'generation_1'\u001b[0m, \u001b[33mtitle\u001b[0m=\u001b[32m'Response model 1:'\u001b[0m,             \u001b[2m             \u001b[0m\n",
+       "\u001b[2;36m                    \u001b[0m         \u001b[33mrequired\u001b[0m=\u001b[3;92mTrue\u001b[0m, \u001b[33mtype\u001b[0m=\u001b[32m'text'\u001b[0m, \u001b[33muse_markdown\u001b[0m=\u001b[3;92mTrue\u001b[0m\u001b[1m)\u001b[0m,                          \u001b[2m             \u001b[0m\n",
+       "\u001b[2;36m                    \u001b[0m         \u001b[1;35mRemoteTextField\u001b[0m\u001b[1m(\u001b[0m\u001b[33mid\u001b[0m=\u001b[1;35mUUID\u001b[0m\u001b[1m(\u001b[0m\u001b[32m'f095d45a-53b3-4531-b541-29c8bbfa0156'\u001b[0m\u001b[1m)\u001b[0m,         \u001b[2m             \u001b[0m\n",
+       "\u001b[2;36m                    \u001b[0m         \u001b[33mclient\u001b[0m=\u001b[3;35mNone\u001b[0m, \u001b[33mname\u001b[0m=\u001b[32m'generation_2'\u001b[0m, \u001b[33mtitle\u001b[0m=\u001b[32m'Response model 2:'\u001b[0m,             \u001b[2m             \u001b[0m\n",
+       "\u001b[2;36m                    \u001b[0m         \u001b[33mrequired\u001b[0m=\u001b[3;92mTrue\u001b[0m, \u001b[33mtype\u001b[0m=\u001b[32m'text'\u001b[0m, \u001b[33muse_markdown\u001b[0m=\u001b[3;92mTrue\u001b[0m\u001b[1m)\u001b[0m\u001b[1m]\u001b[0m                          \u001b[2m             \u001b[0m\n",
+       "\u001b[2;36m                    \u001b[0m            \u001b[33mquestions\u001b[0m=\u001b[1m[\u001b[0m\u001b[1;35mRemoteRatingQuestion\u001b[0m\u001b[1m(\u001b[0m\u001b[33mid\u001b[0m=\u001b[1;35mUUID\u001b[0m\u001b[1m(\u001b[0m\u001b[32m'7714ab9f-5297-461b-9b9c-53de\u001b[0m \u001b[2m             \u001b[0m\n",
+       "\u001b[2;36m                    \u001b[0m         \u001b[32m0bc1633a'\u001b[0m\u001b[1m)\u001b[0m, \u001b[33mclient\u001b[0m=\u001b[3;35mNone\u001b[0m, \u001b[33mname\u001b[0m=\u001b[32m'score_response_1'\u001b[0m, \u001b[33mtitle\u001b[0m=\u001b[32m'Your score for \u001b[0m \u001b[2m             \u001b[0m\n",
+       "\u001b[2;36m                    \u001b[0m         \u001b[32mthe response of model 1:'\u001b[0m, \u001b[33mdescription\u001b[0m=\u001b[32m'- Add up to +2 points, if the \u001b[0m   \u001b[2m             \u001b[0m\n",
+       "\u001b[2;36m                    \u001b[0m         \u001b[32mcode is properly commented, with inline comments and doc strings for \u001b[0m    \u001b[2m             \u001b[0m\n",
+       "\u001b[2;36m                    \u001b[0m         \u001b[32mfunctions.\\n- Add up to +2 points, if the code contains a good example \u001b[0m  \u001b[2m             \u001b[0m\n",
+       "\u001b[2;36m                    \u001b[0m         \u001b[32mfor testing. \\n- Add up to +3 points, if the code runs and works \u001b[0m        \u001b[2m             \u001b[0m\n",
+       "\u001b[2;36m                    \u001b[0m         \u001b[32mcorrectly. Copy the code into an IDE and test it with at least two \u001b[0m      \u001b[2m             \u001b[0m\n",
+       "\u001b[2;36m                    \u001b[0m         \u001b[32mdifferent inputs. Attribute one point if the code works mostly \u001b[0m          \u001b[2m             \u001b[0m\n",
+       "\u001b[2;36m                    \u001b[0m         \u001b[32mcorrectly, but has some issues. Attribute three points if the code is \u001b[0m   \u001b[2m             \u001b[0m\n",
+       "\u001b[2;36m                    \u001b[0m         \u001b[32mfully correct and robust against different scenarios. \\n'\u001b[0m,               \u001b[2m             \u001b[0m\n",
+       "\u001b[2;36m                    \u001b[0m         \u001b[33mrequired\u001b[0m=\u001b[3;92mTrue\u001b[0m, \u001b[33mtype\u001b[0m=\u001b[32m'rating'\u001b[0m, \u001b[33mvalues\u001b[0m=\u001b[1m[\u001b[0m\u001b[1;36m1\u001b[0m, \u001b[1;36m2\u001b[0m, \u001b[1;36m3\u001b[0m, \u001b[1;36m4\u001b[0m, \u001b[1;36m5\u001b[0m, \u001b[1;36m6\u001b[0m, \u001b[1;36m7\u001b[0m\u001b[1m]\u001b[0m\u001b[1m)\u001b[0m,             \u001b[2m             \u001b[0m\n",
+       "\u001b[2;36m                    \u001b[0m         \u001b[1;35mRemoteRatingQuestion\u001b[0m\u001b[1m(\u001b[0m\u001b[33mid\u001b[0m=\u001b[1;35mUUID\u001b[0m\u001b[1m(\u001b[0m\u001b[32m'43db47a7-1d83-4d2d-b5e2-3992262afb97'\u001b[0m\u001b[1m)\u001b[0m,    \u001b[2m             \u001b[0m\n",
+       "\u001b[2;36m                    \u001b[0m         \u001b[33mclient\u001b[0m=\u001b[3;35mNone\u001b[0m, \u001b[33mname\u001b[0m=\u001b[32m'score_response_2'\u001b[0m, \u001b[33mtitle\u001b[0m=\u001b[32m'Your score for the response\u001b[0m \u001b[2m             \u001b[0m\n",
+       "\u001b[2;36m                    \u001b[0m         \u001b[32mof model 2:'\u001b[0m, \u001b[33mdescription\u001b[0m=\u001b[32m'- Add up to +2 points, if the code is \u001b[0m        \u001b[2m             \u001b[0m\n",
+       "\u001b[2;36m                    \u001b[0m         \u001b[32mproperly commented, with inline comments and doc strings for \u001b[0m            \u001b[2m             \u001b[0m\n",
+       "\u001b[2;36m                    \u001b[0m         \u001b[32mfunctions.\\n- Add up to +2 points, if the code contains a good example \u001b[0m  \u001b[2m             \u001b[0m\n",
+       "\u001b[2;36m                    \u001b[0m         \u001b[32mfor testing. \\n- Add up to +3 points, if the code runs and works \u001b[0m        \u001b[2m             \u001b[0m\n",
+       "\u001b[2;36m                    \u001b[0m         \u001b[32mcorrectly. Copy the code into an IDE and test it with at least two \u001b[0m      \u001b[2m             \u001b[0m\n",
+       "\u001b[2;36m                    \u001b[0m         \u001b[32mdifferent inputs. Attribute one point if the code works mostly \u001b[0m          \u001b[2m             \u001b[0m\n",
+       "\u001b[2;36m                    \u001b[0m         \u001b[32mcorrectly, but has some issues. Attribute three points if the code is \u001b[0m   \u001b[2m             \u001b[0m\n",
+       "\u001b[2;36m                    \u001b[0m         \u001b[32mfully correct and robust against different scenarios. \\n'\u001b[0m,               \u001b[2m             \u001b[0m\n",
+       "\u001b[2;36m                    \u001b[0m         \u001b[33mrequired\u001b[0m=\u001b[3;92mTrue\u001b[0m, \u001b[33mtype\u001b[0m=\u001b[32m'rating'\u001b[0m, \u001b[33mvalues\u001b[0m=\u001b[1m[\u001b[0m\u001b[1;36m1\u001b[0m, \u001b[1;36m2\u001b[0m, \u001b[1;36m3\u001b[0m, \u001b[1;36m4\u001b[0m, \u001b[1;36m5\u001b[0m, \u001b[1;36m6\u001b[0m, \u001b[1;36m7\u001b[0m\u001b[1m]\u001b[0m\u001b[1m)\u001b[0m,             \u001b[2m             \u001b[0m\n",
+       "\u001b[2;36m                    \u001b[0m         \u001b[1;35mRemoteLabelQuestion\u001b[0m\u001b[1m(\u001b[0m\u001b[33mid\u001b[0m=\u001b[1;35mUUID\u001b[0m\u001b[1m(\u001b[0m\u001b[32m'dd2d916c-f0d7-4d2b-aa27-fc1e60575d37'\u001b[0m\u001b[1m)\u001b[0m,     \u001b[2m             \u001b[0m\n",
+       "\u001b[2;36m                    \u001b[0m         \u001b[33mclient\u001b[0m=\u001b[3;35mNone\u001b[0m, \u001b[33mname\u001b[0m=\u001b[32m'which_response_corrected'\u001b[0m, \u001b[33mtitle\u001b[0m=\u001b[32m'If both responses \u001b[0m  \u001b[2m             \u001b[0m\n",
+       "\u001b[2;36m                    \u001b[0m         \u001b[32mscore below 4, select a response to correct:'\u001b[0m, \u001b[33mdescription\u001b[0m=\u001b[32m'Select the \u001b[0m  \u001b[2m             \u001b[0m\n",
+       "\u001b[2;36m                    \u001b[0m         \u001b[32mresponse you will correct in the text field below.'\u001b[0m, \u001b[33mrequired\u001b[0m=\u001b[3;91mFalse\u001b[0m,     \u001b[2m             \u001b[0m\n",
+       "\u001b[2;36m                    \u001b[0m         \u001b[33mtype\u001b[0m=\u001b[32m'label_selection'\u001b[0m, \u001b[33mlabels\u001b[0m=\u001b[1m[\u001b[0m\u001b[32m'Response 1'\u001b[0m, \u001b[32m'Response 2'\u001b[0m, \u001b[32m'Combination\u001b[0m \u001b[2m             \u001b[0m\n",
+       "\u001b[2;36m                    \u001b[0m         \u001b[32mof both'\u001b[0m, \u001b[32m'Neither'\u001b[0m\u001b[1m]\u001b[0m, \u001b[33mvisible_labels\u001b[0m=\u001b[3;35mNone\u001b[0m\u001b[1m)\u001b[0m,                              \u001b[2m             \u001b[0m\n",
+       "\u001b[2;36m                    \u001b[0m         \u001b[1;35mRemoteTextQuestion\u001b[0m\u001b[1m(\u001b[0m\u001b[33mid\u001b[0m=\u001b[1;35mUUID\u001b[0m\u001b[1m(\u001b[0m\u001b[32m'700649a7-a3f9-4a58-a3b9-3b40b2090a3d'\u001b[0m\u001b[1m)\u001b[0m,      \u001b[2m             \u001b[0m\n",
+       "\u001b[2;36m                    \u001b[0m         \u001b[33mclient\u001b[0m=\u001b[3;35mNone\u001b[0m, \u001b[33mname\u001b[0m=\u001b[32m'correction'\u001b[0m, \u001b[33mtitle\u001b[0m=\u001b[32m'Paste the selected response below\u001b[0m \u001b[2m             \u001b[0m\n",
+       "\u001b[2;36m                    \u001b[0m         \u001b[32mand correct it manually:'\u001b[0m, \u001b[33mdescription\u001b[0m=\u001b[32m'Your corrected response must \u001b[0m    \u001b[2m             \u001b[0m\n",
+       "\u001b[2;36m                    \u001b[0m         \u001b[32mfulfill all criteria from the annotation guidelines.'\u001b[0m, \u001b[33mrequired\u001b[0m=\u001b[3;91mFalse\u001b[0m,   \u001b[2m             \u001b[0m\n",
+       "\u001b[2;36m                    \u001b[0m         \u001b[33mtype\u001b[0m=\u001b[32m'text'\u001b[0m, \u001b[33muse_markdown\u001b[0m=\u001b[3;92mTrue\u001b[0m\u001b[1m)\u001b[0m,                                         \u001b[2m             \u001b[0m\n",
+       "\u001b[2;36m                    \u001b[0m         \u001b[1;35mRemoteTextQuestion\u001b[0m\u001b[1m(\u001b[0m\u001b[33mid\u001b[0m=\u001b[1;35mUUID\u001b[0m\u001b[1m(\u001b[0m\u001b[32m'a10e164c-655b-434a-ba8a-fdf8f4b572e6'\u001b[0m\u001b[1m)\u001b[0m,      \u001b[2m             \u001b[0m\n",
+       "\u001b[2;36m                    \u001b[0m         \u001b[33mclient\u001b[0m=\u001b[3;35mNone\u001b[0m, \u001b[33mname\u001b[0m=\u001b[32m'comments'\u001b[0m, \u001b[33mtitle\u001b[0m=\u001b[32m'Annotator Comments'\u001b[0m,                \u001b[2m             \u001b[0m\n",
+       "\u001b[2;36m                    \u001b[0m         \u001b[33mdescription\u001b[0m=\u001b[32m'Add any additional comments here. E.g.: edge cases, issues \u001b[0m \u001b[2m             \u001b[0m\n",
+       "\u001b[2;36m                    \u001b[0m         \u001b[32mwith the interface etc.'\u001b[0m, \u001b[33mrequired\u001b[0m=\u001b[3;91mFalse\u001b[0m, \u001b[33mtype\u001b[0m=\u001b[32m'text'\u001b[0m,                   \u001b[2m             \u001b[0m\n",
+       "\u001b[2;36m                    \u001b[0m         \u001b[33muse_markdown\u001b[0m=\u001b[3;92mTrue\u001b[0m\u001b[1m)\u001b[0m\u001b[1m]\u001b[0m                                                      \u001b[2m             \u001b[0m\n",
+       "\u001b[2;36m                    \u001b[0m            \u001b[33mguidelines\u001b[0m=\u001b[35mYour\u001b[0m task is to evaluate the responses of two LLMs to code \u001b[2m             \u001b[0m\n",
+       "\u001b[2;36m                    \u001b[0m         generation tasks.                                                        \u001b[2m             \u001b[0m\n",
+       "\u001b[2;36m                    \u001b[0m                                                                                  \u001b[2m             \u001b[0m\n",
+       "\u001b[2;36m                    \u001b[0m            First, you need to score each response on a scale from \u001b[1;36m0\u001b[0m to \u001b[1;36m7\u001b[0m. You    \u001b[2m             \u001b[0m\n",
+       "\u001b[2;36m                    \u001b[0m         add points to your final score based on the following criteria:          \u001b[2m             \u001b[0m\n",
+       "\u001b[2;36m                    \u001b[0m            - Add up to +\u001b[1;36m2\u001b[0m points, if the code is properly commented, with inline \u001b[2m             \u001b[0m\n",
+       "\u001b[2;36m                    \u001b[0m         comments and doc strings for functions.                                  \u001b[2m             \u001b[0m\n",
+       "\u001b[2;36m                    \u001b[0m            - Add up to +\u001b[1;36m2\u001b[0m points, if the code contains a good example for        \u001b[2m             \u001b[0m\n",
+       "\u001b[2;36m                    \u001b[0m         testing.                                                                 \u001b[2m             \u001b[0m\n",
+       "\u001b[2;36m                    \u001b[0m            - Add up to +\u001b[1;36m3\u001b[0m points, if the code runs and works correctly. Copy the \u001b[2m             \u001b[0m\n",
+       "\u001b[2;36m                    \u001b[0m         code into an IDE and test it with at least two different inputs.         \u001b[2m             \u001b[0m\n",
+       "\u001b[2;36m                    \u001b[0m         Attribute one point if the code is overall correct, but has some issues. \u001b[2m             \u001b[0m\n",
+       "\u001b[2;36m                    \u001b[0m         Attribute three points if the code is fully correct and robust against   \u001b[2m             \u001b[0m\n",
+       "\u001b[2;36m                    \u001b[0m         different scenarios.                                                     \u001b[2m             \u001b[0m\n",
+       "\u001b[2;36m                    \u001b[0m            Your resulting final score can be any value between \u001b[1;36m0\u001b[0m to \u001b[1;36m7\u001b[0m.           \u001b[2m             \u001b[0m\n",
+       "\u001b[2;36m                    \u001b[0m                                                                                  \u001b[2m             \u001b[0m\n",
+       "\u001b[2;36m                    \u001b[0m            If both responses have a final score of \u001b[1m<\u001b[0m\u001b[39m= \u001b[0m\u001b[1;36m4\u001b[0m\u001b[39m, select one response and\u001b[0m \u001b[2m             \u001b[0m\n",
+       "\u001b[2;36m                    \u001b[0m         \u001b[39mcorrect it manually in the text field. \u001b[0m                                  \u001b[2m             \u001b[0m\n",
+       "\u001b[2;36m                    \u001b[0m         \u001b[39m   The corrected response must fulfill all criteria from above. \u001b[0m         \u001b[2m             \u001b[0m\n",
+       "\u001b[2;36m                    \u001b[0m                                                                                  \u001b[2m             \u001b[0m\n",
+       "\u001b[2;36m                    \u001b[0m         \u001b[39m   \u001b[0m\u001b[33mmetadata_properties\u001b[0m\u001b[39m=\u001b[0m\u001b[1;39m[\u001b[0m\u001b[1;35mRemoteTermsMetadataProperty\u001b[0m\u001b[1;39m(\u001b[0m\u001b[33mid\u001b[0m\u001b[39m=\u001b[0m\u001b[1;35mUUID\u001b[0m\u001b[1;39m(\u001b[0m\u001b[32m'77d28aae-77\u001b[0m \u001b[2m             \u001b[0m\n",
+       "\u001b[2;36m                    \u001b[0m         \u001b[32m44-431d-ada9-817e49b55ae3'\u001b[0m\u001b[1;39m)\u001b[0m\u001b[39m, \u001b[0m\u001b[33mclient\u001b[0m\u001b[39m=<httpx.Client object at \u001b[0m             \u001b[2m             \u001b[0m\n",
+       "\u001b[2;36m                    \u001b[0m         \u001b[1;36m0x7f10042af790\u001b[0m\u001b[39m>, \u001b[0m\u001b[33mname\u001b[0m\u001b[39m=\u001b[0m\u001b[32m'annotator-groups'\u001b[0m\u001b[39m, \u001b[0m\u001b[33mtitle\u001b[0m\u001b[39m=\u001b[0m\u001b[32m'Annotator groups'\u001b[0m\u001b[39m, \u001b[0m     \u001b[2m             \u001b[0m\n",
+       "\u001b[2;36m                    \u001b[0m         \u001b[33mvisible_for_annotators\u001b[0m\u001b[39m=\u001b[0m\u001b[3;92mTrue\u001b[0m\u001b[39m, \u001b[0m\u001b[33mtype\u001b[0m\u001b[39m=\u001b[0m\u001b[32m'terms'\u001b[0m\u001b[39m, \u001b[0m\u001b[33mvalues\u001b[0m\u001b[39m=\u001b[0m\u001b[1;39m[\u001b[0m\u001b[32m'annotator-1'\u001b[0m\u001b[39m, \u001b[0m       \u001b[2m             \u001b[0m\n",
+       "\u001b[2;36m                    \u001b[0m         \u001b[32m'annotator-2'\u001b[0m\u001b[39m, \u001b[0m\u001b[32m'annotator-3'\u001b[0m\u001b[1;39m]\u001b[0m\u001b[1;39m)\u001b[0m\u001b[39m, \u001b[0m                                         \u001b[2m             \u001b[0m\n",
+       "\u001b[2;36m                    \u001b[0m         \u001b[1;35mRemoteTermsMetadataProperty\u001b[0m\u001b[1;39m(\u001b[0m\u001b[33mid\u001b[0m\u001b[39m=\u001b[0m\u001b[1;35mUUID\u001b[0m\u001b[1;39m(\u001b[0m\u001b[32m'ebeed9fe-23df-4f07-b29e-501e2e42c93\u001b[0m \u001b[2m             \u001b[0m\n",
+       "\u001b[2;36m                    \u001b[0m         \u001b[32mf'\u001b[0m\u001b[1;39m)\u001b[0m\u001b[39m, \u001b[0m\u001b[33mclient\u001b[0m\u001b[39m=<httpx.Client object at \u001b[0m\u001b[1;36m0x7f10042af790\u001b[0m\u001b[1m>\u001b[0m,                     \u001b[2m             \u001b[0m\n",
+       "\u001b[2;36m                    \u001b[0m         \u001b[33mname\u001b[0m=\u001b[32m'source-dataset'\u001b[0m, \u001b[33mtitle\u001b[0m=\u001b[32m'Original dataset source'\u001b[0m,                  \u001b[2m             \u001b[0m\n",
+       "\u001b[2;36m                    \u001b[0m         \u001b[33mvisible_for_annotators\u001b[0m=\u001b[3;92mTrue\u001b[0m, \u001b[33mtype\u001b[0m=\u001b[32m'terms'\u001b[0m, \u001b[33mvalues\u001b[0m=\u001b[3;35mNone\u001b[0m\u001b[1m)\u001b[0m\u001b[1m]\u001b[0m                 \u001b[2m             \u001b[0m\n",
+       "\u001b[2;36m                    \u001b[0m            \u001b[33mvectors_settings\u001b[0m=\u001b[1m[\u001b[0m\u001b[1m]\u001b[0m                                                   \u001b[2m             \u001b[0m\n",
+       "\u001b[2;36m                    \u001b[0m         \u001b[1m)\u001b[0m                                                                        \u001b[2m             \u001b[0m\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "dataset_argilla_name = \"code-llm\"\n",
+    "reuse_existing_dataset = False  # for easier iterative testing\n",
+    "\n",
+    "# Create annotator groups. Used for task asignment via meta data filtering.\n",
+    "# See explanations on annotator task assignment further below\n",
+    "annotators = [\"annotator-1\", \"annotator-2\", \"annotator-3\"]\n",
+    "\n",
+    "# Create the interface structure via an Argilla FeedbackDataset\n",
+    "dataset_argilla = rg.FeedbackDataset(\n",
+    "    # The overall annotation guidelines, which human annotators can refer back to inside of the interface\n",
+    "    guidelines=annotator_guidelines,\n",
+    "    # The fields on the left side of the interface\n",
+    "    fields=[\n",
+    "        rg.TextField(name=\"instruction\", title=\"Instruction:\", use_markdown=True, required=True),\n",
+    "        rg.TextField(name=\"generation_1\", title=\"Response model 1:\", use_markdown=True, required=True),\n",
+    "        rg.TextField(name=\"generation_2\", title=\"Response model 2:\", use_markdown=True, required=True),\n",
+    "    ],\n",
+    "    # The different questions on the right side of the interface \n",
+    "    # These are the questions we ask annotators about the fields on the left of the interface\n",
+    "    # The available question types are documented here: https://docs.argilla.io/en/latest/getting_started/cheatsheet.html#configure-datasets\n",
+    "    questions=[\n",
+    "        rg.RatingQuestion(\n",
+    "            name=\"score_response_1\",\n",
+    "            title=\"Your score for the response of model 1:\",\n",
+    "            description=rating_tooltip,  #\"1 = very bad\\n2 = bad\\n3 = mediocre\\n4 = good\\n5 = very good\",\n",
+    "            # Note: Argilla version <= 1.28 does not yet support rating values of 0. \n",
+    "            # This will be possible starting version >= 1.29  \n",
+    "            values=[1, 2, 3, 4, 5, 6, 7],\n",
+    "            required=True,\n",
+    "        ),\n",
+    "        rg.RatingQuestion(\n",
+    "            name=\"score_response_2\",\n",
+    "            title=\"Your score for the response of model 2:\",\n",
+    "            description=rating_tooltip,  #\"1 = very bad\\n2 = bad\\n3 = mediocre\\n4 = good\\n5 = very good\",\n",
+    "            values=[1, 2, 3, 4, 5, 6, 7],\n",
+    "            required=True,\n",
+    "        ),\n",
+    "        rg.LabelQuestion(\n",
+    "            name=\"which_response_corrected\",\n",
+    "            title=\"If both responses score below 4, select a response to correct:\",\n",
+    "            description=\"Select the response you will correct in the text field below.\",\n",
+    "            labels = [\"Response 1\", \"Response 2\", \"Combination of both\", \"Neither\"],\n",
+    "            required=False,\n",
+    "        ),\n",
+    "        rg.TextQuestion(\n",
+    "            name=\"correction\",\n",
+    "            title=\"Paste the selected response below and correct it manually:\",\n",
+    "            description=\"Your corrected response must fulfill all criteria from the annotation guidelines.\",\n",
+    "            use_markdown=True,\n",
+    "            required=False\n",
+    "        ),\n",
+    "        rg.TextQuestion(\n",
+    "            name=\"comments\",\n",
+    "            title=\"Annotator Comments\",\n",
+    "            description=\"Add any additional comments here. E.g.: edge cases, issues with the interface etc.\",\n",
+    "            use_markdown=True,\n",
+    "            required=False\n",
+    "        ),\n",
+    "    ],\n",
+    "    metadata_properties = [\n",
+    "        rg.TermsMetadataProperty(\n",
+    "            name=\"annotator-groups\",\n",
+    "            title=\"Annotator groups\",\n",
+    "            values=annotators,\n",
+    "        ),\n",
+    "        rg.TermsMetadataProperty(\n",
+    "            name=\"source-dataset\",\n",
+    "            title=\"Original dataset source\",\n",
+    "        ),\n",
+    "    ],\n",
+    "    allow_extra_metadata = False\n",
+    ")\n",
+    "\n",
+    "\n",
+    "if reuse_existing_dataset:\n",
+    "    dataset_argilla = rg.FeedbackDataset.from_argilla(dataset_argilla_name, workspace=\"admin\")\n",
+    "else:\n",
+    "    # check if dataset already exists\n",
+    "    dataset_existing = [dataset for dataset in rg.list_datasets() if dataset.name == dataset_argilla_name]\n",
+    "    # if it already exists, delete it\n",
+    "    if len(dataset_existing) > 0: \n",
+    "        rg.FeedbackDataset.from_argilla(name=dataset_argilla_name, workspace='admin').delete()\n",
+    "    # push (updated) dataset to argilla\n",
+    "    dataset_argilla = dataset_argilla.push_to_argilla(dataset_argilla_name, workspace=\"admin\")\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "cadbd691-fe8d-422e-aaa0-2b3dd1b981b6",
+   "metadata": {},
+   "source": [
+    "After running the code above, you will see the new custom `code-llm` task in Argilla (and any other tasks you might have created before, see image).\n",
+    "\n",
+    "<img src=\"https://github.com/MoritzLaurer/huggingface_materials/blob/4d4428d68d8a4a087c4255f6838e261d9d5bfbda/images_misc/argilla-tasks-overview.png?raw=true\" alt=\"image description\">  "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 12,
+   "id": "280c742f-0a1c-4176-8c51-90a858de7a72",
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "# The final argilla dataset\n",
+    "#print(dataset_argilla)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5d15572c-bdc8-414c-8173-542510124959",
+   "metadata": {},
+   "source": [
+    "You can also read the [detailed guide](https://docs.argilla.io/en/latest/conceptual_guides/llm/llm.html) on working with LLM data in Argilla for more guidance on creating different interfaces for different tasks."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0efbf95d-5ab5-4369-a53e-5a5be353ef83",
+   "metadata": {},
+   "source": [
+    "#### Upload data to Argilla for our task\n",
+    "\n",
+    "At this point, the task is still empty. Let's upload some data into the task interface with the code below."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 13,
+   "id": "0b722748-dce4-48cf-99b1-e7217ea09dc0",
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [
+    {
+     "data": {
+      "application/vnd.jupyter.widget-view+json": {
+       "model_id": "520645412f474756bf4753395f75fd59",
+       "version_major": 2,
+       "version_minor": 0
+      },
+      "text/plain": [
+       "Output()"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/html": [
+       "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"></pre>\n"
+      ],
+      "text/plain": []
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/html": [
+       "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\">\n",
+       "</pre>\n"
+      ],
+      "text/plain": [
+       "\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "application/vnd.jupyter.widget-view+json": {
+       "model_id": "cbcdf844d9334395af6bf494b35d9b2a",
+       "version_major": 2,
+       "version_minor": 0
+      },
+      "text/plain": [
+       "Output()"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/html": [
+       "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"></pre>\n"
+      ],
+      "text/plain": []
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/html": [
+       "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\">\n",
+       "</pre>\n"
+      ],
+      "text/plain": [
+       "\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "application/vnd.jupyter.widget-view+json": {
+       "model_id": "7ca6bbf2d60b4b1ca27502790cdf4593",
+       "version_major": 2,
+       "version_minor": 0
+      },
+      "text/plain": [
+       "Output()"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/html": [
+       "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"></pre>\n"
+      ],
+      "text/plain": []
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/html": [
+       "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\">\n",
+       "</pre>\n"
+      ],
+      "text/plain": [
+       "\n"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "import random\n",
+    "\n",
+    "# Iterate over the samples in the dataset\n",
+    "records = []\n",
+    "for example in dataset:\n",
+    "    \n",
+    "    # Add the records to the FeedbackDataset\n",
+    "    record = rg.FeedbackRecord(\n",
+    "        fields={\n",
+    "            \"instruction\": example[\"instructions\"],\n",
+    "            \"generation_1\": example[\"response_model_1\"],\n",
+    "            \"generation_2\": example[\"response_model_2\"]\n",
+    "        },\n",
+    "        metadata={\n",
+    "            # we randomly assign a record/task to the annotators\n",
+    "            \"annotator-groups\": random.choice(annotators), \n",
+    "            \"source-dataset\": \"bigcode/self-oss-instruct-sc2-exec-filter-50k\"\n",
+    "        }\n",
+    "    )\n",
+    "    \n",
+    "    # Optional: add prefilled suggestion\n",
+    "    # you can use this to fill Questions with suggestions from an LLM-as-a-judge system\n",
+    "    # to further speed up manual annotation\n",
+    "    #record.suggestions = [\n",
+    "    #    {\n",
+    "    #        \"question_name\": \"score_response_1\",\n",
+    "    #        \"value\": example[\"llm_judge_rating\"],\n",
+    "    #        \"agent\": \"llama-3-70b-instruct\"\n",
+    "    #    },\n",
+    "    #]\n",
+    "    \n",
+    "    try:\n",
+    "        dataset_argilla.add_records(record, show_progress=True)\n",
+    "    except Exception as e:\n",
+    "        print(\"Exception:\", e)\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e6488c2f-d30c-46ad-af7f-15cfc8b2baee",
+   "metadata": {},
+   "source": [
+    "**The final annotation interface** will look similar to this:\n",
+    "\n",
+    "<img src=\"https://github.com/MoritzLaurer/huggingface_materials/blob/27ed6b49365bed9e3f65a675c322ec935086479c/images_misc/argilla-codellm-interface.png?raw=true\" alt=\"image description\"> "
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "56980744-2394-41e1-b004-89c137afdf5d",
+   "metadata": {
+    "tags": []
+   },
+   "source": [
+    "**Assign tasks to annotators**: Argilla supports assigning tasks to multiple users/annotators. There are different ways of implementing task assignments, [documented here](https://docs.argilla.io/en/latest/practical_guides/assign_records.html). For this tutorial, we use the simplest metadata method, where everyone has access to the same full dataset and all annotations (via the `annotators` variable created above). To access the annotations assigned to them, an annotator then needs to use the `Metadata` filter in the interface to filter the data to only see records assigned to them (see image below). For larger teams and to get multiple annotations for the same record, it is better to use other task assignment methods. \n",
+    "\n",
+    "<img src=\"https://github.com/MoritzLaurer/huggingface_materials/blob/996d4f942948335b884bd251b4d2d5ce0e162468/images_misc/argilla-annotators-metadata.png?raw=true\" width=\"350\" alt=\"image description\">"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c998cd39-9a5e-4554-b577-fac62bd3bfe6",
+   "metadata": {},
+   "source": [
+    "## Annotate"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1501558a-0c96-4b01-9a25-b0c6d6903d68",
+   "metadata": {},
+   "source": [
+    "That's it, we've created our custom data annotation interface with Argilla and we can now start annotating!  \n",
+    "\n",
+    "\n",
+    "**Important**: If you use Argilla in a HF Space, you need to activate persistent storage so that your data is safely stored and not automatically deleted after a while. For production settings, make sure that persistent storage is activated **before** making any annotations to avoid data loss.   "
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a34e3e51-f68f-4980-89e6-d7fb6435109f",
+   "metadata": {},
+   "source": [
+    "## Download annotated data\n",
+    "After annotating, you can pull the data from Argilla and simply store and process them locally in any tabular format (see [docs here](https://docs.argilla.io/en/latest/practical_guides/export_dataset.html)). You can also download filtered version of the dataset ([docs](https://docs.argilla.io/en/latest/tutorials_and_integrations/tutorials/feedback/end2end_examples/filter-and-query-008.html))."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 14,
+   "id": "c12858d4-c1bc-4750-bed2-b84f9ed3afe9",
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [
+    {
+     "data": {
+      "text/html": [
+       "<div>\n",
+       "<style scoped>\n",
+       "    .dataframe tbody tr th:only-of-type {\n",
+       "        vertical-align: middle;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe tbody tr th {\n",
+       "        vertical-align: top;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe thead th {\n",
+       "        text-align: right;\n",
+       "    }\n",
+       "</style>\n",
+       "<table border=\"1\" class=\"dataframe\">\n",
+       "  <thead>\n",
+       "    <tr style=\"text-align: right;\">\n",
+       "      <th></th>\n",
+       "      <th>instruction</th>\n",
+       "      <th>generation_1</th>\n",
+       "      <th>generation_2</th>\n",
+       "      <th>score_response_1</th>\n",
+       "      <th>score_response_1-suggestion</th>\n",
+       "      <th>score_response_1-suggestion-metadata</th>\n",
+       "      <th>score_response_2</th>\n",
+       "      <th>score_response_2-suggestion</th>\n",
+       "      <th>score_response_2-suggestion-metadata</th>\n",
+       "      <th>which_response_corrected</th>\n",
+       "      <th>which_response_corrected-suggestion</th>\n",
+       "      <th>which_response_corrected-suggestion-metadata</th>\n",
+       "      <th>correction</th>\n",
+       "      <th>correction-suggestion</th>\n",
+       "      <th>correction-suggestion-metadata</th>\n",
+       "      <th>comments</th>\n",
+       "      <th>comments-suggestion</th>\n",
+       "      <th>comments-suggestion-metadata</th>\n",
+       "      <th>external_id</th>\n",
+       "      <th>metadata</th>\n",
+       "    </tr>\n",
+       "  </thead>\n",
+       "  <tbody>\n",
+       "    <tr>\n",
+       "      <th>0</th>\n",
+       "      <td>Write a Python function named `get_value` that...</td>\n",
+       "      <td>Here's a Python function that meets your requi...</td>\n",
+       "      <td>Here is a Python function that does what you d...</td>\n",
+       "      <td>[]</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>{'type': None, 'score': None, 'agent': None}</td>\n",
+       "      <td>[]</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>{'type': None, 'score': None, 'agent': None}</td>\n",
+       "      <td>[]</td>\n",
+       "      <td>None</td>\n",
+       "      <td>{'type': None, 'score': None, 'agent': None}</td>\n",
+       "      <td>[]</td>\n",
+       "      <td>None</td>\n",
+       "      <td>{'type': None, 'score': None, 'agent': None}</td>\n",
+       "      <td>[]</td>\n",
+       "      <td>None</td>\n",
+       "      <td>{'type': None, 'score': None, 'agent': None}</td>\n",
+       "      <td>None</td>\n",
+       "      <td>{\"annotator-groups\": \"annotator-2\", \"source-da...</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>1</th>\n",
+       "      <td>Write a Python function `check_collision` that...</td>\n",
+       "      <td>Here's a Python function `check_collision` tha...</td>\n",
+       "      <td>Here is a Python function that checks for coll...</td>\n",
+       "      <td>[]</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>{'type': None, 'score': None, 'agent': None}</td>\n",
+       "      <td>[]</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>{'type': None, 'score': None, 'agent': None}</td>\n",
+       "      <td>[]</td>\n",
+       "      <td>None</td>\n",
+       "      <td>{'type': None, 'score': None, 'agent': None}</td>\n",
+       "      <td>[]</td>\n",
+       "      <td>None</td>\n",
+       "      <td>{'type': None, 'score': None, 'agent': None}</td>\n",
+       "      <td>[]</td>\n",
+       "      <td>None</td>\n",
+       "      <td>{'type': None, 'score': None, 'agent': None}</td>\n",
+       "      <td>None</td>\n",
+       "      <td>{\"annotator-groups\": \"annotator-3\", \"source-da...</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>2</th>\n",
+       "      <td>Create a Python function to serialize and dese...</td>\n",
+       "      <td>Here's a Python function that serializes and d...</td>\n",
+       "      <td>Here is an example of a Python function that s...</td>\n",
+       "      <td>[]</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>{'type': None, 'score': None, 'agent': None}</td>\n",
+       "      <td>[]</td>\n",
+       "      <td>NaN</td>\n",
+       "      <td>{'type': None, 'score': None, 'agent': None}</td>\n",
+       "      <td>[]</td>\n",
+       "      <td>None</td>\n",
+       "      <td>{'type': None, 'score': None, 'agent': None}</td>\n",
+       "      <td>[]</td>\n",
+       "      <td>None</td>\n",
+       "      <td>{'type': None, 'score': None, 'agent': None}</td>\n",
+       "      <td>[]</td>\n",
+       "      <td>None</td>\n",
+       "      <td>{'type': None, 'score': None, 'agent': None}</td>\n",
+       "      <td>None</td>\n",
+       "      <td>{\"annotator-groups\": \"annotator-1\", \"source-da...</td>\n",
+       "    </tr>\n",
+       "  </tbody>\n",
+       "</table>\n",
+       "</div>"
+      ],
+      "text/plain": [
+       "                                         instruction  \\\n",
+       "0  Write a Python function named `get_value` that...   \n",
+       "1  Write a Python function `check_collision` that...   \n",
+       "2  Create a Python function to serialize and dese...   \n",
+       "\n",
+       "                                        generation_1  \\\n",
+       "0  Here's a Python function that meets your requi...   \n",
+       "1  Here's a Python function `check_collision` tha...   \n",
+       "2  Here's a Python function that serializes and d...   \n",
+       "\n",
+       "                                        generation_2 score_response_1  \\\n",
+       "0  Here is a Python function that does what you d...               []   \n",
+       "1  Here is a Python function that checks for coll...               []   \n",
+       "2  Here is an example of a Python function that s...               []   \n",
+       "\n",
+       "   score_response_1-suggestion          score_response_1-suggestion-metadata  \\\n",
+       "0                          NaN  {'type': None, 'score': None, 'agent': None}   \n",
+       "1                          NaN  {'type': None, 'score': None, 'agent': None}   \n",
+       "2                          NaN  {'type': None, 'score': None, 'agent': None}   \n",
+       "\n",
+       "  score_response_2  score_response_2-suggestion  \\\n",
+       "0               []                          NaN   \n",
+       "1               []                          NaN   \n",
+       "2               []                          NaN   \n",
+       "\n",
+       "           score_response_2-suggestion-metadata which_response_corrected  \\\n",
+       "0  {'type': None, 'score': None, 'agent': None}                       []   \n",
+       "1  {'type': None, 'score': None, 'agent': None}                       []   \n",
+       "2  {'type': None, 'score': None, 'agent': None}                       []   \n",
+       "\n",
+       "  which_response_corrected-suggestion  \\\n",
+       "0                                None   \n",
+       "1                                None   \n",
+       "2                                None   \n",
+       "\n",
+       "   which_response_corrected-suggestion-metadata correction  \\\n",
+       "0  {'type': None, 'score': None, 'agent': None}         []   \n",
+       "1  {'type': None, 'score': None, 'agent': None}         []   \n",
+       "2  {'type': None, 'score': None, 'agent': None}         []   \n",
+       "\n",
+       "  correction-suggestion                correction-suggestion-metadata  \\\n",
+       "0                  None  {'type': None, 'score': None, 'agent': None}   \n",
+       "1                  None  {'type': None, 'score': None, 'agent': None}   \n",
+       "2                  None  {'type': None, 'score': None, 'agent': None}   \n",
+       "\n",
+       "  comments comments-suggestion                  comments-suggestion-metadata  \\\n",
+       "0       []                None  {'type': None, 'score': None, 'agent': None}   \n",
+       "1       []                None  {'type': None, 'score': None, 'agent': None}   \n",
+       "2       []                None  {'type': None, 'score': None, 'agent': None}   \n",
+       "\n",
+       "  external_id                                           metadata  \n",
+       "0        None  {\"annotator-groups\": \"annotator-2\", \"source-da...  \n",
+       "1        None  {\"annotator-groups\": \"annotator-3\", \"source-da...  \n",
+       "2        None  {\"annotator-groups\": \"annotator-1\", \"source-da...  "
+      ]
+     },
+     "execution_count": 14,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "remote_dataset = rg.FeedbackDataset.from_argilla(dataset_argilla_name, workspace=\"admin\")\n",
+    "\n",
+    "# pull the first N records from the remote dataset\n",
+    "local_dataset = remote_dataset.pull(max_records=100) \n",
+    "\n",
+    "# transform Argilla dataset to HF dataset\n",
+    "hf_dataset = local_dataset.format_as(\"datasets\")\n",
+    "\n",
+    "# This HF dataset can then be formatted, stored and processed into any tabular data format\n",
+    "# Display the annotated dataset:\n",
+    "hf_dataset.to_pandas()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 15,
+   "id": "740c62c5-d7d5-41f4-b957-8b7f1c49d4a5",
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [
+    {
+     "data": {
+      "application/vnd.jupyter.widget-view+json": {
+       "model_id": "34e25f328ca74b89ba31949271f2b276",
+       "version_major": 2,
+       "version_minor": 0
+      },
+      "text/plain": [
+       "Creating CSV from Arrow format:   0%|          | 0/1 [00:00<?, ?ba/s]"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "10602"
+      ]
+     },
+     "execution_count": 15,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "# Store the dataset locally\n",
+    "hf_dataset.to_csv(\"argilla-dataset-local.csv\")  # Save as CSV\n",
+    "#hf_dataset.to_json(\"argilla-dataset-local.json\")  # Save as JSON\n",
+    "#hf_dataset.save_to_disk(\"argilla-dataset-local\")  # Save as a `datasets.Dataset` in the local filesystem\n",
+    "#hf_dataset.to_parquet()  # Save as Parquet"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "87f15fc3-2faf-43d0-92b9-3f91248420b6",
+   "metadata": {},
+   "source": [
+    "## Next Steps\n",
+    "\n",
+    "That's it! You've created synthetic LLM data with the HF inference API, created your custom annotation interface with Argilla, uploaded the LLM data into Argilla, evaluated/corrected the data, and after annotation you have downloaded the data in a simple tabular format for downstream use. \n",
+    "\n",
+    "We have specifically designed the pipeline and the interface for **two main use-cases**: \n",
+    "1. Evaluation: You can now simply use the numeric scores in the `score_response_1` and `score_response_2` columns to calculate which model was better overall. You can also inspect responses with very low or high ratings for a detailed error analysis. As you test or train different models, you can reuse this pipeline and track improvements of different models over time. \n",
+    "2. Training: After annotating enough data, you can create a train-test split from the data and fine-tune your own model. You can either use highly rated response texts for supervised fine-tuning with the the [TRL SFTTrainer](https://huggingface.co/docs/trl/en/sft_trainer), or you can directly use the ratings for preference-tuning techniques like DPO with the [TRL DPOTrainer](https://huggingface.co/docs/trl/en/dpo_trainer). See the [TRL docs](https://huggingface.co/docs/trl/en/index) for the pros and cons of different LLM fine-tuning techniques. \n",
+    "\n",
+    "**Adapt and improve:** Many things can be improved to tailor this pipeline to your specific use-cases. For example, you can prompt an LLM to evaluate the outputs of the two LLMs with instructions very similar to the guidelines for human annotators (\"LLM-as-a-judge\" approach). This can help further speed up your evaluation pipeline. See our [LLM-as-a-judge recipe](https://huggingface.co/learn/cookbook/llm_judge) for an example implementation of LLM-as-a-judge and our overall [Open-Source AI Cookbook](https://huggingface.co/learn/cookbook/index) for many other ideas. \n",
+    "\n",
+    "\n"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.9.6"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
diff --git a/notebooks/en/enterprise_cookbook_overview.md b/notebooks/en/enterprise_cookbook_overview.md
index b4569f9a..a90daa13 100644
--- a/notebooks/en/enterprise_cookbook_overview.md
+++ b/notebooks/en/enterprise_cookbook_overview.md
@@ -24,7 +24,7 @@ With our dedicated Inference Endpoints, you can easily deploy any model on a wid
 For more details, read also the [dedicated Endpoint](https://huggingface.co/docs/inference-endpoints/index) documentation. 
 
 
-## Data Annotation with Argilla Spaces  (coming soon)
+## [Data Annotation with Argilla Spaces](enterprise_cookbook_argilla)
 
 Whether you're zero-shot testing an LLM or training your own model, creating good test or train data is maybe the highest-value investment you can make at the beginning of your machine learning journey. Argilla is a free, open-source data annotation tool that enables you to create high-quality data for text, image, or audio tasks. Read this recipe to learn how to create a data annotation workflow (alone or in a larger team) in your browser.
 

	instruction	generation_1	generation_2	score_response_1	score_response_1-suggestion	score_response_1-suggestion-metadata	score_response_2	score_response_2-suggestion	score_response_2-suggestion-metadata	which_response_corrected	which_response_corrected-suggestion	which_response_corrected-suggestion-metadata	correction	correction-suggestion	correction-suggestion-metadata	comments	comments-suggestion	comments-suggestion-metadata	external_id	metadata
0	Write a Python function named `get_value` that...	Here's a Python function that meets your requi...	Here is a Python function that does what you d...	[]	NaN	{'type': None, 'score': None, 'agent': None}	[]	NaN	{'type': None, 'score': None, 'agent': None}	[]	None	{'type': None, 'score': None, 'agent': None}	[]	None	{'type': None, 'score': None, 'agent': None}	[]	None	{'type': None, 'score': None, 'agent': None}	None	{\"annotator-groups\": \"annotator-2\", \"source-da...
1	Write a Python function `check_collision` that...	Here's a Python function `check_collision` tha...	Here is a Python function that checks for coll...	[]	NaN	{'type': None, 'score': None, 'agent': None}	[]	NaN	{'type': None, 'score': None, 'agent': None}	[]	None	{'type': None, 'score': None, 'agent': None}	[]	None	{'type': None, 'score': None, 'agent': None}	[]	None	{'type': None, 'score': None, 'agent': None}	None	{\"annotator-groups\": \"annotator-3\", \"source-da...
2	Create a Python function to serialize and dese...	Here's a Python function that serializes and d...	Here is an example of a Python function that s...	[]	NaN	{'type': None, 'score': None, 'agent': None}	[]	NaN	{'type': None, 'score': None, 'agent': None}	[]	None	{'type': None, 'score': None, 'agent': None}	[]	None	{'type': None, 'score': None, 'agent': None}	[]	None	{'type': None, 'score': None, 'agent': None}	None	{\"annotator-groups\": \"annotator-1\", \"source-da...