added documentation, explanations

haesleinhuepf · haesleinhuepf · commit 106bcbbfb32d · 2025-06-21T14:14:20.000+02:00
diff --git a/docs/47_vision/40_vlms_guessing_segmentation_alg.ipynb b/docs/47_vision/40_vlms_guessing_segmentation_alg.ipynb
@@ -5,7 +5,9 @@
    "id": "665e6753-9c9c-4a16-98da-68ac9b783bd4",
    "metadata": {},
    "source": [
-    "# VLM prompt engineering\n"
+    "# VLMs guessing image segmentation strategies\n",
+    "\n",
+    "In this notebook we present images to VLMs and ask them which algorithm to use for segmenting the image. One could expect that depending on the image, the VLM suggests different strategies. In a second example, we demonstrate how a list of rules can be used to guide the VLM in guiding us."
    ]
   },
   {
@@ -116,7 +118,7 @@
    "id": "5e55fea8-31ae-420f-8056-b41c815145d8",
    "metadata": {},
    "source": [
-    "This is the example image we will be using."
+    "These are the example images we will be using."
    ]
   },
   {
@@ -357,12 +359,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "5be7cd84-b868-48fe-8bdb-413c6b731ff1",
-   "metadata": {
-    "tags": []
-   },
+   "id": "b191f4e7-63f6-408c-a269-3b80af47e1d4",
+   "metadata": {},
    "source": [
-    "This is the prompt we submit to the server."
+    "This helper function will send the image together with a prompt to the LLM service provider and display a word cloud of the suggested algorithms."
    ]
   },
   {
@@ -388,6 +388,16 @@
     "    plt.show()"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "id": "5be7cd84-b868-48fe-8bdb-413c6b731ff1",
+   "metadata": {
+    "tags": []
+   },
+   "source": [
+    "This is the simple prompt we submit to the server."
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": 8,
@@ -490,12 +500,12 @@
    ]
   },
   {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "64e56213-be89-45c9-9960-b1be7674712d",
+   "cell_type": "markdown",
+   "id": "66f1d7ec-1cdd-4891-8033-fb0995a2a428",
    "metadata": {},
-   "outputs": [],
-   "source": []
+   "source": [
+    "Next, we try the same strategy using a more complex prompt containing a list of rules to guide the VLM."
+   ]
   },
   {
    "cell_type": "code",
@@ -613,6 +623,25 @@
    "source": [
     "determine_algorithm(prompt, hela_cells)"
    ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "87c99776-1601-45d2-8a3a-2a61d9d670ae",
+   "metadata": {
+    "tags": []
+   },
+   "source": [
+    "## Exercise\n",
+    "Load a natural picture, e.g. showing of a cat, and ask the LLM how to process the image using both prompts above."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "97842c08-6e13-429f-a9dc-26564197637e",
+   "metadata": {},
+   "outputs": [],
+   "source": []
   }
  ],
  "metadata": {