Skip to content

Commit 106bcbb

Browse files
committed
added documentation, explanations
1 parent 6778d08 commit 106bcbb

File tree

1 file changed

+41
-12
lines changed

1 file changed

+41
-12
lines changed

docs/47_vision/40_vlms_guessing_segmentation_alg.ipynb

Lines changed: 41 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,9 @@
55
"id": "665e6753-9c9c-4a16-98da-68ac9b783bd4",
66
"metadata": {},
77
"source": [
8-
"# VLM prompt engineering\n"
8+
"# VLMs guessing image segmentation strategies\n",
9+
"\n",
10+
"In this notebook we present images to VLMs and ask them which algorithm to use for segmenting the image. One could expect that depending on the image, the VLM suggests different strategies. In a second example, we demonstrate how a list of rules can be used to guide the VLM in guiding us."
911
]
1012
},
1113
{
@@ -116,7 +118,7 @@
116118
"id": "5e55fea8-31ae-420f-8056-b41c815145d8",
117119
"metadata": {},
118120
"source": [
119-
"This is the example image we will be using."
121+
"These are the example images we will be using."
120122
]
121123
},
122124
{
@@ -357,12 +359,10 @@
357359
},
358360
{
359361
"cell_type": "markdown",
360-
"id": "5be7cd84-b868-48fe-8bdb-413c6b731ff1",
361-
"metadata": {
362-
"tags": []
363-
},
362+
"id": "b191f4e7-63f6-408c-a269-3b80af47e1d4",
363+
"metadata": {},
364364
"source": [
365-
"This is the prompt we submit to the server."
365+
"This helper function will send the image together with a prompt to the LLM service provider and display a word cloud of the suggested algorithms."
366366
]
367367
},
368368
{
@@ -388,6 +388,16 @@
388388
" plt.show()"
389389
]
390390
},
391+
{
392+
"cell_type": "markdown",
393+
"id": "5be7cd84-b868-48fe-8bdb-413c6b731ff1",
394+
"metadata": {
395+
"tags": []
396+
},
397+
"source": [
398+
"This is the simple prompt we submit to the server."
399+
]
400+
},
391401
{
392402
"cell_type": "code",
393403
"execution_count": 8,
@@ -490,12 +500,12 @@
490500
]
491501
},
492502
{
493-
"cell_type": "code",
494-
"execution_count": null,
495-
"id": "64e56213-be89-45c9-9960-b1be7674712d",
503+
"cell_type": "markdown",
504+
"id": "66f1d7ec-1cdd-4891-8033-fb0995a2a428",
496505
"metadata": {},
497-
"outputs": [],
498-
"source": []
506+
"source": [
507+
"Next, we try the same strategy using a more complex prompt containing a list of rules to guide the VLM."
508+
]
499509
},
500510
{
501511
"cell_type": "code",
@@ -613,6 +623,25 @@
613623
"source": [
614624
"determine_algorithm(prompt, hela_cells)"
615625
]
626+
},
627+
{
628+
"cell_type": "markdown",
629+
"id": "87c99776-1601-45d2-8a3a-2a61d9d670ae",
630+
"metadata": {
631+
"tags": []
632+
},
633+
"source": [
634+
"## Exercise\n",
635+
"Load a natural picture, e.g. showing of a cat, and ask the LLM how to process the image using both prompts above."
636+
]
637+
},
638+
{
639+
"cell_type": "code",
640+
"execution_count": null,
641+
"id": "97842c08-6e13-429f-a9dc-26564197637e",
642+
"metadata": {},
643+
"outputs": [],
644+
"source": []
616645
}
617646
],
618647
"metadata": {

0 commit comments

Comments
 (0)