From 75569f289ed275b99d7db2bd1fafec8d650da66e Mon Sep 17 00:00:00 2001 From: Jacob Marks Date: Tue, 30 Apr 2024 16:44:32 -0400 Subject: [PATCH 1/3] Adds FiftyOne Art Analysis Recipe to Cookbook --- notebooks/en/_toctree.yml | 2 + .../analyzing_art_with_hf_and_fiftyone.ipynb | 851 ++++++++++++++++++ notebooks/en/index.md | 1 + 3 files changed, 854 insertions(+) create mode 100644 notebooks/en/analyzing_art_with_hf_and_fiftyone.ipynb diff --git a/notebooks/en/_toctree.yml b/notebooks/en/_toctree.yml index 72d9c2da..81725521 100644 --- a/notebooks/en/_toctree.yml +++ b/notebooks/en/_toctree.yml @@ -29,6 +29,8 @@ - title: Multimodal Recipes sections: + - local: analyzing_art_with_hf_and_fiftyone + title: Analyzing Artistic Styles with πŸ€— Transformers, πŸ€— Hub, and FiftyOne - local: faiss_with_hf_datasets_and_clip title: Embedding multimodal data for similarity search diff --git a/notebooks/en/analyzing_art_with_hf_and_fiftyone.ipynb b/notebooks/en/analyzing_art_with_hf_and_fiftyone.ipynb new file mode 100644 index 00000000..15655617 --- /dev/null +++ b/notebooks/en/analyzing_art_with_hf_and_fiftyone.ipynb @@ -0,0 +1,851 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Analyzing Artistic Styles with πŸ€— Transformers, πŸ€— Hub, and FiftyOne" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "*Authored by: [Jacob Marks](https://huggingface.co/jamarks)*" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "![Art Analysis Cover Image](https://huggingface.co/datasets/huggingface/cookbook-images/resolve/main/art_analysis_cover_image.jpg)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Visual data like images is incredibly information-rich, but the unstructured nature of that data makes it difficult to analyze. \n", + "\n", + "In this notebook, we'll explore how to use multimodal embeddings and computed attributes to analyze artistic styles in images. We'll use the [WikiArt dataset](https://huggingface.co/datasets/huggan/wikiart) from πŸ€— Hub, which we will load into FiftyOne for data analysis and visualization. We'll dive into the data in a variety of ways:\n", + "\n", + "- **Image Similarity Search and Semantic Search**: We'll generate multimodal embeddings for the images in the dataset using a pre-trained [CLIP](https://huggingface.co/openai/clip-vit-base-patch32) model from πŸ€— Transformers and index the data to allow for unstructured searches.\n", + "\n", + "- **Clustering and Visualization**: We'll cluster the images based on their artistic style using the embeddings and visualize the results using UMAP dimensionality reduction.\n", + "\n", + "- **Uniqueness Analysis**: We'll use our embeddings to assign a uniqueness score to each image based on how similar it is to other images in the dataset.\n", + "\n", + "- **Image Quality Analysis**: We'll compute image quality metrics like brightness, contrast, and saturation for each image and see how these metrics correlate with the artistic style of the images." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Let's get started! πŸš€" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "To run this notebook, you'll need to install the following libraries:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "!pip install -U transformers huggingface_hub fiftyone umap-learn" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "
\n", + "Note: This notebook was tested with transformers==4.40.0, huggingface_hub==0.22.2, and fiftyone==0.23.8.\n", + "
" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Now let's import the modules that we'll need for this notebook:" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [], + "source": [ + "import fiftyone as fo # base library and app\n", + "import fiftyone.zoo as foz # zoo datasets and models\n", + "import fiftyone.brain as fob # ML routines\n", + "from fiftyone import ViewField as F # for defining custom views\n", + "import fiftyone.utils.huggingface as fouh # for loading datasets from Hugging Face" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We'll start by loading the WikiArt dataset from πŸ€— Hub into FiftyOne. This dataset can also be loaded through Hugging Face's `datasets` library, but we'll use [FiftyOne's πŸ€— Hub integration](https://docs.voxel51.com/integrations/huggingface.html#huggingface-hub) to get the data directly from the Datasets server. To make the computations fast, we'll just download the first $1,000$ samples." + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [], + "source": [ + "dataset = fouh.load_from_hub(\n", + " \"huggan/wikiart\", ## repo_id\n", + " format=\"parquet\", ## for Parquet format\n", + " classification_fields=[\"artist\", \"style\", \"genre\"], # columns to store as classification fields\n", + " max_samples=1000, # number of samples to load\n", + " name=\"wikiart\", # name of the dataset in FiftyOne\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "To make downloads lightning-fast, install [HF Transfer](https://pypi.org/project/hf-transfer/):\n", + "\n", + "```bash\n", + "pip install hf-transfer\n", + "```\n", + "\n", + "And enable by setting the environment variable `HF_HUB_ENABLE_HF_TRANSFER`:\n", + "\n", + "```bash\n", + "import os\n", + "os.environ[\"HF_HUB_ENABLE_HF_TRANSFER\"] = \"1\"\n", + "```" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Print out a summary of the dataset to see what it contains:" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Name: wikiart\n", + "Media type: image\n", + "Num samples: 1000\n", + "Persistent: False\n", + "Tags: []\n", + "Sample fields:\n", + " id: fiftyone.core.fields.ObjectIdField\n", + " filepath: fiftyone.core.fields.StringField\n", + " tags: fiftyone.core.fields.ListField(fiftyone.core.fields.StringField)\n", + " metadata: fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.metadata.ImageMetadata)\n", + " artist: fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.labels.Classification)\n", + " style: fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.labels.Classification)\n", + " genre: fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.labels.Classification)\n", + " row_idx: fiftyone.core.fields.IntField\n" + ] + } + ], + "source": [ + "print(dataset)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Visualize the dataset in the [FiftyOne App](https://docs.voxel51.com/user_guide/app.html):" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "session = fo.launch_app(dataset)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "![WikiArt Dataset](https://huggingface.co/datasets/huggingface/cookbook-images/resolve/main/art_analysis_wikiart_dataset.jpg)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Let's list out the names of the artists whose styles we'll be analyzing:" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "['Unknown Artist', 'albrecht-durer', 'boris-kustodiev', 'camille-pissarro', 'childe-hassam', 'claude-monet', 'edgar-degas', 'eugene-boudin', 'gustave-dore', 'ilya-repin', 'ivan-aivazovsky', 'ivan-shishkin', 'john-singer-sargent', 'marc-chagall', 'martiros-saryan', 'nicholas-roerich', 'pablo-picasso', 'paul-cezanne', 'pierre-auguste-renoir', 'pyotr-konchalovsky', 'raphael-kirchner', 'rembrandt', 'salvador-dali', 'vincent-van-gogh']\n" + ] + } + ], + "source": [ + "artists = dataset.distinct(\"artist.label\")\n", + "print(artists)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Finding Similar Artwork" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "When you find a piece of art that you like, it's natural to want to find similar pieces. We can do this with vector embeddings! What's more, by using multimodal embeddings, we will unlock the ability to find paintings that closely resemble a given text query, which could be a description of a painting or even a poem.\n", + "\n", + "Let's generate multimodal embeddings for the images using a pre-trained CLIP Vision Transformer (ViT) model from πŸ€— Transformers. Running `compute_similarity()` from the [FiftyOne Brain](https://docs.voxel51.com/user_guide/brain.html) will compute these embeddings and use them to generate a similarity index on the dataset." + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Computing embeddings...\n", + " 100% |β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 1000/1000 [5.0m elapsed, 0s remaining, 3.3 samples/s] \n" + ] + }, + { + "data": { + "text/plain": [ + "" + ] + }, + "execution_count": 5, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "fob.compute_similarity(\n", + " dataset, \n", + " model=\"zero-shot-classification-transformer-torch\", ## type of model to load from model zoo\n", + " name_or_path=\"openai/clip-vit-base-patch32\", ## repo_id of checkpoint\n", + " embeddings=\"clip_embeddings\", ## name of the field to store embeddings\n", + " brain_key=\"clip_sim\", ## key to store similarity index info\n", + " batch_size=32, ## batch size for inference\n", + " )" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "
\n", + "\n", + "Alternatively, you could load the model directly from the πŸ€— Transformers library and pass the model in directly:\n", + "\n", + "```python\n", + "from transformers import CLIPModel\n", + "model = CLIPModel.from_pretrained(\"openai/clip-vit-base-patch32\")\n", + "fob.compute_similarity(\n", + " dataset, \n", + " model=model,\n", + " embeddings=\"clip_embeddings\", ## name of the field to store embeddings\n", + " brain_key=\"clip_sim\" ## key to store similarity index info\n", + ")\n", + "```\n", + "\n", + "For a comprehensive guide to this and more, check out FiftyOne's πŸ€— Transformers integration.\n", + "
\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Refresh the FiftyOne App, select the checkbox for an image in the sample grid, and click the photo icon to see the most similar images in the dataset. On the backend, clicking this button triggers a query to the similarity index to find the most similar images to the selected image, based on the pre-computed embeddings, and displays them in the App." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "![Image Similarity Search](https://huggingface.co/datasets/huggingface/cookbook-images/resolve/main/art_analysis_image_search.gif)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We can use this to see what art pieces are most similar to a given art piece. This can be useful for finding similar art pieces (to recommend to users or add to a collection) or getting inspiration for a new piece." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "But there's more! Because CLIP is multimodal, we can also use it to perform semantic searches. This means we can search for images based on text queries. For example, we can search for \"pastel trees\" and see all the images in the dataset that are similar to that query. To do this, click on the search icon in the FiftyOne App and enter a text query:" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "![Semantic Search](https://huggingface.co/datasets/huggingface/cookbook-images/resolve/main/art_analysis_semantic_search.gif)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Behind the scenes, the text is tokenized, embedded with CLIP's text encoder, and then used to query the similarity index to find the most similar images in the dataset. This is a powerful way to search for images based on text queries and can be useful for finding images that match a particular theme or style. And this is not limited to CLIP; you can use any multimodal model from πŸ€— Transformers that can generate embeddings for images and text!" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "
\n", + "πŸ’‘ For efficient vector search and indexing over large datasets, FiftyOne has native integrations with open source vector databases:\n", + "\n", + "- [Milvus Integration](https://docs.voxel51.com/integrations/milvus.html)\n", + "- [LanceDB Integration](https://docs.voxel51.com/integrations/lancedb.html)\n", + "- [Qdrant Integration](https://docs.voxel51.com/integrations/qdrant.html)\n", + "- [Redis Integration](https://docs.voxel51.com/integrations/redis.html)\n", + "
\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Uncovering Artistic Motifs with Clustering and Visualization" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "By performing similarity and semantic searches, we can begin to interact with the data more effectively. But we can also take this a step further and add some unsupervised learning into the mix. This will help us identify artistic patterns in the WikiArt dataset, from stylistic, to topical, and even motifs that are hard to put into words. \n", + "\n", + "We will do this in two ways:\n", + "\n", + "1. **Dimensionality Reduction**: We'll use UMAP to reduce the dimensionality of the embeddings to 2D and visualize the data in a scatter plot. This will allow us to see how the images cluster based on their stlye, genre, and artist.\n", + "2. **Clustering**: We'll use K-Means clustering to cluster the images based on their embeddings and see what groups emerge." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "For dimensionality reduction, we will run `compute_visualization()` from the FiftyOne Brain, passing in the previously computed embeddings. We specify `method=\"umap\"` to use UMAP for dimensionality reduction, but we could also use PCA or t-SNE:" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Generating visualization...\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "/opt/homebrew/Caskroom/miniforge/base/envs/fdev/lib/python3.9/site-packages/numba/cpython/hashing.py:482: UserWarning: FNV hashing is not implemented in Numba. See PEP 456 https://www.python.org/dev/peps/pep-0456/ for rationale over not using FNV. Numba will continue to work, but hashes for built in types will be computed using siphash24. This will permit e.g. dictionaries to continue to behave as expected, however anything relying on the value of the hash opposed to hash as a derived property is likely to not work as expected.\n", + " warnings.warn(msg)\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "UMAP( verbose=True)\n", + "Tue Apr 30 11:51:45 2024 Construct fuzzy simplicial set\n", + "Tue Apr 30 11:51:46 2024 Finding Nearest Neighbors\n", + "Tue Apr 30 11:51:47 2024 Finished Nearest Neighbor Search\n", + "Tue Apr 30 11:51:48 2024 Construct embedding\n" + ] + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "98dde3df324249df91f3336c913b409a", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "Epochs completed: 0%| 0/500 [00:00]" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\tcompleted 0 / 500 epochs\n", + "\tcompleted 50 / 500 epochs\n", + "\tcompleted 100 / 500 epochs\n", + "\tcompleted 150 / 500 epochs\n", + "\tcompleted 200 / 500 epochs\n", + "\tcompleted 250 / 500 epochs\n", + "\tcompleted 300 / 500 epochs\n", + "\tcompleted 350 / 500 epochs\n", + "\tcompleted 400 / 500 epochs\n", + "\tcompleted 450 / 500 epochs\n", + "Tue Apr 30 11:51:49 2024 Finished embedding\n" + ] + }, + { + "data": { + "text/plain": [ + "" + ] + }, + "execution_count": 6, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "fob.compute_visualization(dataset, embeddings=\"clip_embeddings\", method=\"umap\", brain_key=\"clip_vis\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Now we can open a panel in the FiftyOne App, where we will see one 2D point for each image in the dataset. We can color the points by any field in the dataset, such as the artist or genre, to see how strongly these attributes are captured by our image features:" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "![UMAP Visualization](https://huggingface.co/datasets/huggingface/cookbook-images/resolve/main/art_analysis_visualize_embeddings.gif)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "
\n", + "πŸ“š For a comprehensive tutorial on dimensionality reduction techniques for visual data, check out Visualizing Data with Dimensionality Reduction Techniques.\n", + "
" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We can also run clustering on the embeddings to group similar images together β€”Β perhaps the dominant features of these works of art are not captured by the existing labels, or maybe there are distinct sub-genres that we want to identify. To cluster our data, we will need to download the [FiftyOne Clustering Plugin](https://github.com/jacobmarks/clustering-plugin):" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "!fiftyone plugins download https://github.com/jacobmarks/clustering-plugin" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Refreshing the app again, we can then access the clustering functionality via an operator in the app. Hit the backtick key to open the operator list, type \"cluster\" and select the operator from the dropdown. This will open an interactive panel where we can specify the clustering algorithm, hyperparameters, and the field to cluster on. To keep it simple, we'll use K-Means clustering with $10$ clusters:" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We can then visualize the clusters in the app and see how the images group together based on their embeddings:" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "![K-means Clustering](https://huggingface.co/datasets/huggingface/cookbook-images/resolve/main/art_analysis_clustering.gif)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We can see that some of the clusters select for artist; others select for genre or style. Others are more abstract and may represent sub-genres or other groupings that are not immediately obvious from the data." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "
\n", + "πŸ“š For a comprehensive tutorial on clustering for visual data, check out Clustering Images with Embeddings.\n", + "
" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Identifying the Most Unique Works of Art" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "One interesting question we can ask about our dataset is how *unique* each image is. This question is important for many applications, such as recommending similar images, detecting duplicates, or identifying outliers. In the context of art, how unique a painting is could be an important factor in determining its value.\n", + "\n", + "While there are a million ways to characterize uniqueness, our image embeddings allow us to quantitatively assign each sample a uniqueness score based on how similar it is to other samples in the dataset. Explicitly, the FiftyOne Brain's `compute_uniqueness()` function looks at the distance between each sample's embedding and its nearest neighbors, and computes a score between $0$ and $1$ based on this distance. A score of $0$ means the sample is nondescript or very similar to others, while a score of $1$ means the sample is very unique." + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Computing uniqueness...\n", + "Uniqueness computation complete\n" + ] + } + ], + "source": [ + "fob.compute_uniqueness(dataset, embeddings=\"clip_embeddings\") # compute uniqueness using CLIP embeddings" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We can then color by this in the embeddings panel, filter by uniqueness score, or even sort by it to see the most unique images in the dataset:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "most_unique_view = dataset.sort_by(\"uniqueness\", reverse=True)\n", + "session.view = most_unique_view.view() # Most unique images" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "![Most Unique Images](https://huggingface.co/datasets/huggingface/cookbook-images/resolve/main/art_analysis_most_unique.jpg)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "least_unique_view = dataset.sort_by(\"uniqueness\", reverse=False)\n", + "session.view = least_unique_view.view() # Least unique images" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "![Least Unique Images](https://huggingface.co/datasets/huggingface/cookbook-images/resolve/main/art_analysis_least_unique.jpg)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Going a step further, we can also answer the question of which artist tends to produce the most unique works. We can compute the average uniqueness score for each artist across all of their works of art:" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Unknown Artist: 0.7932221632002723\n", + "boris-kustodiev: 0.7480731948424676\n", + "salvador-dali: 0.7368807620414014\n", + "raphael-kirchner: 0.7315448102204755\n", + "ilya-repin: 0.7204744626806383\n", + "marc-chagall: 0.7169373812321908\n", + "rembrandt: 0.715205220292227\n", + "martiros-saryan: 0.708560775790436\n", + "childe-hassam: 0.7018343391132756\n", + "edgar-degas: 0.699912746806587\n", + "albrecht-durer: 0.6969358680800216\n", + "john-singer-sargent: 0.6839955708720844\n", + "pablo-picasso: 0.6835137858302969\n", + "pyotr-konchalovsky: 0.6780653000855895\n", + "nicholas-roerich: 0.6676504687452387\n", + "ivan-aivazovsky: 0.6484361530090199\n", + "vincent-van-gogh: 0.6472004520699081\n", + "gustave-dore: 0.6307283287457358\n", + "pierre-auguste-renoir: 0.6271467146993583\n", + "paul-cezanne: 0.6251076007168186\n", + "eugene-boudin: 0.6103397516167454\n", + "camille-pissarro: 0.6046182609119615\n", + "claude-monet: 0.5998234558947573\n", + "ivan-shishkin: 0.589796389836674\n" + ] + } + ], + "source": [ + "artist_unique_scores = {\n", + " artist: dataset.match(F(\"artist.label\") == artist).mean(\"uniqueness\")\n", + " for artist in artists\n", + "}\n", + "\n", + "sorted_artists = sorted(\n", + " artist_unique_scores, key=artist_unique_scores.get, reverse=True\n", + ")\n", + "\n", + "for artist in sorted_artists:\n", + " print(f\"{artist}: {artist_unique_scores[artist]}\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "It would seem that the artist with the most unique works in our dataset is Boris Kustodiev! Let's take a look at some of his works:" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": {}, + "outputs": [], + "source": [ + "kustodiev_view = dataset.match(F(\"artist.label\") == \"boris-kustodiev\")\n", + "session.view = kustodiev_view.view()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "![Boris Kustodiev Artwork](https://huggingface.co/datasets/huggingface/cookbook-images/resolve/main/art_analysis_kustodiev_view.jpg)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "
\n", + "πŸ“š For a comprehensive tutorial on uniqueness for visual data, check out Exploring Image Uniqueness with FiftyOne.\n", + "
" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Characterizing Art with Visual Qualities" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "To round things out, let's go back to the basics and analyze some core qualities of the images in our dataset. We'll compute standard metrics like brightness, contrast, and saturation for each image and see how these metrics correlate with the artistic style and genre of the art pieces." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "To run these analyses, we will need to download the [FiftyOne Image Quality Plugin](https://github.com/jacobmarks/image-quality-issues):" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "!fiftyone plugins download https://github.com/jacobmarks/image-quality-issues/" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Refresh the app and open the operators list again. This time type `compute` and select one of the image quality operators. We'll start with brightness:" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "![Compute Brightness](https://huggingface.co/datasets/huggingface/cookbook-images/resolve/main/art_analysis_compute_brightness.gif)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "When the operator finishes running, we will have a new field in our dataset that contains the brightness score for each image. We can then visualize this data in the app:" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "![Brightness](https://huggingface.co/datasets/huggingface/cookbook-images/resolve/main/art_analysis_brightness.gif)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We can also color by brightness, and even see how it correlates with other fields in the dataset like style:" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "![Style by Brightness](https://huggingface.co/datasets/huggingface/cookbook-images/resolve/main/art_analysis_style_by_brightness.gif)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Now do the same for contrast and saturation. Here are the results for saturation:" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "![Filter by Saturation](https://huggingface.co/datasets/huggingface/cookbook-images/resolve/main/art_analysis_filter_by_saturation.jpg)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Hopefully this illustrates how not everything boils down to applying deep neural networks to your data. Sometimes, simple metrics can be just as informative and can provide a different perspective on your data πŸ€“!" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "
\n", + "πŸ“š For larger datasets, you may want to delegate the operations for later execution.\n", + "
" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## What's Next?" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "In this notebook, we've explored how to use multimodal embeddings, unsupervised learning, and traditional image processing techniques to analyze artistic styles in images. We've seen how to perform image similarity and semantic searches, cluster images based on their style, analyze the uniqueness of images, and compute image quality metrics. These techniques can be applied to a wide range of visual datasets, from art collections to medical images to satellite imagery. Try [loading a different dataset from the Hugging Face Hub](https://docs.voxel51.com/integrations/huggingface.html#loading-datasets-from-the-hub) and see what insights you can uncover!\n", + "\n", + "If you want to go even further, here are some additional analyses you could try:\n", + "\n", + "- **Zero-Shot Classification**: Use a pre-trained vision-language model from πŸ€— Transformers to categorize images in the dataset by topic or subject, without any training data. Check out this [Zero-Shot Classification tutorial](https://docs.voxel51.com/tutorials/zero_shot_classification.html) for more info.\n", + "- **Image Captioning**: Use a pre-trained vision-language model from πŸ€— Transformers to generate captions for the images in the dataset. Then use this for topic modeling or cluster artwork based on embeddings for these captions. Check out FiftyOne's [Image Captioning Plugin](https://github.com/jacobmarks/fiftyone-image-captioning-plugin) for more info." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## FiftyOne Open Source Project" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "[FiftyOne](https://github.com/voxel51/fiftyone/) is the leading open source toolkit for building high-quality datasets and computer vision models. With over 2M downloads, FiftyOne is trusted by developers and researchers across the globe.\n", + "\n", + "πŸ’ͺ The FiftyOne team welcomes contributions from the open source community! If you're interested in contributing to FiftyOne, check out the [contributing guide](https://github.com/voxel51/fiftyone/blob/develop/CONTRIBUTING.md)." + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "fdev", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.9.0" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/notebooks/en/index.md b/notebooks/en/index.md index b22cc465..40c3b994 100644 --- a/notebooks/en/index.md +++ b/notebooks/en/index.md @@ -7,6 +7,7 @@ applications and solving various machine learning tasks using open-source tools Check out the recently added notebooks: +- [Analyzing Artistic Styles with πŸ€— Transformers, πŸ€— Hub, and FiftyOne](analyzing_art_with_hf_and_fiftyone) - [Using LLM-as-a-judge πŸ§‘β€βš–οΈ for an automated and versatile evaluation](llm_judge) - [Create a legal preference dataset](pipeline_notus_instructions_preferences_legal) - [Suggestions for Data Annotation with SetFit in Zero-shot Text Classification](labelling_feedback_setfit) From a5c3a837c258452811413cd4c29349e8ac7d4b5a Mon Sep 17 00:00:00 2001 From: Jacob Marks Date: Tue, 7 May 2024 09:26:22 -0400 Subject: [PATCH 2/3] making requested changes --- notebooks/en/_toctree.yml | 2 +- .../analyzing_art_with_hf_and_fiftyone.ipynb | 95 ++++++++----------- 2 files changed, 42 insertions(+), 55 deletions(-) diff --git a/notebooks/en/_toctree.yml b/notebooks/en/_toctree.yml index 81725521..a75c3601 100644 --- a/notebooks/en/_toctree.yml +++ b/notebooks/en/_toctree.yml @@ -30,7 +30,7 @@ - title: Multimodal Recipes sections: - local: analyzing_art_with_hf_and_fiftyone - title: Analyzing Artistic Styles with πŸ€— Transformers, πŸ€— Hub, and FiftyOne + title: Analyzing Artistic Styles with Multimodal Embeddings - local: faiss_with_hf_datasets_and_clip title: Embedding multimodal data for similarity search diff --git a/notebooks/en/analyzing_art_with_hf_and_fiftyone.ipynb b/notebooks/en/analyzing_art_with_hf_and_fiftyone.ipynb index 15655617..33cedf39 100644 --- a/notebooks/en/analyzing_art_with_hf_and_fiftyone.ipynb +++ b/notebooks/en/analyzing_art_with_hf_and_fiftyone.ipynb @@ -4,7 +4,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "# Analyzing Artistic Styles with πŸ€— Transformers, πŸ€— Hub, and FiftyOne" + "# Analyzing Artistic Styles with Multimodal Embeddings" ] }, { @@ -61,6 +61,24 @@ "!pip install -U transformers huggingface_hub fiftyone umap-learn" ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "To make downloads lightning-fast, install [HF Transfer](https://pypi.org/project/hf-transfer/):\n", + "\n", + "```bash\n", + "pip install hf-transfer\n", + "```\n", + "\n", + "And enable by setting the environment variable `HF_HUB_ENABLE_HF_TRANSFER`:\n", + "\n", + "```bash\n", + "import os\n", + "os.environ[\"HF_HUB_ENABLE_HF_TRANSFER\"] = \"1\"\n", + "```" + ] + }, { "cell_type": "markdown", "metadata": {}, @@ -112,24 +130,6 @@ ")" ] }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "To make downloads lightning-fast, install [HF Transfer](https://pypi.org/project/hf-transfer/):\n", - "\n", - "```bash\n", - "pip install hf-transfer\n", - "```\n", - "\n", - "And enable by setting the environment variable `HF_HUB_ENABLE_HF_TRANSFER`:\n", - "\n", - "```bash\n", - "import os\n", - "os.environ[\"HF_HUB_ENABLE_HF_TRANSFER\"] = \"1\"\n", - "```" - ] - }, { "cell_type": "markdown", "metadata": {}, @@ -328,7 +328,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Behind the scenes, the text is tokenized, embedded with CLIP's text encoder, and then used to query the similarity index to find the most similar images in the dataset. This is a powerful way to search for images based on text queries and can be useful for finding images that match a particular theme or style. And this is not limited to CLIP; you can use any multimodal model from πŸ€— Transformers that can generate embeddings for images and text!" + "Behind the scenes, the text is tokenized, embedded with CLIP's text encoder, and then used to query the similarity index to find the most similar images in the dataset. This is a powerful way to search for images based on text queries and can be useful for finding images that match a particular theme or style. And this is not limited to CLIP; you can use any CLIP-like model from πŸ€— Transformers that can generate embeddings for images and text!" ] }, { @@ -336,12 +336,7 @@ "metadata": {}, "source": [ "
\n", - "πŸ’‘ For efficient vector search and indexing over large datasets, FiftyOne has native integrations with open source vector databases:\n", - "\n", - "- [Milvus Integration](https://docs.voxel51.com/integrations/milvus.html)\n", - "- [LanceDB Integration](https://docs.voxel51.com/integrations/lancedb.html)\n", - "- [Qdrant Integration](https://docs.voxel51.com/integrations/qdrant.html)\n", - "- [Redis Integration](https://docs.voxel51.com/integrations/redis.html)\n", + "πŸ’‘ For efficient vector search and indexing over large datasets, FiftyOne has native integrations with open source vector databases.\n", "
\n" ] }, @@ -462,15 +457,6 @@ "![UMAP Visualization](https://huggingface.co/datasets/huggingface/cookbook-images/resolve/main/art_analysis_visualize_embeddings.gif)" ] }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "
\n", - "πŸ“š For a comprehensive tutorial on dimensionality reduction techniques for visual data, check out Visualizing Data with Dimensionality Reduction Techniques.\n", - "
" - ] - }, { "cell_type": "markdown", "metadata": {}, @@ -491,7 +477,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Refreshing the app again, we can then access the clustering functionality via an operator in the app. Hit the backtick key to open the operator list, type \"cluster\" and select the operator from the dropdown. This will open an interactive panel where we can specify the clustering algorithm, hyperparameters, and the field to cluster on. To keep it simple, we'll use K-Means clustering with $10$ clusters:" + "Refreshing the app again, we can then access the clustering functionality via an operator in the app. Hit the backtick key to open the operator list, type \"cluster\" and select the operator from the dropdown. This will open an interactive panel where we can specify the clustering algorithm, hyperparameters, and the field to cluster on. To keep it simple, we'll use K-Means clustering with $10$ clusters." ] }, { @@ -515,15 +501,6 @@ "We can see that some of the clusters select for artist; others select for genre or style. Others are more abstract and may represent sub-genres or other groupings that are not immediately obvious from the data." ] }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "
\n", - "πŸ“š For a comprehensive tutorial on clustering for visual data, check out Clustering Images with Embeddings.\n", - "
" - ] - }, { "cell_type": "markdown", "metadata": {}, @@ -680,15 +657,6 @@ "![Boris Kustodiev Artwork](https://huggingface.co/datasets/huggingface/cookbook-images/resolve/main/art_analysis_kustodiev_view.jpg)" ] }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "
\n", - "πŸ“š For a comprehensive tutorial on uniqueness for visual data, check out Exploring Image Uniqueness with FiftyOne.\n", - "
" - ] - }, { "cell_type": "markdown", "metadata": {}, @@ -810,6 +778,25 @@ "- **Image Captioning**: Use a pre-trained vision-language model from πŸ€— Transformers to generate captions for the images in the dataset. Then use this for topic modeling or cluster artwork based on embeddings for these captions. Check out FiftyOne's [Image Captioning Plugin](https://github.com/jacobmarks/fiftyone-image-captioning-plugin) for more info." ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### πŸ“š Resources" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "- [FiftyOne 🀝 πŸ€— Hub Integration](https://docs.voxel51.com/integrations/huggingface.html#huggingface-hub)\n", + "- [FiftyOne 🀝 πŸ€— Transformers Integration](https://docs.voxel51.com/integrations/huggingface.html#transformers-library)\n", + "- [FiftyOne Vector Search Integrations](https://voxel51.com/vector-search/)\n", + "- [Visualizing Data with Dimensionality Reduction Techniques](https://docs.voxel51.com/tutorials/dimension_reduction.html)\n", + "- [Clustering Images with Embeddings](https://docs.voxel51.com/tutorials/clustering.html)\n", + "- [Exploring Image Uniqueness with FiftyOne](https://docs.voxel51.com/tutorials/uniqueness.html)" + ] + }, { "cell_type": "markdown", "metadata": {}, From 8a43854690db96c4c1e6539d45f13331d1f39522 Mon Sep 17 00:00:00 2001 From: Jacob Marks Date: Tue, 7 May 2024 12:55:51 -0400 Subject: [PATCH 3/3] typo fix --- notebooks/en/analyzing_art_with_hf_and_fiftyone.ipynb | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/notebooks/en/analyzing_art_with_hf_and_fiftyone.ipynb b/notebooks/en/analyzing_art_with_hf_and_fiftyone.ipynb index 33cedf39..3114da95 100644 --- a/notebooks/en/analyzing_art_with_hf_and_fiftyone.ipynb +++ b/notebooks/en/analyzing_art_with_hf_and_fiftyone.ipynb @@ -355,7 +355,7 @@ "\n", "We will do this in two ways:\n", "\n", - "1. **Dimensionality Reduction**: We'll use UMAP to reduce the dimensionality of the embeddings to 2D and visualize the data in a scatter plot. This will allow us to see how the images cluster based on their stlye, genre, and artist.\n", + "1. **Dimensionality Reduction**: We'll use UMAP to reduce the dimensionality of the embeddings to 2D and visualize the data in a scatter plot. This will allow us to see how the images cluster based on their style, genre, and artist.\n", "2. **Clustering**: We'll use K-Means clustering to cluster the images based on their embeddings and see what groups emerge." ] },