A powerful image search engine that uses state-of-the-art AI models (SAM + CLIP + FAISS) to enable semantic search through images using text queries or image-based queries. The system can detect and match both entire images and specific objects within images.
- Text-to-Image Search: Find images using natural language descriptions
- Image-to-Image Search: Upload an image to find visually similar images
- Object Detection & Matching: Automatically detect objects in images and match specific regions
- Semantic Understanding: Goes beyond keyword matching using CLIP embeddings
- Fast Retrieval: FAISS-powered vector search for instant results
- Modern Web UI: Clean, responsive interface with drag-and-drop support
- Batch Upload: Process multiple images at once
- Visual Bounding Boxes: See exactly which objects matched your query
- Frontend: Modern HTML/CSS/JavaScript interface
- Backend: FastAPI server handling uploads and search requests
- AI Models:
- SAM (Segment Anything Model): Class-agnostic object segmentation
- CLIP: Vision-language model for semantic embeddings
- Vector Database: FAISS for efficient similarity search
- Storage: Local file system for images and metadata
Image Upload → Global CLIP Embedding → SAM Segmentation →
Object Cropping → Object CLIP Embeddings → FAISS Index + Metadata
Text/Image Query → CLIP Embedding → FAISS Search →
Metadata Lookup → Result Ranking → Display Results
- Python 3.8 or higher
- CUDA-capable GPU (recommended) or CPU
- 8GB+ RAM recommended
- Git
-
Clone the repository
git clone https://github.com/Prit44421/semantic-image-retrieval.git cd semantic-image-retrieval
-
Create a virtual environment
python -m venv .venv .\.venv\Scripts\Activate.ps1 -
Install dependencies
pip install -r requirements.txt -
Download SAM model weights (if not already present)
The SAM checkpoint
sam_vit_b_01ec64.pthshould be in the root directory. If missing, download it:# Download from https://github.com/facebookresearch/segment-anything#model-checkpoints # Place sam_vit_b_01ec64.pth in the project root
-
Start the server
uvicorn app.main:app --reload -
Open your browser
Navigate to
http://localhost:8000 -
Upload images
Use the "Upload Images" section to add images to your database
-
Search
Try searching with text like "a cat sitting" or upload a query image
Image_retrival/
├── app/
│ ├── main.py # FastAPI application
│ └── __pycache__/
├── static/
│ ├── index.html # Web interface
│ ├── app.js # Frontend JavaScript
│ ├── styles.css # Styling
│ └── uploads/ # Served result images
├── images/ # Stored uploaded images
├── ingest.py # Image ingestion pipeline
├── search.py # Search pipeline
├── faiss_index.faiss # FAISS vector index
├── metadata.json # Image metadata mapping
├── requirements.txt # Python dependencies
├── sam_vit_b_01ec64.pth # SAM model checkpoint
└── README.md # This file
Key configuration variables in app/main.py:
IMAGES_DIR = "images" # Image storage directory
INDEX_PATH = "faiss_index.faiss" # FAISS index file
METADATA_PATH = "metadata.json" # Metadata file
MIN_SIMILARITY = 0.2 # Minimum similarity threshold
FETCH_K = 100 # Number of candidates to fetch
SCORE_TIE_EPS = 0.02 # Tie-breaking epsilonWhen you upload an image:
- Global Embedding: The entire image is encoded using CLIP
- Segmentation: SAM detects all objects/regions in the image
- Object Embeddings: Each detected region is cropped and encoded with CLIP
- Indexing: All embeddings are added to the FAISS index
- Metadata: Mappings between index IDs and image paths/bounding boxes are stored
When you search:
- Query Encoding: Your text/image is converted to a CLIP embedding
- Vector Search: FAISS finds the most similar embeddings
- Ranking: Results are ranked by similarity score
- Deduplication: Best match per image is selected
- Display: Images with bounding boxes (for object matches) are shown
- YOLO limitation: YOLO only detects ~80 predefined classes (person, car, dog, etc.)
- SAM advantage: Detects ANY object, even if never seen during training (class-agnostic)
- CLIP advantage: Understands semantic relationships between text and images
- Together: Unlimited object detection + semantic understanding = powerful search
Each entry in metadata.json maps a FAISS index ID to image information:
{
"0": {
"image_path": "images/photo.jpg",
"type": "global",
"box": null
},
"1": {
"image_path": "images/photo.jpg",
"type": "object",
"box": [120, 50, 300, 400]
}
}- type:
global(whole image) orobject(detected region) - box: Bounding box coordinates
[x1, y1, x2, y2]for objects
The system automatically:
- Filters results by minimum similarity threshold
- Deduplicates: shows only the best match per image
- Prefers global matches over object matches when scores are similar
- Ranks by similarity score
When an object match is returned, the UI draws a green bounding box highlighting the matched region in the image.
- Ensure SAM checkpoint file exists in the root directory
- Check CUDA availability with
torch.cuda.is_available()
- Reduce batch size in ingestion
- Use CPU instead of GPU
- Close other applications
- Check that images have been uploaded and ingested
- Verify
faiss_index.faissandmetadata.jsonexist - Lower
MIN_SIMILARITYthreshold
- Check file permissions on
images/directory - Ensure images are valid formats (JPG, PNG)
- Check available disk space
- Ingestion: ~10-30 seconds per image (GPU) / ~30-90 seconds (CPU)
- Search: <1 second for most queries
- Index Size: ~2KB per embedding (512-dim float32)
- Scalability: Tested with 1000+ images
- FastAPI: Modern Python web framework
- PyTorch: Deep learning framework
- CLIP: OpenAI's vision-language model
- SAM: Meta's Segment Anything Model
- FAISS: Facebook's similarity search library
- Pillow: Image processing
- Uvicorn: ASGI server
Contributions are welcome! Please feel free to submit a Pull Request.
- Fork the repository
- Create your feature branch (
git checkout -b feature/AmazingFeature) - Commit your changes (
git commit -m 'Add some AmazingFeature') - Push to the branch (
git push origin feature/AmazingFeature) - Open a Pull Request
- Segment Anything Model (SAM) by Meta AI
- CLIP by OpenAI
- FAISS by Facebook Research
- FastAPI by Sebastián Ramírez
Note: This project is designed for educational and research purposes. For production use, consider additional optimizations, security measures, and scalability improvements.