🖼️ Image Captioning with Attention (PyTorch)

Implementation of the “Show, Attend and Tell” paper (Xu et al., 2015) using PyTorch.
The model generates natural language captions for images via an encoder–decoder with attention mechanism.

📝 Overview

This repository demonstrates deep learning-based image captioning. Given an image, the model produces a natural language description, suitable for accessibility, image retrieval, and automatic content generation.

⚡ Features

Encoder: ResNet50 (default) or VGG19 backbone extracts spatial image features.
Attention: Additive attention mechanism highlights informative image regions during captioning.
Decoder: LSTM with attention and gating generates captions step by step.
Data Preprocessing:
- Converts Flickr-style captions.txt to captions.json.
- Splits dataset into train/test sets.
Training Pipeline: Automated preprocessing, training, and checkpointing via run_pipeline.py.
Evaluation:
- eval.py computes BLEU-1 to BLEU-4 scores.
- test.py generates captions on the test split.
Visualization: visualize_captions.py displays images with predicted captions.
Model Summary: print_modelsummary.py prints the architecture of encoder/decoder.
Results: All outputs and metrics saved in experiments/results/.

🛠 Installation

# Clone repository
git clone <repo_url>
cd Image_captioning

# Create virtual environment
python3 -m venv venv
source venv/bin/activate   # macOS/Linux
venv\Scripts\activate      # Windows

# Install dependencies
pip install --upgrade pip
pip install -r requirements.txt

🚀 Usage

1. Preprocess data and train the model

python3.12 run_pipeline.py

2. Evaluate (BLEU scores on test set)

python3.12 eval.py --checkpoint experiments/checkpoints/latest.pth --split test

3. Generate captions for the test split

python3 test.py

4. Visualize model predictions

python3 visualize_captions.py

5. Print a summary of the model

python3 print_modelsummary.py

📊 Example Results

Final Test BLEU Scores:

Metric	Score
BLEU-1	0.7941
BLEU-2	0.6684
BLEU-3	0.5700
BLEU-4	0.4850

📁 Dataset

Supports Flickr8k/Flickr30k style datasets.
Place images and captions as described in data/ (see preprocessing script for details).

👀 Example

Below: Example of a test image and a generated caption.

"A dog is running on the grass"

🤝 Contributing

Pull requests and issues are welcome!

🧩 License

This project is licensed under the MIT License — see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
configs.yaml		configs.yaml
eval.py		eval.py
image_caption.pdf		image_caption.pdf
imgcaptionlstm.ipynb		imgcaptionlstm.ipynb
print_modelsummary.py		print_modelsummary.py
requirements.txt		requirements.txt
run_pipeline.py		run_pipeline.py
test.py		test.py
train.py		train.py
visualize_captions.py		visualize_captions.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🖼️ Image Captioning with Attention (PyTorch)

📝 Overview

⚡ Features

🛠 Installation

🚀 Usage

📊 Example Results

📁 Dataset

👀 Example

🤝 Contributing

🧩 License

About

Uh oh!

Releases

Packages

Languages

License

navaneet625/Image_captioning

Folders and files

Latest commit

History

Repository files navigation

🖼️ Image Captioning with Attention (PyTorch)

📝 Overview

⚡ Features

🛠 Installation

🚀 Usage

📊 Example Results

📁 Dataset

👀 Example

🤝 Contributing

🧩 License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages