Skip to content

Neural Image Captioning with Attention — PyTorch implementation of "Show, Attend and Tell". Generates natural language captions for images using ResNet/VGG encoders + LSTM with attention.

License

Notifications You must be signed in to change notification settings

navaneet625/Image_captioning

Repository files navigation

🖼️ Image Captioning with Attention (PyTorch)

Implementation of the “Show, Attend and Tell” paper (Xu et al., 2015) using PyTorch.
The model generates natural language captions for images via an encoder–decoder with attention mechanism.


📝 Overview

This repository demonstrates deep learning-based image captioning. Given an image, the model produces a natural language description, suitable for accessibility, image retrieval, and automatic content generation.


⚡ Features

  • Encoder: ResNet50 (default) or VGG19 backbone extracts spatial image features.
  • Attention: Additive attention mechanism highlights informative image regions during captioning.
  • Decoder: LSTM with attention and gating generates captions step by step.
  • Data Preprocessing:
    • Converts Flickr-style captions.txt to captions.json.
    • Splits dataset into train/test sets.
  • Training Pipeline: Automated preprocessing, training, and checkpointing via run_pipeline.py.
  • Evaluation:
    • eval.py computes BLEU-1 to BLEU-4 scores.
    • test.py generates captions on the test split.
  • Visualization: visualize_captions.py displays images with predicted captions.
  • Model Summary: print_modelsummary.py prints the architecture of encoder/decoder.
  • Results: All outputs and metrics saved in experiments/results/.

🛠 Installation

# Clone repository
git clone <repo_url>
cd Image_captioning

# Create virtual environment
python3 -m venv venv
source venv/bin/activate   # macOS/Linux
venv\Scripts\activate      # Windows

# Install dependencies
pip install --upgrade pip
pip install -r requirements.txt

🚀 Usage

1. Preprocess data and train the model

python3.12 run_pipeline.py

2. Evaluate (BLEU scores on test set)

python3.12 eval.py --checkpoint experiments/checkpoints/latest.pth --split test

3. Generate captions for the test split

python3 test.py

4. Visualize model predictions

python3 visualize_captions.py

5. Print a summary of the model

python3 print_modelsummary.py

📊 Example Results

Final Test BLEU Scores:

Metric Score
BLEU-1 0.7941
BLEU-2 0.6684
BLEU-3 0.5700
BLEU-4 0.4850

📁 Dataset

  • Supports Flickr8k/Flickr30k style datasets.
  • Place images and captions as described in data/ (see preprocessing script for details).

👀 Example

Below: Example of a test image and a generated caption.

example_image
"A dog is running on the grass"


🤝 Contributing

Pull requests and issues are welcome!


🧩 License

This project is licensed under the MIT License — see the LICENSE file for details.

About

Neural Image Captioning with Attention — PyTorch implementation of "Show, Attend and Tell". Generates natural language captions for images using ResNet/VGG encoders + LSTM with attention.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published