Skip to content

this is the repository for the project image to prompt generator.

Notifications You must be signed in to change notification settings

Nancyjikadra/img_to_prompt

Repository files navigation

Reverse-Engineered Stable Diffusion: Image-Based Prompt Generation

This repository showcases a project where stable diffusion techniques are reverse-engineered to generate image-based prompts. By integrating advanced deep learning models such as ViT, CLIP, and BLIP, the project demonstrates a robust framework for leveraging these models collaboratively. The work also explores loss computation techniques to refine the generated outputs.

Features

  • Reverse Engineering Stable Diffusion: Explores the mechanics of stable diffusion and attempts to reverse it for generating image prompts.
  • Model Integration: Combines Vision Transformer (ViT), CLIP, and BLIP for feature extraction and semantic analysis.
  • Loss Computation: Implements loss functions to optimize the reverse-generation process.
  • Advanced Tech Stack: Developed using Python and PyTorch for seamless integration with deep learning models.

Tech Stack

  • Programming Language: Python
  • Framework: PyTorch
  • Models:
    • BLIP: Bootstrapped Language-Image Pretraining for image-caption tasks.
    • ViT: Vision Transformer for image recognition.
    • CLIP: Contrastive Language–Image Pretraining for linking text and images.
  • Libraries: Deep learning utilities and stable diffusion libraries.

Requirements

  • Python 3.8+
  • PyTorch 1.10+
  • BLIP, ViT, and CLIP model weights (Download instructions provided below)
  • Other dependencies:
    pip install -r requirements.txt

Setup

  1. Clone the Repository:

    git clone https://github.com/username/reverse-engineered-stable-diffusion.git  
    cd reverse-engineered-stable-diffusion  
  2. Install Dependencies:

    pip install -r requirements.txt  
  3. Download Pretrained Models:

  4. Run the Project:

    python main.py  

Project Structure

reverse-engineered-stable-diffusion/  
│  
├── models/                 # Contains model integration scripts  
├── utils/                  # Utility scripts for data preprocessing and loss computation  
├── main.py                 # Entry point for the project  
├── requirements.txt        # List of dependencies  
├── README.md               # Project documentation  
└── results/                # Outputs and visualizations  

How It Works

  1. Input Image Processing: The image is preprocessed and passed through the ViT model to extract features.
  2. Language-Image Alignment: CLIP links the extracted image features to text embeddings.
  3. Prompt Generation: BLIP generates text-based prompts from the image features.
  4. Loss Computation: The generated prompts are evaluated, and the model adjusts outputs to minimize loss.

Results

The project successfully demonstrates prompt generation from images using a reverse-engineered stable diffusion pipeline. Further refinements and experiments with loss functions can enhance the model's accuracy.

Future Work

  • Experiment with alternative diffusion models.
  • Optimize the loss function for improved prompt accuracy.
  • Expand the dataset for better generalization.

Contributions

Contributions are welcome! Please fork the repository, create a new branch, and submit a pull request.

License

This project is licensed under the MIT License.

About

this is the repository for the project image to prompt generator.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published