Skip to content

ExtendedMGIE enhances MLLM-Guided Image Editing with Progressive Feature Blending (PFB), Cross-Attention Masking (CAM), Identity Embeddings (IE), and Gaussian Blurring (GB) for high-quality, controllable, and semantically aligned edits, leveraging Multimodal Large Language Models (MLLMs) for superior results.

Notifications You must be signed in to change notification settings

Cardano-max/MGIE--ExtendedMGIE

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

29 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

EntendedMGIE - MLLM-Guided Image Editing with Progressive Feature Blending and Cross-Attention Masking

Welcome to the official implementation of the advanced MGIE framework, which now integrates Progressive Feature Blending (PFB), Cross-Attention Masking (CAM), Identity Embeddings (IE), and Gaussian Blurring (GB) for unparalleled text-driven image editing. Utilizing the robust capabilities of Multimodal Large Language Models (MLLMs), our enhanced framework not only ensures realistic and coherent image modifications but also meticulously preserves the identity and spatial consistency of the edited regions. These improvements enable highly detailed, semantically aligned, and controllable edits, guided by sophisticated visual-aware instructions, setting a new standard in the field of image manipulation.

Architecture of the xMGIE framework Identity Encoding Network

Table of Contents

Introduction

The MLLM-Guided Image Editing (MGIE) framework is designed to revolutionize text-driven image editing by leveraging Multimodal Large Language Models (MLLMs) to generate detailed, visually-aware instructions. While the original MGIE framework was impressive, our enhancements with Progressive Feature Blending (PFB), Cross-Attention Masking (CAM), Identity Embeddings (IE), and Gaussian Blurring (GB) elevate its capabilities to a new level of precision and realism.

Key Enhancements

  • Progressive Feature Blending (PFB): Seamlessly integrates MLLM-generated content with the original image across multiple feature levels, ensuring visual coherence and consistency.
  • Cross-Attention Masking (CAM): Provides precise control over the editing process by focusing the influence of specific text tokens on desired image regions.
    • Identity Embeddings (IE): Preserves the identity and key characteristics of objects and individuals in the image, maintaining their distinctive features throughout the editing process.
  • Gaussian Blurring (GB): Enhances spatial coherence and natural blending of edited regions with the original image through spatially-varying Gaussian blur techniques.

Extensive experiments and analyses demonstrate that our enhanced MGIE framework outperforms previous methods in terms of visual quality, semantic alignment, and faithfulness to the original image.

Paper

For a deep dive into our methodology, experiments, and results, check out our paper titled "Enhancing MLLM-Guided Image Editing with Progressive Feature Blending and Cross-Attention Masking." The paper is available in the paper/ directory and offers comprehensive insights into our techniques and their contributions.

Code

Explore the implementation of the enhanced MGIE framework in the code/ directory. Our primary implementation is provided in the mgie_implementation.ipynb Jupyter notebook, which includes step-by-step instructions for training and testing the framework on various datasets. Ensure you have all dependencies listed in the requirements.txt file.

Results

Check out our results/ directory to see sample input images and their corresponding edited outputs generated by the enhanced MGIE framework. The input_images/ subdirectory contains the original images, while the output_images/ subdirectory showcases the edited versions, highlighting the effectiveness of our framework.

Getting Started

Ready to dive in? Follow these steps to set up and start using the enhanced MGIE framework.

Prerequisites

  • Python 3.6 or above
  • PyTorch 1.9 or above
  • CUDA 11.0 or above (for GPU acceleration)

Installation

  1. Clone the repository:

    git clone https://github.com/your-username/ml-mgie-implementation.git
    cd ml-mgie-implementation
    Install the required dependencies:
    
    
  2. Install the required dependencies:

     pip install -r code/requirements.txt
    
  3. Download the pretrained models and datasets:

    • Place the pretrained MLLM model (e.g., LLaVA-7B) in the models/ directory.
    • Place the desired datasets (e.g., COCO, CUB, Oxford-102 Flowers) in the data/ directory.

Usage

  1. Open the code/mgie_implementation.ipynb Jupyter notebook(Inference).
  2. Follow the instructions to train and test the framework on your chosen datasets.
  3. Modify the notebook as needed to experiment with different settings and hyperparameters.
  4. Provide input images, text prompts, and binary masks (if applicable) to generate edited images.
  5. Find your edited images saved in the results/output_images/ directory.

Contributing

We welcome contributions! If you encounter issues or have suggestions for improvement, please open an issue or submit a pull request. Adhere to the existing code style and provide detailed explanations of your changes.

Acknowledgements

We extend our gratitude to the original MGIE framework and PFB-Diff method authors for their foundational work in text-driven image editing. Thanks also to the open-source community for providing essential tools and libraries used in this implementation.

About

ExtendedMGIE enhances MLLM-Guided Image Editing with Progressive Feature Blending (PFB), Cross-Attention Masking (CAM), Identity Embeddings (IE), and Gaussian Blurring (GB) for high-quality, controllable, and semantically aligned edits, leveraging Multimodal Large Language Models (MLLMs) for superior results.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published