EntendedMGIE - MLLM-Guided Image Editing with Progressive Feature Blending and Cross-Attention Masking

Welcome to the official implementation of the advanced MGIE framework, which now integrates Progressive Feature Blending (PFB), Cross-Attention Masking (CAM), Identity Embeddings (IE), and Gaussian Blurring (GB) for unparalleled text-driven image editing. Utilizing the robust capabilities of Multimodal Large Language Models (MLLMs), our enhanced framework not only ensures realistic and coherent image modifications but also meticulously preserves the identity and spatial consistency of the edited regions. These improvements enable highly detailed, semantically aligned, and controllable edits, guided by sophisticated visual-aware instructions, setting a new standard in the field of image manipulation.

Architecture of the xMGIE framework

Introduction

The MLLM-Guided Image Editing (MGIE) framework is designed to revolutionize text-driven image editing by leveraging Multimodal Large Language Models (MLLMs) to generate detailed, visually-aware instructions. While the original MGIE framework was impressive, our enhancements with Progressive Feature Blending (PFB), Cross-Attention Masking (CAM), Identity Embeddings (IE), and Gaussian Blurring (GB) elevate its capabilities to a new level of precision and realism.

Key Enhancements

Progressive Feature Blending (PFB): Seamlessly integrates MLLM-generated content with the original image across multiple feature levels, ensuring visual coherence and consistency.
Cross-Attention Masking (CAM): Provides precise control over the editing process by focusing the influence of specific text tokens on desired image regions.
- Identity Embeddings (IE): Preserves the identity and key characteristics of objects and individuals in the image, maintaining their distinctive features throughout the editing process.
Gaussian Blurring (GB): Enhances spatial coherence and natural blending of edited regions with the original image through spatially-varying Gaussian blur techniques.

Extensive experiments and analyses demonstrate that our enhanced MGIE framework outperforms previous methods in terms of visual quality, semantic alignment, and faithfulness to the original image.

Paper

For a deep dive into our methodology, experiments, and results, check out our paper titled "Enhancing MLLM-Guided Image Editing with Progressive Feature Blending and Cross-Attention Masking." The paper is available in the paper/ directory and offers comprehensive insights into our techniques and their contributions.

Code

Explore the implementation of the enhanced MGIE framework in the code/ directory. Our primary implementation is provided in the mgie_implementation.ipynb Jupyter notebook, which includes step-by-step instructions for training and testing the framework on various datasets. Ensure you have all dependencies listed in the requirements.txt file.

Results

Check out our results/ directory to see sample input images and their corresponding edited outputs generated by the enhanced MGIE framework. The input_images/ subdirectory contains the original images, while the output_images/ subdirectory showcases the edited versions, highlighting the effectiveness of our framework.

Getting Started

Ready to dive in? Follow these steps to set up and start using the enhanced MGIE framework.

Prerequisites

Python 3.6 or above
PyTorch 1.9 or above
CUDA 11.0 or above (for GPU acceleration)

Installation

Clone the repository:

git clone https://github.com/your-username/ml-mgie-implementation.git
cd ml-mgie-implementation
Install the required dependencies:

Install the required dependencies:
```
 pip install -r code/requirements.txt
```
Download the pretrained models and datasets:
- Place the pretrained MLLM model (e.g., LLaVA-7B) in the models/ directory.
- Place the desired datasets (e.g., COCO, CUB, Oxford-102 Flowers) in the data/ directory.

Usage

Open the code/mgie_implementation.ipynb Jupyter notebook(Inference).
Follow the instructions to train and test the framework on your chosen datasets.
Modify the notebook as needed to experiment with different settings and hyperparameters.
Provide input images, text prompts, and binary masks (if applicable) to generate edited images.
Find your edited images saved in the results/output_images/ directory.

Contributing

We welcome contributions! If you encounter issues or have suggestions for improvement, please open an issue or submit a pull request. Adhere to the existing code style and provide detailed explanations of your changes.

Acknowledgements

We extend our gratitude to the original MGIE framework and PFB-Diff method authors for their foundational work in text-driven image editing. Thanks also to the open-source community for providing essential tools and libraries used in this implementation.

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
Doc		Doc
Inference		Inference
code		code
paper		paper
results		results
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

EntendedMGIE - MLLM-Guided Image Editing with Progressive Feature Blending and Cross-Attention Masking

Table of Contents

Introduction

Key Enhancements

Paper

Code

Results

Getting Started

Prerequisites

Installation

Usage

Contributing

Acknowledgements

About

Releases

Packages

Languages

Cardano-max/MGIE--ExtendedMGIE

Folders and files

Latest commit

History

Repository files navigation

EntendedMGIE - MLLM-Guided Image Editing with Progressive Feature Blending and Cross-Attention Masking

Table of Contents

Introduction

Key Enhancements

Paper

Code

Results

Getting Started

Prerequisites

Installation

Usage

Contributing

Acknowledgements

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages