README
Social Reward: Evaluating and Enhancing Generative AI through Million-User Feedback from an Online Creative Community
Social Reward: Evaluating and Enhancing Generative AI through Million-User Feedback from an Online Creative Community
Arman Isajanyan1, Artur Shatveryan1, David Kocharyan1, Zhangyang Wang1,2, Humphrey Shi1,3
1Picsart AI Research (PAIR), 2UT Austin, 3Georgia Tech
Abstract:
Social reward as a form of community recognition provides a strong source of motivation for users of online platforms to engage and contribute with content. The recent progress of text-conditioned image synthesis has ushered in a collaborative era where AI empowers users to craft original visual artworks seeking community validation. Nevertheless, assessing these models in the context of collective community preference introduces distinct challenges. Existing evaluation methods predominantly center on limited size user studies guided by image quality and prompt alignment. This work pioneers a paradigm shift, unveiling Social Reward - an innovative reward modeling framework that leverages implicit feedback from social network users engaged in creative editing of generated images. We embark on an extensive journey of dataset curation and refinement, drawing from Picsart: an online visual creation and editing platform, yielding a first million-user-scale dataset of implicit human preferences for user-generated visual art named Picsart Image-Social. Our analysis exposes the shortcomings of current metrics in modeling community creative preference of text-to-image models' outputs, compelling us to introduce a novel predictive model explicitly tailored to address these limitations. Rigorous quantitative experiments and user study show that our Social Reward model aligns better with social popularity than existing metrics. Furthermore, we utilize Social Reward to fine-tune text-to-image models, yielding images that are more favored by not only Social Reward, but also other established metrics. These findings highlight the relevance and effectiveness of Social Reward in assessing community appreciation for AI-generated artworks, establishing a closer alignment with users' creative goals: creating popular visual art.
Setup environment for running train and validation
$ git clone https://github.com/Picsart-AI-Research/Social-Reward
$ cd Social-Reward
$ python -m venv venv
$ source venv/bin/activate
$ pip install pip --upgrade
$ pip install -r requirements.txt
-
PromptImagePair Dataset (
data_set.py
):- PyTorch Dataset for loading paired data (text prompts, positive images, negative images).
- Supports loading data from Parquet files or Pandas DataFrame.
- Applies a specified preprocessing function to the images.
Example:
df = pd.read_parquet('data.parquet') preprocess_fn = torchvision.transforms.Compose([torchvision.transforms.Resize(224), torchvision.transforms.ToTensor()]) dataset = PromptImagePair(df, preprocess_fn) item = dataset[0] text, pos_img, neg_img = item
-
Triplet Loss Module (
losses.py
):- PyTorch module for computing the triplet loss.
- Used for training deep embeddings for similarity learning.
Example:
loss_fn = TripletLoss(margin=0.2) anchor = torch.randn(16, 256) positive = torch.randn(16, 256) negative = torch.randn(16, 256) loss = loss_fn(anchor, positive, negative)
-
Data Preparation:
- Prepare a Parquet file (
train_data.parquet
) containing paired data (prompt, positive image path, negative image path). - Similarly, prepare a validation Parquet file (
validation_data.parquet
).
Example Parquet File Structure (
train_data.parquet
andvalidation_data.parquet
):prompt pos_path neg_path "Prompt 1" "/path/to/remixable/image1.jpg" "/path/to/non-remixable/image2.jpg" "Prompt 2" "/path/to/remixable/image3.jpg" "/path/to/non-remixable/image4.jpg" ... ... ... prompt
: Text prompt corresponding to each pair of positive and negative images.pos_path
: Path to the positive image.neg_path
: Path to the negative image.
- Prepare a Parquet file (
-
Training:
The
finetune_model
script provides various training options through command-line arguments for fine-tuning the CLIP model. Here's a description of each training option.-
--training_file
: Path to the training file (Parquet file) containing the training data. -
--training_mode
: Specifies the training mode, which determines which parts of the model are trained. Available options are:"all"
: Fine-tunes the entire CLIP model."visual"
: Fine-tunes only the visual (image) features of the model."visual_upper_layers"
: Fine-tunes the upper layers of the visual transformer."visual_upper_layers_textual_upper_layers"
: Fine-tunes both the upper layers of the visual transformer and upper layers of the textual transformer."visual_upper_layers_textual_upper_layers_deeper"
: Fine-tunes deeper layers of both the visual and textual transformers."visual_last_layer"
: Fine-tunes only the last layer of the visual transformer.
-
--batch_size
: Batch size for training. -
--n_epochs
: Number of training epochs. -
--save_folder
: Path to the folder where fine-tuned model checkpoints will be saved.
-
--loss_name
: Name of the loss function used for training. Default is"triplet"
. -
--checkout_path
: Path to a pre-trained model checkpoint for fine-tuning. Default isNone
, indicating training from CLIP weights. -
--learning_rate
: Learning rate for optimization. Default is0.00003
.
These are hyperparatemers that are used in the paper.
accelerate launch\ train_pair_pos_neg.py\ --training_file training_file.parquet\ --training_mode visual_upper_layers_textual_upper_layers\ --batch_size 32\ --n_epochs 10\ --save_folder ./clip_model\ --loss_name triplet
-
-
Validation:
- Validate the fine-tuned model using the validation script.
python validate.py 'validation_data.parquet' --checkout_path 'classifier_checkpoint.pth' --device 'cuda' --batch_size 1024 --num_workers 9
If you use our work in your research, please cite our publication:
@misc{isajanyan2024social,
title={Social Reward: Evaluating and Enhancing Generative AI through Million-User Feedback from an Online Creative Community},
author={Arman Isajanyan and Artur Shatveryan and David Kocharyan and Zhangyang Wang and Humphrey Shi},
year={2024},
eprint={2402.09872},
archivePrefix={arXiv},
primaryClass={cs.CV}
}