-
Notifications
You must be signed in to change notification settings - Fork 0
04. Scoring
imi edited this page Aug 16, 2025
·
1 revision
The quality of the generated images is evaluated using one or more scoring models. The final score guides the optimization process.
All scoring-related settings are configured in your config.yaml file.
-
scorer_method: A list of scorer identifiers to use (e.g.,[cityaes, clip, manual]). -
scorer_average_type: How to average scores from different scorers for the same image (arithmetic,geometric,quadratic). -
scorer_weight: (Optional) A dictionary to assign custom weights to scorers (e.g.,{cityaes: 1.2, clip: 0.8}). Default is1.0. -
scorer_default_device: The default device (cpuorcuda) for running scorer models. -
scorer_device: (Optional) A dictionary to override the device for specific scorers (e.g.,{imagereward: cuda}). -
scorer_alt_location: (Optional) A dictionary to specify custom paths for specific scorer models if they are not in the mainscorer_model_dir. -
scorer_filters: (Optional) A dictionary to exclude specific scorers from running on certain payloads by name. -
scorer_print_individual: IfTrue, prints the score from each individual scorer in the console. -
hpsv3_uncertainty_penalty: A multiplier for how much thehpsv3scorer's uncertainty (sigma) penalizes its final score. Default is0.5. -
forensicnoise_detection_method: The detection method for theforensicnoisescorer. Can be"structural"(default) or"colored".
-
Prompt-Image Alignment (PIA): Measure how well the image matches the text prompt.
-
clip: Uses OpenAI's CLIP model (ViT-L/14). General purpose, widely used. -
blip: Uses Salesforce's BLIP model. Often good at capturing finer details described in the prompt.
-
-
Aesthetic Quality: Measure the general visual appeal or predicted human preference, often independent of the prompt.
-
laion: Based on the LAION Aesthetic dataset predictor. A common baseline for general aesthetics. -
chad: Originally trained by Discord users on preferred generations. Can be opinionated but often aligns with popular styles.
-
-
Hybrid (PIA + Aesthetic): Aim to capture both prompt alignment and visual appeal.
-
imagereward: Trained by THUDM to predict human preferences based on prompt-image pairs. -
hpsv21: Human Preference Score v2.1. -
hpsv3: Human Preference Score v3. Returns both a score (mu) and an uncertainty value (sigma). -
pick: Based on the Pick-a-Pic dataset and model, trained on human choices between images generated from the same prompt.
-
-
Anime/Illustration Focused: Specifically trained or tuned for anime, manga, or illustrated styles.
-
cityaes: CityAesthetics model (Anime variant v1.8). Often very effective for anime styles and good at identifying generation artifacts. (recommended) -
aestheticv25: Based on the improved LAION predictor v2.5. (from Euge) -
shadowv2: Aesthetic predictor from the "shadow" model series. -
cafe: Aesthetic predictor from the "cafe" model series. -
wdaes: Aesthetic predictor from the Waifu Diffusion project.
-
-
Anatomy & Composition:
-
luminaflex,lumidinov2l,lumidinov2g: A family of scorers trained to detect anatomical flaws and composition issues.
-
-
Technical & Artifact Analysis:
-
simplequality: A fast, model-free scorer that measures basic image quality metrics like brightness and contrast. -
gammanoise: Measures the level of gamma noise in an image. -
forensicnoise: Analyzes structural noise patterns to detect AI-generated artifacts. Requires background removal (rembg). -
backgroundblackness: Measures the percentage of pure black in the background of an image. Requires background removal (rembg).
-
-
Special & Utility:
-
manual: Enables interactive scoring via the console. The user is prompted to enter a score (0-10) for each generated image. -
noai: Attempts to classify if an image is AI-generated vs. real. (Experimental, results may vary).
-
-
manualMode:- When an image is shown for manual scoring, you can type
OVERRIDE_SCOREin the console. This will interrupt the current iteration and prompt you to enter a final average score for the entire iteration, bypassing all other scorers.
- When an image is shown for manual scoring, you can type
-
hpsv3Scorer:- This scorer returns both a score (
mu) and an uncertainty value (sigma). The final score is calculated asmu - (k * sigma). - You can control the uncertainty penalty with the
hpsv3_uncertainty_penaltysetting inconfig.yaml. A higher value means a stronger penalty for uncertainty.
- This scorer returns both a score (
-
forensicnoiseScorer:- This scorer has a configurable
detection_method. You can set it in yourconfig.yamlvia theforensicnoise_detection_methodkey to change its analysis mode.
- This scorer has a configurable
Note
Some scorers, particularly the technical ones like simplequality, have internal parameters (e.g., sharpness thresholds, weights) that are not currently exposed in config.yaml. To adjust these, you would need to modify the _load_all_models method in sd_optim/scorer.py to pass them during instantiation.
-
General Purpose: A good starting point is a mix of aesthetic and prompt-alignment scorers, such as
[cityaes, hpsv3, clip]. -
Anime Focus: Prioritize anime-specific scorers.
[cityaes, shadowv2, aestheticv25]is a strong combination. -
Colors: Consider adding
backgroundblacknessto penalize models that don't generate perfect 000000 blackgrounds. -
Artifacts:
[simplequality, gammanoise, forensicnoise]can help penalize results that produce errors.
General Tips:
- Choose scorers relevant to your goal. If merging anime models, prioritize anime-focused scorers. If aiming for photorealism matching a complex prompt, prioritize PIA scorers.
- Start with fewer scorers (1-3) to understand their individual impact before combining many.
-
Adjust weights in
scorer_weightto emphasize scorers that best reflect your desired outcome. -
Use
manualscoring for a few iterations initially to get a feel for the model's output and provide direct feedback, even if you switch to automatic later.