Significant Resolution Difference Between Front and Back Sides in Fine-Tuned Model GLB Outputs #48

hayoung-jeremy · 2024-06-04T04:42:18Z

Summary

fine-tuned openlrm-mix-large-1.1 model
dataset : 1000 pairs of glb, all processed by blender-script.py, containing rgba, pose, and intrinsics.npy
trained on : Runpod, A100 SXM 80GB VRAM x8 instance
the purpose of the fine tunining was overfitting, since there is not enough data for now.
the result GLB outputs' resolution is very different between front and back sides.

Configuration

train-sample.yaml

experiment:
    type: lrm
    seed: 42
    parent: lrm-objaverse
    child: small-dummyrun

model:
    camera_embed_dim: 1024 # modified for the openlrm-mix-large-1.1's config.json
    rendering_samples_per_ray: 128 # modified for the openlrm-mix-large-1.1's config.json
    transformer_dim: 1024 # modified for the openlrm-mix-large-1.1's config.json
    transformer_layers: 16 # modified for the openlrm-mix-large-1.1's config.json
    transformer_heads: 16 # modified for the openlrm-mix-large-1.1's config.json
    triplane_low_res: 32
    triplane_high_res: 64
    triplane_dim: 80 # modified for the openlrm-mix-large-1.1's config.json
    encoder_type: dinov2
    encoder_model_name: dinov2_vitb14_reg # modified for the openlrm-mix-large-1.1's config.json
    encoder_feat_dim: 768 # modified for the openlrm-mix-large-1.1's config.json
    encoder_freeze: false

dataset:
    subsets:
        -   name: objaverse
            root_dirs:
                - "/root/OpenLRM/views" # my processed data directory
            meta_path:
                train: "/root/OpenLRM/train_uids.json"
                val: "/root/OpenLRM/val_uids.json"
            sample_rate: 1.0
    sample_side_views: 3
    source_image_res: 448 # modified for the higher resolution
    render_image:
        low: 128 # modified for the higher resolution
        high: 384 # modified for the higher resolution
        region: 128 # modified for the higher resolution
    normalize_camera: true
    normed_dist_to_center: auto
    num_train_workers: 4
    num_val_workers: 2
    pin_mem: true

train:
    mixed_precision: bf16
    find_unused_parameters: false
    loss:
        pixel_weight: 1.0
        perceptual_weight: 1.0
        tv_weight: 5e-4
    optim:
        lr: 4e-4
        weight_decay: 0.05
        beta1: 0.9
        beta2: 0.95
        clip_grad_norm: 1.0
    scheduler:
        type: cosine
        warmup_real_iters: 3000
    batch_size: 2  # modified since using higher resolution
    accum_steps: 8  # modified since using higher resolution
    epochs: 1000  # modified from 60 to 1000, for overfitting the insufficient data
    debug_global_steps: null

val:
    batch_size: 2 # modified since using higher resolution
    global_step_period: 1000
    debug_batches: null

saver:
    auto_resume: true
    load_model: "/root/OpenLRM/model.safetensors" # this refers to openlrm-mix-large-1.1
    checkpoint_root: ./exps/checkpoints
    checkpoint_global_steps: 1000
    checkpoint_keep_level: 5

logger:
    stream_level: WARNING
    log_level: INFO
    log_root: ./exps/logs
    tracker_root: ./exps/trackers
    enable_profiler: false
    trackers:
        - tensorboard
    image_monitor:
        train_global_steps: 100
        samples_per_log: 4

compile:
    suppress_errors: true
    print_specializations: true
    disable: true

result

training result

[TRAIN STEP]loss=0.112, loss_pixel=0.00603, loss_perceptual=0.105, loss_tv=0.544, lr=9.87e-12: 100%|█| 13000/13000 [12:37:44<00:00,  3.50it/s]

loss value : 0.112
duration : 12:37:44

infer-l.yaml

source_size: 448 # modified to fit the fine-tuned model's source_image_res
source_cam_dist: 2.0
render_size: 384 # modified to fit the fine-tuned model's render_image high
render_views: 160
render_fps: 40
frame_size: 2
mesh_size: 384 # modified to fit the fine-tuned model's render_image high
mesh_thres: 3.0

inference result

input image

result video

result mesh (front)

result mesh (back)

As you can see above, there is not much difference in resolution in the generated videos. However, when importing the model into Blender, as shown in the images, there is a significant resolution difference exactly between the front and back sides. The front side shows relatively lower resolution, while the back side shows higher resolution inference results.

Hi @ZexinHe, I’ve tagged you since you're the owner. Sorry for the inconvenience.
I would greatly appreciate it if you could let me know what I might be doing wrong and how I can fix this issue.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Significant Resolution Difference Between Front and Back Sides in Fine-Tuned Model GLB Outputs #48

Significant Resolution Difference Between Front and Back Sides in Fine-Tuned Model GLB Outputs #48

hayoung-jeremy commented Jun 4, 2024

Significant Resolution Difference Between Front and Back Sides in Fine-Tuned Model GLB Outputs #48

Significant Resolution Difference Between Front and Back Sides in Fine-Tuned Model GLB Outputs #48

Comments

hayoung-jeremy commented Jun 4, 2024

Summary

Configuration

train-sample.yaml

result

training result

infer-l.yaml

inference result

input image

result video

result mesh (front)

result mesh (back)