Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Significant Resolution Difference Between Front and Back Sides in Fine-Tuned Model GLB Outputs #48

Open
hayoung-jeremy opened this issue Jun 4, 2024 · 0 comments

Comments

@hayoung-jeremy
Copy link

Summary

  • fine-tuned openlrm-mix-large-1.1 model
  • dataset : 1000 pairs of glb, all processed by blender-script.py, containing rgba, pose, and intrinsics.npy
  • trained on : Runpod, A100 SXM 80GB VRAM x8 instance
  • the purpose of the fine tunining was overfitting, since there is not enough data for now.
  • the result GLB outputs' resolution is very different between front and back sides.

Configuration

train-sample.yaml

experiment:
    type: lrm
    seed: 42
    parent: lrm-objaverse
    child: small-dummyrun

model:
    camera_embed_dim: 1024 # modified for the openlrm-mix-large-1.1's config.json
    rendering_samples_per_ray: 128 # modified for the openlrm-mix-large-1.1's config.json
    transformer_dim: 1024 # modified for the openlrm-mix-large-1.1's config.json
    transformer_layers: 16 # modified for the openlrm-mix-large-1.1's config.json
    transformer_heads: 16 # modified for the openlrm-mix-large-1.1's config.json
    triplane_low_res: 32
    triplane_high_res: 64
    triplane_dim: 80 # modified for the openlrm-mix-large-1.1's config.json
    encoder_type: dinov2
    encoder_model_name: dinov2_vitb14_reg # modified for the openlrm-mix-large-1.1's config.json
    encoder_feat_dim: 768 # modified for the openlrm-mix-large-1.1's config.json
    encoder_freeze: false

dataset:
    subsets:
        -   name: objaverse
            root_dirs:
                - "/root/OpenLRM/views" # my processed data directory
            meta_path:
                train: "/root/OpenLRM/train_uids.json"
                val: "/root/OpenLRM/val_uids.json"
            sample_rate: 1.0
    sample_side_views: 3
    source_image_res: 448 # modified for the higher resolution
    render_image:
        low: 128 # modified for the higher resolution
        high: 384 # modified for the higher resolution
        region: 128 # modified for the higher resolution
    normalize_camera: true
    normed_dist_to_center: auto
    num_train_workers: 4
    num_val_workers: 2
    pin_mem: true

train:
    mixed_precision: bf16
    find_unused_parameters: false
    loss:
        pixel_weight: 1.0
        perceptual_weight: 1.0
        tv_weight: 5e-4
    optim:
        lr: 4e-4
        weight_decay: 0.05
        beta1: 0.9
        beta2: 0.95
        clip_grad_norm: 1.0
    scheduler:
        type: cosine
        warmup_real_iters: 3000
    batch_size: 2  # modified since using higher resolution
    accum_steps: 8  # modified since using higher resolution
    epochs: 1000  # modified from 60 to 1000, for overfitting the insufficient data
    debug_global_steps: null

val:
    batch_size: 2 # modified since using higher resolution
    global_step_period: 1000
    debug_batches: null

saver:
    auto_resume: true
    load_model: "/root/OpenLRM/model.safetensors" # this refers to openlrm-mix-large-1.1
    checkpoint_root: ./exps/checkpoints
    checkpoint_global_steps: 1000
    checkpoint_keep_level: 5

logger:
    stream_level: WARNING
    log_level: INFO
    log_root: ./exps/logs
    tracker_root: ./exps/trackers
    enable_profiler: false
    trackers:
        - tensorboard
    image_monitor:
        train_global_steps: 100
        samples_per_log: 4

compile:
    suppress_errors: true
    print_specializations: true
    disable: true

result

training result

[TRAIN STEP]loss=0.112, loss_pixel=0.00603, loss_perceptual=0.105, loss_tv=0.544, lr=9.87e-12: 100%|| 13000/13000 [12:37:44<00:00,  3.50it/s]
  • loss value : 0.112
  • duration : 12:37:44

infer-l.yaml

source_size: 448 # modified to fit the fine-tuned model's source_image_res
source_cam_dist: 2.0
render_size: 384 # modified to fit the fine-tuned model's render_image high
render_views: 160
render_fps: 40
frame_size: 2
mesh_size: 384 # modified to fit the fine-tuned model's render_image high
mesh_thres: 3.0

inference result

input image

result video

result mesh (front)

result mesh (back)

  • As you can see above, there is not much difference in resolution in the generated videos. However, when importing the model into Blender, as shown in the images, there is a significant resolution difference exactly between the front and back sides. The front side shows relatively lower resolution, while the back side shows higher resolution inference results.

Hi @ZexinHe, I’ve tagged you since you're the owner. Sorry for the inconvenience.
I would greatly appreciate it if you could let me know what I might be doing wrong and how I can fix this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant