Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[windows 10, cuda 11.8] TypeError: grid_encode_forward() Incompatible Function Arguments:, FileNotFoundError: chkpnt_face_latest.pth, ZeroDivisionError: division by zero #42

Open
linhcentrio opened this issue Sep 19, 2024 · 2 comments

Comments

@linhcentrio
Copy link

(D:\Talking_head\SyncTalk\venv) D:\Gaussian\TalkingGaussian>.\scripts\train_xx.bat data\may output\may_project 0
Optimizing output\may_project
Output folder: output\may_project [19/09 17:49:49]
Found transforms_train.json file, assuming Blender data set! [19/09 17:49:49]
Reading Training Transforms [19/09 17:49:49]
5520it [00:03, 1738.82it/s]
5520it [05:35, 16.45it/s]
Reading Test Transforms [19/09 17:55:29]
553it [00:00, 1818.82it/s]
553it [00:37, 14.56it/s]
Generating random point cloud (10000)... [19/09 17:56:11]
Loading Training Cameras [19/09 17:56:13]
Loading Test Cameras [19/09 17:56:37]
Number of points at initialisation : 10000 [19/09 17:56:39]
Setting up [LPIPS] perceptual loss: trunk [alex], v[0.1], spatial [off] [19/09 17:56:45]
D:\Talking_head\SyncTalk\venv\lib\site-packages\torchvision\models_utils.py:208: UserWarning: The parameter 'pretrained' is deprecated since 0.13 and may be removed in the future, please use 'weights' instead.
warnings.warn(
D:\Talking_head\SyncTalk\venv\lib\site-packages\torchvision\models_utils.py:223: UserWarning: Arguments other than a weight enum or None for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing weights=AlexNet_Weights.IMAGENET1K_V1. You can also use weights=AlexNet_Weights.DEFAULT to get the most up-to-date weights.
warnings.warn(msg)
Loading model from: D:\Talking_head\SyncTalk\venv\lib\site-packages\lpips\weights\v0.1\alex.pth [19/09 17:56:45]
Training progress: 4%|####4 | 2000/50000 [00:28<10:42, 74.71it/s, Loss=nan, AU25=1.2-1.3]D:\Talking_head\SyncTalk\venv\lib\site-packages\torch\utils\tensorboard[summary.py:444](http://summary.py:444/): RuntimeWarning: invalid value encountered in cast
tensor = (tensor * scale_factor).clip(0, 255).astype(np.uint8)

[ITER 2000] Evaluating test: L1 0.11171912252902985 PSNR 13.904804420471192 [19/09 17:57:16]

[ITER 2000] Evaluating train: L1 0.11286026984453201 PSNR 13.85939292907715 [19/09 17:57:18]
Training progress: 6%|######6 | 2990/50000 [00:46<10:55, 71.73it/s, Loss=nan, AU25=1.2-1.3]Traceback (most recent call last):
File "D:\Gaussian\TalkingGaussian\train_mouth.py", line 328, in
training(lp.extract(args), op.extract(args), pp.extract(args), args.test_iterations, args.save_iterations, args.checkpoint_iterations, args.start_checkpoint, args.debug_from)
File "D:\Gaussian\TalkingGaussian\train_mouth.py", line 141, in training
render_pkg = render_motion_mouth(viewpoint_cam, gaussians, motion_net, pipe, background)
File "D:\Gaussian\TalkingGaussian\gaussian_renderer*init*.py", line 238, in render_motion_mouth
motion_preds = motion_net(pc.get_xyz, audio_feat)
File "D:\Talking_head\SyncTalk\venv\lib\site-packages\torch\nn\modules[module.py](http://module.py/)", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "D:\Gaussian\TalkingGaussian\scene\motion_net.py", line 321, in forward
enc_x = self.encode_x(x, bound=self.bound)
File "D:\Gaussian\TalkingGaussian\scene\motion_net.py", line 312, in encode_x
feat_xy = self.encoder_xy(xy, bound=bound)
File "D:\Talking_head\SyncTalk\venv\lib\site-packages\torch\nn\modules[module.py](http://module.py/)", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "D:\Gaussian\TalkingGaussian\gridencoder[grid.py](http://grid.py/)", line 156, in forward
outputs = grid_encode(inputs, self.embeddings, self.offsets, self.per_level_scale, self.base_resolution, inputs.requires_grad, self.gridtype_id, self.align_corners, self.interp_id)
File "D:\Talking_head\SyncTalk\venv\lib\site-packages\torch\autograd[function.py](http://function.py/)", line 506, in apply
return super().apply(*args, **kwargs) # type: ignore[misc]
File "D:\Talking_head\SyncTalk\venv\lib\site-packages\torch\cuda\amp\autocast_mode.py", line 98, in decorate_fwd
return fwd(*args, **kwargs)
File "D:\Gaussian\TalkingGaussian\gridencoder[grid.py](http://grid.py/)", line 54, in forward
_backend.grid_encode_forward(inputs, embeddings, offsets, outputs, B, D, C, L, S, H, dy_dx, gridtype, align_corners, interpolation)
TypeError: grid_encode_forward(): incompatible function arguments. The following argument types are supported:

  1. (arg0: torch.Tensor, arg1: torch.Tensor, arg2: torch.Tensor, arg3: torch.Tensor, arg4: int, arg5: int, arg6: int, arg7: int, arg8: float, arg9: int, arg10: Optional[torch.Tensor], arg11: int, arg12: bool) -> None

Invoked with: tensor([[nan, nan],
[nan, nan],
[nan, nan],
...,
[nan, nan],
[nan, nan],
[nan, nan]], device='cuda:0', grad_fn=), Parameter containing:
tensor([[ 8.5519e-05],
[ 6.9541e-05],
[ 6.5501e-05],
...,
[-6.5308e-05],
[ 9.4824e-05],
[-7.8563e-06]], device='cuda:0', requires_grad=True), tensor([ 0, 4232, 8464, 12560, 16656, 20632, 24608, 28456, 32184, 35912,
39512, 43112, 46600], device='cuda:0', dtype=torch.int32), tensor([[[nan],
[nan],
[nan],
...,
[nan],
[nan],
[nan]],

    [[nan],
     [nan],
     [nan],
     ...,
     [nan],
     [nan],
     [nan]],

    [[nan],
     [nan],
     [nan],
     ...,
     [nan],
     [nan],
     [nan]],

    ...,

    [[0.],
     [0.],
     [0.],
     ...,
     [0.],
     [0.],
     [0.]],

    [[0.],
     [0.],
     [0.],
     ...,
     [nan],
     [nan],
     [nan]],

    [[nan],
     [nan],
     [nan],
     ...,
     [nan],
     [nan],
     [nan]]], device='cuda:0'), 10000, 2, 1, 12, -0.013818463040458927, 64, tensor([[0., 0., 0.,  ..., 0., 0., 0.],
    [0., 0., 0.,  ..., 0., 0., 0.],
    [0., 0., 0.,  ..., 0., 0., 0.],
    ...,
    [0., 0., 0.,  ..., 0., 0., 0.],
    [0., 0., 0.,  ..., 0., 0., 0.],
    [0., 0., 0.,  ..., 0., 0., 0.]], device='cuda:0'), 0, False, 0

Training progress: 6%|######6 | 2990/50000 [00:57<15:04, 51.98it/s, Loss=nan, AU25=1.2-1.3]
Optimizing output\may_project
Output folder: output\may_project [19/09 17:57:53]
Found transforms_train.json file, assuming Blender data set! [19/09 17:57:53]
Reading Training Transforms [19/09 17:57:53]
5520it [00:03, 1691.19it/s]
5520it [05:29, 16.76it/s]
Reading Test Transforms [19/09 18:03:27]
553it [00:00, 1663.55it/s]
553it [00:36, 15.14it/s]
Generating random point cloud (2000)... [19/09 18:04:09]
Loading Training Cameras [19/09 18:04:10]
Loading Test Cameras [19/09 18:04:33]
Number of points at initialisation : 2000 [19/09 18:04:34]
Setting up [LPIPS] perceptual loss: trunk [alex], v[0.1], spatial [off] [19/09 18:04:40]
D:\Talking_head\SyncTalk\venv\lib\site-packages\torchvision\models_utils.py:208: UserWarning: The parameter 'pretrained' is deprecated since 0.13 and may be removed in the future, please use 'weights' instead.
warnings.warn(
D:\Talking_head\SyncTalk\venv\lib\site-packages\torchvision\models_utils.py:223: UserWarning: Arguments other than a weight enum or None for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing weights=AlexNet_Weights.IMAGENET1K_V1. You can also use weights=AlexNet_Weights.DEFAULT to get the most up-to-date weights.
warnings.warn(msg)
Loading model from: D:\Talking_head\SyncTalk\venv\lib\site-packages\lpips\weights\v0.1\alex.pth [19/09 18:04:41]
Training progress: 4%|####3 | 2000/50000 [00:27<10:13, 78.22it/s, Loss=nan, Mouth=5.7-16.6]
[ITER 2000] Evaluating test: L1 0.1120919130350414 PSNR 13.891403951142962 [19/09 18:05:13]

[ITER 2000] Evaluating train: L1 0.11286026984453201 PSNR 13.85939292907715 [19/09 18:05:17]
Traceback (most recent call last):
File "D:\Gaussian\TalkingGaussian\train_face.py", line 394, in
training(lp.extract(args), op.extract(args), pp.extract(args), args.test_iterations, args.save_iterations, args.checkpoint_iterations, args.start_checkpoint, args.debug_from)
File "D:\Gaussian\TalkingGaussian\train_face.py", line 241, in training
training_report(tb_writer, iteration, Ll1, loss, l1_loss, iter_start.elapsed_time(iter_end), testing_iterations, scene, motion_net, render if iteration < warm_step else render_motion, (pipe, background))
File "D:\Gaussian\TalkingGaussian\train_face.py", line 365, in training_report
tb_writer.add_histogram("scene/opacity_histogram", scene.gaussians.get_opacity, iteration)
File "D:\Talking_head\SyncTalk\venv\lib\site-packages\torch\utils\tensorboard[writer.py](http://writer.py/)", line 485, in add_histogram
histogram(tag, values, bins, max_bins=max_bins), global_step, walltime
File "D:\Talking_head\SyncTalk\venv\lib\site-packages\torch\utils\tensorboard[summary.py](http://summary.py/)", line 355, in histogram
hist = make_histogram(values.astype(float), bins, max_bins)
File "D:\Talking_head\SyncTalk\venv\lib\site-packages\torch\utils\tensorboard[summary.py](http://summary.py/)", line 399, in make_histogram
raise ValueError("The histogram is empty, please file a bug report.")
ValueError: The histogram is empty, please file a bug report.
Training progress: 4%|####3 | 2000/50000 [00:46<18:46, 42.63it/s, Loss=nan, Mouth=5.7-16.6]
Optimizing output\may_project
Output folder: output\may_project [19/09 18:05:38]
Found transforms_train.json file, assuming Blender data set! [19/09 18:05:38]
Reading Training Transforms [19/09 18:05:38]
5520it [00:03, 1666.87it/s]
5520it [05:47, 15.90it/s]
Reading Test Transforms [19/09 18:11:30]
553it [00:00, 1797.20it/s]
553it [00:36, 14.95it/s]
Generating random point cloud (10000)... [19/09 18:12:12]
Loading Training Cameras [19/09 18:12:14]
Loading Test Cameras [19/09 18:12:39]
Number of points at initialisation : 10000 [19/09 18:12:41]
Traceback (most recent call last):
File "D:\Gaussian\TalkingGaussian\train_fuse.py", line 261, in
training(lp.extract(args), op.extract(args), pp.extract(args), args.test_iterations, args.save_iterations, args.checkpoint_iterations, args.start_checkpoint, args.debug_from)
File "D:\Gaussian\TalkingGaussian\train_fuse.py", line 57, in training
(model_params, motion_params, _, _) = torch.load(os.path.join(scene.model_path, "chkpnt_face_latest.pth"))
File "D:\Talking_head\SyncTalk\venv\lib\site-packages\torch[serialization.py](http://serialization.py/)", line 791, in load
with _open_file_like(f, 'rb') as opened_file:
File "D:\Talking_head\SyncTalk\venv\lib\site-packages\torch[serialization.py](http://serialization.py/)", line 271, in _open_file_like
return _open_file(name_or_buffer, mode)
File "D:\Talking_head\SyncTalk\venv\lib\site-packages\torch[serialization.py](http://serialization.py/)", line 252, in init
super().init(open(name, mode))
FileNotFoundError: [Errno 2] No such file or directory: 'output\may_project\chkpnt_face_latest.pth'
Looking for config file in output\may_project\cfg_args
Config file found: output\may_project\cfg_args
Rendering output\may_project
Found transforms_train.json file, assuming Blender data set! [19/09 18:13:04]
Reading Test Transforms [19/09 18:13:04]
553it [00:00, 1716.39it/s]
553it [00:30, 18.11it/s]
Generating random point cloud (10000)... [19/09 18:13:36]
Loading Training Cameras [19/09 18:13:36]
Loading Test Cameras [19/09 18:13:40]
Number of points at initialisation : 10000 [19/09 18:13:41]
Traceback (most recent call last):
File "D:\Gaussian\TalkingGaussian\synthesize_fuse.py", line 125, in
render_sets(model.extract(args), args.iteration, pipeline.extract(args), args.use_train, args.fast, args.dilate)
File "D:\Gaussian\TalkingGaussian\synthesize_fuse.py", line 93, in render_sets
(model_params, motion_params, model_mouth_params, motion_mouth_params) = torch.load(os.path.join(dataset.model_path, "chkpnt_fuse_latest.pth"))
File "D:\Talking_head\SyncTalk\venv\lib\site-packages\torch[serialization.py](http://serialization.py/)", line 791, in load
with _open_file_like(f, 'rb') as opened_file:
File "D:\Talking_head\SyncTalk\venv\lib\site-packages\torch[serialization.py](http://serialization.py/)", line 271, in _open_file_like
return _open_file(name_or_buffer, mode)
File "D:\Talking_head\SyncTalk\venv\lib\site-packages\torch[serialization.py](http://serialization.py/)", line 252, in init
super().init(open(name, mode))
FileNotFoundError: [Errno 2] No such file or directory: 'output\may_project\chkpnt_fuse_latest.pth'
Setting up [LPIPS] perceptual loss: trunk [alex], v[0.1], spatial [off]
D:\Talking_head\SyncTalk\venv\lib\site-packages\torchvision\models_utils.py:208: UserWarning: The parameter 'pretrained' is deprecated since 0.13 and may be removed in the future, please use 'weights' instead.
warnings.warn(
D:\Talking_head\SyncTalk\venv\lib\site-packages\torchvision\models_utils.py:223: UserWarning: Arguments other than a weight enum or None for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing weights=AlexNet_Weights.IMAGENET1K_V1. You can also use weights=AlexNet_Weights.DEFAULT to get the most up-to-date weights.
warnings.warn(msg)
Loading model from: D:\Talking_head\SyncTalk\venv\lib\site-packages\lpips\weights\v0.1\alex.pth
Traceback (most recent call last):
File "D:\Gaussian\TalkingGaussian[metrics.py](http://metrics.py/)", line 215, in
print(lmd_meter.report())
File "D:\Gaussian\TalkingGaussian[metrics.py](http://metrics.py/)", line 102, in report
return f'LMD ({self.backend}) = {self.measure():.6f}'
File "D:\Gaussian\TalkingGaussian[metrics.py](http://metrics.py/)", line 96, in measure
return self.V / self.N
ZeroDivisionError: division by zero

Repository owner deleted a comment Sep 19, 2024
Repository owner deleted a comment Sep 19, 2024
@Fictionarry
Copy link
Owner

The problem is shown to be at the gridencoder. I see you are using the environment for SyncTalk, but the gridencoder implementation in our repo is a bit different from its. Try to reinstall it with our code or align the gridencoder in our repo with that in SyncTalk.

There is another problem at

raise ValueError("The histogram is empty, please file a bug report.")
ValueError: The histogram is empty, please file a bug report.

try to replace tensorboard with tensorboardx or some other version.

@linhcentrio
Copy link
Author

thank you i'll try!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants