Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

bad_alloc when using Conv2d after torchvision.io.read_video #8458

Open
Inforeon opened this issue May 29, 2024 · 0 comments
Open

bad_alloc when using Conv2d after torchvision.io.read_video #8458

Inforeon opened this issue May 29, 2024 · 0 comments

Comments

@Inforeon
Copy link

馃悰 Describe the bug

Short:

Attempting a forward pass on Conv2d after loading an unrelated video with torchvision.io.read_video throws an std::bad_alloc runtime exception

Code:

import torch
import torchvision

if __name__ == '__main__':
    print(f"Pytorch Version: {torch.__version__}")
    print(f"Torchvision Version: {torchvision.__version__}")

    mp4 = torchvision.io.read_video("testpov_video.mp4", pts_unit='sec')
    print(f"Video shape: {mp4[0].shape}\tdevice: {mp4[0].device}\tdtype:{mp4[0].dtype}")

    print(f"Cuda available: {torch.cuda.is_available()}")
    print(f"Cuda device: {torch.cuda.get_device_name()}")

    test_conv = torch.nn.Conv2d(3, 8, kernel_size=1).to("cuda")
    test_in = torch.rand((1, 3, 32, 32)).to("cuda")
    test_conv(test_in)

Output

Pytorch Version: 2.3.0+cu121
Torchvision Version: 0.18.0+cu121
Video shape: torch.Size([240, 256, 256, 3])	device: cpu	dtype:torch.uint8
Cuda available: True
Cuda device: NVIDIA RTX A4000
terminate called after throwing an instance of 'std::bad_alloc'
  what():  std::bad_alloc

Process finished with exit code 134 (interrupted by signal 6: SIGABRT)

Long/Additional information:

  • My system is Ubuntu 22.04.4 in a virtual machine with ~96GB ram allocated. There are 2 gpus, an A4000 and an A5000 and this error occurs when attempting to use either of them.
  • The error does not occur when using the CPU
  • It does not occur without loading the mp4 video
  • I have tested with other mp4 files and the error still occurs
  • I attempted pip install --force-reinstall torch torchvision torchaudio and the problem persisted
  • I have not tested with other torchvision.io functions

Versions

Collecting environment information...
PyTorch version: 2.3.0+cu121
Is debug build: False
CUDA used to build PyTorch: 12.1
ROCM used to build PyTorch: N/A

OS: Ubuntu 22.04.4 LTS (x86_64)
GCC version: (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
Clang version: Could not collect
CMake version: Could not collect
Libc version: glibc-2.35

Python version: 3.10.12 (main, Nov 20 2023, 15:14:05) [GCC 11.4.0] (64-bit runtime)
Python platform: Linux-6.5.0-35-generic-x86_64-with-glibc2.35
Is CUDA available: True
CUDA runtime version: 12.5.40
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration:
GPU 0: NVIDIA RTX A5000
GPU 1: NVIDIA RTX A4000

Nvidia driver version: 535.171.04
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

CPU:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Address sizes: 40 bits physical, 48 bits virtual
Byte Order: Little Endian
CPU(s): 20
On-line CPU(s) list: 0-19
Vendor ID: AuthenticAMD
Model name: Common KVM processor
CPU family: 15
Model: 6
Thread(s) per core: 1
Core(s) per socket: 20
Socket(s): 1
Stepping: 1
BogoMIPS: 5599.99
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx lm rep_good nopl cpuid extd_apicid tsc_known_freq pni cx16 x2apic hypervisor cmp_legacy 3dnowprefetch vmmcall
Hypervisor vendor: KVM
Virtualisation type: full
L1d cache: 1.3 MiB (20 instances)
L1i cache: 1.3 MiB (20 instances)
L2 cache: 10 MiB (20 instances)
L3 cache: 320 MiB (20 instances)
NUMA node(s): 1
NUMA node0 CPU(s): 0-19
Vulnerability Gather data sampling: Not affected
Vulnerability Itlb multihit: Not affected
Vulnerability L1tf: Not affected
Vulnerability Mds: Not affected
Vulnerability Meltdown: Not affected
Vulnerability Mmio stale data: Not affected
Vulnerability Retbleed: Not affected
Vulnerability Spec rstack overflow: Not affected
Vulnerability Spec store bypass: Not affected
Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2: Mitigation; Retpolines; STIBP disabled; RSB filling; PBRSB-eIBRS Not affected; BHI Not affected
Vulnerability Srbds: Not affected
Vulnerability Tsx async abort: Not affected

Versions of relevant libraries:
[pip3] numpy==1.26.4
[pip3] pytorch-fast-transformers==0.4.0
[pip3] pytorch-lightning==2.2.4
[pip3] torch==2.3.0
[pip3] torchaudio==2.3.0
[pip3] torchdata==0.7.1
[pip3] torchinfo==1.8.0
[pip3] torchmetrics==1.3.2
[pip3] torchvision==0.18.0
[pip3] triton==2.3.0
[conda] No relevant packages

@ezyang ezyang transferred this issue from pytorch/pytorch May 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant