Skip to content

This library provides a TorchScript compatible Cost-Volume sample operator with additional offset input.

License

Notifications You must be signed in to change notification settings

VLOGroup/LVVCP_Op_sample_cv

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Sample CV

Overview

This library provides a TorchScrip compatible Cost-Volume sample operator with additional offset input.

This simple version assumes integer offsets and will round any floating point offset to the nearest integer. Also it does not provide a backward operation in its current form and ist intended for inference only.

Usefull commands for Debugging:

  • Building CUDA with CMAKE syntax changed quite a few times.
    • before CMAKE 3.8 custom versions where common
    • with CMAKE 3.8 CUDA became std. macros like FindCUDA, cuda_select_nvcc_arch_flags and CUDA_NVCC_FLAGS became std.
    • with CMAKE3.18 a new std. was introduced and FindCUDA CUDA_NVCC_FLAGS became depricated - now use CMAKE_CUDA_FLAGS instead and set the project type to CUDA
  • Problems with CUDA Architecture / ARCH Flags (simplified):
    • NVCC can generate PTX (virtual intermediate representation/assembly) and SASS (real machine code) code. As PTX is an intermediate representation it can be JIT compiled into SASS machine code also for newer GPU generations but requieres extra startup time for that. Therefore one can generate fatbinaries that already contain PTX and SASS for different architectures at once.

      • Explicitly forcing the build system to use specific CUDA ARCH and CODE flags to be used within TORCHs version of the setuptools. This means this flag is only recognized by (setup.py). Here some examples:
        export TORCH_CUDA_ARCH_LIST=7.5 Using more than one parameter seems not to be possible with older cmake versions export TORCH_CUDA_ARCH_LIST="6.5 7.5" Using more than one parameter seems not to be possible with older cmake versions export TORCH_CUDA_ARCH_LIST=ALL
      • Check which flags where used to build your precompiled pytorch:
        torch.__config__.show()
        torch.cuda.get_arch_list()
      • Investigate the libraries binary file, to see which architecturs PTX/ELF where integrated:
        cuobjdump <objfilename><br cuobjdump <objfilename> -lelf -lptx
    • Seeing calls to g++ and nvcc:

      • with python distutils:
        python setup.py --verbose
      • with cmake :
        make VERBOSE=1
    • CUDA error: no kernel image is available for execution on the device indicates that the cuda kernel was not built for your graphics card

Linklist

This demo was built using information of these very good web sources:

EXTENDING TORCHSCRIPT WITH CUSTOM C++ OPERATORS
REGISTERING A DISPATCHED OPERATOR IN C++
Custom C++ Autograd
(old) Source Code for this tutorial
TorchScript intro
TorchScript Jit inof
PyTorch C++ API
OptoX - our previous framework - currently not TorchScriptable

Pytorch C10::ArrayRef References Pytorch c10::IValue Reference CUDA NVCC Compiler Docu (PDF)

About

This library provides a TorchScript compatible Cost-Volume sample operator with additional offset input.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published