Skip to content

Latest commit

 

History

History
52 lines (41 loc) · 3.37 KB

README.md

File metadata and controls

52 lines (41 loc) · 3.37 KB

Sample CV

Overview

This library provides a TorchScrip compatible Cost-Volume sample operator with additional offset input.

This simple version assumes integer offsets and will round any floating point offset to the nearest integer. Also it does not provide a backward operation in its current form and ist intended for inference only.

Usefull commands for Debugging:

  • Building CUDA with CMAKE syntax changed quite a few times.
    • before CMAKE 3.8 custom versions where common
    • with CMAKE 3.8 CUDA became std. macros like FindCUDA, cuda_select_nvcc_arch_flags and CUDA_NVCC_FLAGS became std.
    • with CMAKE3.18 a new std. was introduced and FindCUDA CUDA_NVCC_FLAGS became depricated - now use CMAKE_CUDA_FLAGS instead and set the project type to CUDA
  • Problems with CUDA Architecture / ARCH Flags (simplified):
    • NVCC can generate PTX (virtual intermediate representation/assembly) and SASS (real machine code) code. As PTX is an intermediate representation it can be JIT compiled into SASS machine code also for newer GPU generations but requieres extra startup time for that. Therefore one can generate fatbinaries that already contain PTX and SASS for different architectures at once.

      • Explicitly forcing the build system to use specific CUDA ARCH and CODE flags to be used within TORCHs version of the setuptools. This means this flag is only recognized by (setup.py). Here some examples:
        export TORCH_CUDA_ARCH_LIST=7.5 Using more than one parameter seems not to be possible with older cmake versions export TORCH_CUDA_ARCH_LIST="6.5 7.5" Using more than one parameter seems not to be possible with older cmake versions export TORCH_CUDA_ARCH_LIST=ALL
      • Check which flags where used to build your precompiled pytorch:
        torch.__config__.show()
        torch.cuda.get_arch_list()
      • Investigate the libraries binary file, to see which architecturs PTX/ELF where integrated:
        cuobjdump <objfilename><br cuobjdump <objfilename> -lelf -lptx
    • Seeing calls to g++ and nvcc:

      • with python distutils:
        python setup.py --verbose
      • with cmake :
        make VERBOSE=1
    • CUDA error: no kernel image is available for execution on the device indicates that the cuda kernel was not built for your graphics card

Linklist

This demo was built using information of these very good web sources:

EXTENDING TORCHSCRIPT WITH CUSTOM C++ OPERATORS
REGISTERING A DISPATCHED OPERATOR IN C++
Custom C++ Autograd
(old) Source Code for this tutorial
TorchScript intro
TorchScript Jit inof
PyTorch C++ API
OptoX - our previous framework - currently not TorchScriptable

Pytorch C10::ArrayRef References Pytorch c10::IValue Reference CUDA NVCC Compiler Docu (PDF)