Skip to content

Configuration

FindHao edited this page Aug 31, 2025 · 4 revisions

CUTracer is configured by environment variables.

  • CUTRACER_INSTRUMENT: comma-separated instrumentation types to enable

    • opcode_only (lightest)
    • reg_trace
    • mem_trace
  • CUTRACER_ANALYSIS: comma-separated analysis types to enable

    • proton_instr_histogram (auto-enables opcode_only if not set)
    • deadlock_detection (auto-enables reg_trace)
  • KERNEL_FILTERS: comma-separated substrings to match kernel names (mangled or unmangled)

    • Example: KERNEL_FILTERS=add,_Z2_gemm,reduce
  • INSTR_BEGIN, INSTR_END: static instruction index interval gate during instrumentation

    • Example: INSTR_BEGIN=0 INSTR_END=1000
  • TOOL_VERBOSE: verbosity of tool logs (0/1/2)

Other environment considerations:

  • CUDA_MANAGED_FORCE_DEVICE_ALLOC=1 is set by the tool to simplify channel memory handling.
  • CUDA/NVBit/driver versions must be compatible with your GPU.

Notes 📝:

  • When proton_instr_histogram is enabled, opcode_only is forced internally to minimize overhead and ensure required data is available.
  • When deadlock_detection is enabled, reg_trace is forced internally because loop detection relies on PC and opcode correlation per warp.
  • KERNEL_FILTERS uses substring matching against both unmangled and mangled names; any match enables instrumentation for that function and related device functions.

Clone this wiki locally