-
Notifications
You must be signed in to change notification settings - Fork 2
FAQ
FindHao edited this page Aug 31, 2025
·
5 revisions
No. CUTracer attaches via CUDA_INJECTION64_PATH.
Prefer proton_instr_histogram (auto-enables opcode_only), filter kernels, narrow instruction intervals.
Yes. Use CUTRACER_INSTRUMENT=reg_trace,mem_trace. Expect higher overhead and larger outputs.
Your kernel may not execute clock instructions. Insert scopes (e.g., Triton pl.scope) or ensure clock reads occur.
Current working directory of the traced process.
Yes. Instrumentation and data flushing handle capture paths. For captured graphs, flushing occurs at cuGraphLaunch exit; ensure appropriate stream synchronization.
deadlock_detection auto-enables reg_trace to capture PCs/opcodes per warp; use KERNEL_FILTERS and INSTR_BEGIN/INSTR_END to reduce impact.