Skip to content

[Issue]: SMI event entry abruptly null terminated #190

@briankoco

Description

@briankoco

Problem Description

The kernel driver reports SVM/SMI events through a kfifo that is exported for userspace profiler consumption. Each SMI event is formatted as a newline-terminated string.

KFD_SMI_EVENT_QUEUE_RESTORE events are currently not being formatted correctly because a NULL character is added to the event string before the newline character is added. This abruptly NULL terminates it and breaks userspace parsing that expects newline delimited events:

Operating System

Ubuntu 24.04.2 LTS (Noble Numbat)

CPU

Intel(R) Xeon(R) CPU E5-2660 v4 @ 2.00GHz

GPU

AMD Radeon Graphics gfx906

ROCm Version

6.4.0

ROCm Component

ROCK-Kernel-Driver

Steps to Reproduce

  1. Install ROCm on the relevant kernel driver version
  2. Set HSA_SVM_PROFILE=svm.txt
  3. Set HSA_XNACK=0
  4. Run a HIP application that experiences a queue eviction. One example is the hmmstress workload
  5. Observe that the SMI profile dumped to svm.txt has improperly formatted events, owing to the queue restore events which do not contain newlines.

(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support

No response

Additional Information

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions