Skip to content

Conversation

@christopherbate
Copy link
Collaborator

  • [mlir-tensorrt] NFC: fix build when NCCL is ON but MPI is OFF
    GitOrigin-RevId: 8c365f7839bc39903c21a6fffe370bcff0f3f074

  • [mlir-tensorrt][cmake] Refactor compilation options into dedicated CMake module, add extra checks mode
    This change consolidates handling of global compilation options (that are
    specific to MTRT and not handled as part of HandleLLVMOptions) into
    a dedicated CMake module. It adds additional CMake cache options for
    enabling '-Werror' globally and for addition additional warnings and
    runtime checks recommended by
    https://best.openssf.org/Compiler-Hardening-Guides/Compiler-Options-Hardening-Guide-for-C-and-C++.html.

    By default, MLIR_TRT_ENABLE_EXTRA_CHECKS and MLIR_TRT_ENABLE_WERROR are
    set to OFF, but the idea is to enable them in CI in the future.

    In addition, fix some lingering warnings and build issues discovered
    when testing with the extra checks mode and Werror enabled:

    • Supress deprecated declaration warnings related to TensorRT
    • Fix some warnings when NCCL is enabled.
    • Fix test flags when ORTools is OFF but Shardy is ON.

    GitOrigin-RevId: 39f5e91697a20a65da2a5374b90366aba005f889

  • [mlir-tensorrt] Fix const fold execution aliasing and multi-device runtime issues
    Fixes two issues that manifested as non-fatal error logs related to
    CUDA runtime function call failures during compilation or execution
    of a test program. Such errors were logged when e.g. we have no
    path to propgate the error outwards (such as calling cudaFree in a
    destructor).

    Despite the similar symptoms, there were too different root causes:

    1. In the execute-constant-foldable-subgraphs pass, we previously
      had an issue where a function returning multiple results could return
      two values which alias each othera, which is a violation of the
      compiler-runtime ABI contract. We fixed this for ABI v1 in
      ee26682 but forgot to update the
      actual pipeline in the "execute-constant-foldable-subgraphs" pass.
      This is now fixed. In addition, we now give each outlined sub-module
      a unique name so that the IR printing doesn't clobber debug artifacts
      for different constant foldable modules.

    2. The second issues is that in multi-device setups, we need to ensure
      that CUDA events created as part of the pinned memory pool allocator
      are associated with the correct device. Simplify the existing infra
      by moving the pinned memory allocator to be part of the runtime client
      only, not the runtime session. The allocator now has a CUDA event
      pool for each device.

    GitOrigin-RevId: 2c5b5e7026d04150ebae2f40e25faf42d2af70df

  • [mlir-tensorrt] NFC: bump version to 0.4.3
    GitOrigin-RevId: 9eb762aaf5edcc4e24220a67cc96c25b94c36173

GitOrigin-RevId: 8c365f7839bc39903c21a6fffe370bcff0f3f074
…ake module, add extra checks mode

This change consolidates handling of global compilation options (that are
specific to MTRT and not handled as part of `HandleLLVMOptions`) into
a dedicated CMake module. It adds additional CMake cache options for
enabling '-Werror' globally and for addition additional warnings and
runtime checks recommended by
https://best.openssf.org/Compiler-Hardening-Guides/Compiler-Options-Hardening-Guide-for-C-and-C++.html.

By default, `MLIR_TRT_ENABLE_EXTRA_CHECKS` and `MLIR_TRT_ENABLE_WERROR` are
set to `OFF`, but the idea is to enable them in CI in the future.

In addition, fix some lingering warnings and build issues discovered
when testing with the extra checks mode and Werror enabled:
- Supress deprecated declaration warnings related to TensorRT
- Fix some warnings when NCCL is enabled.
- Fix test flags when ORTools is OFF but Shardy is ON.

GitOrigin-RevId: 39f5e91697a20a65da2a5374b90366aba005f889
…ntime issues

Fixes two issues that manifested as non-fatal error logs related to
CUDA runtime function call failures during compilation or execution
of a test program. Such errors were logged when e.g. we have no
path to propgate the error outwards (such as calling `cudaFree` in a
destructor).

Despite the similar symptoms, there were too different root causes:

1. In the `execute-constant-foldable-subgraphs` pass, we previously
  had an issue where a function returning multiple results could return
  two values which alias each othera, which is a violation of the
  compiler-runtime ABI contract. We fixed this for ABI v1 in
  ee26682 but forgot to update the
  actual pipeline in the "execute-constant-foldable-subgraphs" pass.
  This is now fixed. In addition, we now give each outlined sub-module
  a unique name so that the IR printing doesn't clobber debug artifacts
  for different constant foldable modules.

2. The second issues is that in multi-device setups, we need to ensure
  that CUDA events created as part of the pinned memory pool allocator
  are associated with the correct device. Simplify the existing infra
  by moving the pinned memory allocator to be part of the runtime client
  only, not the runtime session. The allocator now has a CUDA event
  pool for each device.

GitOrigin-RevId: 2c5b5e7026d04150ebae2f40e25faf42d2af70df
GitOrigin-RevId: 9eb762aaf5edcc4e24220a67cc96c25b94c36173
@copy-pr-bot
Copy link

copy-pr-bot bot commented Dec 2, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@christopherbate christopherbate merged commit 6f43bf2 into main Dec 2, 2025
2 checks passed
@christopherbate christopherbate deleted the migrate-internal-changes branch December 2, 2025 14:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants