You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Extracted the precompiled header part of #5747 to see what it would do in isolation.
Precompiled Header (PCH) Build Optimization for nvFuser
What It Does
Precompiled Headers (PCH) pre-parse frequently-included header files once and cache the result, eliminating redundant parsing across hundreds of source files.
This branch adds:
PCH for 10 key nvFuser headers (polymorphic_value.h, type_traits.h, ir/base_nodes.h, etc.)
Shared PCH across 20+ test targets (prevents redundant PCH compilation)
Build Time Results
Compiler
Baseline
With PCH
Wall-clock Improvement
GCC
20m 51s
17m 6s
18% faster
Clang
20m 43s
8m 48s
57% faster
CPU Time Results
Compiler
Baseline
With PCH
CPU Time Reduction
GCC
231 min
185 min
20% less work
Clang
232 min
97 min
58% less work
Key Takeaway
PCH is a low-risk, high-impact optimization that can be merged independently. Clang users see the largest benefit (57% faster builds), while GCC users still gain a meaningful 18% improvement.
Precompile polymorphic_value.h to eliminate ~4000s of redundant header parsing. Enabled by default for Release builds. Disable with -DNVFUSER_USE_POLYMORPHIC_PCH=OFF.
The [[maybe_unused]] attributes were added to member variables (mcast_handle_, cu_dev_, mc_base_ptr_, exporter_rank_, peer_fd_) but there's no explanation or validation that these changes don't affect functionality. These appear to be CUDA-related variables whose usage should be verified.
The PCH implementation relies on global property tracking for test targets and assumes consistent header availability. Need to verify that all 10 specified header files exist and are accessible in all build environments, and that the global property mechanism works correctly across different build configurations.
if(NVFUSER_USE_POLYMORPHIC_PCH)
get_property(NVFUSER_TEST_PCH_TARGET GLOBAL PROPERTY NVFUSER_TEST_PCH_TARGET)
if(NOT NVFUSER_TEST_PCH_TARGET)
# First test target: create the PCH with top 10 nvFuser headers
message(STATUS "Creating shared test PCH on target: ${TEST_NAME}")
target_precompile_headers(${TEST_NAME} PRIVATE
"${NVFUSER_SRCS_DIR}/polymorphic_value.h"
"${NVFUSER_ROOT}/lib/dynamic_type/src/dynamic_type/type_traits.h"
"${NVFUSER_SRCS_DIR}/ir/base_nodes.h"
"${NVFUSER_SRCS_DIR}/scheduler/tools/abstract_tensor.h"
"${NVFUSER_SRCS_DIR}/type.h"
"${NVFUSER_SRCS_DIR}/ir/container.h"
"${NVFUSER_SRCS_DIR}/serde/fusion_cache_generated.h"
"${NVFUSER_SRCS_DIR}/iter_visitor.h"
"${NVFUSER_SRCS_DIR}/ir/internal_nodes.h"
"${NVFUSER_SRCS_DIR}/ir/interface_nodes.h"
)
set_property(GLOBAL PROPERTY NVFUSER_TEST_PCH_TARGET ${TEST_NAME})
else()
# Subsequent test targets: reuse existing PCH
target_precompile_headers(${TEST_NAME} REUSE_FROM ${NVFUSER_TEST_PCH_TARGET})
endif()
endif()
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Extracted the precompiled header part of #5747 to see what it would do in isolation.
Precompiled Header (PCH) Build Optimization for nvFuser
What It Does
Precompiled Headers (PCH) pre-parse frequently-included header files once and cache the result, eliminating redundant parsing across hundreds of source files.
This branch adds:
polymorphic_value.h,type_traits.h,ir/base_nodes.h, etc.)Build Time Results
CPU Time Results
Key Takeaway
PCH is a low-risk, high-impact optimization that can be merged independently. Clang users see the largest benefit (57% faster builds), while GCC users still gain a meaningful 18% improvement.