Skip to content

Conversation

@tianhaox
Copy link
Contributor

@tianhaox tianhaox commented Nov 24, 2025

Overview:

  • rename sol mode of database to silicon mode.
  • add empirical mode, where the latency = sol / empirical factor. Current emprical factor is a constant number, could be more complicated in future. Add each op should design its own empirical factors.
  • add hybrid mode, when missing silicon data, in this mode, allow to fallback to empirical mode.

So in future, when you add a new op, you can support running configuration without actual data if you use sol/empirical/hybrid mode. The hybrid mode should give you the best guess of the overal model performance.

fbd50242b800a82f013bce57711a6323 This is the agg mode of running qwen3 32b on h200. Pure emprical mode has much smaller diff than sol mode, comparing with silicon mode. Imagine if part of the model is missing, the hybrid mode can give you a very close result unless the missing part is taking the major portion.

@copy-pr-bot
Copy link

copy-pr-bot bot commented Nov 24, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@github-actions github-actions bot added the feat label Nov 24, 2025
@Arsene12358
Copy link
Contributor

Also, it seems we only enable this via webapp, what about users using the CLI?

@tianhaox
Copy link
Contributor Author

Get the hybrid time, hybrid math and hybrid mem

should be enabled afterwards when we do some refactoring of the config

Signed-off-by: Tianhao Xu <[email protected]>
Copy link
Contributor

@davilu-nvidia davilu-nvidia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll approve first to unblock merging.

Signed-off-by: Tianhao Xu <[email protected]>
@tianhaox
Copy link
Contributor Author

tianhaox commented Dec 3, 2025

Also, it seems we only enable this via webapp, what about users using the CLI?

added cli.
aiconfigurator cli default --model QWEN3_480B --total_gpus 32 --system b200_sxm --database_mode HYBRID
this command shall work while silicon mode cannot.

@tianhaox tianhaox merged commit 8c399e5 into ai-dynamo:main Dec 3, 2025
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants