-
Notifications
You must be signed in to change notification settings - Fork 7
feat: GPU Telemetry Realtime Dashboard Display #370
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from 14 commits
Commits
Show all changes
21 commits
Select commit
Hold shift + click to select a range
6a82ab9
feat: add realtime GPU telemetry dashboard
ilana-n dbede5b
fix: add more test coverage and remove redundant command acknowledge …
ilana-n baa3818
fix: use dcgm faker for gpu telemetry unit tests
ilana-n eb0d18b
fix: address feedback
ilana-n 6df2996
fix: unit test and optimization for checking past values
ilana-n 403dc83
fix: unecessary logging causing errors
ilana-n afa41e1
fix: update docs
ilana-n 9f73494
fix: minor doc fix
ilana-n 7a67a4c
feat: add realtime GPU telemetry dashboard
ilana-n 3bf876d
fix: address feedback
ilana-n 9214e3a
fix: update docs
ilana-n fa34211
fix: minor doc fix
ilana-n da2983d
fix: remove manual command acknowledgement
ilana-n 7929399
fix: telemetry manager unit tests
ilana-n 3351090
fix: increase test coverage?
ilana-n d81ca24
fix: extra code outside loop, unused var, and extraneous comments
ilana-n 168ab27
fix: address comments
ilana-n fad5c82
Merge branch 'main' into ilana/gpu-telemetry-dashboard
ilana-n 875383e
fix: default dict lambda in records manager
ilana-n f3a7b19
fix unit tests
ilana-n 71cbc6d
fix: unit test
ilana-n File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,11 @@ | ||
| # SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved. | ||
| # SPDX-License-Identifier: Apache-2.0 | ||
|
|
||
| from aiperf.common.enums.base_enums import CaseInsensitiveStrEnum | ||
|
|
||
|
|
||
| class GPUTelemetryMode(CaseInsensitiveStrEnum): | ||
| """GPU telemetry display mode.""" | ||
|
|
||
| SUMMARY = "summary" | ||
| REALTIME_DASHBOARD = "realtime_dashboard" |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
40 changes: 40 additions & 0 deletions
40
src/aiperf/common/mixins/realtime_telemetry_metrics_mixin.py
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,40 @@ | ||
| # SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved. | ||
| # SPDX-License-Identifier: Apache-2.0 | ||
| import asyncio | ||
|
|
||
| from aiperf.common.config import ServiceConfig | ||
| from aiperf.common.enums import MessageType | ||
| from aiperf.common.hooks import AIPerfHook, on_message, provides_hooks | ||
| from aiperf.common.messages import RealtimeTelemetryMetricsMessage | ||
| from aiperf.common.mixins.message_bus_mixin import MessageBusClientMixin | ||
| from aiperf.common.models import MetricResult | ||
| from aiperf.controller.system_controller import SystemController | ||
|
|
||
|
|
||
| @provides_hooks(AIPerfHook.ON_REALTIME_TELEMETRY_METRICS) | ||
| class RealtimeTelemetryMetricsMixin(MessageBusClientMixin): | ||
| """A mixin that provides a hook for real-time GPU telemetry metrics.""" | ||
|
|
||
| def __init__( | ||
| self, service_config: ServiceConfig, controller: SystemController, **kwargs | ||
| ): | ||
| super().__init__(service_config=service_config, controller=controller, **kwargs) | ||
| self._controller = controller | ||
| self._telemetry_metrics: list[MetricResult] = [] | ||
| self._telemetry_metrics_lock = asyncio.Lock() | ||
|
|
||
| @on_message(MessageType.REALTIME_TELEMETRY_METRICS) | ||
| async def _on_realtime_telemetry_metrics( | ||
| self, message: RealtimeTelemetryMetricsMessage | ||
| ): | ||
| """Update the telemetry metrics from a real-time telemetry metrics message.""" | ||
| self.debug( | ||
| f"Mixin received telemetry message with {len(message.metrics)} metrics, triggering hook" | ||
| ) | ||
|
|
||
| async with self._telemetry_metrics_lock: | ||
| self._telemetry_metrics = message.metrics | ||
| await self.run_hooks( | ||
| AIPerfHook.ON_REALTIME_TELEMETRY_METRICS, | ||
| metrics=message.metrics, | ||
| ) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.