Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Data] Additional metrics for tracking block times #46254

Open
scottjlee opened this issue Jun 25, 2024 · 0 comments
Open

[Data] Additional metrics for tracking block times #46254

scottjlee opened this issue Jun 25, 2024 · 0 comments
Labels
data Ray Data-related issues enhancement Request for new feature and/or capability P1 Issue that should be fixed within a few weeks

Comments

@scottjlee
Copy link
Contributor

Description

To gain more insight into internal Ray Data operations, we need metrics with finer granularity and a Histogram-like view (currently all Ray Data metrics are Gauge forms, in order to display the last known value on the Ray Dashboard).

Some other useful metrics: time taken to read the block, deserialize the block, move blocks between workers and generate batches.

  • General overview of the stats / metrics code path:
    • op_runtime_metrics.py defines metrics which are observed during dataset execution
    • stats.py contains the classes which collect and summarize stats from runtime metrics.
    • _StatsActor._create_prometheus_metrics_for_execution_metrics() converts the raw metrics from op_runtime_metrics.py into Gauge prometheus-type metrics.
    • Currently, a major restriction is that all of our metrics are defined as Gauge, in order to display the last known value on the ray dashboard.

As an example, let's consider the block generation time, defined as the time taken to generate blocks per task.

  • For block generation time, we have an existing metric block_generation_time under op_runtime_metrics.py , which is incremented in on_task_output_generated()
  • We can use this existing metric and define a new Histogram metric, which tracks the size/number of events.
  • Currently i don't think ray dashboard has a way to display the entire histogram, but should be able to show single-value metrics like avg, p90, etc., if you also want to add a chart.

Use case

No response

@scottjlee scottjlee added enhancement Request for new feature and/or capability P1 Issue that should be fixed within a few weeks data Ray Data-related issues labels Jun 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
data Ray Data-related issues enhancement Request for new feature and/or capability P1 Issue that should be fixed within a few weeks
Projects
None yet
Development

No branches or pull requests

1 participant