[Dev UI] Available datasets should be scoped to the target being tested #1554

MichaelDoyle · 2024-12-19T19:31:51Z

Overview

Currently, it's possible to create a "model" dataset, and try to evaluate a "flow" with that dataset. Fundamentally, these are incompatible and will cause an error.

User goal(s)

Help choosing relevant datasets.

Requirements

At minimum, we should scope the target under test to datasets of the same type.
Within the same type (particularly flows) we should consider looking for schema compatibility as well.

Acceptance Criteria

1 When running an evaluation, only relevant/compatible datasets are available

Designs

Inlined screenshots

Notes

After choosing a flow, model datasets are still available in the drop down:

Running the evaluation shows success, but it is really a failure:

ssbushi · 2025-01-27T21:40:19Z

#1647 is now merged. We can support data validation in the Dev UI now.

genkit-tools package must be updated for this change!. See https://github.com/FirebasePrivate/genkit-ui/tree/main?tab=readme-ov-file#shipping-the-changes for details

Scope:

Data Validation is a FYI only. It should not block the user from running evaluation.
This is scoped only to the Run evaluations tab (not to the dataset viewer)

Mock:

MichaelDoyle · 2025-01-30T18:29:15Z

What happens when we have a LOT of datasets. Will it be obvious which ones are suitable to use? Or will the user need to click through each one?

Why non-blocking? Is it a matter of needing "loose" schema validation? Or another requirement?

ssbushi · 2025-02-03T15:51:59Z

It will not be obvious to see which ones are suitable. Validations needs to be done on a pair of dataset and targetAction and must cover all samples in the dataset to be reliable. It is not feasible (or worth it) to do all that computation to determine suitable datasets.

We can implement a mat-menu with search filter to help users narrow down to a dataset (or action).

Why non-blocking:

It is primarily for loose schema validation. The motivation is originally from dataset Schema validation, where a copy of the schema is stored on the dataset metadata. Loose validation helps in cases where there is drift between the dataset schema and action schema.

In the run-evaluation component, there is no technical need to support non-blocking validation. But it helps to reduce user friction, (they can proceed without being forced to go back and fix all errors in their dataset)

ssbushi · 2025-02-14T15:40:41Z

@shrutip90 for decision on blocking behaviour

MichaelDoyle added the devui label Dec 19, 2024

github-project-automation bot added this to Genkit Backlog Dec 19, 2024

ssbushi mentioned this issue Jan 22, 2025

feat: add new API for data validation #1647

Merged

2 tasks

ssbushi assigned ssbushi and hritan and unassigned ssbushi Jan 27, 2025

hritan moved this to In Progress in Genkit Backlog Feb 3, 2025

ssbushi mentioned this issue Feb 5, 2025

feat: support model validation #1868

Merged

2 tasks

hritan assigned ssbushi and unassigned hritan Feb 11, 2025

ssbushi assigned shrutip90 Feb 14, 2025

shrutip90 assigned shrutip90 and unassigned shrutip90 and ssbushi Mar 11, 2025

shrutip90 added the evals related to eval label Mar 11, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Dev UI] Available datasets should be scoped to the target being tested #1554

[Dev UI] Available datasets should be scoped to the target being tested #1554

MichaelDoyle commented Dec 19, 2024

ssbushi commented Jan 27, 2025 •

edited

Loading

MichaelDoyle commented Jan 30, 2025 •

edited

Loading

ssbushi commented Feb 3, 2025

ssbushi commented Feb 14, 2025

[Dev UI] Available datasets should be scoped to the target being tested #1554

[Dev UI] Available datasets should be scoped to the target being tested #1554

Comments

MichaelDoyle commented Dec 19, 2024

Overview

User goal(s)

Requirements

Acceptance Criteria

Designs

Notes

ssbushi commented Jan 27, 2025 • edited Loading

MichaelDoyle commented Jan 30, 2025 • edited Loading

ssbushi commented Feb 3, 2025

ssbushi commented Feb 14, 2025

ssbushi commented Jan 27, 2025 •

edited

Loading

MichaelDoyle commented Jan 30, 2025 •

edited

Loading