Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Dev UI] Available datasets should be scoped to the target being tested #1554

Open
MichaelDoyle opened this issue Dec 19, 2024 · 4 comments
Open
Assignees
Labels
devui evals related to eval

Comments

@MichaelDoyle
Copy link
Member

Overview

Currently, it's possible to create a "model" dataset, and try to evaluate a "flow" with that dataset. Fundamentally, these are incompatible and will cause an error.

User goal(s)

Help choosing relevant datasets.

Requirements

  1. At minimum, we should scope the target under test to datasets of the same type.
  2. Within the same type (particularly flows) we should consider looking for schema compatibility as well.

Acceptance Criteria

  • 1 When running an evaluation, only relevant/compatible datasets are available

Designs

Inlined screenshots

Notes

After choosing a flow, model datasets are still available in the drop down:

Image

Running the evaluation shows success, but it is really a failure:

Image Image
@ssbushi
Copy link
Contributor

ssbushi commented Jan 27, 2025

#1647 is now merged. We can support data validation in the Dev UI now.

genkit-tools package must be updated for this change!. See https://github.com/FirebasePrivate/genkit-ui/tree/main?tab=readme-ov-file#shipping-the-changes for details

Scope:

  • Data Validation is a FYI only. It should not block the user from running evaluation.
  • This is scoped only to the Run evaluations tab (not to the dataset viewer)

Mock:

Image

@ssbushi ssbushi assigned ssbushi and hritan and unassigned ssbushi Jan 27, 2025
@MichaelDoyle
Copy link
Member Author

MichaelDoyle commented Jan 30, 2025

What happens when we have a LOT of datasets. Will it be obvious which ones are suitable to use? Or will the user need to click through each one?

Why non-blocking? Is it a matter of needing "loose" schema validation? Or another requirement?

@hritan hritan moved this to In Progress in Genkit Backlog Feb 3, 2025
@ssbushi
Copy link
Contributor

ssbushi commented Feb 3, 2025

It will not be obvious to see which ones are suitable. Validations needs to be done on a pair of dataset and targetAction and must cover all samples in the dataset to be reliable. It is not feasible (or worth it) to do all that computation to determine suitable datasets.

We can implement a mat-menu with search filter to help users narrow down to a dataset (or action).

Why non-blocking:

It is primarily for loose schema validation. The motivation is originally from dataset Schema validation, where a copy of the schema is stored on the dataset metadata. Loose validation helps in cases where there is drift between the dataset schema and action schema.

In the run-evaluation component, there is no technical need to support non-blocking validation. But it helps to reduce user friction, (they can proceed without being forced to go back and fix all errors in their dataset)

@ssbushi
Copy link
Contributor

ssbushi commented Feb 14, 2025

@shrutip90 for decision on blocking behaviour

@shrutip90 shrutip90 assigned shrutip90 and unassigned shrutip90 and ssbushi Mar 11, 2025
@shrutip90 shrutip90 added the evals related to eval label Mar 11, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
devui evals related to eval
Projects
Status: In Progress
Development

No branches or pull requests

4 participants