Custom Evaluation

Expand GenAI evals to allow for more robust evaluation types

Including:

- LLM as a judge (LLMJudgeTask)
- Structured output validation (FieldValidators)
- Matching validators (validating specific outputs types, e.g. string)

Functionality:

- Allow users to specify multiple evaluations for a single `event`
- Global test suite validations along with individualized assertions
- GenAI service profile type that allows users to validate multiple genai profiles at once
   - Some genai applications will have more than one prompt. A user may want to build a test suite for each individual task as well as a global service evaluation task.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Custom Evaluation #148

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Custom Evaluation #148

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions