Add 2 new scorers Exact Match and Contains #275

cantemizyurek · 2025-10-26T17:30:09Z

No description provided.

- Remove unused DB_LOCATION import from test-utils.ts - Replace FILES_LOCATION import with local constant in files.test.ts Co-authored-by: Matt Pocock <[email protected]>

- Add dotenv as a dependency - Create env-setup-file module that imports dotenv/config - Export env-setup-file as 'evalite/env-setup-file' - Automatically prepend env-setup-file to setupFiles array - Update documentation to reflect automatic .env loading - Update example config to remove manual dotenv setup Fixes mattpocock#234 🤖 Generated with [Claude Code](https://claude.ai/code) Co-authored-by: Matt Pocock <[email protected]>

… precedence - Add loadVitestSetupFiles() to load setupFiles from vitest.config.ts - Merge setupFiles from both configs with evalite.config.ts taking precedence - Add tests for vitest.config.ts setupFiles support and precedence - setupFiles execution order: env-setup-file -> vitest -> evalite Co-authored-by: Matt Pocock <[email protected]>

- Export new `evalite/scorers` module with factory functions - Add `createLLMBasedScorer` for model-dependent scorers - Add `createEmbeddingBasedScorer` for embedding-dependent scorers - Introduce `EvaluationSample` type with query, contexts, and reference fields Part of mattpocock#250

- Added a new `faithfulness` scorer to evaluate model responses against retrieved contexts. - Introduced utility functions for scoring and context handling. - Updated `package.json` to include `zod` version 4.1.12 as a dependency. - Updated `pnpm-lock.yaml` to reflect changes in dependencies and versions. Part of mattpocock#250

Add Faithfulness

- Introduced a new `answerSimilarity` scorer to assess the semantic similarity between a ground truth answer and a generated answer. - The scorer utilizes embedding models to compute cosine similarity and includes an optional threshold for binary output. - Updated the `scorers` module to export the new `answerSimilarity` scorer. Part of mattpocock#250

Add Answer Similarity Scorer

- Introduced a new `contextRecall` scorer to evaluate how much of a generated answer can be attributed to retrieved contexts. - Updated the `scorers` module to export the new `contextRecall` scorer. Part of mattpocock#250

…endency

…c and updating metadata format

…g based scorers, and context recall and faithfulness classifications

… namespace for better organization

…d Multi Turn Sample Type

…iTurnFn types

…ess scorers

- Remove unused DB_LOCATION import from test-utils.ts - Replace FILES_LOCATION import with local constant in files.test.ts Co-authored-by: Matt Pocock <[email protected]>

- Add dotenv as a dependency - Create env-setup-file module that imports dotenv/config - Export env-setup-file as 'evalite/env-setup-file' - Automatically prepend env-setup-file to setupFiles array - Update documentation to reflect automatic .env loading - Update example config to remove manual dotenv setup Fixes mattpocock#234 🤖 Generated with [Claude Code](https://claude.ai/code) Co-authored-by: Matt Pocock <[email protected]>

… precedence - Add loadVitestSetupFiles() to load setupFiles from vitest.config.ts - Merge setupFiles from both configs with evalite.config.ts taking precedence - Add tests for vitest.config.ts setupFiles support and precedence - setupFiles execution order: env-setup-file -> vitest -> evalite Co-authored-by: Matt Pocock <[email protected]>

…intained and grows as necessary

Add Scorers module

- Remove `createBaseScorer`, consolidate to `createLLMScorer`/`createEmbeddingScorer` - Add generic `TExpected` type for type-safe expected data - Replace `singleTurn`/`multiTurn` with single `scorer` function - Rename utils to `isSingleTurnInput`/`isMultiTurnInput` - Update all scorers (faithfulness, answerSimilarity, contextRecall) to new API - Fix example.eval.ts: textStream -> text 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>

Move inline expected data types from answer-similarity, context-recall, and faithfulness scorers to the Evalite.Scorers namespace in types.ts for better type organization and discoverability. Co-authored-by: Matt Pocock <[email protected]>

- Renamed input types in Evalite.Scorers namespace to reflect output handling: SingleTurnInput to SingleTurnOutput, MultiTurnInput to MultiTurnOutput, and updated related types accordingly. - Modified scorer implementations in context-recall and faithfulness to use new output types. - Updated utility functions to check for output types instead of input types, enhancing clarity and consistency in the scoring logic.

Related mattpocock#250

- Introduced two new scorers: `exactMatch` checks for exact string matches, while `contains` verifies if the output includes the expected substring. - Updated the Evalite types to include `ExactMatchExpected` and `ContainsExpected` for better type safety. - Exported new scorers from the scorers index for accessibility. Related mattpocock#250

- Created a new file `string-scorers.eval.ts` to demonstrate the usage of `exactMatch` and `contains` scorers.

changeset-bot · 2025-10-26T17:30:13Z

⚠️ No Changeset found

Latest commit: 0a30d93

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

vercel · 2025-10-26T17:30:14Z

@cantemizyurek is attempting to deploy a commit to the Skill Recordings Team on Vercel.

A member of the Team first needs to authorize it.

🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>

mattpocock and others added 30 commits October 19, 2025 12:44

Changed default storage to in-memory. SQLite still available via config.

da895ea

Remove problematic backend-only-constants imports

2efa48e

- Remove unused DB_LOCATION import from test-utils.ts - Replace FILES_LOCATION import with local constant in files.test.ts Co-authored-by: Matt Pocock <[email protected]>

Fixed CI properly

59677dd

Merge branch 'main' of https://github.com/mattpocock/evalite into v1

9514115

Huge move from evals -> suites, and results -> evals

48a5581

Added changeset

26073ea

Removed streaming text support from tasks.

54c9ccb

Fixes after cherrypick

4f8ec7a

Formatting

c624aca

Docs updates

d135997

Docs updates

15f3dc2

Merge pull request #1 from cantemizyurek/faithfulness

b48f5a7

Add Faithfulness

feat: Add evaluation script for Answer Similarity

b6bac1c

Merge pull request #2 from cantemizyurek/answer-similarity

eafa00c

Add Answer Similarity Scorer

feat: Add Context Recall Scorer

4f788c3

- Introduced a new `contextRecall` scorer to evaluate how much of a generated answer can be attributed to retrieved contexts. - Updated the `scorers` module to export the new `contextRecall` scorer. Part of mattpocock#250

feat: Add evaluation script for RAG Context Recall

e6448d2

refactor: Update scorers to use 'expected' instead of 'input.reference'

652591e

refactor: Remove failedToScore utility and replace with error in scorers

156a06d

refactor: Update scoring schemas to use jsonSchema and remove zod dep…

a719796

…endency

refactor: Simplify answerSimilarity scorer by removing threshold logi…

efdffa4

…c and updating metadata format

refactor: rename embedding to embeddingModel clearer

189ef3e

refactor: update embedding property to embeddingModel for clarity

d5f243d

refactor: Introduce Scorers namespace with types for LLM and embeddin…

d031484

…g based scorers, and context recall and faithfulness classifications

refactor: Move SingleTurnSample and EvaluationSample types to Scorers…

68bcde5

… namespace for better organization

refactor: Update Evalite types to support userInput structure. And ad…

3535516

…d Multi Turn Sample Type

cantemizyurek and others added 26 commits October 22, 2025 23:32

refactor: Enhance scorer options structure with SingleTurnFn and Mult…

811ca8a

…iTurnFn types

fix: Fix evaluation input types to fit new format

7abecd4

refactor: Simplify function signatures in contextRecall and faithfuln…

f32d0c9

…ess scorers

Changed default storage to in-memory. SQLite still available via config.

f092e68

Remove problematic backend-only-constants imports

751ed07

- Remove unused DB_LOCATION import from test-utils.ts - Replace FILES_LOCATION import with local constant in files.test.ts Co-authored-by: Matt Pocock <[email protected]>

Fixed CI properly

57883fb

Huge move from evals -> suites, and results -> evals

54f9618

Added changeset

07541f6

Removed streaming text support from tasks.

58cb7a6

Fixes after cherrypick

6a25f86

Formatting

938ef45

Docs updates

0519544

Docs updates

43dbbd8

feat: Add sheet overlay backdrop for evaluation routes

53b2cd1

fix: Update layout for ResultComponent to ensure minimum height is ma…

563791d

…intained and grows as necessary

Create real-phones-join.md

df9484b

Merge branch 'v1' of https://github.com/mattpocock/evalite into scorers

a24889f

Merge pull request mattpocock#251 from cantemizyurek/scorers

f67f215

Add Scorers module

feat: Implement Tool Call Accuracy scorer

a731f2d

Related mattpocock#250

feat: Add evaluation examples for Exact Match and Contains scorers

01e9e1b

- Created a new file `string-scorers.eval.ts` to demonstrate the usage of `exactMatch` and `contains` scorers.

fix: normalize indentation in answer-similarity scorer

0a30d93

🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>

mattpocock force-pushed the v1 branch from f8a0b49 to e4dbe8a Compare November 2, 2025 13:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add 2 new scorers Exact Match and Contains #275

Add 2 new scorers Exact Match and Contains #275

Uh oh!

cantemizyurek commented Oct 26, 2025

Uh oh!

changeset-bot bot commented Oct 26, 2025 •

edited

Loading

Uh oh!

vercel bot commented Oct 26, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Add 2 new scorers Exact Match and Contains #275

Are you sure you want to change the base?

Add 2 new scorers Exact Match and Contains #275

Uh oh!

Conversation

cantemizyurek commented Oct 26, 2025

Uh oh!

changeset-bot bot commented Oct 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

⚠️ No Changeset found

Uh oh!

vercel bot commented Oct 26, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

changeset-bot bot commented Oct 26, 2025 •

edited

Loading