Skip to content

Commit

Permalink
[Refactor] Change evaluation script path (#1165)
Browse files Browse the repository at this point in the history
  • Loading branch information
deshraj authored Jan 12, 2024
1 parent 862ff6c commit affe319
Show file tree
Hide file tree
Showing 21 changed files with 50 additions and 45 deletions.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
Original file line number Diff line number Diff line change
@@ -1,42 +1,42 @@
---
title: "Pipeline"
title: "App"
---

Create a RAG pipeline object on Embedchain. This is the main entrypoint for a developer to interact with Embedchain APIs. A pipeline configures the llm, vector database, embedding model, and retrieval strategy of your choice.
Create a RAG app object on Embedchain. This is the main entrypoint for a developer to interact with Embedchain APIs. An app configures the llm, vector database, embedding model, and retrieval strategy of your choice.

### Attributes

<ParamField path="local_id" type="str">
Pipeline ID
App ID
</ParamField>
<ParamField path="name" type="str" optional>
Name of the pipeline
Name of the app
</ParamField>
<ParamField path="config" type="BaseConfig">
Configuration of the pipeline
Configuration of the app
</ParamField>
<ParamField path="llm" type="BaseLlm">
Configured LLM for the RAG pipeline
Configured LLM for the RAG app
</ParamField>
<ParamField path="db" type="BaseVectorDB">
Configured vector database for the RAG pipeline
Configured vector database for the RAG app
</ParamField>
<ParamField path="embedding_model" type="BaseEmbedder">
Configured embedding model for the RAG pipeline
Configured embedding model for the RAG app
</ParamField>
<ParamField path="chunker" type="ChunkerConfig">
Chunker configuration
</ParamField>
<ParamField path="client" type="Client" optional>
Client object (used to deploy a pipeline to Embedchain platform)
Client object (used to deploy an app to Embedchain platform)
</ParamField>
<ParamField path="logger" type="logging.Logger">
Logger object
</ParamField>

## Usage

You can create an embedchain pipeline instance using the following methods:
You can create an app instance using the following methods:

### Default setting

Expand Down Expand Up @@ -127,4 +127,4 @@ app = App.from_config(config_path="config.json")
}
```

</CodeGroup>
</CodeGroup>
File renamed without changes.
File renamed without changes.
File renamed without changes.
33 changes: 17 additions & 16 deletions docs/components/evaluation.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -84,7 +84,7 @@ Once you have created your dataset, you can run evaluation on the dataset by pic
For example, you can run evaluation on context relevancy metric using the following code:

```python
from embedchain.eval.metrics import ContextRelevance
from embedchain.evaluation.metrics import ContextRelevance
metric = ContextRelevance()
score = metric.evaluate(dataset)
print(score)
Expand Down Expand Up @@ -112,20 +112,21 @@ context_relevance_score = num_relevant_sentences_in_context / num_of_sentences_i
You can run the context relevancy evaluation with the following simple code:

```python
from embedchain.eval.metrics import ContextRelevance
from embedchain.evaluation.metrics import ContextRelevance

metric = ContextRelevance()
score = metric.evaluate(dataset) # 'dataset' is definted in the create dataset section
print(score)
# 0.27975528364849833
```

In the above example, we used sensible defaults for the evaluation. However, you can also configure the evaluation metric as per your needs using the `ContextRelevanceConfig` class.

Here is a more advanced example of how to pass a custom evaluation config for evaluating on context relevance metric:

```python
from embedchain.config.eval.base import ContextRelevanceConfig
from embedchain.eval.metrics import ContextRelevance
from embedchain.config.evaluation.base import ContextRelevanceConfig
from embedchain.evaluation.metrics import ContextRelevance

eval_config = ContextRelevanceConfig(model="gpt-4", api_key="sk-xxx", language="en")
metric = ContextRelevance(config=eval_config)
Expand All @@ -144,7 +145,7 @@ metric.evaluate(dataset)
The language of the dataset being evaluated. We need this to determine the understand the context provided in the dataset. Defaults to `en`.
</ParamField>
<ParamField path="prompt" type="str" optional>
The prompt to extract the relevant sentences from the context. Defaults to `CONTEXT_RELEVANCY_PROMPT`, which can be found at `embedchain.config.eval.base` path.
The prompt to extract the relevant sentences from the context. Defaults to `CONTEXT_RELEVANCY_PROMPT`, which can be found at `embedchain.config.evaluation.base` path.
</ParamField>


Expand All @@ -161,7 +162,7 @@ answer_relevancy_score = mean(cosine_similarity(generated_questions, original_qu
You can run the answer relevancy evaluation with the following simple code:

```python
from embedchain.eval.metrics import AnswerRelevance
from embedchain.evaluation.metrics import AnswerRelevance

metric = AnswerRelevance()
score = metric.evaluate(dataset)
Expand All @@ -172,8 +173,8 @@ print(score)
In the above example, we used sensible defaults for the evaluation. However, you can also configure the evaluation metric as per your needs using the `AnswerRelevanceConfig` class. Here is a more advanced example where you can provide your own evaluation config:

```python
from embedchain.config.eval.base import AnswerRelevanceConfig
from embedchain.eval.metrics import AnswerRelevance
from embedchain.config.evaluation.base import AnswerRelevanceConfig
from embedchain.evaluation.metrics import AnswerRelevance

eval_config = AnswerRelevanceConfig(
model='gpt-4',
Expand All @@ -200,7 +201,7 @@ score = metric.evaluate(dataset)
The number of questions to generate for each answer. We use the generated questions to compare the similarity with the original question to determine the score. Defaults to `1`.
</ParamField>
<ParamField path="prompt" type="str" optional>
The prompt to extract the `num_gen_questions` number of questions from the provided answer. Defaults to `ANSWER_RELEVANCY_PROMPT`, which can be found at `embedchain.config.eval.base` path.
The prompt to extract the `num_gen_questions` number of questions from the provided answer. Defaults to `ANSWER_RELEVANCY_PROMPT`, which can be found at `embedchain.config.evaluation.base` path.
</ParamField>

## Groundedness <a id="groundedness"></a>
Expand All @@ -214,7 +215,7 @@ groundedness_score = (sum of all verdicts) / (total # of claims)
You can run the groundedness evaluation with the following simple code:

```python
from embedchain.eval.metrics import Groundedness
from embedchain.evaluation.metrics import Groundedness
metric = Groundedness()
score = metric.evaluate(dataset) # dataset from above
print(score)
Expand All @@ -224,8 +225,8 @@ print(score)
In the above example, we used sensible defaults for the evaluation. However, you can also configure the evaluation metric as per your needs using the `GroundednessConfig` class. Here is a more advanced example where you can configure the evaluation config:

```python
from embedchain.config.eval.base import GroundednessConfig
from embedchain.eval.metrics import Groundedness
from embedchain.config.evaluation.base import GroundednessConfig
from embedchain.evaluation.metrics import Groundedness

eval_config = GroundednessConfig(model='gpt-4', api_key="sk-xxx")
metric = Groundedness(config=eval_config)
Expand All @@ -242,15 +243,15 @@ score = metric.evaluate(dataset)
The openai api key to use for the evaluation. Defaults to `None`. If not provided, we will use the `OPENAI_API_KEY` environment variable.
</ParamField>
<ParamField path="answer_claims_prompt" type="str" optional>
The prompt to extract the claims from the provided answer. Defaults to `GROUNDEDNESS_ANSWER_CLAIMS_PROMPT`, which can be found at `embedchain.config.eval.base` path.
The prompt to extract the claims from the provided answer. Defaults to `GROUNDEDNESS_ANSWER_CLAIMS_PROMPT`, which can be found at `embedchain.config.evaluation.base` path.
</ParamField>
<ParamField path="claims_inference_prompt" type="str" optional>
The prompt to get verdicts on the claims from the answer from the given context. Defaults to `GROUNDEDNESS_CLAIMS_INFERENCE_PROMPT`, which can be found at `embedchain.config.eval.base` path.
The prompt to get verdicts on the claims from the answer from the given context. Defaults to `GROUNDEDNESS_CLAIMS_INFERENCE_PROMPT`, which can be found at `embedchain.config.evaluation.base` path.
</ParamField>

## Custom <a id="custom_metric"></a>

You can also create your own evaluation metric by extending the `BaseMetric` class. You can find the source code for the existing metrics at `embedchain.eval.metrics` path.
You can also create your own evaluation metric by extending the `BaseMetric` class. You can find the source code for the existing metrics at `embedchain.evaluation.metrics` path.

<Note>
You must provide the `name` of your custom metric in the `__init__` method of your class. This name will be used to identify your metric in the evaluation report.
Expand All @@ -260,7 +261,7 @@ You must provide the `name` of your custom metric in the `__init__` method of yo
from typing import Optional

from embedchain.config.base_config import BaseConfig
from embedchain.eval.metrics import BaseMetric
from embedchain.evaluation.metrics import BaseMetric
from embedchain.utils.eval import EvalData

class MyCustomMetric(BaseMetric):
Expand Down
18 changes: 11 additions & 7 deletions embedchain/app.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,24 +11,28 @@
import yaml
from tqdm import tqdm

from embedchain.cache import (Config, ExactMatchEvaluation,
SearchDistanceEvaluation, cache,
gptcache_data_manager, gptcache_pre_function)
from embedchain.cache import (
Config,
ExactMatchEvaluation,
SearchDistanceEvaluation,
cache,
gptcache_data_manager,
gptcache_pre_function,
)
from embedchain.client import Client
from embedchain.config import AppConfig, CacheConfig, ChunkerConfig
from embedchain.constants import SQLITE_PATH
from embedchain.embedchain import EmbedChain
from embedchain.embedder.base import BaseEmbedder
from embedchain.embedder.openai import OpenAIEmbedder
from embedchain.eval.base import BaseMetric
from embedchain.eval.metrics import (AnswerRelevance, ContextRelevance,
Groundedness)
from embedchain.evaluation.base import BaseMetric
from embedchain.evaluation.metrics import AnswerRelevance, ContextRelevance, Groundedness
from embedchain.factory import EmbedderFactory, LlmFactory, VectorDBFactory
from embedchain.helpers.json_serializable import register_deserializable
from embedchain.llm.base import BaseLlm
from embedchain.llm.openai import OpenAILlm
from embedchain.telemetry.posthog import AnonymousTelemetry
from embedchain.utils.eval import EvalData, EvalMetric
from embedchain.utils.evaluation import EvalData, EvalMetric
from embedchain.utils.misc import validate_config
from embedchain.vectordb.base import BaseVectorDB
from embedchain.vectordb.chroma import ChromaDB
Expand Down
File renamed without changes.
File renamed without changes.
File renamed without changes.
2 changes: 1 addition & 1 deletion embedchain/eval/base.py → embedchain/evaluation/base.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
from abc import ABC, abstractmethod

from embedchain.utils.eval import EvalData
from embedchain.utils.evaluation import EvalData


class BaseMetric(ABC):
Expand Down
File renamed without changes.
Original file line number Diff line number Diff line change
Expand Up @@ -8,9 +8,9 @@
from openai import OpenAI
from tqdm import tqdm

from embedchain.config.eval.base import AnswerRelevanceConfig
from embedchain.eval.base import BaseMetric
from embedchain.utils.eval import EvalData, EvalMetric
from embedchain.config.evaluation.base import AnswerRelevanceConfig
from embedchain.evaluation.base import BaseMetric
from embedchain.utils.evaluation import EvalData, EvalMetric


class AnswerRelevance(BaseMetric):
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -8,9 +8,9 @@
from openai import OpenAI
from tqdm import tqdm

from embedchain.config.eval.base import ContextRelevanceConfig
from embedchain.eval.base import BaseMetric
from embedchain.utils.eval import EvalData, EvalMetric
from embedchain.config.evaluation.base import ContextRelevanceConfig
from embedchain.evaluation.base import BaseMetric
from embedchain.utils.evaluation import EvalData, EvalMetric


class ContextRelevance(BaseMetric):
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -8,9 +8,9 @@
from openai import OpenAI
from tqdm import tqdm

from embedchain.config.eval.base import GroundednessConfig
from embedchain.eval.base import BaseMetric
from embedchain.utils.eval import EvalData, EvalMetric
from embedchain.config.evaluation.base import GroundednessConfig
from embedchain.evaluation.base import BaseMetric
from embedchain.utils.evaluation import EvalData, EvalMetric


class Groundedness(BaseMetric):
Expand Down
File renamed without changes.
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[tool.poetry]
name = "embedchain"
version = "0.1.63"
version = "0.1.64"
description = "Data platform for LLMs - Load, index, retrieve and sync any unstructured data"
authors = [
"Taranjeet Singh <[email protected]>",
Expand Down

0 comments on commit affe319

Please sign in to comment.