#

evaluation-metrics

Here are 397 public repositories matching this topic...

relari-ai / continuous-eval

Open-Source Evaluation for GenAI Application Pipelines

information-retrieval evaluation-metrics evaluation-framework rag llmops retrieval-augmented-generation llm-evaluation

Updated Jun 30, 2024
Python

diyabodiwala / FlicksMAB

FlicksMAB is a movie recommendation system that leverages the power of multi-armed bandits (MAB) to personalize movie suggestions for users. Built using PyTorch, this system uses the MovieLens 100K dataset to learn user preferences and recommend movies that are likely to engage them.

deep-learning pytorch recommendation-system multi-armed-bandits movielens-dataset evaluation-metrics

Updated Jun 29, 2024
Python

kolena

kolenaIO / kolena

Python client for Kolena's machine learning testing platform

testing machine-learning evaluation evaluation-metrics evaluation-framework mlops evaluate-models llmops

Updated Jun 29, 2024
Python

AgentOps-AI / agentops

Python SDK for agent monitoring, LLM cost tracking, benchmarking, and more. Integrates with most LLMs and agent frameworks like CrewAI, Langchain, and Autogen

agent ai openai evaluation-metrics mistral cost-estimation autogen groq agentops llm langchain anthropic evals ollama crewai

Updated Jun 28, 2024
Python

TonicAI / tonic_validate

Metrics to evaluate the quality of responses of your Retrieval Augmented Generation (RAG) applications.

evaluation-metrics evaluation-framework rag large-language-models llm llms llmops retrieval-augmented-generation

Updated Jun 28, 2024
Python

confident-ai / deepeval

The LLM Evaluation Framework

evaluation-metrics evaluation-framework llm-evaluation llm-evaluation-framework llm-evaluation-metrics

Updated Jun 28, 2024
Python

narches / evaluator

The Evaluator app is a powerful and intuitive tool designed to streamline the evaluation process across professional performance and project assessments. With its user-friendly interface and customizable criteria, Evaluator simplifies the task of scoring and providing feedback.

evaluation evaluation-metrics mentoring-platform mentorevaluator

Updated Jun 28, 2024
HTML

Muhammad-Sheraz-ds / 100Days-of-Machine-Learning

Embark on a transformative "100 Days of Machine Learning" journey. This curated repository guides enthusiasts through a hands-on approach, covering fundamental ML concepts, algorithms, and applications. Each day, engage in theoretical insights, practical coding exercises, and real-world projects. Balance theory with hands-on experience.

reinforcement-learning exploratory-data-analysis models machine-learning-algorithms regression data-acquisition data-visualization feature-extraction supervised-learning classification feature-engineering unsupervised-learning evaluation-metrics clustering-algorithm time-series-analysis handling-missing-value handling-outlier business-problem data-preprocessing-and-cleaning

Updated Jun 28, 2024
Jupyter Notebook

katha-ai / VELOCITI

VELOCITI Benchmark Evaluation and Visualisation Code

benchmarking benchmark video artificial-intelligence dataset awesome-list clip evaluation-metrics video-understanding vlm semantic-role-labeling llm chain-of-thought vision-language-model llm-inference llama3

Updated Jun 28, 2024
Python

nick7nlp / Counting-Stars

Counting-Stars (★)

evaluation-metrics long-context large-language-model

Updated Jun 28, 2024
Jupyter Notebook

ziqihuangg / Awesome-Evaluation-of-Visual-Generation

A list of works on evaluation of visual generation models, including evaluation metrics, models, and systems

benchmark awesome evaluation image-generation evaluation-metrics generative-models video-generation evaluation-system

Updated Jun 28, 2024

huggingface / lighteval

LightEval is a lightweight LLM evaluation suite that Hugging Face has been using internally with the recently released LLM data processing library datatrove and LLM training library nanotron.

evaluation evaluation-metrics evaluation-framework huggingface

Updated Jun 27, 2024
Python

Striveworks / valor

Valor is a centralized evaluation store which makes it easy to measure, explore, and rank model performance.

computer-vision evaluation classification object-detection image-segmentation evaluation-metrics model-evaluation mlops

Updated Jun 28, 2024
Python

Jeevaranjani / CODSOFT

Cod soft internship

logistic-regression evaluation-metrics datavisualization datapreprocessing

Updated Jun 26, 2024
Jupyter Notebook

athina-ai / athina-evals

Python SDK for running evaluations on LLM generated responses

evaluation evaluation-metrics evaluation-framework llmops llm-eval llm-ops llm-evaluation llm-evaluation-toolkit

Updated Jun 25, 2024
Python

shi-ang / SurvivalEVAL

The most comprehensive Python package for evaluating survival analysis models.

survival-analysis evaluation-metrics survival-curves

Updated Jun 24, 2024
Python

evalkit / evalkit

The TypeScript LLM Evaluation Library

nodejs javascript typescript ai evaluations devtools openai evaluation-metrics gpt-3 gpt-4 llm

Updated Jun 22, 2024
TypeScript

shreyansh-2003 / Hands-On-With-Machine-Learning-Algorithms

This repository contains a collection of labs that explore various machine learning algorithms and techniques. Each lab focuses on a specific topic and provides detailed explanations, code examples, and analysis. The labs cover clustering, classification and regression algos, hyperparameter tuning, data-preprocessing and various evaluation metrics.

machine-learning linear-regression exploratory-data-analysis data-visualization seaborn scipy logistic-regression support-vector-machine gradient-descent regression-models statsmodels evaluation-metrics decision-tree-classifier kmeans-clustering hierarchical-clustering classification-algorithm sklearn-library clustering-algorithms data-preparation-and-analysis

Updated Jun 22, 2024
Jupyter Notebook

gentaiscool / distfuse

A library to calculate similarity scores between two collections of text sequences encoded using transformer models for bitext mining, dense retrieval, retrieval-based classification, and retrieval-augmented generation (RAG).

api semantic retrieval metrics similarity embeddings evaluation-metrics similarity-score rag sentence-transformers dense-retrieval

Updated Jun 22, 2024
Python

RichardObi / frd-score

Official implementation of the Fréchet Radiomics Distance.

Updated Jun 21, 2024
Python

Improve this page

Add a description, image, and links to the evaluation-metrics topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the evaluation-metrics topic, visit your repo's landing page and select "manage topics."