Open-Source Evaluation for GenAI Application Pipelines
-
Updated
Jun 30, 2024 - Python
Open-Source Evaluation for GenAI Application Pipelines
FlicksMAB is a movie recommendation system that leverages the power of multi-armed bandits (MAB) to personalize movie suggestions for users. Built using PyTorch, this system uses the MovieLens 100K dataset to learn user preferences and recommend movies that are likely to engage them.
Python client for Kolena's machine learning testing platform
Python SDK for agent monitoring, LLM cost tracking, benchmarking, and more. Integrates with most LLMs and agent frameworks like CrewAI, Langchain, and Autogen
Metrics to evaluate the quality of responses of your Retrieval Augmented Generation (RAG) applications.
The LLM Evaluation Framework
The Evaluator app is a powerful and intuitive tool designed to streamline the evaluation process across professional performance and project assessments. With its user-friendly interface and customizable criteria, Evaluator simplifies the task of scoring and providing feedback.
Embark on a transformative "100 Days of Machine Learning" journey. This curated repository guides enthusiasts through a hands-on approach, covering fundamental ML concepts, algorithms, and applications. Each day, engage in theoretical insights, practical coding exercises, and real-world projects. Balance theory with hands-on experience.
VELOCITI Benchmark Evaluation and Visualisation Code
Counting-Stars (★)
A list of works on evaluation of visual generation models, including evaluation metrics, models, and systems
LightEval is a lightweight LLM evaluation suite that Hugging Face has been using internally with the recently released LLM data processing library datatrove and LLM training library nanotron.
Valor is a centralized evaluation store which makes it easy to measure, explore, and rank model performance.
Cod soft internship
Python SDK for running evaluations on LLM generated responses
The most comprehensive Python package for evaluating survival analysis models.
The TypeScript LLM Evaluation Library
This repository contains a collection of labs that explore various machine learning algorithms and techniques. Each lab focuses on a specific topic and provides detailed explanations, code examples, and analysis. The labs cover clustering, classification and regression algos, hyperparameter tuning, data-preprocessing and various evaluation metrics.
A library to calculate similarity scores between two collections of text sequences encoded using transformer models for bitext mining, dense retrieval, retrieval-based classification, and retrieval-augmented generation (RAG).
Official implementation of the Fréchet Radiomics Distance.
Add a description, image, and links to the evaluation-metrics topic page so that developers can more easily learn about it.
To associate your repository with the evaluation-metrics topic, visit your repo's landing page and select "manage topics."