Skip to content

Commit

Permalink
Effectively Annotate Text Data for Transformers via Active Learning u…
Browse files Browse the repository at this point in the history
…sing Cleanlab (#63)

# What does this PR do?

Demonstrating how to effectively annotate text data for Transformer
models using active learning, specifically leveraging the Cleanlab
open-source package.

* Introduction to active learning and its importance in efficiently
utilizing labeling efforts under budget constraints.

* Implementation of the ActiveLab algorithm, which assists in
prioritizing data for annotation based on the potential impact on model
performance. This is particularly beneficial when dealing with noisy
annotators, as it helps in deciding whether to seek additional
annotations for previously labeled data or new data.

* A detailed walkthrough on iteratively improving a text classification
model by selecting the most impactful data points for annotation,
retraining the model, and evaluating its performance.

## Who can review?

 @MKhalusova appreciate your review.

---------

Co-authored-by: Maria Khalusova <[email protected]>
  • Loading branch information
aravindputrevu and MKhalusova committed Apr 8, 2024
1 parent fdb688f commit 7a512d6
Show file tree
Hide file tree
Showing 3 changed files with 2,299 additions and 1 deletion.
3 changes: 2 additions & 1 deletion notebooks/en/_toctree.yml
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,7 @@
title: Create a legal preference dataset
- local: semantic_cache_chroma_vector_database
title: Implementing semantic cache to improve a RAG system.
- local: annotate_text_data_transformers_via_active_learning
title: Annotate text data using Active Learning with Cleanlab
- local: llm_judge
title: Using LLM-as-a-judge for an automated and versatile evaluation

Loading

0 comments on commit 7a512d6

Please sign in to comment.