Finalizing MMTEB #784

KennethEnevoldsen · 2024-05-22T08:47:47Z

This issue is to get an overview of what needs to be done before MMTEB can be finalized.

Adding the last remaining datasets, notably:
Speeding up the benchmark
- I believe we are only missing: Convert tasks to ClusteringFast. #660
- see also Paper segment: Speeding up retrieval #836
- see also Paper segment: Speeding up MTEB (English) + co2 impact #838
- see also Paper segment: Comparing scores of Clustering to ClusteringFast #835
Running models #705 (partly depends on 1, 2 as well as Model which we need to implement #879)
Figuring out Score Aggregation for Multilingual benchmark (and in general) #752 (partly depends on 3)
Deciding on meaningful benchmark subsets (depends on 3)

see Paper segment: Task selection #837

Paper Writing: An overview issue #896 (depends on 3, 4 and 5) (see also Paper writing #595)

see Paper segment: Score aggregation #839

Updating leaderboard to new format Leaderboard changes for massive multilinguality #674 (depends on 3-6)

Is there anything else that is needed?

vaibhavad · 2024-05-22T14:43:00Z

Construction of MMTEB-Lite? It will be a faster version of MMTEB. Two approaches that come to mind for implementing this are -

Reducing the size document set of some retrieval benchmarks.
Reducing the number of tasks

dokato · 2024-05-23T07:24:49Z

Hey @KennethEnevoldsen I'd like to merge also this dataset in #773. 3 reasons: a) we don't seem to have brazilian dialect represeted, b) multilabel task doesn't have large language coverage c) I had it prepared for long time, but multilabel task got only merged last week when I was away. We only need to address a problem with stratification of the splits there.

KennethEnevoldsen · 2024-05-23T08:28:16Z

@vaibhavad yes, def. we need to construct the benchmarks and ideally think about downsampling some of the larger retrieval datasets. A solution might be to implement a downsample function for retrieval tasks.

Thanks @dokato - let us get it merged in as well. Looks to be in a reasonable state

Ruqyai · 2024-05-24T06:42:28Z

Hey @KennethEnevoldsen I read the list and I think I can help in Running models #705

jordiclive · 2024-05-31T11:02:38Z

@KennethEnevoldsen Is there anything meaningful new contributors can help with?

KennethEnevoldsen · 2024-05-31T13:25:25Z

Hi @jordiclive! I believe there are multiple avenues to take, but any of the outlines paper segment I believe is meaning (see the updated post above), implementing model (see e.g. #845, will finish it up either Monday or in the weekend), or starting work on 8)

KennethEnevoldsen pinned this issue May 22, 2024

This was referenced Jun 3, 2024

Add Russian tasks (RU-MTEB) #815

Merged

Transitioning to a faster MTEB #482

Closed

WIP: Add abstention metrics #841

Closed

This was referenced Jun 10, 2024

Paper Writing: An overview issue #896

Open

add Indonesian MT STS dataset #922

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Finalizing MMTEB #784

Finalizing MMTEB #784

KennethEnevoldsen commented May 22, 2024 •

edited

Loading

vaibhavad commented May 22, 2024

dokato commented May 23, 2024

KennethEnevoldsen commented May 23, 2024

Ruqyai commented May 24, 2024

jordiclive commented May 31, 2024

KennethEnevoldsen commented May 31, 2024

Finalizing MMTEB #784

Finalizing MMTEB #784

Comments

KennethEnevoldsen commented May 22, 2024 • edited Loading

vaibhavad commented May 22, 2024

dokato commented May 23, 2024

KennethEnevoldsen commented May 23, 2024

Ruqyai commented May 24, 2024

jordiclive commented May 31, 2024

KennethEnevoldsen commented May 31, 2024

KennethEnevoldsen commented May 22, 2024 •

edited

Loading