Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Finalizing MMTEB #784

Open
3 of 4 tasks
KennethEnevoldsen opened this issue May 22, 2024 · 6 comments
Open
3 of 4 tasks

Finalizing MMTEB #784

KennethEnevoldsen opened this issue May 22, 2024 · 6 comments

Comments

@KennethEnevoldsen
Copy link
Contributor

KennethEnevoldsen commented May 22, 2024

This issue is to get an overview of what needs to be done before MMTEB can be finalized.

  1. Adding the last remaining datasets, notably:
  2. Speeding up the benchmark
  3. Running models #705 (partly depends on 1, 2 as well as Model which we need to implement #879)
  4. Figuring out Score Aggregation for Multilingual benchmark (and in general) #752 (partly depends on 3)
  5. Deciding on meaningful benchmark subsets (depends on 3)
  1. Paper Writing: An overview issue #896 (depends on 3, 4 and 5) (see also Paper writing #595)
  1. Updating leaderboard to new format Leaderboard changes for massive multilinguality #674 (depends on 3-6)

Is there anything else that is needed?

@KennethEnevoldsen KennethEnevoldsen pinned this issue May 22, 2024
@vaibhavad
Copy link
Contributor

Construction of MMTEB-Lite? It will be a faster version of MMTEB. Two approaches that come to mind for implementing this are -

  1. Reducing the size document set of some retrieval benchmarks.
  2. Reducing the number of tasks

@dokato
Copy link
Collaborator

dokato commented May 23, 2024

Hey @KennethEnevoldsen I'd like to merge also this dataset in #773. 3 reasons: a) we don't seem to have brazilian dialect represeted, b) multilabel task doesn't have large language coverage c) I had it prepared for long time, but multilabel task got only merged last week when I was away. We only need to address a problem with stratification of the splits there.

@KennethEnevoldsen
Copy link
Contributor Author

@vaibhavad yes, def. we need to construct the benchmarks and ideally think about downsampling some of the larger retrieval datasets. A solution might be to implement a downsample function for retrieval tasks.

Thanks @dokato - let us get it merged in as well. Looks to be in a reasonable state

@Ruqyai
Copy link
Contributor

Ruqyai commented May 24, 2024

Hey @KennethEnevoldsen I read the list and I think I can help in Running models #705

@jordiclive
Copy link
Contributor

@KennethEnevoldsen Is there anything meaningful new contributors can help with?

@KennethEnevoldsen
Copy link
Contributor Author

Hi @jordiclive! I believe there are multiple avenues to take, but any of the outlines paper segment I believe is meaning (see the updated post above), implementing model (see e.g. #845, will finish it up either Monday or in the weekend), or starting work on 8)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants