Added ULQA benchmark #3340

keramjan · 2025-10-13T18:58:15Z

The benchmark implements multiple tasks to evaluate LLMs in new language domain: Uyghur. The benchmark includes multiple tasks to evaluate LLMs in multiple Complexity levels: 1. basic 2. intermediate 3. advanced.

1. The configuration file is copied from gsm8k and made initial edit.

1. The YAML configuration for ulqa is added.

1. The task configuration files are added and tested.

1. generation_kwargs arguments are updated.

1. More evaluation matrics are added.

added ulqa to task list

… ulqa

CLAassistant · 2025-10-13T18:58:22Z

All committers have signed the CLA.

# Conflicts: # lm_eval/tasks/README.md

baberabb · 2026-01-13T15:26:01Z

Hi! Thank you for the PR, and sorry it took so long for me to get to it. I removed ulut as a tag name as its already a group name so it doesn't duplicate.

keramjan · 2026-01-13T18:52:56Z

thank you!

keramjan and others added 30 commits August 24, 2025 12:35

task yaml configuration added

7da2a56

1. The configuration file is copied from gsm8k and made initial edit.

New task: ULQA

990ca11

1. The YAML configuration for ulqa is added.

celep1/2, uleval, ulqa

a35ce40

1. The task configuration files are added and tested.

Update ulqa.yaml

fff23be

1. generation_kwargs arguments are updated.

Update ulqa.yaml

9628d58

1. More evaluation matrics are added.

Update ulqa.yaml

4109bb2

Update ulqa.yaml

29b65cd

Update ulqa.yaml

9b2b815

Update ulqa.yaml

1e69c4b

Update ulqa.yaml

90e0c9c

Update ulqa.yaml

fa6954a

Update ulqa.yaml

2bae48c

Update ulqa.yaml

63b455f

Update ulqa.yaml

9c9b401

Update ulqa.yaml

6043d84

Update ulqa.yaml

dc55fab

Update ulqa.yaml

997ea41

Update ulqa.yaml

603ff43

Update celep2.yaml

d74e07f

Huggingface Dataset Path Updated.

c02194d

lambada_uyghur task added

d8f04be

lambada_uyghur task config updated

31f65ee

Update lambada_uyghur.yaml

56fe47d

Update lambada_uyghur.yaml

137c5b1

Merge branch 'EleutherAI:main' into ulqa

9d325d9

Update lambada_uyghur.yaml

cd0cd26

lambada Uyghur test

006a9b9

lambada Uyghur test

b375ea3

lambada Uyghur test

15a2945

lambada Uyghur test

56dd845

keramjan and others added 15 commits October 4, 2025 10:32

ulut task group debugged

3671217

ulut task group debugged

811b326

ulut task group debugged

3828a25

All sub-tasks are converted to multiple choise questions.

8a79ddc

All sub-tasks are converted to multiple choise questions.

46a0540

tag added to ulut.yaml

42072e4

Update README.md

ccae01e

added ulqa to task list

added README file

661920a

Update README.md

d4392a1

Update README.md

318bb48

Update README.md

5c70c24

all Uyghur tasks are merged

d6f2ed2

Merge branch 'ulqa' of github.com:keramjan/lm-evaluation-harness into…

9d9cf83

… ulqa

ulqa task updated

2e98394

ulqa task updated

683a6bf

keramjan requested a review from baberabb as a code owner October 13, 2025 18:58

keramjan and others added 9 commits October 14, 2025 17:18

Merge branch 'main' into ulqa

40a1a3c

ulut doc_to_decontamination_query value added

c836e26

doc_to_decontamination_query added to more tasks in ulqa

89ac4c3

metrics bleu and chrf removed from generative tasks in ulqa

a51d14f

Resolve merge conflict: keep ULQA benchmark entry

00b4c52

Merge branch 'main' into ulqa

8f93f22

Merge branch 'main' into ulqa

b3cd096

# Conflicts: # lm_eval/tasks/README.md

fix parsing bug

e313e9f

rm ulut tag as its already a group name

7657f78

baberabb approved these changes Jan 13, 2026

View reviewed changes

baberabb merged commit 4b74ec1 into EleutherAI:main Jan 13, 2026
5 of 6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Added ULQA benchmark #3340

Added ULQA benchmark #3340

keramjan commented Oct 13, 2025

Uh oh!

CLAassistant commented Oct 13, 2025 •

edited

Loading

Uh oh!

baberabb commented Jan 13, 2026

Uh oh!

Uh oh!

keramjan commented Jan 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Added ULQA benchmark #3340

Added ULQA benchmark #3340

Conversation

keramjan commented Oct 13, 2025

Uh oh!

CLAassistant commented Oct 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

baberabb commented Jan 13, 2026

Uh oh!

Uh oh!

keramjan commented Jan 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

CLAassistant commented Oct 13, 2025 •

edited

Loading