The Swahili Classification Task #998

msamwelmollel · 2024-06-27T21:54:36Z

Checklist

Run tests locally to make sure nothing is broken using make test.
Run the formatter to format the code using make lint.

Adding datasets checklist

Reason for dataset addition: ...

I have run the following models on the task (adding the results to the pr). These can be run using the mteb -m {model_name} -t {task_name} command.
- sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2
- intfloat/multilingual-e5-small
I have checked that the performance is neither trivial (both models gain close to perfect scores) nor random (both models gain close to random scores).
If the dataset is too big (e.g. >2048 examples), considering using self.stratified_subsampling() under dataset_transform()
I have filled out the metadata object in the dataset file (find documentation on it here).
Run tests locally to make sure nothing is broken using make test.
Run the formatter to format the code using make lint.

Adding a model checklist

I have filled out the ModelMeta object to the extent possible
I have ensured that my model can be loaded using
- mteb.get_model(model_name, revision_id) and
- mteb.get_model_meta(model_name, revision_id)
I have tested the implementation works on a representative set of tasks.

KennethEnevoldsen

Please also specify "reason for adding the dataset".

KennethEnevoldsen · 2024-06-28T08:27:06Z

mteb/tasks/Classification/swa/SwahiliNewsClassification.py

+class SwahiliNewsClassification(AbsTaskClassification):
+    metadata = TaskMetadata(
+        name="SwahiliNewsClassification",
+        description="Dataset for Swahili News Classification, categorized with 5 domains. Building and Optimizing Swahili Language Models: Techniques, Embeddings, and Datasets",


which 5 domains?

KennethEnevoldsen · 2024-06-28T08:27:40Z

mteb/tasks/Classification/swa/SwahiliNewsClassification.py

+        socioeconomic_status="mixed",
+        annotations_creators="derived",
+        text_creation="found",
+        bibtex_citation="""


No citation?

KennethEnevoldsen · 2024-06-28T08:27:56Z

mteb/tasks/Classification/swa/SwahiliNewsClassification.py

@@ -0,0 +1,43 @@
+from __future__ import annotations
+
+# from ....abstasks import AbsTaskClassification


Suggested change

# from ....abstasks import AbsTaskClassification

KennethEnevoldsen · 2024-06-28T08:28:59Z

mteb/tasks/Classification/swa/SwahiliNewsClassification.py

+    metadata = TaskMetadata(
+        name="SwahiliNewsClassification",
+        description="Dataset for Swahili News Classification, categorized with 5 domains. Building and Optimizing Swahili Language Models: Techniques, Embeddings, and Datasets",
+        reference="https://huggingface.co/datasets/Mollel/SwahiliNewsClassification",


Seems like the dataset has quite limited documentation?

msamwelmollel added 5 commits June 27, 2024 15:42

swahili_news_classfication

d2932ce

update_two

99bd4e8

Update Swahili addin source and refeence

b774828

Update SwahiliNewsClassification.py

dfaf35a

Update SwahiliNewsClassification.py

affb828

msamwelmollel closed this Jun 27, 2024

msamwelmollel reopened this Jun 27, 2024

KennethEnevoldsen reviewed Jun 28, 2024

View reviewed changes

KennethEnevoldsen self-assigned this Jun 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The Swahili Classification Task #998

The Swahili Classification Task #998

msamwelmollel commented Jun 27, 2024 •

edited

Loading

KennethEnevoldsen left a comment

KennethEnevoldsen Jun 28, 2024

KennethEnevoldsen Jun 28, 2024

KennethEnevoldsen Jun 28, 2024

KennethEnevoldsen Jun 28, 2024

		@@ -0,0 +1,43 @@
		from __future__ import annotations

		# from ....abstasks import AbsTaskClassification

The Swahili Classification Task #998

Are you sure you want to change the base?

The Swahili Classification Task #998

Conversation

msamwelmollel commented Jun 27, 2024 • edited Loading

Checklist

Adding datasets checklist

Adding a model checklist

KennethEnevoldsen left a comment

Choose a reason for hiding this comment

KennethEnevoldsen Jun 28, 2024

Choose a reason for hiding this comment

KennethEnevoldsen Jun 28, 2024

Choose a reason for hiding this comment

KennethEnevoldsen Jun 28, 2024

Choose a reason for hiding this comment

KennethEnevoldsen Jun 28, 2024

Choose a reason for hiding this comment

msamwelmollel commented Jun 27, 2024 •

edited

Loading