This is an algorithm for Arabic stemming written on Snowball framework language. If offers light stemming and text normalization.
@article{Chelli2018,
author = "Assem Chelli",
title = "{Assem's Arabic Stemmer}",
year = "2018",
month = "11",
url = "https://figshare.com/articles/Assem_s_Arabic_Stemmer/7295690",
doi = "10.6084/m9.figshare.7295690.v1"
}
This is a sample of results:
Word | Light Stemmer | Root-Based Stemmer |
---|---|---|
طفل | طفل | طفل |
اطفال | اطفال | طفل |
الاطفال | اطفال | طفل |
اطفالكم | اطفال | طفل |
فأطفالكم | اطفال | طفل |
اطفالهم | اطفال | طفل |
والاطفال | اطفال | طفل |
فاطفالهم | اطفال | طفل |
وطفل | طفل | طفل |
الطفولة | طفول | طفل |
والطفلتين | طفل | طفل |
طفلتان | طفل | طفل |
They are already attached as git submodules so just run:
$ git submodule update --init --recursive
$ make build
- Light Stemmer
$ make run
الطالب
طالب
- Root-Based Stemmer
$ make run_root
الطالب
طلب
We configured tests to run against snowball-data arabic sample to test speed, grouping factor and precision.
$ make test
- dist light stemmer to available languages:
$ make dist