If you use these data please cite this dataset using the DOI of the particular released version you were using
When you are using UraLex 2.0, you should also cite the following papers which introduce the dataset:
De Heer, Mervi; Blokland, Rogier; Dunn, Michael; Vesakoski, Outi. (submitted manuscript). “Loanwords in basic vocabulary as an indicator of borrowing profiles.”
and
Syrjänen, Kaj; Maurits, Luke; Leino, Unni; Honkola, Terhi; Rota, Jadranka & Vesakoski, Outi. (submitted manuscript). “Crouching TIGER, Hidden Structure: Exploring the nature of linguistic data using TIGER values.”
The dataset is described in uralex_documentation.md.
This dataset is licensed under a CC-BY-4.0 license
- Varieties: 27 (linked to 26 different Glottocodes)
- Concepts: 313 (linked to 313 different Concepticon concept sets)
- Lexemes: 10,231
- Sources: 42
- Synonymy: 1.22
- Cognacy: 9,751 cognates in 3,792 cognate sets (2,349 singletons)
- Cognate Diversity: 0.35
| Name | GitHub user | Description | Role |
|---|---|---|---|
| Mervi de Heer | @MervideHeer | author, DataCurator, DataCollector | |
| Mikko Heikkilä | author | ||
| Kaj Syrjänen | @kasyrj | data collection | Author, DataCurator |
| Jyri Lehtinen | author, DataCollector | ||
| Outi Vesakoski | author | ||
| Toni, Suutari | author | ||
| Michael Dunn | @evoling | author | |
| Urho Määttä | author | ||
| Unni-Päivä Leino | author | ||
| Luke Maurits | @lmaurits | helped with sources | Other |
| Robert Forkel | @xrotwang | patron, code | DataCurator |
The following CLDF datasets are available in cldf:
- CLDF Wordlist at cldf/cldf-metadata.json