CLDF dataset derived from Grollemund et al.'s "Bantu expansion shows habitat alters the route and pace of human dispersals" from 2015
If you use these data please cite
- the original source
Grollemund, Rebecca, Branford, Simon, Bostoen, Koen, Meade, Andrew, Venditti, Chris, & Pagel, Mark (2015) Bantu expansion shows habitat alters the route and pace of human dispersals. Proc Natl Acad Sci USA. doi:10.1073/pnas.1503793112.
- the derived dataset using the DOI of the particular released version you were using
This dataset is licensed under a https://creativecommons.org/licenses/by-nc/4.0/ license
Available online at https://doi.org/10.1073/pnas.1503793112
Conceptlists in Concepticon:
From Harald Hammarström:
- I don't know what Os and Dj stand for in B71a_Teke_Os, B71a_Teke_Dj but if they are B71A the id should be Tegue of the Alima/Gabon [teg].
- Lega is ambiguous between Lega Shabunda and Lega Mwenda, I have reason to suspect this is Lega Shabunda because that's the one Stappert worked on and I bet they got their wordlist from there.
- The D20B_Vamba vocabulary (it's been discussed a couple of times in the literature)
probably did not come from a native speaker, but purports to be Amba [rwm] so I've assigned it that id. - The D313_Mbuttu_1919 is often taken to be a variant of Vanuma, but that's based on impressionistic comparison, their paper puts it closer to Bodo, so I've id:d it as Bodo.
- Nyiha and Emakhua are to specified to region but those bare names usually mean Central Nyiha and Central Emakhua so I've id:d them so.
- Based on Philippson's other publications (the data is from him), JE32_Luyia could only be Masaaba, Isuxa, Logooli, or Saamia. Saamia [lsm] is the largest one and also the one the missionaries tried to use for standardisation so one might as well guess JE32_Luyia is Saamia [lsm].
The orthography profile is only an approximation. There remain quite a few cases where we could not decide what the pronunciation is, due to ambiguities. We left them in this form, but ask kindly to check upon this, when running any kind of analysis in which the phonetic transcriptions of this dataset are important.
- Varieties: 424 (linked to 333 different Glottocodes)
- Concepts: 100 (linked to 100 different Concepticon concept sets)
- Lexemes: 37,730
- Sources: 217
- Synonymy: 1.00
- Cognacy: 37,712 cognates in 3,853 cognate sets (1,794 singletons)
- Cognate Diversity: 0.10
- Invalid lexemes: 0
- Tokens: 183,363
- Segments: 606 (0 BIPA errors, 0 CLTS sound class errors, 600 CLTS modified)
- Inventory size (avg): 40.85
| Name | GitHub user | Description | Role |
|---|---|---|---|
| Robert Forkel | @xrotwang | CLDF conversion | Editor |
| Tiago Tresoldi | @tresoldi | CLDF conversion | Editor |
| Johann-Mattis List | @lingulist | orthography profile | Editor |
| Rebecca Grollemund | data collection | Distributor | |
| Mark Pagel | data analysis | Distributor |
The following CLDF datasets are available in cldf:
- CLDF Wordlist at cldf/cldf-metadata.json