Skip to content

hlplab/SwehVd-corpus

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

67 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

The SwehVd corpus

This is the repository of the vowel database SwehVd, a phonetically annotated corpus of Swedish h-VOWEL-d word recordings. SwehVd was collected with the goal to characterize the Central Swedish vowel space---specifically, the Stockholm variety---within and across talkers. Stockholm Swedish is a variety subsumed under Central Swedish---the regional standard variety of Swedish spoken in an area around and beyond Stockholm (eastern Svealand), including Mälardalssvenska, Sveamål, Uppsvenska, Mellansvenska [see e.g., @bruce2009; @elert1994; @riad2014]. The SwehVd targets the contemporary Central Swedish spoken in the larger Stockholm area. It covers the entire monophthong inventory of Central Swedish, including all nine long vowels (hid, hyd, hud, hed, häd, höd, had, håd, hod), eight short vowels (hidd, hydd, hudd, hedd, hädd, hödd, hadd, hådd, hodd), and four allophones (härd, härr, hörd, hörr). SwehVd consists of N=10 recordings of each hVd word (for a total of 220 recordings for the 22 different hVd words) per talker.

The database contains recordings from 48 talkers (N = 24 female) for a total N of tokens = 10,010. The database contains first to third formant (F1-F3) measurements for each talker at five time points across each vowel, together with vowel duration and mean F0 over the entire vowel. The following information is available on talker: age (at time of recording) and gender. All talkers were L1 talkers of Stockholm Swedish, born and raised in the larger Stockholm area. All talkers spoke other languages besides Swedish, they were all proficient talkers of English, all reported proficiency of a least 4 on a scale from 1 (=beginner) to 7 (=fluent), as self-rated in a language background questionnaire. Other languages spoken within the population were French (N=16), German (N=17), Spanish (N=11), Norwegian (N=3), Japanese (N=2), Korean (N=2), Italian (N=2), Dutch (N=2), Turkish (N=1), Serbian (N=1), Danish (N=1), Finnish (N=1), Mandarin Chinese (N=1), and Persian (N=1). We do not provide language proficiency information on individual level in this database, since it can be considered sensitive information (in accordance with GDPR; Regulation (EU) 2016/679 of the European Parliament and of the Council of 27 April 2016 on the protection of natural persons with regard to the processing of personal data and on the free movement of such data, and repealing Directive 95/46/EC (General Data Protection Regulation) [2016] OJ L 119/1).

For a more exhaustive description of the recording, processing and annotation process, please see Persson & Jaeger, 2023.

This repository contains sound files, textgrids, and a Praat script from which raw phonetic measurements were obtained, together with a bib-file that lists papers that have used the SwehVd corpus, and two rmd-files, one release notes file and one that reads in manually corrected measurement errors. We encourage all users to initialize git lfs (please see, https://git-lfs.github.com/), given that the repository contain large files (sound files in .wav format).

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published