About the Switchboard corpus:
The Switchboard corpus is conversational telephone speech collected as 2-channel, 8kHz-sampled data. We are using just the Switchboard-1 Phase 1 training data. The catalog number LDC97S62 (Switchboard-1 Release 2) corresponds, we believe, to what we have. We also use the Mississippi State transcriptions, which we download separately from here.
About the Fisher-English corpus
The Fisher-English corpus is conversational telephone speech collected as 2-channel, 8kHz-sampled data. The data is similar to Switchboard but the transcription was mostly done in a "faster", lower-quality way.
Fisher comes in two parts, and the text and speech have separate LDC numbers. This recipe uses both parts. The LDC numbers are
The speech: **LDC2004S13**, **LDC2005S13**
The text: **LDC2004T19**, **LDC2005T19**
We are using the eval2000 a.k.a. hub5'00 evaluation data. The LDC numbers are
The speech: **LDC2002S09**
The text: **LDC2002T43**
comming soon
comming soon
comming soon