Preliminary recommendations on Spoken Texts
2 Introduction
The document starts out by sketching the different transcription and representation of the two research communities concerned with the analysis of spoken texts, the corpus linguistics and the speech community. Whereas the former is, according to Llisterri, mainly concerned with
"acquir[ing] large amounts of data reflecting the natural use of language, [and] therefore emphasis is usually put on the naturalness and spontaneity of the recording, avoiding experimentally controlled situations […]" (p. 4),
object of the latter is
"to obtain controlled speech data for basic research aimed at modelling and describing the articulatory and acoustic properties of speech, or, in the field of speech technology, to derive data for speech synthesis or to build up material for training and testing speech recognition, speaker recognition/verification or spoken language dialogue systems […]" (ibid.).
The differences are summarised in the following table (p.5):
Corpus linguistics |
Speech research |
|
Materials |
Unprepared, unelicited speech |
Controlled, elicited speech |
Scope |
Discourse, dialogue |
Utterance |
Recordings |
Natural environment |
Controlled environment |
Transcription |
Orthographic enriched (transcription) |
Phonetic and orthographic aligned with the speech signal |
Orientation towards |
Symbolic, categorical representation |
Speech symbol, temporal representation |