Corpora in Speech Research
A spoken corpus is important because of the following useful features:
- It provides a broad sample of speech, extending over a wide selection of variables such as:
- speaker gender
- speaker age
- speaker class
- genre (e.g. newsreading, poetry, legal proceedings etc)
This allows generalisations to be made about spoken language as the corpus is as wide and as representative as possible. It also allows for variations within a given spoken language to be studied.
- It provides a sample of naturalistic speech rather than speech elicited under aritificial conditions. The findings from the corpus are therefore more likely to reflect language as it is spoken in "real life" since the data is less likely to be subject to production monitoring by the speaker (such as trying to suppress a regional accent).
- Because the (transcribed) corpus has usually been enhanced with prosodic and other annotations it is easier to carry out large scale quantitative analyses than with fresh raw data. Where more than one type of annotation has been used it is possible to study the interrelationships between say, phonetic annotations and syntactic structure.
Prosodic annotation of spoken corpora
Because much phonetic corpus annotation has been at the level of prosody, this has been the focus of most of the phonetic and phonological research in spoken corpora. This work can be divided roughly into three types:
-
How do prosodic elements of speech relate to other linguistic levels?
- How does what is actually perceived and transcribed relate to the actual acoustic reality of speech?
- How does the typology of the text relate to the prosodic patterns in the corpus?
Read more about prosodic annotation in spoken corpora in detail in Corpus Linguistics, Chapter 4, pages 89-90.