Phonetic transcription

Spoken language corpora can also be transcibed using a form of phonetic transcription. Not many examples of publicly available phonetically transcribed corpora exist at the time of writing. This is possibly because phonetic transcription is a form of annotation which needs to be carried out by humans rather than computers. Such humans have to be well skilled in the perception and transcription of speech sounds. Phonetic transcription is therefore a very time consuming task.

Another problem is that phonetic transcription works on the assumption that the speech signal can be divided into single, clearly demarcated "sounds", while in fact, these "sounds" do not have such clear boundaries, therefore what phonetic transcription takes to be the same sound, might be different according to context.

Nevertheless, phonetically transcribed corpora is extremely useful to the linguist who lacks the technological tools and expertise for the laboratory analysis of recorded speech. One such example is the MARSEC corpus (which is derived from the Lancaster/IBM Spoken English Corpus) and has been manipulated by the Universities of Lancaster and Leeds. The MARSEC corpus will include a phonetic transcription.


Part-of-speech annotation | Lemmatisation | Parsing
Semantics | Discoursal and text annotation
Prosody | Problem-oriented tagging