Discoursal and text linguistic annotation.

Aspects of language at the levels of text and discourse are one of the least frequently encountered annotations in corpora. However, occasionally such annotations are applied.

Discourse tags

Stenström (1984) annotated the London-Lund spoken corpus with 16 "discourse tags". They included categories such as:

Despite their potential role in the analysis of discourse these kinds of annotation have never become widely used, possibly because the linguistic categories are context-dependent and their identification in texts is a greater source of dispute than other forms of linguistic phenomena.

Anaphoric annotation

Cohesion is the vehicle by which elements in text are linked together, through the use of pronouns, repetition, substitution and other devices. Halliday and Hasan's "Cohesion in English" (1976) was considered to be a turning point in linguistics, as it was the most influential account of cohesion. Anaphoric annotation is the marking of pronoun reference - our pronoun system can only be realised and understood by reference to large amounts of empirical data, in other words, corpora.

Anaphoric annotation can only be carried out by human analysts, since one of the aims of the annotation is to train computer programs with this data to carry out the task. There are only a few instances of corpora which have been anaphorically annotated; one of these is the Lancaster/IBM anaphoric treebank, an example of which is given below:

A039 1 v (1 [N Local_JJ atheists_NN2 N] 1) [V want_VV0 (2 [N the_AT (9 Charlotte_N1 9) Police_NN2 Department_NNJ N] 2) [Ti to_TO get_VV0 rid_VVN of_IO [N 3 <REF=2 its_APP$ chaplain 3) ,_, [N {{3 the_AT Rev._NNSB1 Dennis_NP1 Whitaker_NP1 3} ,_, 38_MC N]N]Ti]V] ._.

The above text has been part-of-speech tagged and skeleton parsed, as well as anaphorically annotated. The following codes explain the annotation:


Part-of-speech annotation | Lemmatisation | Parsing
Semantics | Phonetic transcription
Prosody | Problem-oriented tagging