Semantics

Two types of semantic annotation can be identified:
  1. The marking of semantic relationships between items in the text, for example the agents or patients of particular actions. This has scarcely begun to be widely accepted at the time of writing, although some forms of parsing capture much of its import.
  2. The marking of semantic features of words in the text, essentially the annotation of word senses in one form or another. This has quite a long history, dating back to the 1960s.

There is no universal agreement about which semantic features ought to be annotated - in fact in the past much of the annotation was motivated by social scientific theories of, for instance, social interaction. However, Sedelow and Sedelow (1969) made use of Roget's Thesarus - in which words are organised into general semantic categories.

The example below (Wilson, forthcoming) is intended to give the reader an idea of the types of categories used in semantic tagging:

And		00000000
the		00000000
soldiers	23241000
platted		21072000
a		00000000
crown		21110400
of		00000000
thorns		13010000
and		00000000
put		21072000
it		00000000
on		00000000
his		00000000
head		21030000
and		00000000
they		00000000
put		21072000
on		00000000
him		00000000
a		00000000
purple		31241100
robe		21110321
The numeric codes stand for:

00000000	Low content word (and, the, a, of, on, his, they etc)
13010000	Plant life in general
21030000	Body and body parts
21072000	Object-oriented physical activity (e.g. put)
21110321	Men's clothing: outer clothing
21110400	Headgear
23231000	War and conflict: general
31241100	Colour
The semantic categories are represented by 8-digit numbers - the one above is based on that used by Schmidt (1993) and has a hierarchical structure, in that it is made up of three top level categories, which are themselves subdivided, and so on.
Part-of-speech annotation | Lemmatisation | Parsing
Discoursal and text linguistic annotation | Phonetic transcription
Prosody | Problem-oriented tagging