Headers and Text

Elements

The TEI annotation of the header and the text is based around two basic devices: tags and entity references. Texts are assumed to be made up of elements. An element refers to any unit of text e.g. a word, sentence, paragraph, chapter etc. Elements are marked in TEI by using SGML tags (which are different from the "tags" used in linguistic annotation such as "VVZ"). These SGML tags are indicated by a pair of balanced angle brackets (< >).

Start and end tags

A start tag at the beginning of an element is represented by a balanced pair of angle brackets containing annotation strings, thus <...>; while an end tag is represented by a slash character preceding the annotation strings, thus: </...> To give an example, a frequently used TEI tag is that which indicates a paragraph. This would be represented as follows:

<p>
The actual textual material goes here.
</p>

Entity references

In contrast to tags, entity references are delimited by the characters & and ;. Put simply, an entity reference is a shorthand way of encoding detailed information whitin a text. The shorthand form refers outwards to a feature system declaration (FSD) in the document header which contains all the relevant information in full TEI tag-based markup.

For example, one shorthand code which is used in part-of-speech annotation is "vvd", in which the first v signifies that the word is a verb, the second v signifies that it is a lexical verb, and the d signifies that it is a past tense form. In the example below, this code is used in the form of an entity reference:

polished&vvd;

The entity reference here (&vvd;) might refer to a feature system declaration such as the following - which presents the information contained in the tag "vvd" fully in terms of its component features:

<fs id=vvd type=word-form>
	<f name=verb-class><sym value=verb>
	<f name=base><sym value=lexical>
	<f name=verb-form><sym value=past>
</fs>

In a pre-TEI corpus such as the Lancaster-IBM Spoken English Corpus, the same word plus annotation might look something like this:

polished_vvd

Here the tag is attached to the word with an underscore character and the user could look up the meaning of the code "vvd" in a table listing all the codes used in annotating the corpus:

vvd: past tense form of lexical verb (e.g. looked)