The Annotation Scheme

The Annotation Scheme for Hindi

The annotation scheme for Hindi was designed on the basis of the blueprint provided by Botley’s (2000) scheme devised for English demonstrative anaphors. A short description of Botley’s scheme is provided here in order to facilitate an understanding of the annotation scheme for Hindi demonstrative anaphors. The analysis of Botley’s three data sets was carried out using a set of linguistic labels which are typically termed ‘tags’ in corpus linguistics literature.

A two-step process was followed in Botley’s study:

Each demonstrative case in the three corpora was identified and classified, and distribution statistics were obtained.
The demonstrative types identified according to (1) were represented in terms of distinctive features found to be present or absent in them.

The demonstrative pronouns are understood in terms of an unordered paradigmatic set of five distinctive features, in Botley’s (2000) study:

R = Recoverability of antecedent

D = Direction of reference

P = Phoric Type

S= Syntactic function

A= Antecedent type

Each of the features listed in turn constitutes an unordered set consisting of values delineating different categories of demonstrative use. The features, along with their possible values, are schematically represented in a table (Botley 2000: 123), which is reproduced below:

Feature

Value 1

Value 2

Value 3

Value 4

Value 5

Recoverability of Antecedent

(Directly Recoverable)

(Indirectly Recoverable)

(Non-recoverable)

(not applicable,

eg. exophora)

None

Direction of Reference

(anaphoric)

(cataphoric)

(not applicable, eg. exophoric or deictic)

None

Phoric type

(referential)

(substitutional)

(not applicable)

None

Syntactic Function

(noun modifier)

(noun head)

(not applicable)

None

Antecedent Type

(nominal antecedent)

(propositional / factual antecedent)

(Clausal antecedent)

(Adjectival antecedent)

(no antecedent)

Table 1 ‘Demonstrative features and their possible values’

(His Table 2.3)

Each demonstrative case is assigned a five-character alphanumeric code in which each character is a slot occupied by a feature from the first column in Table 1, with an appropriate value for that feature selected from the other columns.

The demonstrative pronouns in Hindi are understood in terms of an unordered set of seven distinctive features:

DM = Distance-marking (proximal as opposed to distal deixis).

N = Nature of deixis : use of deictic element as isolated pronominal or demonstrative.

R = Recoverability of antecedent

D = Direction of reference

P = Phoric Type

S= Syntactic function

A= Antecedent type

Each of the features listed in turn constitutes an unordered set associated with a set of values. The features, along with their possible values, are schematically represented in Table 2 below, with the first tag in linear order occurring topmost in the table, followed sequentially by the remaining tags:

Feature	Value 1	Value 2	Value 3	Value 4	Value 5
Distance Marking	P (Proximate)	D (Distal)	None	None	None
Nature of deixis	P (Pronoun)	D (Demonstrative)	None	None	None
Recoverability of Antecedent	D (Directly Recoverable)	In (Inferable)	T (Temporal anchor)	0 (not applicable, eg. exophora)	None
Direction of Reference	A (Anaphoric)	C (Cataphoric)	0 (Not applicable, eg. exophoric)	None	None
Phoric type	R (Referential)	0 (Not applicable)	None	None	None
Syntactic Function	M (Noun modifier/specifier position of DP)	H (Noun head/head of DP)	0 (Not applicable)	None	None
Antecedent Type	N (Nominal antecedent)	P (Direct speech or quotation)	C (Clausal antecedent)	J (Adjectival antecedent)	0 (No antecedent)

Table 2 ‘Demonstrative features and their associated values: the final annotation scheme for Hindi’

Each demonstrative case in the sample Hindi article is assigned a seven-character alphanumeric code in which each character is a slot occupied by a feature from the first column in Table 2, with an appropriate value for that feature selected from the other columns. Where a particular value for a feature is not applicable for a particular case in the data, a zero value may be applied to that slot in the tag.

The first feature encountered in the (unordered) list of features for Hindi is distance marking. This feature refers to the binary distinction of proximal as opposed to distal deixis.

The second feature relates to nature of deixis. This feature encodes the pronominal as opposed to purely demonstrative or deictic use of the anaphoric element.

The third feature encountered in the (unordered) list of features for Hindi is recoverability of antecedent. This feature refers to the relationship between a given demonstrative expression and its antecedent or referent. Four values have been posited for the feature recoverability. The first value concerns directly recoverable antecedents and is denoted by D. The category of antecedents classified as inferable antecedents has been introduced in the present study to offer an adequate description of antecedents of Hindi demonstratives, a large number amongst whom involve antecedents which are inferred from an extended unit of text. These forms are widely attested in the Hindi corpus. They are similar to the category of indirectly recoverable antecedents in Botley (2000:125-126), but differ in some crucial aspects. A third category introduced is that of temporal anchor. These include anaphors whose antecedents are not present overtly in the text (like the inferable antecedent cases), but are anaphoric on a temporal anchorage point. A primary function of demonstrative anaphors with temporal anchor antecedents is maintaining a chronological order in the text. The value of zero is assigned to antecedents which are exophoric, i.e., they refer to a fact or an entity, for instance a date, which is not present explicitly or implicitly in the text of the newspaper article.

The fourth feature encodes the direction of reference in which anaphoric phenomena may occur. This includes anaphoric (backward referring) and cataphoric (forward referring) anaphoric elements. Cases which involve irrecoverable antecedents are by default assigned the value 0 for this feature.

The fifth feature is termed phoric type. In the Hindi case, this feature involves a binary distinction, referential and non-referential. The category of ‘substitutional’ phoric type (identified in English by Botley) does not appear to be relevant to Hindi. No case involving a deictic element valued 0 for phoric type, was encountered in the Hindi corpus under investigation.

This brings us to the feature syntactic function, which includes the division modifier and head. Demonstratives may have different statuses, syntactically, and this is captured in the annotation scheme used. The full noun phrase (NP) and determiner phrase (DP) is taken into account and the position of the demonstrative form within it is then studied. For the purposes of this study, complex internal syntactic structure of the DP is ignored and a basic distinction is made between specifier or modifier position, and the head position.

This brings us to the final (seventh) feature, antecedent type. The principal classifications under this feature, for Hindi, which are supported by the annotation scheme, are the values nominal, clausal, direct speech proposition, and the value 0 for extra-textual and non-overt antecedents. This feature contributes to bringing to the forefront the principal kinds of antecedents involved in anaphoric usage.

References