The Annotation Scheme for Hindi
The annotation scheme for Hindi was designed on the basis of the blueprint provided by Botley’s (2000) scheme devised for English demonstrative anaphors. A short description of Botley’s scheme is provided here in order to facilitate an understanding of the annotation scheme for Hindi demonstrative anaphors. The analysis of Botley’s three data sets was carried out using a set of linguistic labels which are typically termed ‘tags’ in corpus linguistics literature.
A two-step process was followed in Botley’s study:
The demonstrative pronouns are understood in terms of an unordered paradigmatic set of five distinctive features, in Botley’s (2000) study:
R = Recoverability of antecedent
D = Direction of reference
P = Phoric Type
S= Syntactic function
A= Antecedent type
Each of the features listed in turn constitutes an unordered set consisting of values delineating different categories of demonstrative use. The features, along with their possible values, are schematically represented in a table (Botley 2000: 123), which is reproduced below:
Feature |
Value 1 |
Value 2 |
Value 3 |
Value 4 |
Value 5 |
Recoverability of Antecedent |
D (Directly Recoverable) |
I (Indirectly Recoverable) |
N (Non-recoverable) |
0 (not applicable, eg. exophora) |
None |
Direction of Reference |
A (anaphoric) |
C (cataphoric) |
0 (not applicable, eg. exophoric
or deictic) |
None |
None |
Phoric type |
R (referential) |
S (substitutional) |
0 (not applicable) |
None |
None |
Syntactic Function |
M (noun modifier) |
H (noun head) |
0 (not applicable) |
None |
None
|
Antecedent Type |
N (nominal antecedent) |
P (propositional / factual
antecedent) |
C (Clausal antecedent) |
J (Adjectival antecedent) |
0 (no antecedent) |
Table 1 ‘Demonstrative features and their possible values’
(His Table 2.3)
Each demonstrative case is assigned a five-character alphanumeric code in which each character is a slot occupied by a feature from the first column in Table 1, with an appropriate value for that feature selected from the other columns.
The demonstrative pronouns in Hindi are understood in terms of an unordered set of seven distinctive features:
DM = Distance-marking (proximal as opposed to distal deixis).
N = Nature of deixis : use of deictic element as isolated pronominal or demonstrative.
R = Recoverability of antecedent
D = Direction of reference
P = Phoric Type
S= Syntactic function
A= Antecedent type
Each of the features listed in turn constitutes an unordered set associated with a set of values. The features, along with their possible values, are schematically represented in Table 2 below, with the first tag in linear order occurring topmost in the table, followed sequentially by the remaining tags:
Feature |
Value 1 |
Value 2 |
Value 3 |
Value 4 |
Value 5 |
Distance Marking |
P (Proximate) |
D (Distal) |
None |
None |
None |
Nature of deixis |
P (Pronoun) |
D (Demonstrative) |
None |
None |
None |
Recoverability of Antecedent |
D (Directly Recoverable) |
In (Inferable) |
T (Temporal anchor) |
0 (not applicable, eg. exophora) |
None |
Direction of Reference |
A (Anaphoric) |
C (Cataphoric) |
0 (Not applicable, eg. exophoric) |
None |
None |
Phoric type |
R (Referential) |
0 (Not applicable) |
None |
None |
None |
Syntactic Function |
M (Noun modifier/specifier position
of DP) |
H (Noun head/head of DP) |
0 (Not applicable) |
None |
None
|
Antecedent Type |
N (Nominal antecedent) |
P (Direct speech or quotation) |
C (Clausal antecedent) |
J (Adjectival antecedent) |
0 (No antecedent) |
Table 2 ‘Demonstrative features
and their associated values: the final annotation scheme for Hindi’
The first feature encountered in the (unordered) list of features for Hindi is distance marking. This feature refers to the binary distinction of proximal as opposed to distal deixis.
The second feature relates to nature of deixis. This feature encodes the pronominal as opposed to purely demonstrative or deictic use of the anaphoric element.
The third feature encountered in the (unordered) list of features for Hindi is recoverability of antecedent. This feature refers to the relationship between a given demonstrative expression and its antecedent or referent. Four values have been posited for the feature recoverability. The first value concerns directly recoverable antecedents and is denoted by D. The category of antecedents classified as inferable antecedents has been introduced in the present study to offer an adequate description of antecedents of Hindi demonstratives, a large number amongst whom involve antecedents which are inferred from an extended unit of text. These forms are widely attested in the Hindi corpus. They are similar to the category of indirectly recoverable antecedents in Botley (2000:125-126), but differ in some crucial aspects. A third category introduced is that of temporal anchor. These include anaphors whose antecedents are not present overtly in the text (like the inferable antecedent cases), but are anaphoric on a temporal anchorage point. A primary function of demonstrative anaphors with temporal anchor antecedents is maintaining a chronological order in the text. The value of zero is assigned to antecedents which are exophoric, i.e., they refer to a fact or an entity, for instance a date, which is not present explicitly or implicitly in the text of the newspaper article.
The fourth feature encodes the direction of reference in which anaphoric phenomena may occur. This includes anaphoric (backward referring) and cataphoric (forward referring) anaphoric elements. Cases which involve irrecoverable antecedents are by default assigned the value 0 for this feature.
The
fifth feature is termed phoric type.
In the Hindi case, this feature involves a binary distinction, referential
and non-referential. The category of ‘substitutional’ phoric type (identified
in English by Botley) does not appear to be relevant to Hindi. No case involving
a deictic element valued 0 for phoric type, was encountered in the
Hindi corpus under investigation.
This brings us to the feature syntactic function, which includes the division modifier and head. Demonstratives may have different statuses, syntactically, and this is captured in the annotation scheme used. The full noun phrase (NP) and determiner phrase (DP) is taken into account and the position of the demonstrative form within it is then studied. For the purposes of this study, complex internal syntactic structure of the DP is ignored and a basic distinction is made between specifier or modifier position, and the head position.
This brings us to the final (seventh) feature, antecedent type. The principal classifications under this feature, for Hindi, which are supported by the annotation scheme, are the values nominal, clausal, direct speech proposition, and the value 0 for extra-textual and non-overt antecedents. This feature contributes to bringing to the forefront the principal kinds of antecedents involved in anaphoric usage.
For further details on the Hindi annotation scheme, and the results obtained from applying statistical tests on the annotated Hindi corpus, I refer the reader to Sinha (2003).
Botley, 2000. Corpora and Discourse anaphora: using corpus evidence to test theoretical claims. Ph.D. thesis, Lancaster University.
Sinha, 2003. Demonstrative anaphors in Hindi newspaper reportage: a corpus-based study. MA dissertation, Lancaster University.