Corpora in Dialectology and Variation Studies

In this section we are concerned with geographical variation - corpora have long been recognised as a valuable source of comparison between language varieties as well as for the description of those varieties themselves. Certain corpora have tried to follow as far as possible the same sampling procedures as other corpora in order to maximise the degree of comparability. For examples, the LOB corpus contains roughly the same genres and sample sizes as the Brown corpus and is sampled from the same year ( i.e. 1961). The Kolhapur Indian corpus is also broadly parallel to Brown and LOB, although the sampling year is 1978.

One of the earliest pieces of work using the LOB and Brown corpora in tandem was the production of a word frequency comparison of American and British written English. These corpora have also been used as the basis of more complex aspects of language such as the use of the subjunctive (Johansson and Norheim 1988).

One role for corpora in national variation studies has been as a testbed for two theories of language variation. Quirk et al's (1985) "common core" hypothesis, and Braj Kachru's conception of national varieties as forming many unique "Englishes" which differ in important ways from one another. Most work on lexis and grammar comparing the Kolhapur Indian corpus with Brown and LOB has supported the common core hypothosis (Leitner 1991). However, there is still scope for the extension of such work.

Few examples of dialect corpora exist at present - two of which are the Helsinki corpus of English dialects and Kirk's Northern Ireland Transcribed Corpus of Speech (NITCS). Both corpora consist of conversations with a fieldworker - in Kirk's corpus from Northern Ireland, and in the Helsinki corpus from several English regions. Dialectology is an empirical field of linguistics although it has tended to concentrate on experiments and less controlled sampling, rather than use corpora. Such elicitation experiments tend to focus on vocabulary and pronunciation, neglecting other aspects of linguistics such as syntax. Dialect corpora allow these other aspects to be studied, and because the corpora are sampled so as to be representative, quantitative as well as qualitative conclusions can be drawn about the target population as a whole.

Read about comparisons using dialect data in Corpus Linguistics, Chapter 4, page 110.