Corpora and Grammar

Grammatical (or syntactic) studies have, along with lexical studies, been the most frequent types of research which have used corpora. Copora makes a useful tool for syntactical research because of :

Many smaller-scale studies of grammar using corpora have included quantitative data analysis (for example, Schmied's 1993 study of relative clauses). There is now a greater interest in the more systematic study of grammatical frequency - for example, Oostdijk and de Haan (1994a) are aiming to analyse the frequency of the various English clause types.

Since the 1950s the rational-theory based/empiricist-descriptive division in linguistics (see Session One) has often meant that these two approaches have been viewed as separate and in competition with each other. However, there is a group of researchers who have used corpora in order to test essentially rationalist grammatical theory, rather than use it for pure description or the inductive generation of theory.

At Nijmegen University, for instance, primarily rationalist formal grammars are tested on real-life language found in computer corpora (Aarts 1991). The formal grammar is first devised by reference to introspective techniques and to existing accounts of the grammar of the language. The grammar is then loaded into a computer parser and is run over a corpus to test how far it accounts for the data in the corpus. The grammar is then modified to take account of those analyses which it missed or got wrong.