In January 2006, Lancaster was awarded a grant by the British Academy (SG-42148) to develop a digital resource for the investigation of spoken Nepali. The corpus, now complete, is a digitised and re-encoded version of the collection of spoken Nepali discourse made in the 1970s by Professor C. M. Bandhu, Central Department of Linguistics, Tribhuvan University.

In the 1970s, a collection of spoken Nepali text was gathered by Professor C. M. Bandhu of Tribhuvan University, Kathmandu. The texts, amounting to 96 pages of transcriptions, represent the Nepali of a number of villages in different districts of Nepal. The Bandhu Collection is small by the standards of modern corpora, though large by the standards of the time. However, it is of immense value: consisting largely of narrative it is a lens through which the effect of narrative content on linguistic form may be examined using quantitative methodologies.

While originally produced in digital form, the encoding and formatting of the data was not standardised. To make the data more widely usable and available, we initiated a project entitled A digital resource for the study of spoken Nepali language: the Bandhu Collection, with the support of the British Academy. This work was planned to run alongside, and to complement, current work on the Nepali National Corpus (NNC) currently being constructed as part of the Nelralec project.

This project had the following goals: to convert the Bandhu Collection from its current state as a paper-only resource of limited use, to a digital resource; to manipulate the encoding and markup of the text to bring it into line, as far as possible, with the emerging standard practice for the encoding of Nepali text corpora; and, using the NNC as a baseline, to employ the Bandhu Collection to establish basic parameters for an empirical, corpus-based study of textual discourse-type variation in Nepali.

