Lancaster UniversityGraduate School
Faculty of Arts and Social Sciences

You are here: Home >

UCREL Corpus Research Seminar: Using Life-Logging to Re-Imagine Representativeness in Corpus Design

Date: 21 February 2013 Time: 2.00-3:00 pm

Venue: FASS Meeting Room 3

UCREL Corpus Research Seminar

Using Life-Logging to Re-Imagine Representativeness in Corpus Design

Stephen Wattam (SCC, Lancaster University)

The composition of general-purpose corpora has been a topic of much debate throughout the history of corpus linguistics, however, the sampling design of many widely-used corpora is difficult to defend for many purposes. Of particular note is the somewhat vague approach taken to balancing the socio-economic components, production/consumption, and proportions of speech/writing.

In this talk I'll re-examine the original intent of sampling language in order to produce a strategy that offers scientific and pragmatic advantages over conventional corpus building techniques. Using methods from the life-logging community—who have applied technology to construct and process verbatim recordings of everyday life—I form a sample of language use as a transitive process, rather than a persistent and immutable entity. This is intended to clarify and explicitly outline many of the assumptions made when sampling language using conventional approaches.

The advantages and disadvantages of the method will be compared to conventional general-purpose corpora, and a preliminary study will be presented as a first look into the properties of the resultant data. The practical and ethical issues surrounding such pervasive sampling methods will also be discussed, along with ways in which they may be mitigated.

In addition to this, there will be a discussion of the value such sampling methods may provide to corpus linguistics and NLP techniques from a scientific perspective.

I am particularly interested in feedback, so hope to reserve some time for a questions and comments.

Event website:


Who can attend: Anyone


Further information

Organising departments and research centres: Computing and Communications, Linguistics and English Language, University Centre for Computer Corpus Research on Language (UCREL)


| Home | Who's who? | Research Training | News and Events | Resources |

Graduate School, Faculty of Arts and Social Sciences, Lancaster University, Lancaster LA1 4YD, UK
Tel: +44 (0) 1524 510880 E-mail:
Copyright & Disclaimer | Privacy and Cookies Notice