Man sat in computer lab

Short courses in Corpus Linguistics

As an alternative to taking a complete Masters course, you can take our individual specialist postgraduate modules for Institutional credit. These are ideal if you need to develop knowledge and skills in a specific area. Each of these courses can be taken separately, run for one term (three months), and are offered online.



Term 1 (Oct–Dec)

Fundamentals of corpus linguistics

Term 1 (Oct– Dec)

Corpus based grammar and vocabulary of English

Term 2 (Jan– Mar)

Corpus design and data collection

Term 2 (Jan– Mar)

Using corpora in Language teaching

Term 2 (Jan – Mar)

Corpus based discourse analysis

Term 3 (Apr – Jun)

Statistics and data visualization

Fees per course (one term) in 2022/23

UK: £1,640

International: £3,387

Online learning

The courses are delivered via Lancaster’s high quality virtual learning site. You will have access to the University’s extensive online library resources, and there will be plenty of opportunities to interact with peers and your tutors.

Fundamentals of corpus linguistics

This course provides an overview of corpus linguistic methods and their application in a range of areas, including sociolinguistics, discourse analysis and applied linguistics. It will enable you to acquire theoretical knowledge of the underlying principles of the field of corpus linguistics as well as practical skills. The course introduces key corpus linguistic techniques such as concordance analysis, the analysis of wordlists and ngram lists, keyword analysis and collocation analysis. It also provides an overview of practical applications of corpus methods in a wide range of areas of linguistic and social research. An indicative outline of topics, not necessary in this order, includes (please note that the topics in the actual course may slightly vary):

  • The story of corpus linguistics: background and basic terminology
  • Linguistic description and corpus annotation
  • Concordances and frequency information
  • Collocations and n-grams
  • Types of corpora, available corpora, and corpus building
  • Corpus linguistics and society: discourse analysis, sociolinguistics
  • Corpus linguistics and pragmatics
  • Corpora in the classroom.
  • Presenting corpus research in research reports

Concise Bibliography

Biber, D., & Reppen, R. (Eds.). (2015). The Cambridge handbook of English corpus linguistics. Cambridge University Press.

Biber, D., Douglas, B., Conrad, S., & Reppen, R. (1998). Corpus linguistics: Investigating language structure and use. Cambridge University Press.

Gablasova, D., Brezina, V., & McEnery, A. (2019). The Trinity Lancaster Corpus: Development, description and application. International Journal of Learner Corpus Research, 5(2), 126-160.

McEnery, T., & Hardie, A. (2011). Corpus linguistics: Method, theory and practice. Cambridge University Press.

O'Keeffe, A., & McCarthy, M. (Eds.). (2010). The Routledge handbook of corpus linguistics. Routledge.

boy doing eye tracking

Corpus based grammar and vocabulary of English

This course offers a detailed corpus-based description of the grammar and lexicon of the English language and the methodology to arrive at this description. You will learn about words – for example, their meanings and the relationships between words as observed in a corpus (collocation and colligation) – word classes, phrases and clauses. The course discusses the concept of lexicogrammar, a notion that allow us to see language holistically with the attention to patterns, which do not fit in older grammar book and dictionary descriptions. Special attention is paid to spoken language and the ‘grammar of conversation’. In this way, the course has a dual focus: 1) It provides linguistic knowledge and relevant terminology about the English language, guiding you through grammatical, lexical and lexico-grammatical patterns observed in large general corpora. 2) It teaches transferable skills of language description of a variety of lexico-grammatical phenomena based on corpus evidence. An indicative outline of topics, not necessary in this order, includes (please note that the topics in the actual course may vary):

  • Words and word classes
  • Phrases and clauses
  • Grammar of the noun phrase
  • Grammar of the verb phrase
  • The grammar of conversation
  • Words and their meanings
  • Producing (pedagogical) wordlists
  • Special topics in corpus-based analyses of lexicogrammar

Concise Bibliography

Biber, D., Johansson, S., Leech, G., Conrad, S., & Finegan, E. (1999, 2022). Longman grammar of spoken and written English. Longman.

Biber, D., Conrad, S., & Leech, G. (2002). Longman student grammar of spoken and written English. Longman.

Brezina, V., & Gablasova, D. (2017). How to produce vocabulary lists? Issues of definition, selection and pedagogical aims. A response to Gabriele Stein. Applied Linguistics, 38(5), 764-767.

Carter, R., & McCarthy, M. (2006). Cambridge grammar of English: a comprehensive guide. Cambridge University Press.

Kennedy, G. (2003). Structure and meaning in English: a guide for teachers. Routledge.

Huddleston, R., & Pullum, G. (2002). The Cambridge grammar of the English language. Cambridge

Tulips in Alex square

Corpus design and data collection

This course provides essential information about corpus design and data collection, one of the key areas in corpus linguistics. It will equip you with the necessary skills for carrying out research projects that are not dependent on existing corpora; instead, you will be able to collect data from a variety of sources and compile them into a properly sampled dataset. Building on a long tradition of corpus development at Lancaster University and providing specific examples from recent projects such as the British National Corpus 2014, Guangwai Lancaster Corpus of L2 Chinese or Trinity Lancaster Corpus, the course offers both theoretical knowledge and practical skills for you to be able to build your own corpus. An indicative outline of topics, not necessary in this order, includes (please note that the topics in the actual course may vary):

  • Corpus as a sample: Types of sampling, sampling frame
  • Corpus design: Necessary steps before data collection
  • Corpus development: Recording data and meta-data, data cleaning, xml conversion
  • Corpus annotation: Types of annotation, POS tagging and lemmatization, semantic and error tagging
  • Written corpus design
  • Spoken corpus design
  • Learner corpus design
  • Multimodal corpus design
  • Corpus distribution and copyright

Concise Bibliography

Biber, D. (1993). Representativeness in corpus design. Literary and linguistic computing, 8(4), 243- 257.

Brezina, V., Hawtin, A., & McEnery, T. (2021). The Written British National Corpus 2014–design and comparability. Text & Talk,

Brezina, V., Gablasova, D., & McEnery, T. (2019). Corpus-based approaches to spoken L2 production. International Journal of Learner Corpus Research, 5(2), 119-125.

Čermák, F. (2009). Spoken corpora design: Their constitutive parameters. International Journal of Corpus Linguistics, 14(1), 113-123.

Davies, M. (2009). The 385+ million word Corpus of Contemporary American English (1990–2008+): Design, architecture, and linguistic insights. International journal of corpus linguistics, 14(2), 159-190.

Knight, D. (2011). The future of multimodal corpora. Revista brasileira de linguística aplicada, 11(2), 391-415.

Love, R., Dembry, C., Hardie, A., Brezina, V., & McEnery, T. (2017). The spoken BNC2014.

Students in seminar

Corpus based discourse analysis

This course offers an in-depth exploration of corpus-based discourse analysis, a prominent area of the application of the corpus method with a very long Lancaster tradition. A range of practical examples of corpus-based discourse studies across a variety of discourse domains (e.g. media discourse, healthcare-related discourse, etc.) will guide you to develop your skills in this area. The course includes an overview of different fields in which corpus-based discourse analysis can be employed, detailed discussion of linguistic and societal implications of these topics, as well as relevant social and linguistic theories. An indicative outline of topics, not necessary in this order, includes (please note that the topics in the actual course may vary):

  • Basic concepts and the role of corpora in discourse analysis
  • News discourse
  • Language on television
  • Social media discourse
  • Healthcare discourse
  • Financial discourse
  • Discourse and gender
  • Discourse and politics
  • Science discourse
  • Writing up discourse analysis

Concise Bibliography

Baker, P. (2019). Fabulosa!: The Story of Polari, Britain's Secret Gay Language. Reaktion Books.

Baker, P. (2014). Using corpora to analyze gender. Continuum.

Baker, P. (2006). Using corpora in discourse analysis. Continuum.

Baker, P., Brookes, G., & Evans, C. (2019). The Language of Patient Feedback: A Corpus Linguistic Study of Online Health Communication. Routledge.

Baker, P., Gabrielatos, C., & McEnery, T. (2013). Discourse analysis and media attitudes: The representation of Islam in the British press. Cambridge University Press.

Baker, P., & Ellece, S. (2011). Key terms in discourse analysis. Continuum.

Baker, P., Gabrielatos, C., Khosravinik, M., Krzyżanowski, M., McEnery, T., & Wodak, R. (2008). A useful methodological synergy? Combining critical discourse analysis and corpus linguistics to examine discourses of refugees and asylum seekers in the UK press. Discourse & society, 19(3), 273-306.

Bednarek, M. (2018). Language and Television Series: A Linguistic Approach to TV Dialogue. Cambridge University Press.

Partington, A., & Taylor, C. (2017). The language of persuasion in politics: An introduction. Routledge.

Semino, E., Demjén, Z., Hardie, A., Payne, S., & Rayson, P. (2017). Metaphor, cancer and the end of life: A corpus-based study. Routledge.

Taylor, C., & Marchi, A. (Eds.). (2018). Corpus approaches to discourse: A critical review. Routledge.

Alex Square

Using corpora in Language teaching

The course is divided into two main parts. The first part (Weeks 1-5) will cover major areas related to using corpora in the classroom. This part will familiarise you with the main theoretical and practical issues in corpus-based language teaching, raising your awareness of advantages and limitations of corpus-based approaches. The second part (Weeks 6-10) will consist of application of corpus methods in different areas of language teaching. An indicative outline of topics, not necessary in this order, includes:

  • Corpus-based approaches to language teaching: Key issues
  • Direct use of corpora in the classroom: Data-driven learning
  • Developing corpus-based teaching materials
  • Analysing learner language using corpora
  • Corpora in teaching vocabulary and grammar
  • Corpora in teaching speaking and writing skills
  • Corpora in teaching English for Academic Purposes
  • Corpora and language assessment

Concise Bibliography

Bennett, G. (2010). Using corpora in the language learning classroom: Corpus linguistics for teachers. Ann Arbor: University of Michigan Press/ESL

Campoy-Cubillo, M.C., Belles-Fortuno, B., & Gea-Valor, M.L. (Eds.) (2010). Corpus-based approaches to English language teaching. New York: Continuum.

Granger, S., Gilquin, G., & Meunier, F. (Eds.). (2015). The Cambridge handbook of learner corpus research. Cambridge University Press.

Gilquin, G., & Granger, S. (2010). How can data-driven learning be used in language teaching. The Routledge Handbook of Corpus Linguistics (pp. 359-370).

Hunston, S. (2010). Corpora in Applied Linguistics. Cambridge University Press.

Jones, C., & Waller, D. (2015). Corpus linguistics for grammar: A guide for research. Routledge.

McEnery, T, Xiao, R., & Tono, Y. (2006). Corpus-based language studies: An advanced resource book. New York: Routledge

O'Keeffe, A., McCarthy, M., & Carter, R. (2007). From corpus to classroom: Language use and language teaching. New York: Cambridge University Press.

Reppen, R. (2010). Using corpora in the language classroom. New York: Cambridge University Press.

Sinclair, J. (Ed.). (2004). How to use corpora in language teaching. Amsterdam: John Benjamins Publishing Company.

Timmis, I. (2015). Corpus linguistics for ELT: Research and practice. New York: Routledge.

man and woman working on computer

Statistics and data visualization

This course provides an overview of the main statistical procedures used for the analysis of linguistic data and language corpora, together with examples of application of these methods. Since corpus linguistics is an essentially quantitative approach, the module will enable you to acquire theoretical knowledge of the mathematical modelling of linguistic data and of appropriate statistical tests, as well as practical skills to carry out a range of statistical analyses of linguistic (corpus) data. The course is tailor-made for linguistics students and structured according to linguistic topics and the relevant statistical methods for their analysis. An indicative outline of topics, not necessary in this order, includes (please note that the topics in the actual course may vary):

  • Measures of frequency, dispersion and diversity
  • Meta-analysis and effect sizes
  • Statistics behind collocations, keywords and reliability of manual coding
  • Contingency tables, the chi-squared test and regression models
  • Correlation, cluster analysis and factor analysis
  • T-test, ANOVA and their non-parametric counterparts
  • Bootstrapping and non-parametric regression
  • Data visualization
  • Presenting statistical information in research reports

Concise Bibliography

Brezina, V. (2018). Statistics in corpus linguistics: A practical guide. Cambridge University Press.

Brezina, V., & Meyerhoff, M. (2014). Significant or random. A critical review of sociolinguistic generalisations based on large corpora. International Journal of Corpus Linguistics, 19(1), 1-28.

Brezina, V., McEnery, T., & Wattam, S. (2015). Collocations in context: A new perspective on collocation networks. International Journal of Corpus Linguistics, 20(2), 139-173.

Everitt, B. S., & Skrondal, A. (2010). The Cambridge dictionary of statistics. Cambridge University Press.

Gries, S. Th. (2013). Statistics for linguistics with R: A practical introduction. Walter de Gruyter.

Oakes, M. P. (1998). Statistics for corpus linguistics. Edinburgh University Press.

Vogt, W. P., & Johnson, B. (2011). Dictionary of statistics & methodology: A nontechnical guide for the social sciences. Sage.

lady in a computer lab with headphones

Taster Session: Corpora as samples of languages

Vaclav Brezina will be looking at the process of building corpora and also at the different corpus types.

Linked icons