Event Information:

  • Wed

    Fall School: Corpus Linguistics

    Norwegian University of Science and Technology

    Costas Gabrielatos will give two courses, 'Keyword analysis', and 'Beyond word frequency'.

    Keyword analysis

    In this session Gabrielatos will explore definitions of the terms keyword and keyness, and discuss appropriate metrics, focusing on the distinction between effect size and statistical significance. He will also focus on how to derive true keywords (i.e. based on effect-size), while also catering for statistical significance, as all but one current corpus tools use an inappropriate metric (log-likelihood), which only specifies statistical significance (the exception being Sketch Engine).

    Beyond word frequency

    Overall, this session will focus on a more comprehensive view of  'frequency: it will discuss how the normalized word frequency in a corpus may not always be the best way to count instances of a linguistic feature, and why it is best to view the normalized frequency of a linguistic unit as the number of instances of a feature out of the total number of opportunities for it to appear (Ball, 1994). The session will also focus on how the total number of instances (however measured) may be misleading on its own, and may need to be supplemented with  metrics of dispersion/spread. Regarding word frequency, the session will show how token and type frequencies can be examined in combination – not collapsed into a single type-token ratio metric, but visualised two-dimensionally in a scatterplot.


    Fall School in Corpus Linguistics:

    Fall School full schedule: