Fall School: Corpus LinguisticsNorwegian University of Science and Technology
Costas Gabrielatos will give two courses, 'Keyword analysis', and 'Beyond word frequency'.
In this session Gabrielatos will explore definitions of the terms keyword and keyness, and discuss appropriate metrics, focusing on the distinction between effect size and statistical significance. He will also focus on how to derive true keywords (i.e. based on effect-size), while also catering for statistical significance, as all but one current corpus tools use an inappropriate metric (log-likelihood), which only specifies statistical significance (the exception being Sketch Engine).
Beyond word frequency
Overall, this session will focus on a more comprehensive view of 'frequency: it will discuss how the normalized word frequency in a corpus may not always be the best way to count instances of a linguistic feature, and why it is best to view the normalized frequency of a linguistic unit as the number of instances of a feature out of the total number of opportunities for it to appear (Ball, 1994). The session will also focus on how the total number of instances (however measured) may be misleading on its own, and may need to be supplemented with metrics of dispersion/spread. Regarding word frequency, the session will show how token and type frequencies can be examined in combination – not collapsed into a single type-token ratio metric, but visualised two-dimensionally in a scatterplot.
Fall School in Corpus Linguistics: http://www.ntnu.edu/lingphil/corpus-linguistics
Fall School full schedule: http://www.ntnu.edu/lingphil/schedule