So…new dictionary sheds light on frequency of words in British English

4 February 2024 09:00

‘Yeah’, ‘so’, ‘oh’ and ‘like’ take pride of place in the top 100 most used British English words, according to a new dictionary presented by researchers at Lancaster University.

Not surprisingly, words such as ‘the’, ‘be’ ‘and’ ‘a’ and ‘of’ make the top five most used words and coming in towards the end of the 5000-strong list of words are ‘Victorian’, ‘Sydney’, and ‘Belgium’; the lower-frequency items still appear 13 times per million words.

And, for example, did you know that the adjective ‘hot’ occurs with an average frequency of 115 times per million words and is used most frequently in speech and least frequently in official documents? We often talk about ‘hot water’, ‘hot chocolate’ and ‘hot tub’.

‘A Frequency Dictionary of British English’ (recently published by Routledge), collated and presented by Professor of Corpus Linguistics Vaclav Brezina and Senior Lecturer Dr Dana Gablasova, includes the 5,000 most frequent words in current British English including emerging variations.

The dictionary therefore helps distinguish different uses of words, which, say the authors, is essential for both language learners and researchers interested in words occurring in real contexts.

The new book provides information about frequency and distribution of words designed to meet the needs of a wide variety of users interested in current British English vocabulary including students, educators, researchers, journalists, libraries and material developers.

“Words can tell us fascinating stories about how we live, what we find important and how we think about the world,” explains Professor Brezina. “This dictionary provides detailed insights into the use of English words across a number of contexts, a social geography of language, if you like.”

The frequency dictionary is based on extensive research on current British English by using the British National Corpus 2014 (BNC2014), a 100-million-word representative corpus or dataset of contemporary British English developed at Lancaster University and uses ‘per one million words’ as a measure of word frequencies.

The corpus includes a wide range of genres/registers of spoken and written English including informal speech, fiction, newspapers, academic writing and e-language.

The BNC2014 was constructed as a comparable counterpart to the original British National Corpus completed in the early 1990s.

Researchers on that project, which included Professor Brezina, used innovative computational methods to examine huge amounts of words to compile and analyse the new balanced dataset (corpus) which covers the period from 2007 to 2020 with 2014 providing the mid-point.

Professor Brezina explained that he and Dr Gablasova had used a new tool called #LancsBox X, developed and custom-made by a team at Lancaster University, which enabled them to produce the analysis.

A free download available to everyone, it would, he added, help researchers produce dictionaries and grammar books much more speedily in the future.

“Dictionaries were a lifetime’s work but now technology, including this cutting-edge tool, will bring together language research and computation to make it so much easier,” explains Professor Brezina.

“This is the future of lexicography – automated and semi-automated processes which will radically alter the way we produce dictionaries, grammar books, and teaching materials.

“Also, not only can we accurately describe the uses and meanings of words, using #LancsBox X we can also easily visualise word associations and the connections between words, allowing us to directly witness the intricate structure of language. As the saying suggests, a picture can sometimes be worth a thousand words.”

Back to News