UCREL CRS and Data Science Group are hosting a joint talk next week. Stéphan Tulkens, from University of Antwerp, will be joining us to present a paper entitled Unsupervised Word Sense Disambiguation and Concept Extraction in Clinical and Biomedical Documents


The automated analysis of clinical text presents us with a set of unique challenges. First, clinical text differs from conventional textual domains, diminishing the performance of off-the-shelf resources which are not specifically developed with this domain in mind. As an example, clinical text is usually written in a relatively informal style, featuring abbreviations and idiomatic language use which might be specific to a given caregiver or hospital. Second, training data is sparse due to privacy constraints, which also makes it difficult to reuse annotated data and datasets in different projects, hampering progress.

In this talk, I will describe research on the extraction and disambiguation of concepts from patient notes and biomedical texts using unsupervised methods based on distributional semantics.

Specifically, we've shown that domain-specific distributional semantic vectors, when appropriately composed into higher-order context vectors, provide us with a sufficiently powerful instrument to be able to distinguish between multiple highly related senses in a biomedical Word Sense Disambiguation (WSD) task. Our unsupervised method obtains comparable performance to knowledge-based and supervised methods on the same task.

Additionally, we apply similar distributional semantic methods to concept extraction on free clinical text. On this task, our approach is outperformed by supervised concept extraction on the same dataset, but significantly outperforms other unsupervised concept extraction methods.

The talk will be held on Thursday 7th December from 12:00 to 1:00pm, in Charles Carter A17.


UCREL Corpus Research Seminar Add to my calendar

Back to listing