|Skip Links | Access/General | Site Map|
|Faculty of Arts and Social Sciences
|You are here: Home >|
Subscribe to News and Events
UCREL CRS: The compilation and annotation of the Reference Corpus of Contemporary Portuguese
Date: 28 November 2013 Time: 2.00-3.00 pm
Venue: FASS Meeting Room 1
In this talk, I will present the Reference Corpus of Contemporary Portuguese, which has been developed at the Centre for Linguistics at the University of Lisbon (CLUL) for more than two decades. This is an electronically based linguistic corpus of written and spoken materials, with a total of 311 million tokens, covering different varieties of Portuguese in the world. The CRPC is now available for online queries through the CQPWeb interface.
After briefly reporting on the processes and tools involved for the automatic annotation of the corpus with lemmas, PoS and NP chunks, I will focus on our annotation scheme for modality. Modality is usually defined as the expression of the speaker's opinion and of his attitude towards the proposition (Palmer, 1986). It traditionally covers epistemic modality, which is related to the degree of commitment of the speaker to the truth of the proposition, but also deontic modality, capacity and volition, a.o. Modality detection is therefore also clearly linked to the current trend in NLP on sentiment analysis and opinion mining. I will report on a corpus sample of approximately 2000 sentences fully annotated with modal values, which provides us with insights in the distribution of the types of modality and the validity of our annotation scheme. This manually annotated corpus was recently used as training data for the automatic tagging of modality in Portuguese, with promising results.
Event website: http://ucrel.lancs.ac.uk/crs/presentation.php?id=49
Who can attend: Anyone
Organising departments and research centres: Linguistics and English Language
|| Home | Departments | People | Study Here | Research | Business and Enterprise | News and Events |
- FASS Intranet -