....L
E R - B I M L.....L E R - B I M L.....L E R - B I M L.....L E R - B I M L.....L
E R - B I M L.....L E R - B I M L.....
The
Corpus
Corpus release and download details
Scottish Gaelic corpus
A beta version of the Scottish Gaelic corpus can now be downloaded here.
Corpus contents:
- conversation.txt - an informal conversation
- lecture.txt - a university lecture on philosophy
- sermon.txt - a sermon from a Church of Scotland communion service
- service.txt - a second sermon
- talk.txt - an informal educational/historical/religious talk
All files are encoded in UTF-8 format.
Welsh corpus
A beta version of the Welsh corpus can now be downloaded here.
Corpus contents:
- cathedral.txt - sermon from cathedral eucharist
- chapel.txt - sermon from chapel service
- chat-1.txt - television talk/magazine show
- chat-2.txt - television talk/magazine show
- demog.txt - informal domestic conversation
- dentist-1.txt - dental appointment
- dentist-2.txt - dental appointment
- football.txt - football magazine show
- rugby.txt - rugby commentary
- school.txt - school history lesson
All files are encoded in UTF-8 format.
A part-of-speech tagged version of the Welsh corpus, and resources for Welsh part-of-speech tagging with
the Brill Tagger, can be downloaded here.
A set of Welsh resources for tagging with Oliver Mason's QTag part-of-speech tagger can be downloaded here.