home page projects publications sci-fi genealogy miscellaneous
Publications: Journal articles Authored books Edited collections Conferences and workshops Book chapters Ph.D. Misc. Presentations

Some of these papers are available in draft form as PDF,
you'll need the free Acrobat Reader from Adobe.

This page is no longer updated - please visit my publications page on the university research portal.

Journal articles

  1. Baron, A., Rayson, P. and Archer, D. (forthcoming). Word frequency and key word statistics in historical corpus linguistics. International Journal of English Studies.
  2. Rayson, P. (2008). From key words to key semantic domains. International Journal of Corpus Linguistics. 13:4 pp. 519-549. DOI: 10.1075/ijcl.13.4.06ray
  3. Koller, V., Hardie, A., Rayson, P. and E. Semino (2008) Using a semantic annotation tool for the analysis of metaphor in discourse. Metaphorik.de 15 http://www.metaphorik.de/15/ ISSN 1618-2006
  4. Smith, N., Hoffmann, S. and Rayson, P. (2008). Corpus Tools and Methods, Today and Tomorrow: Incorporating Linguists' Manual Annotations. Literary and Linguistic Computing, 23 (2), pp. 163-180. doi: 10.1093/llc/fqn004
  5. Gacitua, R., Sawyer, P., Rayson, P. (2008). A flexible framework to experiment with ontology learning techniques. In Knowledge-Based Systems, 21, 3, April 2008, pp. 192-199. DOI: 10.1016/j.knosys.2007.11.009
  6. Pilz, T., Ernst-Gerlach, A., Kempken, S., Rayson, P. and Archer, D. (2008) The identification of spelling variants in English and German historical texts: manual or automatic? Literary and Linguistic Computing, 23, 1, pp. 65-72. doi:10.1093/llc/fqm044
  7. Walkerdine, J., Hughes, D., Rayson, P., Simms, J., Gilleade, K., Mariani, J. and Sommerville, I. (2008) A framework for P2P application development, Computer Communications 31, pp. 387-401. doi: 10.1016/j.comcom.2007.08.004
  8. Beal, J., Corrigan, K., Smith, N. and Rayson, P. (2007) Writing the Vernacular: Transcribing and Tagging the Newcastle Electronic Corpus of Tyneside English. Studies in Variation, Contacts and Change in English. Volume 1. Research Unit for Variation, Contacts and Change in English (VARIENG), University of Helsinki. http://www.helsinki.fi/varieng/journal/volumes/01/beal_et_al/
  9. Sampaio, A., Rashid, A., Chitchyan, R. and Rayson, P. (2007) EA-Miner: Towards Automation in Aspect-Oriented Requirements Engineering. Transactions on AOSD III, LNCS 4620, Springer-Verlag, Berlin Heidelberg, pp. 4-39.
  10. Smith, N. and Rayson, P. (2007). Recent change and variation in the British English use of the progressive passive. ICAME Journal 31, pp. 107-137.
  11. Sawyer, P., Rayson, P. and Cosh, K. (2005) Shallow Knowledge as an Aid to Deep Understanding in Early Phase Requirements Engineering. IEEE Transactions on Software Engineering. Volume 31, number 11, November, 2005, pp. 969 - 981. ISSN 0098-5589.
    doi: http://doi.ieeecomputersociety.org/10.1109/TSE.2005.129
  12. Piao, S., Rayson, P., Archer, D., McEnery, T. (2005) Comparing and combining a semantic tagger and a statistical tool for MWE extraction. Computer Speech and Language, (Special issue on Multiword expressions), Volume 19, issue 4, pp. 378 - 397, Elsevier. doi:10.1016/j.csl.2004.11.002
  13. Ramduny-Ellis, D., Dix, A., Rayson, P., Onditi, V., Sommerville, I., Ransom, J. (2005) Artefacts as designed, Artefacts as used: resources for uncovering activity dynamics. Cognition Technology and Work, Volume 7, number 2, pp. 76-87, Springer. doi:10.1007/s10111-005-0179-1
  14. Sawyer, P., Rayson, P., and Garside, R. (2002) REVERE: support for requirements synthesis from documents. Information Systems Frontiers Journal. Volume 4, Issue 3, Kluwer, Netherlands, pp. 343 - 353. ISSN 1387-3326. PDF version
  15. I. Sommerville, T. Rodden, P. Rayson, A. Kirby, and A. Dix1 (1998) Supporting information evolution on the WWW. World Wide Web Journal, Volume 1, Number 1. Baltzer Science, Netherlands. pp. 45-54.
    Note: this journal is now published by Kluwer (abstract)
  16. Rayson, P., and Garside, R. (1998). The CLAWS Web Tagger. ICAME Journal, no. 22. The HIT-centre - Norwegian Computing Centre for the Humanities, Bergen, pp. 121-123. PDF version
  17. Rayson, P., Leech, G., and Hodges, M. (1997). Social differentiation in the use of English vocabulary: some analyses of the conversational component of the British National Corpus. International Journal of Corpus Linguistics. Volume 2, number 1. pp 133 - 152. John Benjamins, Amsterdam/Philadelphia. ISSN 1384-6655. (abstract and full version)

Authored books

Companion website book cover
  1. Xiao, R., P. Rayson & T. McEnery (2009) A Frequency Dictionary of Mandarin Chinese: Core vocabulary for learners. London: Routledge. ISBN: 978-0-415-45586-2.
  2. Leech, G., Rayson, P., and Wilson, A. (2001). Word Frequencies in Written and Spoken English: based on the British National Corpus. Longman, London. ISBN 0582-32007-0
    Note: see the companion website for more details

Edited collections

  1. Davies, M., Rayson, P., Hunston, S. and Danielsson, P. (eds.) (2007). Proceedings of the Corpus Linguistics Conference (CL2007). University of Birmingham, 27-30 July 2007.
  2. Andrew Wilson, Dawn Archer and Paul Rayson (eds.) (2006) Corpus linguistics around the world. Rodopi, Amsterdam, pp. 233. ISBN 90-420-1836-4 (Appears in the series Language and Computers Number 56).
  3. Wilson, A., Rayson, P. and McEnery, T. (eds.) (2003) Corpus Linguistics by the Lune: a festschrift for Geoffrey Leech. Peter Lang, Frankfurt. (Volume 8 in the Lodz studies in Language Series edited by Lewandowska-Tomaszczyk, B. and Melia, P. J.) ISBN 3-631-50952-2. 305 pp. (contents)
  4. Dawn Archer, Paul Rayson, Andrew Wilson and Tony McEnery (eds.) (2003) Proceedings of the Corpus Linguistics 2003 conference. UCREL technical paper number 16. UCREL, Lancaster University. ISBN 1 86220 131 5.
  5. Wilson, Rayson, McEnery (2003) Wilson, A., Rayson, P. and McEnery, T. (eds.) (2003) A Rainbow of Corpora: Corpus Linguistics and the Languages of the World. Lincom-Europa, München. ISBN 3 89586 872 8. Linguistics Edition 40. 174 pp. (contents)
  6. Rayson P., Wilson A., McEnery T., Hardie A. and Khoja S. (eds.) (2001). UCREL Technical Papers Volume 13 Special Issue. Proceedings of the Corpus Linguistics 2001 conference. Linguistics Department, Lancaster University. ISBN 1 86220 107 2. (contentsPDF version)

Conferences and workshops

  1. Alves, V., Schwanninger, C., Barbosa, L., Rashid, A., Sawyer, P., Rayson, P., Pohl, C., Rummler, A. (2008). An Exploratory Study of Information Retrieval Techniques in Domain Analysis In proceedings of 12th International Software Product Line Conference 2008, Limerick, Ireland, 8-12 September 2008.
  2. Hughes, D., Rayson, P., Walkerdine, J., Lee, K., Greenwood, P., Rashid, A., May-Chahal, C., Brennan, M. (2008) Supporting law enforcement in digital communities through natural language analysis. In proceedings of the 2nd International Workshop on Computational Forensics (IWCF 2008), Washington DC, USA, August 7-8, 2008. Lecture Notes in Computer Science 5158, pp. 122-134. http://dx.doi.org/10.1007/978-3-540-85303-9_12
  3. Rayson, P. (2008). New trends in corpus linguistics for translation studies. In proceedings of the CCID & Lancaster University Workshop on Corpus Linguistics & Machine Translation Applications, August 12-13 2008, CCID, Beijing, China.
  4. Rayson, P., Xu,X., Xiao, J., Wong, A., Yuan, Q. (2008). Quantitative analysis of translation revision: contrastive corpus research on native English and Chinese translationese. In proceedings of XVIII FIT World Congress, Shanghai, China, August 4-7, 2008.
  5. Rayson, P. and Archer, D. (2008). Key domain analysis: mining text in the humanities and social sciences. In proceedings of Workshop on Text Mining Applications in the Social Sciences in conjunction with the 4th International Conference on e-Social Science. 18 June 2008, Manchester, UK.
  6. Hardie, A., Koller, V., Rayson, P., Semino, E. (2008) Computer-assisted metaphor analysis using key semantic domains. Researching and Applying Metaphor conference (RaAM 7), Caceres, Spain, 29-31 May 2008.
  7. Baron, A. and Rayson, P. (2008). VARD2: A tool for dealing with spelling variation in historical corpora. In proceedings of the Postgraduate Conference in Corpus Linguistics, Aston University, Birmingham, 22nd May 2008. PDF version
  8. Pooley, N., Alcock, K., Cain, K., Hardie, A., Hoffmann, S. and Rayson, P. (2008) Variability in child language. Poster presented at ICAME 2008 Conference, May 14-18, Ascona, Switzerland. PDF version
  9. Pooley, N., Cain, K., Hardie, A., Rayson, P., Hoffmann, S. and Alcock, K. (2008) The Child Language Survey. Exploring children's written language production: a developmental and historical perspective. Poster presented at ESRC Seminar Reading Comprehension: from Theory to Practice. January 10-11 2008, Lancaster University.
  10. Gacitua, R., Sawyer, P., Rayson, P. (2007). A flexible framework to experiment with ontology learning techniques. In Research and Development in Intelligent Systems XXIV. Proceedings of AI-2007, the Twenty-seventh SGAI International Conference on Innovative Techniques and Applications of Artificial Intelligence, Springer, London, pp. 153-166. DOI: 10.1007/978-1-84800-094-0_12
  11. Gacitua, R., Sawyer, P., Piao, S. and Rayson, P. (2007). Ontology acquisition process: A framework for experimenting with different NLP techniques. In proceedings of the UK e-Science All Hands Meeting 2007, Nottingham, UK, 10th-13th September 2007, pp. 561-567. ISBN 978-0-9553988-3-4. PDF version
  12. Smith, N., Hoffmann, S. and Rayson, P. (2007) Corpus tools and methods today and tomorrow: Incorporating user-defined annotations. In proceedings of Corpus Linguistics 2007, July 27-30, University of Birmingham, UK.
  13. Rayson, P., Archer, D., Baron, A., Culpeper, J. and Smith, N. (2007). Tagging the Bard: Evaluating the accuracy of a modern POS tagger on Early Modern English corpora. In proceedings of Corpus Linguistics 2007, July 27-30, University of Birmingham, UK. PDF version
  14. Hardie, A., Koller, V., Rayson, P. and Semino, E. (2007) Exploiting a semantic annotation tool for metaphor analysis. In M. Davies, P. Rayson, S. Hunston and P. Danielsson (eds) Proceedings of Corpus Linguistics 2007, July 27-30, University of Birmingham, UK. PDF version
  15. Chitchyan, R., Rashid, A., Rayson, P. and Waters, R. (2007). Semantics-Based composition for aspect-oriented requirements engineering. In Proceedings of the 6th international Conference on Aspect-Oriented Software Development (Vancouver, British Columbia, Canada, March 12 - 16, 2007). AOSD '07, vol. 208. ACM Press, New York, NY, pp. 36-48. DOI http://doi.acm.org/10.1145/1218563.1218569
  16. Rayson, P., Archer, D., Baron, A. and Smith, N. (2007). Tagging historical corpora - the problem of spelling variation. In proceedings of Digital Historical Corpora, Dagstuhl-Seminar 06491, International Conference and Research Center for Computer Science, Schloss Dagstuhl, Wadern, Germany, December 3rd-8th 2006. PDF version (http://drops.dagstuhl.de/opus/volltexte/2007/1055/) ISSN 1862-4405.
  17. Doherty, N., Lockett, N., Rayson, P. and Riley, S. (2006). Electronic-CRM: a simple sales tool or facilitator of relationship marketing? 29th Institute for Small Business & Entrepreneurship Conference. International Entrepreneurship - from local to global enterprise creation and development. 31 October - 2 November 2006, Cardiff-Caerdydd, UK.
  18. Mudraya, O., Babych, B., Piao, S., Rayson, P., Wilson, A. (2006). Developing a Russian semantic tagger for automatic semantic annotation. In proceedings of Corpus Linguistics 2006, St. Petersburg, Russia, 10-14 October 2006, pp. 282-289 (in Russian), pp. 290-297 (in English). PDF version PDF version (slides)
  19. Smith, N., Leech, G. and Rayson, P. (2006) The expression of obligation and necessity in British English across the twentieth century: developments in matching corpora. 14th International Conference on English Historical Linguistics (14 ICEHL), Bergamo, Italy, 21-25 August 2006.
  20. Piao, S. L., Rayson, P., Mudraya, O., Wilson, A. and Garside, R. (2006) Measuring MWE compositionality using semantic annotation. In proceedings of COLING/ACL workshop on Multiword Expressions: Identifying and Exploiting Underlying Properties, July 23, 2006, Sydney, Australia. PDF version (Download data for human ratings)
  21. Dawn Archer, Andrea Ernst-Gerlach, Sebastian Kempken, Thomas Pilz, Paul Rayson (2006). The identification of spelling variants in English and German historical texts: manual or automatic? In proceedings of Digital Humanities 2006, The Sorbonne, Centre Cultures Anglophones et Technologies de l'Information, Paris, France, July 5 - 9, 2006, pp. 3 - 5.
  22. Chitchyan, R., Sampaio, A., Rashid, A. and Rayson, P. (2006). Evaluating EA-Miner: Are Early Aspect Mining Techniques Effective? In proceedings of Towards Evaluation of Aspect Mining (TEAM 2006). Workshop Co-located with ECOOP 2006, European Conference on Object-Oriented Programming, 20th edition, July 3-7, Nantes, France, pp. 5-8. PDF version
  23. Rayson, P. and Smith, N. (2006) The key domain method for the study of language varieties. The Third Inter-Varietal Applied Corpus Studies (IVACS) group International Conference on "LANGUAGE AT THE INTERFACE". University of Nottingham, UK, 23-24 June 2006. PDF version
  24. Smith, N., Leech, G. and Rayson, P. (2006) Exploring grammatical change across the twentieth century: A backward step permits further advance. 27th conference of the International Computer Archive of Modern and Medieval English (ICAME) University of Helsinki, Finland, 24-28 May, 2006.
  25. Joan Beal, Karen Corrigan, Paul Rayson and Nicholas Smith (2006) Writing the Vernacular: Transcribing and Tagging the Newcastle Electronic Corpus of Tyneside English (NECTE). Pre-conference workshop on corpus annotation, ICAME-27, University of Helsinki, Finland, 24 May 2006.
  26. Chitchyan, R., Sampaio, A., Rashid, A. and Rayson, P. (2006). A tool suite for aspect-oriented requirements engineering. In proceedings of Early Aspects at ICSE: Workshop in Aspect-Oriented Requirements Engineering and Architecture Design. In conjunction with the 2006 International Conference on Software Engineering, Shanghai, China, May 21, 2006. ACM Press, New York, NY, pp. 19-26. DOI http://doi.acm.org/10.1145/1137639.1137644
  27. Rayson, P., Walkerdine, J., Fletcher, W.H. and Kilgarriff, A. (2006) Annotated Web as corpus. In proceedings of the 2nd Web as Corpus Workshop held in conjunction with the 11th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2006), Trento, Italy, April 3, 2006, pp. 27 - 33. PDF version
  28. Piao, S.L., Sun, G., Rayson, P. and Yuan, Q. (2006) Automatic extraction of Chinese multiword expressions with a statistical tool. In proceedings of the Workshop on Multi-word-expressions in a Multilingual Context held in conjunction with the 11th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2006), Trento, Italy, April 3, 2006, pp. 17-24. PDF version
  29. Sharoff, S., Babych, B., Rayson, P., Mudraya, P. and Piao, S. (2006) ASSIST: Automated Semantic Assistance for Translators. In companion proceedings to the 11th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2006), Trento, Italy, April 3-7, 2006, pp. 139 - 142. ISBN 1-932432-60-4. PDF version
  30. Sampaio, A., Chitchyan, R., Rashid, A. and Rayson, P. (2005). EA-Miner: a tool for automating aspect-oriented requirements identification. In Proceedings of the 20th IEEE/ACM International Conference on Automated Software Engineering, Long Beach, California, USA, November 7-11, 2005, ASE '05. ACM Press, New York, NY, pp. 352-355. DOI http://doi.acm.org/10.1145/1101908.1101967
  31. Mudraya, O., Piao, S.L., Löfberg, L.5, Rayson, P., Archer, D.10 (2005). English-Russian-Finnish cross-language comparison of phrasal verb translation equivalents. In Cosme, C., Gouverneur, C., Meunier, F., & Paquot, M. (eds.), Proceedings of the Phraseology 2005 Conference, Lovain-la-Neuve, Belgium, 13-15 October 2005, pp. 277-281. PDF version
  32. Archer, D., Culpeper, J. and Rayson, P. (2005) Love - a familiar or a devil? An exploration of key domains in Shakespeare’s Comedies and Tragedies. Presented at the AHRC ICT Methods Network Expert Seminar on Linguistics. Lancaster University, 8 September 2005. PDF version
  33. Américo Sampaio, Awais Rashid and Paul Rayson (2005). Early-AIM: An Approach for Identifying Aspects in Requirements. Poster in proceedings of the 13th IEEE International Requirements Engineering Conference. August 29th - September 2nd 2005, Paris, France, pp. 487-488. DOI: 10.1109/RE.2005.24 PDF version
  34. Smith, N., Rayson, P., Leech, G., and Wynne, M.11 (2005). Changing English across the twentieth century: enhancements to an existing family of corpora. Poster presented at the Digital Resources for the Humanities conference (DRH 2005), Lancaster University, UK. (abstract)
  35. Mudraya, O., Rayson, P., Cave, F., and Whitehouse, O. (2005) The Development of a Corpus of Entrepreneurship/Small Business. Presented as a poster at the Corpus Linguistics 2005 conference, July 14-17, Birmingham, UK.
  36. Rayson, P., Tono, Y., Morita, Y., Hoshino, M., Nakamura, T., Aizawa, H., Watanabe, R. (2005) Building a Corpus of Professional English. Presented as a poster at the Corpus Linguistics 2005 conference, July 14-17, Birmingham, UK.
  37. Rayson, P., Archer, D.10 and Smith, N. (2005) VARD versus Word: A comparison of the UCREL variant detector and modern spell checkers on English historical corpora. In proceedings of the Corpus Linguistics 2005 conference, July 14-17, Birmingham, UK. Proceedings from the Corpus Linguistics Conference Series on-line e-journal, Vol. 1, no. 1, ISSN 1747-9398. PDF version
  38. Laura Löfberg5, Scott Piao, Paul Rayson, Jukka-Pekka Juntunen6, Asko Nykänen6, and Krista Varantola5 (2005) A semantic tagger for the Finnish language. In proceedings of the Corpus Linguistics 2005 conference, July 14-17, Birmingham, UK. Proceedings from the Corpus Linguistics Conference Series on-line e-journal, Vol. 1, no. 1, ISSN 1747-9398. PDF version
  39. Scott S.L. Piao, Dawn Archer, Olga Mudraya, Paul Rayson, Roger Garside, Tony McEnery, Andrew Wilson (2005) A Large Semantic Lexicon for Corpus Annotation. In proceedings of the Corpus Linguistics 2005 conference, July 14-17, Birmingham, UK. Proceedings from the Corpus Linguistics Conference Series on-line e-journal, Vol. 1, no. 1, ISSN 1747-9398. PDF version
  40. Dawn Archer, Jonathan Culpeper and Paul Rayson (2005) Love - a familiar or a devil? An exploration of key domains in Shakespeare’s Comedies and Tragedies. Presented as part of the Keyword Extraction in Information Retrieval panel at the ACH/ALLC Conference June 15 - 18, 2005, Victoria, BC, Canada.
  41. Américo Sampaio, Neil Loughran, Awais Rashid and Paul Rayson (2005). Mining Aspects in Requirements. Presented at Early Aspects 2005: Aspect-Oriented Requirements Engineering and Architecture Design Workshop. March 15, 2005, Chicago, Illinois, USA. PDF version
  42. Magali Paquot2, Sylviane Granger2, Paul Rayson and Cédrick Fairon2 (2004) Extraction of multi-word units from EFL and native English corpora: The phraseology of the verb 'make'. Presented at Europhras, European Society of Phraseology, 26-29 August 2004, Basel, Switzerland.
  43. Walkerdine, J. and Rayson, P. (2004) P2P-4-DL: Digital Library over Peer-to-Peer. In Caronni G., Weiler N., Shahmehri N. (eds.) Proceedings of Fourth IEEE International Conference on Peer-to-Peer Computing (PSP2004) 25-27 August 2004, Zurich, Switzerland. IEEE Computer Society Press, pp. 264-265. ISBN 0-7695-2156-8. PDF version DOI: http://dx.doi.org/10.1109/PTP.2004.1334957
  44. Archer, D. and Rayson, P. (2004) Using an historical semantic tagger as a diagnostic tool for variation in spelling. Presented at Thirteenth International Conference on English Historical Linguistics (ICEHL 13) University of Vienna, Austria 23-29 August, 2004. PDF version
  45. Archer, D., Rayson, P., Piao, S., McEnery, T. (2004). Comparing the UCREL Semantic Annotation Scheme with Lexicographical Taxonomies. In Williams G. and Vessier S. (eds.) Proceedings of the 11th EURALEX (European Association for Lexicography) International Congress (Euralex 2004), Lorient, France, 6-10 July 2004. Université de Bretagne Sud. Volume III, pp. 817-827. ISBN 2-9522-4570-3. PDF version
  46. Löfberg L5, Juntunen J-P6, Nykanen A6, Varantola K5, Rayson P, Archer D. (2004). Using a semantic tagger as dictionary search tool. In Williams G. and Vessier S. (eds.) Proceedings of the 11th EURALEX (European Association for Lexicography) International Congress (Euralex 2004), Lorient, France, 6-10 July 2004. Université de Bretagne Sud. Volume I, pp. 127-134. ISBN 2-9522-4570-3. PDF version
  47. Jones, M.8, Rayson, P. and Leech, G. (2004) Key category analysis of a spoken corpus for EAP. Presented at The 2nd Inter-Varietal Applied Corpus Studies (IVACS) International Conference on "Analyzing Discourse in Context" The Graduate School of Education, Queen’s University, Belfast, Northern Ireland, 25 - 26 June, 2004. PDF version
  48. Onditi V. O., Rayson P., Ransom B., Ramduny D., Sommerville I., Dix A. (2004). Language Resources and Tools for Supporting The System Engineering Process. In Farid Meziane and Elisabeth Métais (eds.) Proceedings of 9th International Conference on Applications of Natural Language to Information Systems (NLDB 2004), Salford, UK, June 2004. LNCS 3136. Springer-Verlag, Berlin Heidelberg, pp. 147 - 158. ISBN 3-540-22564-1.
  49. Rayson, P., Archer, D., Piao, S. L., McEnery, T. (2004). The UCREL semantic analysis system. In proceedings of the workshop on Beyond Named Entity Recognition Semantic labelling for NLP tasks in association with 4th International Conference on Language Resources and Evaluation (LREC 2004), 25th May 2004, Lisbon, Portugal, pp. 7-12. PDF version
  50. Piao, Scott S. L., Paul Rayson, Dawn Archer, Tony McEnery (2004). Evaluating Lexical Resources for A Semantic Tagger. In proceedings of 4th International Conference on Language Resources and Evaluation (LREC 2004), 26-28 May 2004, Lisbon, Portugal, Volume II, pp. 499-502. ISBN 2-9517408-1-6. PDF version
  51. Rayson P., Berridge D. and Francis B. (2004). Extending the Cochran rule for the comparison of word frequencies between corpora. In Volume II of Purnelle G., Fairon C., Dister A. (eds.) Le poids des mots: Proceedings of the 7th International Conference on Statistical analysis of textual data (JADT 2004), Louvain-la-Neuve, Belgium, March 10-12, 2004, Presses universitaires de Louvain, pp. 926 - 936. ISBN 2-930344-50-4. PDF version
  52. S. Sharoff, P. Rayson, O. Mudraya, A. Wilson and T. McEnery (2004). A tool for assisting translators using automatic semantic annotation. Presented (by Serge Sharoff) at Corpus Use and Learning to Translate (CULT-BCN) Barcelona, January 22nd-24th 2004.
  53. Piao, S. L., Rayson, P., Archer, D., Wilson, A. and McEnery, T. (2003). Extracting Multiword Expressions with a Semantic Tagger. In proceedings of the Workshop on Multiword Expressions: Analysis, Acquisition and Treatment, at ACL 2003, 41st Annual Meeting of the Association for Computational Linguistics, Sapporo, Japan, July 12, 2003, pp. 49-56. PDF version
  54. Dawn Archer, Tony McEnery, Paul Rayson, Andrew Hardie (2003). Developing an automated semantic analysis system for Early Modern English. In Dawn Archer, Paul Rayson, Andrew Wilson and Tony McEnery (eds.) Proceedings of the Corpus Linguistics 2003 conference. UCREL technical paper number 16. UCREL, Lancaster University, pp. 22 - 31. PDF version
  55. Laura Löfberg5, Dawn Archer, Scott Piao, Paul Rayson, Tony McEnery, Krista Varantola5, Jukka-Pekka Juntunen6 (2003). Porting an English semantic tagger to the Finnish language. In Dawn Archer, Paul Rayson, Andrew Wilson and Tony McEnery (eds.) Proceedings of the Corpus Linguistics 2003 conference. UCREL technical paper number 16. UCREL, Lancaster University, pp. 457 - 464. PDF version
  56. Dix A., Ramduny D., Rayson P., Onditi V., Sommerville I. and Mackenzie A. (2003). Finding Decisions Through Artefacts. In Julie Jacko and Constantine Stephanidis (eds.) Human-Computer Interaction, Theory and practice (Part I). Volume 1 of the Proceedings of Human Computer Interaction International 2003. Crete, Greece, June 22-27, 2003. Lawrence Erlbaum Associates, New Jersey, pp. 78-82. ISBN 0-8058-4930-0 (abstract PDF version)
  57. Simon Lock, Jen Allanson, and Paul Rayson (2003). Personality Engineering for Emotional Interactive Avatars. In Constantine Stephanidis and Julie Jacko (eds.) Human-Computer Interaction, Theory and practice (Part II). Volume 2 of the Proceedings of Human Computer Interaction International 2003. Crete, Greece, June 22-27, 2003.. Lawrence Erlbaum Associates, New Jersey, pp. 503-507. ISBN 0-8058-4931-9
  58. Paul Rayson, Bernadette Sharp1, Albert Alderson1, John Cartmell1, Caroline Chibelushi1, Rodney Clarke1, Alan Dix, Victor Onditi, Amanda Quek1, Devina Ramduny, Andy Salter1, Hanifa Shah1, Ian Sommerville, Phil Windridge1 (2003). Tracker: a framework to support reducing rework through decision management. In Proceedings of 5th International Conference on Enterprise Information Systems ICEIS2003. Angers - France, April 23-26, 2003. Volume 2, pp. 344 - 351. ISBN 972-98816-1-8. PDF version
  59. Rayson, P., Garside, R., and Sawyer, P. (2000). Assisting requirements engineering with semantic document analysis. In Proceedings of Content-based multimedia information access RIAO 2000 (Recherche d'Informations Assistie par Ordinateur, Computer-Assisted Information Retrieval) International Conference, College de France, Paris, France, April 12-14, 2000. C.I.D., Paris, pp. 1363 - 1371. ISBN 2-905450-07-X
    [Also appears as CSEG Technical Report CSEG/6/00] PDF version
  60. Rayson, P., Emmet, L.3, Garside, R., and Sawyer, P. (2000). The REVERE Project: Experiments with the application of probabilistic NLP to Systems Engineering. In proceedings of 5th International Conference on Applications of Natural Language to Information Systems (NLDB'2000). Versailles, France, June 28-30th, 2000.
    [Also appears in Springer LNCS 1959. ] PDF version
  61. Rayson, P. and Garside, R. (2000). Comparing corpora using frequency profiling. In proceedings of the workshop on Comparing Corpora, held in conjunction with the 38th annual meeting of the Association for Computational Linguistics (ACL 2000). 1-8 October 2000, Hong Kong, pp. 1 - 6. PDF version
  62. Rayson, P., Garside, R., and Sawyer, P. (1999). Recovering Legacy Requirements. In Proceedings of REFSQ'99 Fifth International Workshop on Requirements Engineering: Foundations of Software Quality, June 14-15 1999, Heidelberg, Germany. Published by University of Namur, pp. 49-54. ISBN 2 87037 307 4.
    [Also appears as CSEG Technical Report CSEG/5/99] PDF version Postscript version
  63. Andy Kirby, Paul Rayson, Tom Rodden, Ian Sommerville and Alan Dix1 (1997). Versioning the web. In Reidar Conradi (ed.) Software configuration management supplementary proceedings., 7th International Workshop, SCM7, Boston, USA, May 18-19, 1997. Dept. Computer and Information Science, Norwegian University of Science and Technology, N-7034 Trondheim, Norway. pp 163-173.
    [Also appears as CSEG Technical Report CSEG/21/1996] (abstract) Postscript version
  64. Rayson, P., and Wilson, A. (1996). The ACAMRIT semantic tagging system: progress report. In L. J. Evett, and T. G. Rose (eds) Language Engineering for Document Analysis and Recognition, LEDAR, AISB96 Workshop proceedings, pp 13-20. Brighton, England. Faculty of Engineering and Computing, Nottingham Trent University, UK. ISBN 0 905 488628 PDF version

Book sections and chapters

  1. Archer, D., Culpeper, J. and Rayson, P. (forthcoming) Love - a familiar or a devil? An exploration of key domains in Shakespeare’ Comedies and Tragedies. In Archer, D. (ed.) What's in a word-list? Investigating word frequency and keyword extraction. Ashgate.
  2. Rayson, P. and Stevenson, M. (forthcoming) Sense and semantic tagging, chapter 27 in Lüdeling, A. and Kytö, M. Corpus Linguistics. An international handbook (Handbooks of Linguistics and Communication Science Series), Mouton de Gruyter, Berlin.
  3. Archer, D. and Rayson, P. (forthcoming) Using the UCREL automated semantic analysis system to investigate differing concerns in refugee literature. In M. Deegan, L. Hunyadi, & H. Short (eds.) The Keyword Project: Unlocking Content Through Computational Linguistics. Office for Humanities Communication Publications (18).
  4. Mudraya, O., Piao, S.L., Rayson, P., Sharoff, S., Babych, B. and Löfberg, L. (2008). Automatic Extraction of Translation Equivalents of Phrasal and Light Verbs in English and Russian. In Granger, S. and Meunier, F. (eds.) Phraseology : an interdisciplinary perspective. Benjamins, Amsterdam, pp. 293-309.
  5. Sylviane Granger2, Magali Paquot2, and Paul Rayson (2006) Extraction of multiword units from EFL and native English corpora. The phraseology of the verb 'make'. In Phraseology in Motion 1: Methoden und Kritik. Proceedings of Europhras Basel 2004. European Society of Phraseology, Schneider Verlag Hohengehren, Erlangen, pp. 57-68.
  6. Barbara Lewandowska-Tomaszczyk4, Michael Oakes7 & Paul Rayson (2003). Annotated Corpora for Assistance with English-Polish Translation. In Wilson, A., Rayson, P. and McEnery, T. (eds.) Corpus Linguistics by the Lune: a festschrift for Geoffrey Leech. Peter Lang, Frankfurt, pp. 107 - 118. ISBN 3-631-50952-2
  7. Rayson, P., Wilson, A. and Leech, G. (2002) Grammatical word class variation within the British National Corpus sampler. In Peters, P., Collins, P., and Smith, A. (eds.) New frontiers of corpus research: Papers from the Twenty First International Conference on English Language Research on Computerized Corpora, Sydney 2000. Rodopi, Amsterdam, pp. 295 - 306. ISBN 90-420-1237-4 PDF version
  8. Rayson, P., Emmet, L.3, Garside, R., and Sawyer, P. (2001). The REVERE Project: Experiments with the application of probabilistic NLP to Systems Engineering. In Bouzeghoub, M., Kedad, Z., and Métais, E. (eds.) Natural Language Processing and Information Systems. 5th International Conference on Applications of Natural Language to Information Systems (NLDB'2000). Versailles, France, June 2000. Revised papers. LNCS 1959. Springer-Verlag, Berlin Heidelberg, pp. 288 - 300. ISBN 3-540-41943-8. http://dx.doi.org/10.1007/3-540-45399-7_24 PDF version
  9. Rayson, P., Garside, R., and Sawyer, P. (2000). Assisting Requirements Recovery from Legacy Documents. In Henderson, P. (ed.) Systems Engineering for Business Process Change: collected papers from the EPSRC research programme. Springer-Verlag, London, pp. 251 - 263. ISBN 1-85233-2220
    [Also appears as CSEG Technical Report CSEG/8/00] PDF version
  10. Granger, S.2, and Rayson, P. (1998). Automatic profiling of learner texts. In S. Granger (ed.) Learner English on Computer. Longman, London and New York, pp. 119-131.
  11. Fligelstone, S., Pacey, M., and Rayson, P. (1997). How to generalise the task of annotation. In. R. Garside, G. Leech, and A. McEnery (eds.) Corpus Annotation: Linguistic Information from Computer Text Corpora. Longman, London. pp 122 - 136. PDF version
  12. Garside, R., and Rayson, P. (1997). Higher-level annotation tools. In. R. Garside, G. Leech, and A. McEnery (eds.) Corpus Annotation: Linguistic Information from Computer Text Corpora. Longman, London. pp 179 - 193.
  13. McEnery, A., and Rayson, P. (1997). A Corpus/annotation toolbox. In. R. Garside, G. Leech, and A. McEnery (eds.) Corpus Annotation: Linguistic Information from Computer Text Corpora. Longman, London. pp 194 - 208.
  14. Fligelstone, S., Rayson, P., and Smith, N. (1996). Template analysis: bridging the gap between grammar and the lexicon. In J. Thomas, and M. Short (eds), Using corpora for language research: Studies in the Honour of Geoffrey Leech. pp 181-207. Longman, London.
  15. Wilson, A. and Rayson, P. (1993). Automatic Content Analysis of Spoken Discourse. In: C. Souter and E. Atwell (eds), Corpus Based Computational Linguistics. Amsterdam: Rodopi. pp215-226 text

Ph.D.

Rayson, P. (2003). Matrix: A statistical method and software tool for linguistic analysis through corpus comparison. Ph.D. thesis, Lancaster University. (further information on Wmatrix software) (abstract full text: PDF version Postscript version)

Misc.

  1. Rayson, P. (2006) AHRC e-Science Scoping Study Final report: Findings of the Expert Seminar for Linguistics. AHRC e-Science Scoping Study (eSSS) project report. PDF version
  2. Marilyn Deegan9, Harold Short9, Dawn Archer, Paul Baker, Tony McEnery, Paul Rayson (2004) Computational Linguistics Meets Metadata, or the Automatic Extraction of Key Words from Full Text Content. RLG Diginews, Vol. 8, No. 2, Apr 15, 2004. ISSN 1093-5371. PDF version
  3. Dawn Archer, Andrew Wilson, Paul Rayson (2002). Introduction to the USAS category system. Benedict project report, October 2002. PDF version
  4. Alan Dix, Devina Ramduny, Paul Rayson, Victor Onditi, Ian Sommerville and Adrian Mackenzie (2002) Artefacts speak and artefacts to speak. Position paper for "Analyzing Collaborative Activity: Representing Field Research for Understanding Collaboration" - CSCW 2002 workshop (html PDF version )
  5. Alan Dix, Devina Ramduny, Paul Rayson, Ian Sommerville. (2001) Artefact-centred analysis - transect and archaeological approaches (work in progress). Team-Ethno Online, Issue 1 - Field(work) of Dreams, November 2001, Lancaster University, UK. (html)
  6. Rayson, P., Garside, R., and Sawyer, P. (1999). Language engineering for the recovery of requirements from legacy documents. REVERE project report, Lancaster University, May 1999.
    [Also appears as CSEG Technical Report CSEG/6/99] ( abstract PDF version Postscript version)
  7. Garside, R.G., Leech, G.N., Thomas, J.A., Wilson, A., and Rayson, P. (1996). Final Report on EPSRC/DTI-Project 'Automatic Content Analysis of Corpora of Spoken Discourse'.
  8. Rayson, P., and Wilson, A. (1996) ACAMRIT: Automatic Content Analysis of Market Research Interview Transcripts. Technical Manual.
  9. Rayson, P. (1994). MATRIX: Automatic linking of key relations in tagged text. ACAMRIT project Technical Report CS-3. Lancaster University, Computing Department.
  10. Garside, R., McEnery, A., and Rayson, P. (1993). Argument Frame Extraction and Term Clustering from an English-French Bilingual Aligned Corpus. ET10-63 Working Paper. Postscript version PDF version
  11. Garside, R.G., Leech, G.N., Thomas, J.A., Wilson, A., and Rayson, P. (1992). Final Report on SERC-DTI Project Nr. GR-F/36385 IED4-1-1143, 'The Automatic Content Analysis of Spoken Discourse'.

Notes 1School of Computing, Staffordshire University, UK, ST18 0DG.
2Université catholique de Louvain, Belgium.
3Adelard, Coborn House, 3 Coborn Road, London UK. E3 2DA. loe@adelard.co.uk
4University of Lodz, Poland.
5University of Tampere, Finland.
6Kielikone Oy, Finland.
7University of Sunderland, UK.
8University of Nottingham, UK.
9King's College London, UK.
10University of Central Lancashire, UK.
11University of Oxford, UK.

Presentations

  1. Rayson, P. (2008). Hands-on session for Wmatrix: bring your own corpus! Invited workshop at International Seminar "New Trends In Corpus Linguistics For Language Teaching And Translation Studies. In Honour Of John Sinclair", University of Granada, Spain, 23rd September 2008.
  2. Rayson, P. (2008). Corpus annotation, keyness statistics and multiword expression extraction. Invited workshop for PhD students at University of Birmingham, UK, 9th September 2008.
  3. Rayson, P. (2008). Identification of multiword expressions with Wmatrix: why I don't like n-grams. Invited talk at Formulaic Language Research Network (FLaRN) 2008: Third International Postgraduate Conference, June 19-20th, 2008, University of Nottingham, UK.
  4. Rayson, P. and Archer, D. (2007). Corpus annotation and retrieval: an introduction. Invited talk at Text Mining for Historians: AHRC ICT Methods Network Workshop. University of Glasgow: 17-18 July, 2007.
  5. Rayson, P. (2007) Video concordancing: challenges and opportunities. Invited talk at the Workshop on multi-modal communication, ESRC e-Social Science Digital Record node programme, Centre for Research in Applied Linguistics (CRAL), University of Nottingham, 9th May 2007.
  6. Rayson, P. (2007). Travelling through time with corpus annotation software. Invited talk at the 10th Practical Applications in Language and Computers (PALC 2007) conference. 19-22 April 2007, Lodz University, Poland. PDF version PDF version
  7. Semino, E. and Rayson, P. (2006) Corpus techniques for metaphor analysis. Presented at the Open training workshop, MetNet: Metaphor Analysis. The Open University, Milton Keynes, 19-20 September 2006.
  8. Archer, D. and Rayson, P. (2006). Teaching a computer to read Shakespeare - the problem of spelling variation. Invited talk at the OED Forum, Oxford University Press and Kellogg College, Oxford, 21 June 2006.
  9. Rayson, P. (2006) Falling foul of multiword expressions. Presented at the Workshop on Chinese Multi-word expressions and MT. China Centre for Information Industry Development (CCID), Beijing, P.R. China. June 9th 2006.
  10. Rayson, P. (2006) Moving from key words to key domains. Invited talk at the Chinese Academy of Social Sciences, Beijing, P.R. China. June 8th, 2006.
  11. Rayson, P. (2006) Automated semantic assistance for human translators. Invited talk at the Department of Chinese, Translation and Linguistics, City University of Hong Kong. June 5th, 2006.
  12. Dawn Archer and Paul Rayson (2006) Dealing with variation: spelling. Invited talk at SCOTS Symposium on Linguistic Variation and Electronic Projects. University of Glasgow, 28th April 2006.
  13. Paul Rayson (2005). Keywords are not enough. Presented at the joint 26th ICAME and 6th AAACL conference, Ann Arbor, Michigan May 12-15, 2005.
  14. Paul Rayson (2005) Right from the word go: identifying multi-word-expressions for semantic tagging. Invited talk at BAAL Corpus Linguistics SIG / OTA Workshop: Identifying and Researching Multi-Word Units. Thursday 21st April 2005, Oxford University Computing Services. (PDF versionslides)
  15. Paul Rayson (2004). Keywords are not enough. Invited talk for JAECS (Japan Association for English Corpus Studies) at Chuo University, Tokyo, Japan, 27th November 2004. (PDF versionslides)
  16. Paul Rayson, Scott Piao, Dawn Archer (2004). Modern and Historical Aspects of the UCREL Semantic Analysis System. Invited talk at the University of Sheffield, UK, 16th November 2004. (PDF versionslides)
  17. Dawn Archer and Paul Rayson (2004). Using the UCREL automated semantic to investigate refugee literature. Invited talk at the Keywords workshop. February 5-6th, 2004, King's College London.
  18. Paul Rayson (2003). USAS: UCREL semantic analysis system. Invited talk at the Prague workshop on lexico-semantic classification and tagging. December 8-9th, 2003, Center for Computational Linguistics, Charles University.
  19. Paul Rayson (2003). UCREL: from LOB to FLOB and beyond. Invited talk at Northwestern University, Evanston, Chicago, USA. September 2003.
  20. Paul Rayson (2002). USAS: UCREL semantic analysis system. Invited talk at Daito Bunka University, Tokyo, Japan. February 2002. (HTML slides)
  21. Barbara Lewandowska-Tomaszczyk, Michael Oakes & Paul Rayson (2001). Annotated Corpora for Assistance with English-Polish Translation. Paper presented at Corpus Linguistics 2001, Lancaster University, UK, March 30-April 2, 2001. (PDF abstract)
  22. Rayson, P. (2001). Wmatrix: a web-based corpus processing environment. Software demonstration presented at ICAME 2001 conference, Université catholique de Louvain, Belgium. May 16-20, 2001. (PDF handout)
  23. Rayson, P., Garside, R., and Sawyer, P. (2000). Assisting requirements engineering with semantic document analysis. Poster presented at RIAO 2000 (Recherche d'Informations Assistie par Ordinateur, Computer-Assisted Information Retrieval) International Conference, Paris, France, April 12-14, 2000. (PNG version)
  24. Rayson, P., Leech, G., and Wilson, A. (2000). Large numbers and indexing the British National Corpus. Poster presented at ICAME 2000, Sydney, Australia, April 21-25, 2000. (PNG poster, Powerpoint slides).
  25. Rayson, P., Emmet, L., Garside, R., and Sawyer, P. (2000). The REVERE Project: Experiments with the application of probabilistic NLP to Systems Engineering. Presented at 5th International Conference on Applications of Natural Language to Information Systems (NLDB'2000). Versailles, France, June 28-30th, 2000. (Powerpoint slides)
  26. Lee, D. and Rayson, P. (2000). Xkwic on Linux: a powerful concordancer for research. Workshop presentation at TALC2000. (Fourth International Conference on Teaching and Language Corpora. English Department, University of Graz, Austria July 19-23, 2000) (Powerpoint slides, HandoutPostscript version)
  27. Rayson, P. and Garside, R. (2000). Comparing corpora using frequency profiling. Presented at Comparing Corpora workshop, held in conjunction with the 38th annual meeting of the Association for Computational Linguistics (ACL 2000). 7th October 2000. Hong Kong University of Science and Technology (HKUST). (Powerpoint slides)
  28. Rayson, P. (1999). REVERE project. Corpus Research Group, Linguistics Department, Lancaster University, May 1999.
  29. Rayson, P., Garside, R., and Sawyer, P. (1999). Recovering Legacy Requirements. REFSQ'99 Fifth International Workshop on Requirements Engineering: Foundations of Software Quality, June 14-15 1999, Heidelberg, Germany. (local html version of slides)
  30. Rayson, P. (1999). UCREL: from LOB to REVERE. CSEG Away Day, Windemere, November 1999. (local html version of slides)
  31. Rayson, P. (1998). Xmatrix corpus retrieval program. Corpus Research Group, Linguistics Department, Lancaster University, November 1998.
  32. Rayson, P. (1998). The REVERE project. CSEG Away Day, Windemere, November 1998.
  33. Rayson, P. (1998). The REVERE project. Fourth SEBPC workshop on legacy systems Lincoln, University of Lincolnshire and Humberside, December 1998.
  34. Granger, S., and Rayson, P. (1997). Automatic profiling of learner texts. ICAME97 conference, Chester, 21-25 May 1997. (abstract)
  35. Leech, G., and Rayson, P. (1997). Social differentiation in the use of English vocabulary: some analyses of the conversational component of the British National Corpus. ICAME97 conference, Chester, 21-25 May 1997. (abstract)
  36. Rayson, P. (1997). The DEADA project. CSEG Away Day, Windemere, November 1997.
  37. Rayson, P., and Wilson, A. (1996). The ACAMRIT Semantic Tagging System: Progress Report. AISB Workshop on Language Engineering for Document Analysis and Recognition (LEDAR), Brighton, April 1996.
  38. Rayson, P., and Wilson, A. (1996). Anaphora in Market Research Interview Transcripts. Poster presentation at Discourse Anaphora and Anaphor Resolution Colloquium (DAARC96). Lancaster University, 17-18th July, 1996.
  39. Rayson, P. (1995). The ACAMRIT semantic tagging system. Corpus Research Group, Linguistics Department, Lancaster University. December 1995.
  40. Rayson, P. (1992). Automatic Content Analysis of Spoken Discourse. Invited talk, University of Limerick, Ireland. May 1992.
  41. Wilson, A. and Rayson, P. (1991). The Automatic Content Analysis of Spoken Discourse. 12th International Conference on English Language Research on Computerized Corpora (ICAME), Ilkley, May 1991.

Abstracts


Matrix: A statistical method and software tool for linguistic analysis through corpus comparison

Paul Rayson
Computing Department, Lancaster University

This thesis reports the development of a new kind of method and tool (Matrix) for advancing the statistical analysis of electronic corpora of linguistic data. First, we describe the standard corpus linguistic methodology, which is hypothesis-driven. The standard research process model is 'question - build - annotate - retrieve - interpret', in other words, identifying the research question (and the linguistic features) early in the study. In recent years corpora have been increasingly annotated with linguistic information. From our survey, we find that no tools are available which are data-driven on annotated corpora, in other words, a tool which assists in finding candidate research questions. However, Matrix is such a tool. It allows the macroscopic analysis (the study of the characteristics of whole texts or varieties of language) to inform the microscopic level (focussing on the use of a particular linguistic feature) as to which linguistic features should be investigated further. By integrating part-of-speech tagging and lexical semantic tagging in a profiling tool, the Matrix technique extends the keywords procedure to produce key grammatical categories and key concepts. It has been shown to be applicable in the comparison of UK 2001 general election manifestos of the Labour and Liberal Democratic parties, vocabulary studies in sociolinguistics, studies of language learners, information extraction and content analysis. Currently, it has been tested on restricted levels of annotation and only on English language data.

Ph.D. was examined on Friday 20th December 2002 by Tony McEnery, Mike Scott and Eric Atwell. Their recommendation was that I be awarded the degree of Ph.D. forthwith.


Social Differentiation in the use of English Vocabulary: some analyses of the Conversational Component of the British National Corpus

Paul Rayson, Geoffrey Leech and Mary Hodges
UCREL (University Centre for Computer Corpus Research on Language)
Lancaster University,
Lancaster LA1 4YT,
United Kingdom.

In this article we undertake selective quantitative analyses of the demographically-sampled spoken English component of the British National Corpus (for brevity, referred to here as the Conversational Corpus). This is a subcorpus of c.4.5 million words, in which speakers and respondents are identified by such factors as gender, age, social group and geographical region. Using a corpus analysis tool developed at Lancaster University, we undertake a comparison of the vocabulary of speakers, highlighting those differences which are marked by a very high chi-squared value of difference between different sectors of the corpus according gender, age and social group. A fourth variable, that of geographical region of the United Kingdom, is not investigated in this article, although it remains a promising subject for future research. (As background we also briefly examine differences between spoken and written material in the British National Corpus (BNC).) This study is illustrative of the potentiality of the Conversational Corpus for future corpus-based research on social differentiation in the use of language. There are evident limitations, including (a) the reliance on vocabulary frequency lists, and (b) the simplicity of the transcription system employed for the spoken part of the BNC. The conclusion of the article considers future advances in the research paradigm illustrated here.

Keywords: British National Corpus, spoken English vocabulary frequency, chi-squared test


Automatic Profiling of Learner Texts

Sylviane Granger, Université Catholique de Louvain;
Paul Rayson, University of Lancaster

One way of characterising a language variety is by drawing up a word category profile. This method has been used in previous studies to bring out the distinctive features of various registers of English: learned and scientific English, American vs British English and spoken vs written English, etc. In this paper the method is applied to a corpus of advanced EFL argumentative writing extracted from the International Corpus of Learner English (ICLE) database. Using a lexical and grammatical frequency program we draw up a profile of EFL writing and compare it to that of comparable native speaker writing taken from the Louvain Corpus of Native English Essays (LOCNESS).

In a second stage, the main patterns of over- and underuse of word categories and lemmas displayed by EFL writing are interpreted stylistically in the light of the results of variability studies. Most of the distinctive features point in the same direction: they highlight the speech-like nature of learner writing. A comparison of our results with those of a corpus of writing by 15 year old native speakers of English (Shimazumi and Berber Sardinha 1996) suggests that this stylistic defiency may be common to all novice writers, whether native or non-native.

In the final section, we highlight the value of automatic profiling for ELT materials design and survey some of the avenues for future research opened up by this innovative technique.


Social differentiation in the use of English vocabulary: Some analyses of the conversational component of the British National Corpus

Geoffrey Leech and Paul Rayson
University of Lancaster

We report on a selective quantitative analysis of the demographically-sampled spoken English component of the British National Corpus (BNC). This is a subcorpus of c.4.5 million words, in which speakers are identified by such factors as gender, age, and social group. Using a corpus analysis tool for extraction and statistical analysis of data, we undertook a comparison of the vocabulary of speakers, highlighting those differences which are marked by a very high chi-squared value of difference between different groups of speakers in the corpus. As background we also briefly examined the differences between spoken and written material in the BNC. In spite of its limitations (especially reliance on word-frequency lists), this study illustrates the potentiality of the of the spoken part of the BNC for future corpus-based research on social differentiation in the use of language.


Versioning the Web

Andy Kirby, Paul Rayson, Tom Rodden, Ian Sommerville and Alan Dix +
Collaborative Systems Engineering Group, Computing, Department, Lancaster University, Lancaster, LA1 4YR, UK
+ School of Computing, Staffordshire University, Stafford, ST18 0DG, UK
email: is@comp.lancs.ac.uk

Abstract Currently the display of web pages centres upon the presentation of a single instance of each page. As the web evolves to become a long-term information store (e.g. with the increasing use of Intranets), there will be a need to provide mechanisms to manage versions of web pages. Users must be able to access predictable information (e.g. the last version which they accessed) and be able to know what page versions are available and the attributes of these versions. This is a report of some initial work in this area where we have explored some of the issues of versioning the web. We have devised a simple system which allows single web pages to be converted to a version set in a transparent way (i.e. all links continue to work). This system (V-Web) maintains and provides public and private access to a set of versions for web pages and includes facilities for both page readers and page authors.


Supporting Information Evolution on the WWW

I. Sommerville, T. Rodden, P. Rayson, A. Kirby and A. Dix +
Collaborative Systems Engineering Group, Computing, Department, Lancaster University, Lancaster, LA1 4YR, UK
email: {is, tom, paul, ak}@comp.lancs.ac.uk
+ School of Computing, Staffordshire University, Stafford, ST18 0DG, UK
email: cmtajd@sable.soc.staffs.ac.uk

Abstract

This paper describes a simple versioning system which we have developed to support collaborative authoring on the WWW. The system provides facilities for versioning pages and allowing readers access to different versions of web pages. We discuss why such a system is needed and present the versioning model which we have developed. This model allows existing web pages to be converted to versioned entities without affecting links to those pages. Central to the model, is a set of access control facilities which allows authors to provide co-authors with access to versions of pages under development. We describe the instantiation of this model and assess our work against the requirements identified in the first part of the paper.


Rayson, P., Garside, R., and Sawyer, P.

Language engineering for the recovery of requirements from legacy documents.
REVERE project report, Lancaster University, May 1999.

Legacy documents, such as requirements documents or manuals of business procedures, can sometimes offer an important resource for informing what features of legacy software are redundant, need to be retained or can be reused. This situation is particularly acute where business change has resulted in the dissipation of human knowledge through staff turnover or redeployment. Exploiting legacy documents poses formidable problems, however, since they are often incomplete, poorly structured, poorly maintained and voluminous. This report proposes that language engineering using tools that exploit probabilistic natural language processing (NLP) techniques offer the potential to ease these problems. Such tools are available, mature and have been proven in other domains. The document provides a review of NLP and a discussion of the components of probabilistic NLP techniques and their potential for requirements recovery from legacy documents. The report concludes with a summary of the preliminary results of the adaptation and application of these techniques in the REVERE project.