home page projects publications sci-fi genealogy miscellaneous
Research themes: My research interests are in the area of corpus linguistics (and related subjects within natural language processing and language engineering) and the application of corpus-based methods in word frequency dictionaries, semantic analysis, information extraction, systems engineering and decision management.

Funded research projects

As investigator:
2008-11 Isis: Protecting children in online social networks, EPSRC, Co-investigator (with Prof. Awais Rashid, Dr. Daniel Hughes, Dr. James Walkerdine, Prof. Geoff Coulson, Prof. Corine May-Chahal, Lancaster University; Dr. Matt Jones, Swansea University and Dr. Penny Duquenoy, Middlesex University), 437,162.
2007 Variability in child language: A feasibility and pilot study on the exploitation of the Child Language Survey. Lancaster University small grant scheme, Co-investigator (with Kate Cain and Katie Alcock, Psychology; Andrew Hardie and Sebastian Hoffmann, Linguistics), 9,719.
2005-07 Changing English Across the Twentieth Century: a corpus-based study (Lancaster1901), Leverhulme Trust, Principal Investigator (with G. Leech of Lancaster University & Martin Wynne of Oxford University), 82,111.
2005-07 Automated semantic assistance for translators (ASSIST), EPSRC, Co-investigator (with R. Garside of Lancaster University and Tony Hartley of University of Leeds), 185,161.
2006 Workshop on Historical Text Mining, AHRC ICT Methods Network, Co-organiser (with D. Archer, University of Central Lancashire), 4,996.
2006 Building an English-Chinese Domain-Comparable Corpus, British Academy, Principal Investigator (with Q. Yuan of China Centre for Information Industry Development, Beijing, P.R. China), 7,485.
2005 Scragg revisited: a quantitative investigation of spelling variation across the centuries, British Academy, Acting Co-investigator (with D. Archer of the University of Central Lancashire), 7,321.
2004-05 Unlocking the Word Hoard, Andrew W. Mellon Foundation, Co-applicant (with M. Mueller, Northwestern University, USA), $212, 761.
2002-05 Benedict: The New Intelligent Dictionary, European Commission, IST-2001-34237 (with T. McEnery of Lancaster University, Kielikone Oy, Harper Collins Publishers Ltd, Gummerus Kustannus Oy, University of Tampere, Nokia), 402,693.
2004 Collection of the Corpus of Professional English (COPE), Daiwa Anglo-Japanese Foundation, Co-applicant (with Y. Tono of Meikai University, Tokyo), 2000.
2003-04 Extending CLAWS, Lancaster University Small Grant, Co-applicant (with R. Garside, Computing), 5000.
2003-04 Development and Validation of a Linguistic Corpus for Content Analysis of Entrepreneurship / Small-Business Documents, Co-applicant (with F. Cave, Management School), Lancaster University Small Grant, 4000.

2004-5 Corpus of Professional English: a Professional English Research Consortium project. (Shogakukan Inc., Tokyo, Japan) This work involved collection, PDF conversion, clean-up and annotation.

As researcher:
Towards an Online Conceptual Database of the Latin Vulgate Bible
Scragg revisited
Tracker (project manager) (May 2001-October 2004)
Benedict (March 2002-2005)
REVERE (May 1998-April 2001)
DEADA (March 1997-April 1998)
Vweb (August 1996-February 1997))

Professional Activities

  1. Production editor of the Corpora Journal published by Edinburgh University Press.
  2. Production editor of the ICAME Journal published with the University of Bergen, Norway.
  3. Member of Advisory board for ICAME (International Computer Archive of Modern and Medieval English).
  4. Professional memberships: IEEE, IEEE Computer Society, ACL (Association for Computational Linguistics), ALLC (Association for Literary & Linguistic Computing)
  5. Co-editor (with Mark Davies, Brigham Young University) Routledge Frequency Dictionaries book series.
  6. Co-organiser of: (a) 1st Corpus Linguistics conference CL2001 (March 2001, Lancaster, UK), (b) 2nd Corpus Linguistics conference CL2003 (March 2003, Lancaster, UK), (c) 3rd Corpus Linguistics conference CL2005 (July 2005, Birmingham, UK), (d) Digital Resources for the Humanities conference (DRH2005) (September 2005, Lancaster, UK), (e) EACL06 workshop on Multiword expressions in a multilingual context (April 2006, Trento, Italy), (f) LREC06 workshop on Language Resources for Translation (LR4Trans-III) (May 2006, Genoa, Italy), (g) Workshop on Chinese Multi-word expressions and MT (June 2006, Beijing, P.R. China) (h) Linguistics Expert Seminar for AHDS E-Science Scoping Study (Acting as chair, July 2006, London, UK) (i) Workshop on Historical Text Mining (July 2006, Lancaster, UK), (j) 4th Corpus Linguistics conference CL2007 (July 2007, Birmingham, UK), (k) Workshop on Corpus Linguistics & Machine Translation Applications (August 12-13 2008, CCID, Beijing, P.R. China) (l) eLexicography in the 21st century (22-24 October, 2009, Louvain-la-Neuve, Belgium) (m) 30th Annual Conference of the International Computer Archive for Modern and Medieval English ICAME30 (May 2008, Lancaster, UK) (n) 5th Corpus Linguistics conference CL2009 (July 2009, Liverpool, UK).
  7. Member of programme (or scientific) committee for (a) International Symposium on Learner Corpora in Asia (March 2004, Tokyo, Japan), (b) 1st International Workshop on Natural Language Understanding and Cognitive Science (NLUCS2004) collocated with ICEIS 2004 (April 2004, Porto, Portugal), (c) 2nd International Workshop on Natural Language Understanding and Cognitive Science (NLUCS2005) collocated with ICEIS 2005 (May 2005, Miami, USA), (d) Phraseology2005 (Louvain-la-Neuve, Belgium, October 2005), (e) TALN 2006 (Leuven, Belgium, April 2006), (f) 3rd International Workshop on Natural Language Understanding and Cognitive Science (NLUCS2006) collocated with ICEIS 2006 (May 2006, Paphos, Cyprus), (g) Association for Computational Linguistics (ACL07) poster/demo session (Prague, Czech Republic, June 2007) (h) 4th International Workshop on Natural Language Processing and Cognitive Science (NLPCS-2007) collocated with ICEIS 2007 (June 2007, Madeira, Portugal), (i) Corpus Linguistics 2007 Colloquium "Towards a reference corpus of web genres" (July 2007, Birmingham, UK) (k) Language Resources and Evaluation Conference, LREC2008 (May 2008, Marrakech, Morocco) (l) International symposium on Using Corpora in Contrastive and Translation Studies (September 2008, Zhejiang University, China).
  8. Proposal Reviewer for ESRC and AHRC research councils in the UK.
  9. Reviewer for the following journals: Computer Speech and Language, Language Resources and Evaluation, IEEE Transactions on Professional Communication.
  10. External examiner for (a) MSc (by research) at University of Leeds, UK (June 2004), (b) PhD at University of Leeds, UK (June 2005).