Andrew Hardie

 

A list of my publications

 

Title links either go to an online version of the text, or to further publication details if the text is not available online.

 


 

Books

 

McEnery, T and Hardie A (forthcoming) Foundations of Corpus Linguistics.

 

McEnery, T, Hardie, A and Younis, N (eds) (forthcoming) Arabic Corpus Linguistics.

 

McEnery, T and Hardie, A (2012) Corpus Linguistics: Method, Theory and Practice. Cambridge: Cambridge University Press.

 

Baker, P, Hardie A and McEnery, T (2006) A Glossary of Corpus Linguistics. Edinburgh: Edinburgh University Press.

 

 

Journal articles

 

Hu, X, Xiao, R and Hardie, A (forthcoming) How do English translations differ from native English writings? A multi-feature statistical model for linguistic variation analysis.

 

Semino, E, Demjén, Z, Demmen, J, Koller, V, Payne, S, Hardie, A, Rayson, P (forthcoming) The online use of ‘Violence’ and ‘Journey’ metaphors by cancer patients, as compared with health professionals: a mixed methods study.

 

Demmen, J, Semino, E, Demjén, Z, Koller, V, Hardie, A, Rayson, P and Payne, S (forthcoming) A computer-assisted study of the use of Violence metaphors for cancer and end of life by patients, family carers and health professionals.

 

Hardie, A and Ibrahim, WMA (forthcoming) Accessible corpus annotation for Arabic.

 

Gregory, I, Baron, A, Murrieta-Flores, P, Hardie, A and Rayson, P. (forthcoming). Geographical Text Analysis: Using GIS to map and analyse large volumes of text.

 

Murrieta-Flores, P, Baron, A, Gregory, I, Hardie, A and Rayson, P (in press) Automatically analysing large texts in a GIS environment: The Registrar General’s reports and cholera in the nineteenth century. Transactions in GIS.

 

Hardie, A (2014) Modest XML for Corpora: Not a standard, but a suggestion. ICAME Journal 38: 73-103.

 

Hardie, A (2012) CQPweb – combining power, flexibility and usability in a corpus analysis tool. International Journal of Corpus Linguistics 17 (3): 380-409. [alternative link]

 

Hardie, A, Lohani, R and Yadava, YP (2011) Extending corpus annotation of Nepali: advances in tokenisation and lemmatisation. Himalayan Linguistics 10 (1): 151-165.

 

Gregory, I and Hardie, A (2011) Visual GISting: Bringing together corpus linguistics and Geographical Information Systems. Literary and Linguistic Computing 26 (3): 297-314.

 

Hardie, A and McEnery, T (2010) On two traditions in corpus linguistics, and what they have in common. International Journal of Corpus Linguistics 15 (3): 384-394.

 

Hardie, A and Mudraya, O (2009) Collocational patterning in cross-linguistic perspective: adpositions in English, Nepali, and Russian. Arena Romanistica 4: 138-149.

 

Dunning, A., Gregory, I. and Hardie, A. (2009) Freeing up digital content: new research means new licenses. Serials 22 (2): 166-173.

 

Prentice, S and Hardie, A (2009) Empowerment and disempowerment in the Glencairn uprising: a corpus-based critical analysis of Early Modern English news discourse. Journal of Historical Pragmatics 10(1): 23-55.

 

Yadava, Y.P., Hardie, A., Lohani R.R., Regmi B.N., Gurung, S., Gurung, A., McEnery, T., Allwood, J., and Hall, P. (2008). Construction and annotation of a corpus of contemporary Nepali. Corpora 3(2): 213-225.

 

Koller, V., Hardie, A., Rayson, P. and E. Semino (2008) Using a semantic annotation tool for the analysis of metaphor in discourse. Metaphorik.de 15. http://www.metaphorik.de/15/

 

Hardie, A (2008) A collocation-based approach to Nepali postpositions. Corpus Linguistics and Linguistic Theory 4(1): 19-62.

 

Hardie, A (2007) Part-of-speech ratios in English corpora. International Journal of Corpus Linguistics 12(1): 55-81.

 

Hardie, A (2007) From legacy encodings to Unicode: the graphical and logical principles in the scripts of South Asia. Language Resources and Evaluation 41(1): 1-25.

 

Baker, P, Hardie, A, McEnery, T, Xiao, R, Bontcheva, K, Cunningham, H, Gaizauskas, R, Hamza, O, Maynard, D, Tablan, V, Ursu, C, Jayaram, BD and Leisher, M (2004) Corpus linguistics and South Asian languages: corpus creation and tool development. Literary and Linguistic Computing 19(4): 509-524.

 

Hardie, A and McEnery, T (2003) The were-subjunctive in British rural dialects: marrying corpus and questionnaire data. Computers and the Humanities 37(2): 205-228.

 

 

Chapters in edited volumes

 

Hardie, A (in press) Corpus linguistics. In Allan, K (ed.) The Routledge Handbook of Linguistics. Routledge.

 

McEnery, T and Hardie, A (in press) Neo-Firthian corpus linguistics. In: Waugh, LR, Joseph, JE and Monville-Burston, M (eds) The Cambridge History of Linguistics. Cambridge: Cambridge University Press.

 

Gregory, I, Cooper, D, Hardie, A and Rayson, P (in press). Spatializing and analyzing digital texts: Corpora, GIS and places. In: Bodenhamer, D, Corrigan, J and Harris, TM (eds) Deep Maps and Spatial Narratives. Bloomington: Indiana University Press.

 

Gregory, I, Baron, A, Cooper, D, Hardie, A, Murrieta-Flores, P and Rayson, P (2014) Crossing boundaries: Using GIS in literary studies, history and beyond. In: Hueber, J and Mendes da Silva, A (eds) Keys for architectural history research in the digital era. Institut national d’histoire de l’art Actes de colloques. http://inha.revues.org/4931 .

 

Hardie, A (2014) XML encoding for spoken learner (and other) corpora: a modest approach. In: Ishikawa, S (ed.) Learner corpus studies in Asia and the world. Vol. 2. Papers from LCSAW2014, pp. 49-62. Kobe, Japan: School of Languages and Communication, Kobe University.

 

McEnery, T and Hardie, A (2013) The history of corpus linguistics. In: Allan, K (ed.) The Oxford Handbook of the History of Linguistics. Oxford University Press, pp. 727-746.

 

Hardie, A, McEnery, T, and Piao, SS (2010) A corpus-based approach to text reuse in the newsbooks of the Commonwealth. In: Dooley, B (ed.) The Dissemination of News and the Emergence of Contemporaneity in Early Modern Europe, pp. 251-286. Ashgate.

 

Hardie, A, Lohani, RR, Regmi, BR, and Yadava, YP (2009)  A morphosyntactic categorisation scheme for the automated analysis of Nepali. In: Singh, R. (ed.) Annual Review of South Asian Languages and Linguistics 2009, pp. 171-196. Mouton de Gruyter.

 

Hardie, A and McEnery, T (2009) Corpus linguistics and historical contexts: text reuse and the expression of bias in early modern English journalism. In: Bowen, R, Mobärg, M and Ohlander, S (eds) (2009) Corpora and discourse – and stuff: papers in honour of Karin Aijmer, pp. 59-92. Gothenburg Studies in English 96. Göteborg: Acta Universitatis Gothoburgensis.

 

Hardie, A (2009) First language acquisition. In: Culpeper, J., Katamba, F. Kerswill, P., Wodak, R. and McEnery, T. (eds.) English Language: Description, Variation and Context, pp. 609-624. Houndmills: Palgrave.

 

Hardie, A (2009) Corpus linguistics and the languages of South Asia: some current research directions. In: Baker, P (ed.) Contemporary Approaches to Corpus Linguistics, pp. 262-288. Continuum.

 

Hardie, A, Baker, P, McEnery, T and Jayaram, BD (2006) Corpus-building for South Asian languages. In: Saxene, A and Borin, L (eds.) Lesser-known languages in South Asia: Status and Policies, Case Studies and Applications of Information Technology, pp. 211-242. Mouton de Gruyter.

 

Hardie, A and McEnery, T (2006) Statistics. In: Brown, K (ed.) Encyclopaedia of Language and Linguistics, 2nd edition, vol. 12: 138-146. Oxford: Elsevier.

 

Hardie, A (2005) Automated part-of-speech analysis of Urdu: conceptual and technical issues. In: Yadava, Y, Bhattarai, G, Lohani, RR, Prasain, B and Parajuli, K (eds.) Contemporary issues in Nepalese linguistics, pp. 48-72. Kathmandu: Linguistic Society of Nepal.

 

Hardie, A, Levin, E and Pęzik, P (2005) Analiza morfologiczno-składniowa korpusów (“Part-of-speech tagging”). In: Lewandowska-Tomaszczyk, B (ed.) Podstawy językoznawsta korpusowego (“Foundations of Corpus Linguistics”). Łódź, Poland: Wydawnictwo Uniwersytetu Łódzkiego.

 

McEnery, T, Baker, JP and Hardie, A (2000a) Swearing and abuse in modern British English. In: Lewandowska-Tomaszczyk, B and Melia, PJ (eds.) PALC ’99: Practical Applications in Language Corpora, pp. 37-48. Peter Lang.

 

McEnery, T, Baker, JP and Hardie, A (2000b) Assessing claims about language use with corpus data – swearing and abuse. In: Kirk, J (ed.) Corpora Galore. Amsterdam: Rodopi. Reprinted in: Sampson, G and McCarthy, D (eds.) (2004) Corpus linguistics: readings in a widening discipline, pp. 45-55. London and New York: Continuum International.

 

 

Papers in peer-reviewed conference proceedings

 

Rupp, CJ, Rayson, P, Baron, A, Donaldson, C, Gregory, I, Hardie, A and Murrieta-Flores, P (2013) Customising geoparsing and georeferencing for historical texts. In: Proceedings of the 2013 IEEE International Conference on Big Data, pp. 59-62. [alternative link]

 

Michard, A, Guillaume, S, Hardie, A and Todam M (2012) Combining documentation and research: Ongoing work on an endangered language. In Xiong, D et al. (eds.), Proceedings of IALP 2012 (2012 International Conference on Asian Language Processing), pp. 169-172. Hanoi, Vietnam: MICA Institute, Hanoi University of Science and Technology. [alternative link]

 

Evert, S and Hardie, A (2011) Twenty-first century Corpus Workbench: Updating a query architecture for the new millennium. In: Proceedings of the Corpus Linguistics 2011 conference. University of Birmingham, UK.

 

Hardie, A (2007) Collocational properties of adpositions in Nepali and English. In: Proceedings of the Corpus Linguistics 2007 conference.

 

Hardie, A, Koller, V, Rayson, P and Semino, E (2007) Exploiting a semantic annotation tool for metaphor analysis. In: Proceedings of the Corpus Linguistics 2007 conference.

 

Semino, E, Koller, V, Hardie, A and Rayson, P (2005) A computer-assisted approach to the analysis of metaphor variation across genres. In: Barnden, J, Lee, M, Littlemore, J, Moon, R, Philip, G and Wallington, A (eds.) Corpus-based approaches to figurative language: a Corpus Linguistics 2005 colloquium, pp. 145-154. Birmingham: University of Birmingham Cognitive Science Research Papers.

 

Xiao, Z, McEnery, T, Baker, P and Hardie, A (2004) Developing Asian language corpora: standards and practice. In: Proceedings of the 4th Workshop on Asian Language Resources, Sanya, China.

 

Hardie, A (2003) Developing a tagset for automated part-of-speech tagging in Urdu. In: Archer, D, Rayson, P, Wilson, A, and McEnery, T (eds.) (2003) Proceedings of the Corpus Linguistics 2003 conference. UCREL Technical Papers Volume 16. Department of Linguistics, Lancaster University.

 

Archer, D, McEnery, T, Rayson, P and Hardie, A (2003) Developing an automated semantic analysis system for Early Modern English. In: Archer, D, Rayson, P, Wilson, A, and McEnery, T (eds.) (2003) Proceedings of the Corpus Linguistics 2003 conference. UCREL Technical Papers Volume 16. Department of Linguistics, Lancaster University.

 

Baker, P, Hardie, A, McEnery, T and Jayaram, BD (2003) Constructing corpora of South Asian languages. In: Archer, D, Rayson, P, Wilson, A, and McEnery, T (eds.) (2003) Proceedings of the Corpus Linguistics 2003 conference. UCREL Technical Papers Volume 16. Department of Linguistics, Lancaster University.

 

Baker, P, Hardie, A, McEnery, T and Jayaram, BD (2003) Corpus data for South Asian language processing. In: Proceedings of the EACL Workshop on South Asian Languages, Budapest.

 

Baker, P, Hardie, A, McEnery, T, Cunningham, H and Gaizauskas, R (2002) EMILLE, a 67-million word corpus of Indic languages: data collection, markup and harmonisation. In: Proceedings of LREC 2002.

 

 

Book reviews

 

Hardie, A (2013) Review of: Vander Viana, Sonia Zyngier and Geoff Barnbrook (eds.). 2011. Perspectives on Corpus Linguistics. Amsterdam and Philadelphia: John Benjamins. ICAME Journal 37: 266-271.

 

Hardie, A (2005) Review of: Lars Borin (ed). 2002. Parallel corpora, parallel worlds. Selected papers from a symposium on parallel and comparable corpora at Uppsala University, Sweden, 22–23 April, 1999. Amsterdam: Rodopi. Languages in Contrast 5(2): 291-296.

 

 

Edited conference proceedings

 

Hardie, A and Love, R (eds.) (2013) Corpus Linguistics 2013: Abstract Book. Lancaster: UCREL.

 

Rayson, P, Wilson, A, McEnery, T, Hardie, A and Khoja, S (eds.) (2001) Proceedings of the Corpus Linguistics 2001 conference. UCREL Technical Papers Volume 13 Special Issue. Department of Linguistics, Lancaster University.

 

Baker, P, Hardie, A, McEnery, T and Siewierska, A (eds.) (2000) Proceedings of the Third Discourse Anaphora and Reference Resolution Colloquium (2000). UCREL Technical Papers Volume 12 Special Issue. Department of Linguistics, Lancaster University.

 

 

Unpublished PhD thesis

 

Hardie, A (2004) The computational analysis of morphosyntactic categories in Urdu. Unpublished PhD thesis, Department of Linguistics and English Language, Lancaster University.

 

 

Talks and conference presentations

 

June 2014. Extending a corpus analysis tool to support the analysis of field data. Talk at the Department of Linguistics, University of Ghana.

 

June 2014. Yesterday, Today, Towards Tomorrow (with Tony McEnery). Plenary presentation at the IVACS 2014 conference, Newcastle University.

 

May 2014. Rethinking basic statistical techniques in corpus analysis. Plenary presentation at the International Symposium on Learner Corpus Studies in Asia and the World (LCSAW) 2014, Kobe University, Japan.

 

May 2014. XML encoding for spoken learner (and other) corpora: a modest approach. Plenary presentation at the International Symposium on Learner Corpus Studies in Asia and the World (LCSAW) 2014, Kobe University, Japan.

 

May 2014. Statistical identification of keywords, lockwords and collocations as a two-step procedure. Presentation at the ICAME 35 conference, University of Nottingham.

 

March 2014. Analysing EEBO-TCP as an annotated corpus. Invited talk at the Sheffield Centre for Early Modern Studies, University of Sheffield.

 

March 2014: The applicocausative voice in dialectal and standard Javanese: a corpus-based analysis (with Noor Malihah). Presentation at the Second Asia Pacific Corpus Linguistics Conference (APCLC 2014), Hong Kong Polytechnic University.

 

February 2014: The affordances of corpus analysis software in approaching EEBO-TCP. Invited presentation at the Northern Renaissance Seminar ‘To set the word against the word’: new directions in early modern textual analysis, Lancaster University.

 

January 2014. Using version control software for corpus construction. ESRC Centre for Corpus Approaches to Social Science Technical Presentation, Lancaster University.

 

September 2013: Transforming EEBO-TCP into a Corpus (with Paul Rayson, Alistair Baron). Presentation at the EEBO-TCP 2013 conference, University of Oxford.

 

June 2013: Annotation and analysis of Early Modern English corpus data. Invited presentation at the Contested Words: The Digital Analysis of Early Modern Texts workshop, University of Warwick.

 

June 2013: The statistics of collocation: basic principles and potential problems. Invited talk at the University of Sheffield.

 

May 2013: Applying cluster analysis to the problem of text-type classification (with Ghada Mohamed). Invited talk at the Institute of the Czech National Corpus, Charles University, Prague.

 

May 2013: Annotation and analysis: an overview of tools and techniques. Invited talk at the Institute of the Czech National Corpus, Charles University, Prague.

 

May 2013: Spatial analysis of corpus data using Geographical Information Systems. Invited talk at the University of Erlangen-Nuremberg.

 

April 2013: Annotation and analysis of Early Modern corpus data. Invited presentation at Giornata di Studi – Corpus Linguistics and Historical Corpora, University of Florence.

 

February 2013: Wrangling large-scale data for specialised corpora. Invited presentation at the BAAL Corpus Linguistics SIG Symposium on Building and Mining Small Specialised Corpora, University of Edinburgh.

 

January 2013: Approaching text typology through cluster analysis in English and Arabic corpora (with Ghada Mohamed). Presentation at the LSB2013 conference, Brussels.

 

September 2012: Prerequisites to a corpus-based analysis of EEBO-TCP (with Alistair Baron). Presentation at the EEBO-TCP 2012 conference, University of Oxford.

 

September 2012. Which ‘Lancaster’ do you mean? Disambiguation challenges in extracting place names for Spatial Humanities (with Paul Rayson and Alistair Baron). Presentation at the Digital Humanities Congress conference 2012, University of Sheffield.

 

January 2012: Modest XML for Corpora. Presentation to the UCREL Corpus Research Seminar, Department of Linguistics, Lancaster University.

 

July 2011. Research ethics in corpus linguistics (with Tony McEnery). Presentation at the CL2011 conference, University of Birmingham.

 

July 2011. Twenty-first century Corpus Workbench: Updating a query architecture for the new millennium (with Stefan Evert). Presentation at the CL2011 conference, University of Birmingham.

 

May 2011: The conceptual convergence of functional-cognitive theory and neo-Firthian linguistics (with Tony McEnery). Presentation at the ICAME 32 conference, Oslo.

 

May 2011: The internal gradience of the adposition category: some evidence from comparable corpora of English, Nepali and Russian. Presentation at the ICAME 32 pre-conference workshop on Corpus-Based Contrastive Analysis, Oslo.

 

November 2010: Extending a corpus analysis tool to support the analysis of field data: CQPweb and minority languages of South Asia. Presentation to the UCREL Corpus Research Seminar, Department of Linguistics, Lancaster University.

 

November 2010: Invited panel presentation at the 5th Chicago Colloquium on Digital Humanities and Computer Science, Northwestern University, Chicago.

 

September 2010: An introduction to CQPweb (and its application to the lesser-studied languages of the world). Invited talk at CNRS, Paris.

 

September 2010: Extending a corpus analysis tool to support the analysis of field data: Bodo and Dimasa data in the CQPweb system. Presentation at the 16th Himalayan Languages Symposium, School of Oriental and African Studies, London.

 

October 2009: Collocational patterning in cross-linguistic perspective: adpositions in English, Nepali, and Russian. Presentation at the 28th International conference on Lexis and Grammar, University of Bergen.

 

July 2009: Corpus evidence and the internal gradience of grammatical categories in Nepali. Presentation at the 15th Himalayan Languages Symposium, University of Oregon.

 

May 2009: CQPweb – combining power, flexibility and usability in a corpus analysis tool. Presentation at the ICAME 30 conference, Lancaster University.

 

September 2008: Visual GISting: Merging Corpus Linguistics and Geographical Information Systems (with Ian Gregory). Presentation at the Digital Resources for the Humanities and Arts conference 2008 (DRHA08), University of Cambridge.

 

June 2008: Text reuse and ideology: tracing duplicates and variants in the news discourse of seventeenth-century England. Presentation at the 4th international IVACS conference, University of Limerick.

 

May 2008: Computer-assisted metaphor analysis using key semantic domains (with Veronika Koller, Paul Rayson, and Elena Semino). Presentation at the Researching and Applying Metaphor conference (RaAM 7), Caceres, Spain.

 

December 2007: Mentions in time & space: extracting and visualizing report impact from a corpus of newsbook text. Presentation at the Places of News conference, Jacobs University Bremen.

 

July 2007: Collocational properties of adpositions in Nepali and English. Presentation at the CL2007 conference, University of Birmingham.

 

July 2007: Exploiting a semantic annotation tool for metaphor analysis (with Paul Rayson, Veronika Koller, Elena Semino). Presentation at the CL2007 conference, University of Birmingham.

 

June 2007: Historical text mining applied to Early Modern English Literature. Presentation (jointly with Stephen Pumfrey)at the workshop on “The Electronic Revolution in Textual Analysis”, Institute for Advanced Studies, Lancaster University.

 

May 2007: Quantifying syntactic structures for keyness analysis. Presentation at the ICAME-28 conference, Stratford-upon-Avon.

 

May 2007: Collocational patterns around prepositions in English. Presentation at Madan Puraskar Pustakalaya, Kathmandu, Nepal.

 

April 2007: The Lancaster Newsbooks Corpus: construction and analysis. Presentation at the University of Florence, Italy.

 

February 2007: Prepositions in English: some thoughts towards a collocation-based approach to grammatical categorisation. Presentation at the School of English, University of Liverpool.

 

December 2006: Historical text mining: corpus-based approaches to the newsbooks of the Commonwealth. Presentation at the workshop on “Time and Space on the Way to Modernity: The Emergence of Contemporaneity in European Culture”, International University Bremen.

 

December 2006: Corpora and the languages of South Asia. Presentation to Working Group 1 of COST Action A31 on “Stability and adaptation of classification systems in a cross-cultural perspective”, at AKNOA, Humbolt-Universität, Berlin.

 

February 2006: A collocation-based approach to Nepali postpositions. Presentation to the Research Issues in Theoretical Linguistics group, Department of Linguistics, Lancaster University.

 

November 2005: Exploiting the Nepali National Corpus: postpositions and collocational patterns. Presentation at the Conference of the Linguistic Society of Nepal, Kathmandu.

 

November 2005: Automated part-of-speech analysis of Urdu: conceptual and technical issues. Presentation at the Conference of the Linguistic Society of Nepal, Kathmandu.

 

November 2005: Creating and analysing a corpus of Nepali. Presentation to the Corpus Research Group, Department of Linguistics, Lancaster University.

 

September 2005: Foundational issues for corpus linguistics and the languages of South Asia. Presentation to the Department of Linguistics, University of Gothenburg.

 

July 2005: How common is a noun? Part-of-speech ratios in English. Presentation at the CL2005 conference, University of Birmingham.

 

June 2005: Approaching part-of-speech tagging: manual and automatic analysis. Presentation at Madan Puraskar Pustakalaya, Kathmandu, Nepal.

 

February 2005: Written corpora: design and data collection. Unicode, XML and XCES: corpus encoding and mark-up. Corpus annotation. Presentations at Madan Puraskar Pustakalaya, Kathmandu, Nepal.

 

March 2004: Data and software resources for natural language processing in the South Asian languages. Presentation at EuroIndia 2004 conference, New Delhi.

 

March 2004: Tagging a new language: a case study in Urdu. Presentation at the University of Łódź, Poland.

 

March 2003: Developing a tagset for automated part-of-speech tagging in Urdu. Presentation at the CL2003 conference, Lancaster University.

 

October 2002: A part-of-speech tagset for Urdu. Presentation at the BAAL/CUP Seminar on Researching the Indic Languages Diaspora in Britain, University of York.

 

April 2002: A part-of-speech tagset for Urdu. Presentation to the Corpus Research Group, Department of Linguistics, Lancaster University.