The PH Corpus is a corpus of Mandarin Chinese, containing about 2.4 million words of newswire text published by Xinhua News Agency in 1990-1991. The corpus was compiled by Guo Jin. The segmented version of the corpus is available at ftp://ftp.cogsci.ed.ac.uk/pub/chinese. The corpus is now part-of-speech tagged (tagset) and made accessible online via our web- based concordancer, with the support of our ESRC-funded project (Award Reference RES-000-23-0553).

 

Created and maintained by Richard Xiao 2004-2007