The People's Daily Corpus is a one million word corpus of Mandarin Chinese, released by the Institute of  Computational Linguistics, Peking University and available at http://icl.pku.edu.cn/Introduction/corpustagging.htm. The corpus contains one month's data from People's Daily (January 1998). The corpus has now been marked up in XML and transferred to Unicode (UTF-8), thanks to the support of the UK ESRC (Award reference RES-000-23-0553), and can be explored using the WebConc here [Sorry the online service is no longer available]. Click here to view the tagset.

 

Created and maintained by Richard Xiao 2004-2008