The People's Daily Corpus is a one million word corpus of Mandarin Chinese, released by the Institute of  Computational Linguistics, Peking University and available at The corpus contains one month's data from People's Daily (January 1998). The corpus has now been marked up in XML and transferred to Unicode (UTF-8), thanks to the support of the UK ESRC (Award reference RES-000-23-0553), and can be explored using the WebConc here [Sorry the online service is no longer available]. Click here to view the tagset.


Created and maintained by Richard Xiao 2004-2008