The ZJU Corpus of Translational Chinese (ZCTC)

Since the 1990s, the rapid development of the corpus-based approach in linguistic investigation in general, and the development of multilingual corpora in particular, have brought even more vigour into descriptive translation studies. As Laviosa (1998) observes, "the corpus-based approach is evolving, through theoretical elaboration and empirical realisation, into a coherent, composite and rich paradigm that addresses a variety of issues pertaining to theory, description, and the practice of translation." Presently, corpus-based Descriptive Translation Studies (DTS) has primarily been concerned with describing translation as a product, by comparing corpora of translated and non-translational native texts in the target language, especially translated and native English. The majority of product-oriented translation studies attempt to uncover evidence to support or reject the so-called "translation universal" (TU) hypotheses that are concerned with features of translational language as the "third code" of translation  (Frawley 1984), which is supposed to be different from both source and target languages.

Presently a large part of product-oriented translation studies have been based on the Translational English Corpus (TEC), which was designed specifically for the purpose of studying English translated from a range of source languages. This is perhaps the only publicly available corpus of translational language. Most of the pioneering and prominent studies of translational English have been based on this corpus, which have so far focused on syntactic and lexical features of translated and original texts of English. Such studies have provided evidence to support the hypotheses of "translational universals" (TUs) in translated English, e.g. simplification, explicitation, sanitisation, and normalisation.

However, the term "translational universal" is highly debatable in the literature. Since research of features of translational language has so far been confined largely to English and closely related European languages, and the translational universals that have been proposed so far are identified on the basis of translational English ĘC mostly translated from European languages, there is a possibility that such linguistic features are not "universal" cross-linguistically but rather specific to English and/or genetically related languages that have been investigated. Clearly, if the features of translational language that have been reported are to be generalised as translation "universals", the language pairs involved must not be restricted to English and closely related languages. Evidence from "genetically" distinct language pairs such as English and Chinese is undoubtedly more convincing.

The ZJU Corpus of Translational Chinese (ZCTC) is created exactly with this aim in mind. It is designed as a translational counterpart of the Lancaster Corpus of Mandarin Chinese (LCMC), a one-million-word balanced corpus representing native Mandarin Chinese. The LCMC and ZCTC corpora have been built by following comparable sampling criteria and the same sampling techniques, and they have been processed using the same tools to ensure maximum comparability.

The ZCTC corpus is created on our ongoing project (07.2007-02.2010) A Corpus-Based Quantitative Study of Translational Chinese in English-Chinese Translation, which is funded by the China National Foundation of Social Sciences (grant reference 07BYY011).

Richard Xiao



1. Corpus design

2. Corpus annotation

3. Corpus markup

4. Data sources

5. Credits

6. Availability