Transcription of LCCWP Projects

NOTE: This page is still in draft form!

The markup tags used in the LCWP

SGML tagsdenotes
<P>....</P>A "chunk" of text. Roughly corresponds to the notion of paragraph but also includes headings, captions, list items and chunks of text of indeterminate function
<REG>...<REG>Regularized spelling. In its current form, the corpus does not indicate the original spellings of regularized words, but we intend to add this information in the future.
<SIC>...<SIC>Cases where there may be a degree of doubt about the accuracy of the words transcribed.
<GAP>Material omitted from the transcription, typically textual or visual material imported into project from an external source
<GAP desc="figure">Graphic element (picture, drawing, photograph etc.) produced by the child
<TABLE>....</TABLE>Any clear use of a table containing rows and cells
<ROW>...</ROW>Row in a table
<CELL>...</CELL>Cell in a table
<NAME key="...">Anonymized name of a child in the corpus sample
<CHPB desc="...">Child's page number

Notes on tag usage

Page numbering. Originally we used child's page numbers. Problems with this: children use many blank pages (and it is not generally clear if intentional which is which); their projects sometimes start make page 1 the cover pages, sometimes it is the inside cover, sometimes it is the first page of 'text proper'. To be consistent we decided to make page 1 the inside cover page in all instances, and - if it exists - the cover page is numbered page 0.
The child's original page numbering is still included for reference, but it is not the primary label in the project index.

Anonymisation tags. Eg <name key="KH"> refers to the child with pseudonym Kyrah Hollingsworth. Thus it is still possible to cross-reference a name in one project to a name in another project, but the identification of that child in real life is not disclosed.

Criteria underlying the Transcription

Our main concerns in developing transcription guidelines for LCCWP have been:

The transcription scheme is discussed in more detail in the paper: Smith, N., McEnery A. and Ivanic, R. (1998) Issues in Transcribing a Corpus of Children's Handwritten Projects. Literary and Linguistic Computing, Vol.13, No.4. Oxford: OUP.

Updated: 04 April 2001.