How to Use the LCPW Corpus

There are several ways to access the corpus data. Here we list the four most likely routes.

  1. Navigating by project series

  2. Navigating by child

  3. Downloading text transcriptions

  4. Downloading POS-tagged versions of the transcriptions

  1. Navigating by project series

    1. Select the Corpus Contents Page to see the list of available projects. Choose one of the series - Free Choice projects (4.1 and 4.2), Animals (5.1), Birds (5.3) or Free Choice (6.2). This will open a page containing a list of projects from that series, listed by title (on the right) and author(s) (on the left). See figure 1 below. Project names are identified by the child's initials, followed by the year number, a full-stop and a number indicating the sequence of the project within that year. Note that some of the projects are the work of more than one child. These have identifiers beginnning X..


      Figure 1: the list of Animals projects (series 5.1)

    2. When you follow the link to, say, "Butterflies (5.1)", an index to all the features of the Butterflies project by Kyrah Hollingsworth appears.


      Figure 2: Index page for a specific project (in this case, KH5.1)

      View Scans of the Original Pages: select a page number to go directly to a scanned image of the original version of that page, as produced by the child.1

      Browse the Text: opens a transcribed version of the children's original project text. The transcription includes a minimal set of SGML markup tags to encode the children's texts. These indicate, for example, the location of drawings and other graphics on the page, and regularized spellings.

      Physical characteristics: contains detailed notes of those features of the material page which could not be captured by the scanned images. For example, size, texture, luminosity, use of tools and materials in the production of the project.

      Grammatical Tagging: This is a version of the text transcription file in which each word is annotated with a part-of-speech ("POS" or wordclass") label, called a tag. The tag follows the word, linked to it by an underscore character. Examples of tags:

      NN1 = singular common noun. Eg pencil, tiger, time, choice
      NN2 = plural common noun. Eg pencil, tiger, time, choice
      VV0 = base form lexical verb, Eg give, play, forget
      VV0 = -s form of lexical verb, Eg gives, plays, forgets
      CS = Conjunction, subordinating type, Eg when, because, although

      [ View the entire list of tags used ]

      Additional information contains any available information relating to the project which did not seem to fit easily under any of the above categories.

    3. Viewing projects page by page.

      When you have selected a particular page to view, typically coming from the Child's project index page (e.g., KH5.1 Index above) the options are as follows:

      The scan is a low-resolution image of the original page (jpg format, approx. 5-18 KBytes). A higher resolution version of the same image can be obtained by clicking on "Enlarged View" (jpg format, approx. 30-60 KBytes). It will load automatically into a separate image viewing software if your browser is configured to load a plug-in for jpg files.

      The other options on this menu are more or less the same as in the previous menu. The main difference is that the links take you directly to the appropriate page in the data; for example, physical characteristics calls up page 9 of the file describing physical characteristics of Kyrah Hollingsworth's "Butterflies" project.

  2. Navigating by child

    In the Corpus Contents page, select the name of a child from the list at the bottom of the page. This reveals a "profile page" for that child. Here is the profile of Kyrah Hollingsworth:


    Figure 4: profile of Kyrah Hollingsworth.

    From here you can explore the child's work longitudinally by selecting the name of the project from years 4, 5 and 6. You can return to the child's profile page wherever you see "Other projects by [child's initials]".

    The link "Ages and Years explained" shows how children's ages relate to school years in the British schooling system.

  3. Downloading text transcriptions

    The transcribed versions of the children's project texts can be downloaded individually or by series. The files are in SGML format, and we provide details of the transcription scheme that has been applied. The composite versions of all the projects are called all51text-core.sgm (series 5.1), all53text-core.sgm (5.3) and all62text-core.sgm (6.2).

  4. Downloading POS-tagged versions of the transcriptions

    The POS-tagged versions of the children's project texts can be downloaded individually or by series. The files are in SGML format, and we provide details of the tagging scheme that has been applied. The composite versions of all the projects are called all51pos-core.sgm (series 5.1), all53pos-core.sgm (5.3) and all62pos-core.sgm (6.2).


Footnotes

1: Although some of the children have numbered their pages, we have applied our own page numbering scheme to ensure consistent referencing throughout the corpus.