Looking at text re-use in a corpus of seventeenth-century news reportage
It has been supposed by many scholars who
have examined the newsbooks of the seventeenth century that newswriters
indulged in a great deal of text re-use. This may mean the verbatim
quoting of a source doument in more than one newsbook; it may also refer
to more direct forms of duplication. The issue of text re-use in this
period is a complicated one. Many writers were responsible for more than
one periodical. Successful newsbooks might be duplicated or emulated in
a number of ways by other hands - as reprints, as counterfeits, or as
imitations, to follow the classification used in the New Cambridge Bibliography
of English Literature. Furthermore, as with today's newspapers, even totally
independent publications were still reporting the same events; for instance,
speeches in Parliament.
Using a corpus of text stored in machine-readable format (i.e. not as graphical scans) it is possible to perform a comparison of two documents swiftly and accurately, and produce very quickly a quantitative evaluation fo their similarity - that is, the extent to which one has copied from another, or to which both have utilised the same third source. To accomplish this, a project (funded by the British Academy) was begun here at the University of Lancaster to accomplish the following two aims:
The period selected was that from December 1653 to May 1654. This period is of historical as well as linguistic interest, since it corresponds to the beginning of Cromwell's Protectorate. At this time much of England's attention was focussed on happenings in Scotland, where a Royalist uprising under Glencairn was threatening Cromwell's rule. A historical summary of the Glencairn Uprising, prepared for this project by Helen Baker, can be downloaded from this site.
Other items much in the news at this time included a peace treaty being negotiated with the Netherlands, and an embassy to the Queen of Sweden.
The corpus consists of approximately 800,000 words of running text drawn from all the newsbooks present in the Thomason Tracts that were published during the period in question. These documents were typed in an SGML-compatible format by transcribers during the middle part of 2002 (see also here for a description of the encoding scheme). It was then necessary to identify an appropriate method for examining the text re-use in the corpus.
home | background | projects | encoding | data | references | contact & links