It is
easy to explore the LCMC corpus using Xara. We have built two different
servers for the character version and the Pinyin version of the corpus. The
default server is set for the standard character version. To select this
server, simply press the OK button after activating the Xara client by
running Sara32.exe from the folder of Xara (Figure 1). To select the
server for the Pinyin version, select Menu choose ¡°LCMC_Pinyin¡± and confirm by
pressing OK (Figure2):
Figure 1 Figure 2
All of the following operations are the same for the two versions except for one difference: type characters or Pinyin symbols as the search string depending upon the version of the LCMC you are using. In this section, we will use the default standard version for the purpose of demonstration. When a server is successfully selected, the client shows a list of corpus files available for that server (Figure 3).
Figure 3 Figure 4
There
are several ways to explore the corpus with Xara. The simplest way is to
type in a search word in the Quick query text box and press the Enter
key. However, we can use Query builder to make very complex
queries. Now suppose we want
to extract all instances of the verbal-finalÁË -le (tagged as u)
immediately followed (the link type defined as Next) by a noun (tagged
as n) in sentence number 0010 in all of the 500 sample files in the 15
text categories. This complicated query can be made using Query builder
of Xara. First, define the scope node (the left node in Query
builder that indicates the context to search in) as ¡°0010¡± using the s
element (Figure 4). In the query node (the right node in Query builder),
select AddKey (POS) to define the first part of the query as ÁË and
select the POS tag u, and the second part as Any and select the
POS tag n. Then define the link type as Next (Figure 5).
The search result is shown in Figure 6.
Figure 5 Figure 6
The upper part of the
concordance window gives the query text (Select Query ¨C
Query text from the main
menu to display the query text) while the lower window displays the
concordances. The status bar of the concordance window shows the name of the
corpus, the partition or subcorpus (in this case Null as we have not
defined a partition), the current position of the pointer/mouse (i.e.
concordance number 1), the total number of concordances (i.e., 25), the number
of files in which the query is matched (10), the file name where the current
concordance occurs (i.e. LCMC_A), and the file/sentence number for the current
concordance (i.e. File A04 and sentence number sn0010). As we have searched in
sentence number 0010 (in 500 sample files), this should be the sentence number
for all of the concordances.
By comparison to many other
corpus tools, one advantage of Xara is that it displays complete
sentences while also centering the search query. Users are also given options
to display concordances in the page (giving more context) or line mode
(i.e. KWIC, as shown in Figure 6), in XML or plain text. Additionally,
users can define their own style sheet to display selected XML elements. Xara
can also compute significant collocates automatically using a statistic
selected from those available by the user.
Note: To explore the corpus using WordSmith version 4, load the concodancer with corpus files in the folder \LCMC\character\texts or \LCMC\Pinyin\texts. Convert the encoding from UTF-8 to Unicode. However, make sure that you have made a copy of your data before you do so, as the original texts will be replaced!