Next: Feature Coding Up: From Corpus to Codings: Previous: Introduction

Pre-Preparation

The Corpus

To prepare the corpus, the user needs to pre-segment the text, one item per line of a text file, e.g., for a study which is studying the expression of semantic events:

Creating a DASD dataset
This section describes the knowledge required 
to create a DASD dataset.
A DASD dataset can be created 
by specifying NEW in the DISP parameter 
     of a DD statement. 
Alternatively, the DASD dataset can be created 
etc.

The user must represent the coding scheme (the features in which the user is interested) in terms of a system network. This network needs to be entered into the computer in the format which is used for entering grammars in the WAG system. The input format is similar to that used in the Penman Text Generation system (WAG does in fact read Penman-format systems):

(defsystem
  :name congruency
  :entry-condition semantic-event
  :features (clausal-event
             nominalised-event 
             adjectival-event))

The user provides a set of these systems, which together define a system network. These are read into the coder, which can then be used for semi-automated coding of the text corpus using this coding scheme.

The features in the coding scheme can be from any linguistic level, for instance, intonational, grammatical, semantic, speech-function, contextual (e.g., the gender of the speaker, the source of the text). These levels may be mixed freely within the coding scheme.

The user can use the Systemic Grapher, another module of the WAG system, to check that the coding scheme has been defined as intended. Figure 1 shows a part of a graph of a typical coding scheme.

: A Partial Graph of a Coding Scheme

Mick O'Donnell
Thu Jan 25 17:20:03 GMT 1996

Pre-Preparation

The Corpus

The Coding Scheme