next up previous
Next: Preserving Coherence in Up: Variable Length On-Line Document Previous: Introduction

Variable-Length Document Presentation

Any document marked up for RST can be used for variable-length document presentation. This section describes the process whereby the rst-structure is pruned to produce a suitable length document.

Assigning Relevance Scores to Text Nodes

As described in the introduction, the basic mechanism involves assigning each structural relation a relevance score between 0.0 and 1.0. For instance, ELABORATION may have a score of 0.40 (low relevance), while PURPOSE might be scored more highly.

By an RST-tree, I assume a tree with the top-nucleus as the root of the tree, and satellites hanging off this, and their satellites hanging off of them. Our task is then to prune branches off of this tree. The top-nucleus has a relevance value of 1.0 (maximum relevance).

Through a process of recursive descent, we assign each node in the tree the relevance level of its parent, multiplied by the relevance score of the relation which connects them to the parent. For instance, an ELABORATION of the top-nucleus would have relevance 0.4 (1.0 * 0.4), while an ELABORATION of that node would have relevance 0.16 (0.4 * 0.4). Nodes lower in the RST-tree (less nuclear) will thus have lower relevance than higher nodes (more nuclear), and will thus be the first to be pruned.

This is a simple mechanism, but it has shown good results in producing reasonable texts at whatever degree of verbosity. It is easy to see that an elaboration of an elaboration will in most cases be less essential to a text than the elaboration itself.

However, there are some cases where this method breaks down -- nuclearity does not always reflect centrality of information. Sometimes an author introduces information in a rhetorically unimportant place, yet that information may be needed later to understand the argument. One example of this in the summary shown earlier is where the original text had said: he was faced with constant pressure from Edward to sign. He refused to do so. In the summary, ``to sign'' was pruned as, but it was actually a central concept, and the anaphoric ``so'' failed because of its pruning.

The text-nodes are then placed in a queue, position based on their relevance score.

Pruning the RST-tree

When a request is received to display the text at a particular length, the system needs to determine which text-nodes to display. Taking each node in turn from the relevance queue (starting with the most relevant), the program checks to see if including this text node will push the word-count over the limit. If not it adds the node to the nodes-to-be-expressed list, and increments the words-so-far count. When the word-limit is exceeded, the procedure then turns to expressing the selected nodes. The nodes are expressed in the order in which they appeared in the original full text.

Note that the satellites of a node will always have lower or equal relevance than the node itself, so we never include a satellite in the nodes-to-be-expressed list if its nucleus is not, which may produces incoherency.

Extensions on Basic RST

The RST Markup Tool, and consequently document presentation, allows markup of more than simple nuclear-satellite relations. This includes:

Allowing the intermixing of story grammars and RST greatly increases the representative power of the formalism, and subsequently helps in text pruning. For instance, if we provide the INTRODUCTION and CONCLUSIONS relations higher relevance values than BODY, then these sections will be more prominent in any summary.

All of these structures are handled in terms of the relation (role) linking the constituent to the whole, and this relation is handled identically to simple RST relations in text pruning.

User-Variation of Relation Weightings

The actual values associated with each relation are not fixed, but can be varied by the user. The user can select values which reflect their interests, highlighting some types of rhetorical relations, and ignoring others.

The system comes with three inbuilt `user-models', representing different ranges of interest: ( standard, (average values), how&why preferring cause, reason, purpose, conditionals, etc., and when&where, preferring spatial- and temporal-locations and extents. Figure 3 demonstrate the slight difference of information (bold font) included in the text when switching between the when&where set and the how&why set. We might also add such sets as naive, preferring definitions, clarifications, restatements, and elaborations, while an expert might value these less, but prefer generalisations, etc. Apart from these built-in values, the user can also assign values to each relation independently.

  


How&Why Summary: Alexander III, King of Scots, died. The successor to the Scottish throne was his granddaughter Margaret. The earls and other great magnates had accepted Margaret as the heir to the throne and arrangements were made to bring her to Scotland. Several Guardians were appointed to govern the realm. Discussions were held with Edward I to prevent any instability. A treaty was signed whereby the new queen was to marry Edward's own son. Margaret died. Edward brought out his claims of overlordship. He used the treaty of Falaise. ...
Where&When Summary: In 1286, Alexander III, King of Scots, died at Kinghorn in Fife. The successor to the Scottish throne was his granddaughter Margaret. The earls and other great magnates had accepted Margaret as the heir to the throne and arrangements were made to bring her to Scotland. In the meantime, several Guardians were appointed. Discussions were held with Edward I. A treaty was signed. Margaret died in Orkney. After her death, Edward brought out his claims of overlordship of Scotland. ...

Figure 3: Summaries with different weighting sets



next up previous
Next: Preserving Coherence in Up: Variable Length On-Line Document Previous: Introduction



Mick O'Donnell
Mon Nov 18 18:41:07 GMT 1996