Research
My main objective is to allow people to produce high-quality
technical documentation in languages they do not know. For
the foreseeable future there is little prospect that this can
be done through Machine Translation, since the task of
interpreting free text is too difficult. Along with colleagues
at the ITRI, I am therefore pursuing the alternative route of
Multilingual Natural Language Generation.
Symbolic Authoring
We have worked in this field since 1993, when two projects
called DRAFTER and GIST started up. The DRAFTER system generates
instructions for using software applications; the GIST system
generates instructions for filling in social security forms.
An innovative feature of both projects was that the content
of the generated documents could be specified by anyone with
knowledge of the domain; expertise in knowledge engineering was
not required. This was done through special-purpose graphical
interfaces to the knowledge base. We have called this approach
symbolic authoring: the author edits a `symbolic source'
from which the program generates documents in several languages.
Knowledge editing
The graphical interfaces used in DRAFTER and GIST were only
partly successful. After several hours of training, authors
could define the content of simple documents, but they found
the diagrams difficult to interpret and often made mistakes.
From evaluation studies, it was obvious that a more accessible
presentation of the knowledge would be a text. During 1996, I
developed a new knowledge interface for DRAFTER in which the
same editing options were presented to the author as operations
on a `feedback text', generated by the program from the current
knowledge base, rather than as operations on a diagram. This
new version, DRAFTER-II, proved much easier to use. The
technique of editing knowledge through a feedback text is
called WYSIWYM:
What You See Is What You Meant. The feedback
text presented on the screen (What You See) expresses the
sum total of the knowledge-editing decisions that you have
made so far (What You Meant).
Layout and style
From 1997-2000 I worked with Donia Scott and Nadjet
Bouayad-Agha on the
ICONOCLAST
project (Integrating CONstraints
On Content, Layout And STyle), which sought to generate formatted
documents rather than mere texts, and to allow authors some
control over style and appearance. These objectives are especially
relevant for commercial applications of M-NLG. A company producing
technical documentation is interested not only in the content
expressed by a document, but in whether its appearance and
linguistic style projects the right image:
professional, or friendly, or avant-garde, or whatever. The
formatting of a document cannot be done correctly simply by
adding layout to the generated text, since layout and wording
can interact. In ICONOCLAST we explored these interactions
in the context of Patient Information Leaflets, the inserts in
medicines that tell you how to use the medicine, list possible
side-effects, and so on.
Mutlipe document types
During 2001 I worked with Donia Scott, Nadjet Bouayad-Agha, Roger Evans,
and Anja Belz on
PILLS
(Patient Information Language Localisation System),
which demonstrated a potential solution to the problem of product
documentation in the pharmaceutical industry. In essence, the problem is
that documents with multiple purposes, in multiple languages, must be
produced from the same information base, so that for any single product
(e.g., Prozac) there will be literally thousands of documents presenting
similar information in different ways. A small change in the information
base (e.g., the discovery of a new side-effect) might make most of these
documents out-of-date; at present, this means that technical authors and
translators must be pressed into service to make the necessary changes.
PILLS showed a more effective solution based on NLG: all documents are
generated automatically, in multiple languages, from a single information
model that is maintained through WYSIWYM knowledge editing.
Generating from patient records
In October 2002 we begin a new project called
CLEF
(Clinical E-Science
Framework) in which our role will be to generate reports of various kinds
from information extracted from cancer patient records. Among other things,
this project will allow us to extend our work in PILLS on the problem of
how to compose multiple document-types from the same information base.
Last updated 14th February 1999