Research

My main objective is to allow people to produce high-quality technical documentation in languages they do not know. For the foreseeable future there is little prospect that this can be done through Machine Translation, since the task of interpreting free text is too difficult. Along with colleagues at the ITRI, I am therefore pursuing the alternative route of Multilingual Natural Language Generation.

Symbolic Authoring

We have worked in this field since 1993, when two projects called DRAFTER and GIST started up. The DRAFTER system generates instructions for using software applications; the GIST system generates instructions for filling in social security forms. An innovative feature of both projects was that the content of the generated documents could be specified by anyone with knowledge of the domain; expertise in knowledge engineering was not required. This was done through special-purpose graphical interfaces to the knowledge base. We have called this approach symbolic authoring: the author edits a `symbolic source' from which the program generates documents in several languages.

Knowledge editing

The graphical interfaces used in DRAFTER and GIST were only partly successful. After several hours of training, authors could define the content of simple documents, but they found the diagrams difficult to interpret and often made mistakes. From evaluation studies, it was obvious that a more accessible presentation of the knowledge would be a text. During 1996, I developed a new knowledge interface for DRAFTER in which the same editing options were presented to the author as operations on a `feedback text', generated by the program from the current knowledge base, rather than as operations on a diagram. This new version, DRAFTER-II, proved much easier to use. The technique of editing knowledge through a feedback text is called WYSIWYM: What You See Is What You Meant. The feedback text presented on the screen (What You See) expresses the sum total of the knowledge-editing decisions that you have made so far (What You Meant).

Layout and style

From 1997-2000 I worked with Donia Scott and Nadjet Bouayad-Agha on the ICONOCLAST project (Integrating CONstraints On Content, Layout And STyle), which sought to generate formatted documents rather than mere texts, and to allow authors some control over style and appearance. These objectives are especially relevant for commercial applications of M-NLG. A company producing technical documentation is interested not only in the content expressed by a document, but in whether its appearance and linguistic style projects the right image: professional, or friendly, or avant-garde, or whatever. The formatting of a document cannot be done correctly simply by adding layout to the generated text, since layout and wording can interact. In ICONOCLAST we explored these interactions in the context of Patient Information Leaflets, the inserts in medicines that tell you how to use the medicine, list possible side-effects, and so on.

Mutlipe document types

During 2001 I worked with Donia Scott, Nadjet Bouayad-Agha, Roger Evans, and Anja Belz on PILLS (Patient Information Language Localisation System), which demonstrated a potential solution to the problem of product documentation in the pharmaceutical industry. In essence, the problem is that documents with multiple purposes, in multiple languages, must be produced from the same information base, so that for any single product (e.g., Prozac) there will be literally thousands of documents presenting similar information in different ways. A small change in the information base (e.g., the discovery of a new side-effect) might make most of these documents out-of-date; at present, this means that technical authors and translators must be pressed into service to make the necessary changes. PILLS showed a more effective solution based on NLG: all documents are generated automatically, in multiple languages, from a single information model that is maintained through WYSIWYM knowledge editing.

Generating from patient records

In October 2002 we begin a new project called CLEF (Clinical E-Science Framework) in which our role will be to generate reports of various kinds from information extracted from cancer patient records. Among other things, this project will allow us to extend our work in PILLS on the problem of how to compose multiple document-types from the same information base.


Last updated 14th February 1999