SB explained the current situation in Germany with brief descriptions of the activities, primarily as part of the Verbmobil project, at the universities of Hamburg, Bielefeld, Koblenz, Berlin, Magdeburg and Heidelberg. He suggested that standardisation is required but was keen to stress the need for more flexibility than would be permitted by a single reference architecture. He said that the architecture(s) should depend on the task: i.e. what does the system do; how does it do it; and is it for research or application? He suggested the need for a variety of reference architectures, although presumably a single but flexible architecture which allowed for the variety of system types could also serve the purpose.
DS and CM explained the purpose of the RAGS project, explaining that it is based on application-oriented systems. They explained that the architecture developed would be:
EH then moved on to describe how the feeling within the speech community (and perhaps more importantly, in DARPA) is that the speech recognition problem has been solved and they now need to move onto dialog systems. To do this, they need to do NLG, but have found that there is no standardisation. DARPA wants standards, and EH's (persuasive) argument is that if the NLG community doesn't come up with the kind of thing DARPA wants then it will get someone else to do it. He outlined a four stage process to the development of standards: (1) I/O standards for realisation, text planning and sentence planning and data models; (2) Adapt existing systems; (3) Create benchmark tests; (4) Identify major bottlenecks.
RD had a somewhat different perspective on the matter, stressing the fact that the field in its broader sense has psycholinguistic and theoretical linguistic considerations as well as engineering ones. He also stressed that he was interested in ``reuse'' rather than ``standardisation''. The two things are clearly related -- it is impossible to have reuse without standardisation, but his point was that he preferred to emphasise the development of standards only as a means of introducing reusability of resources, not as a goal in itself.
When the discussion was opened up, Owen Rambow asked what was meant by standardisation and what it was for. DS and CM replied stressing that in their (and the RAGS project's) view it was to facilitate the development of applications and that it was not intended to be prescriptive. RD again stressed the distinction between standards and reuse. David McDonald suggested that it might be possible to have a Penman/FUF type component higher up (i.e. at an earlier stage) although possibly only for certain narrowly defined components, e.g. pronoun choice.
Robert Rubinoff was concerned that RAGS would give a biased picture of the NLG field, concentrating as it does only on fully implemented applied systems. DS reassured him that it was simply a way of making the problem a more managable one. CM pointed out that no architecture could be expected to cover all of NLG.
RD questioned whether there were indeed comparable architectures for NLU. Johanna Moore pointed out some less than auspicious examples of community think again. She mentioned the Penn Treebank, which ``no-one likes, but everyone uses, because it's there'' and SEMEVAL which caused such big fights some peole involved still don't talk to each other. Added to this, she pointed out that in NLU at least the input to the system is known in advance!
RD also suggested that ``theory-laden'' approaches dictate the architecture, so the possible aims of working within a standardised architecture and implementing a theory might be in conflict.
Christian Matthiesson suggested that we have had an architecture for two decades, i.e. Penman, with the stratification of ``meta-language'' and ``language''.
Mike Reape pointed out that NLG, unlike other areas of NLP, such as NLU, was not a coherent field, and as such was less amenable to a reference architecture than, say, NLU. Stephan agreed, but put a more positive face on it, stating that we need to bring our ideas together.
Finally, Jon Oberlander returned to the idea of a broader field encompassing psycholinguistics. As he pointed out, the most complete general-purpose NLG systems are ourselves. In order to improve on our system development, he suggested, we need to do more research on how humans perform the task.
The discussion session was disappointing from the RAGS perspective. It did, however, raise general issues of standardisation within NLG. It was clear from the discussion that there is not a clear consensus on whether it is even possible, let alone how it should be done. However, I think it is fair to say that the majority of people there were happy with the idea of a non-prescriptive, consensus-based architecture which would allow the reusability of resources, and the development of datasets as resources. There is the potential for a split within the community along applied/theoretical lines, with RD, Jon Oberlander and others being in favour of more psycholinguistic research and modelling, which lends itself much less easily to standardisation. Where the aim is the development of practical applied systems, however, the need for standardisation and reusable resources seems to be broadly welcomed. The potential for DARPA funding for such work in the US, together with the European move already underway (with RAGS in the UK, and Verbmobil in Germany) seems to make it inevitable that the development of such standards will be talked about much more in the coming months and years.