Information on how to find W107 is available on our contact page.
| 22 April 2004 |
No Seminar | |
| 29 April 2004 |
Daniel Paiva ITRI, University of Brighton Controlling generation with stylistic parameters Abstract and further information | |
| 6 May 2004 |
No Seminar | |
| 13 May 2004 |
Matthew Purver and Ruth Kempson King's College London Incrementality and Context-dependence in Generation Abstract and further information | |
| 20 May 2003 |
No Seminar | |
| 27 May 2004 |
Maria Lapata Department of Computer Science, University of Sheffield Assessing Cohesion via Pattern Mining: Representation and Models Abstract and further information | |
| 3 June 2004 |
Paula Buttery Computer Lab, Cambridge University Reliable Language Acquisition from Real Data Abstract and further information | |
| 10 June 2004 |
Dunstan Brown and Carole Tiberius Surrey University Syncretism and textual frequency in Russian Abstract and further information | |
| 17 June 2004 |
Lynne Cahill Sussex University Phonemes, metaphonemes and features: multilingual lexical representation of phonology Abstract and further information | |
| 24 June 2004 |
Gerardo Sierra Martinez Engineering Institute, Universidad Nacional Autónoma de México (UNAM) Terminological Information Retrieval Abstract and further information | |
| 1 July 2004 |
David Garcia Martul Department of Information Sciences, University Carlos III, Madrid An Approximation to a System of Knowledge Organisation through Computational Linguistics: Topic Maps Abstract and further information | |
| Additional Seminar: Monday, 19 July 2004 |
Dominique Estival DSTO, Australian Government Department of Defence Conversations with Virtual Advisers Abstract and further information | |
In this talk I will describe a framework for stylistic control of the
generation process. The approach correlates stylistic dimensions
obtained from a corpus-based factor analysis with internal generator
decisions, and uses the correlation to direct the generator towards
particular style settings. I will illustrate this with a prototype
generator that produces patient information leaflets. Also I will
compare this framework with previous approaches arguing that it
offers a more generic approach to the problem of stylistic control.
Matthew Purver and Ruth Kempson
Incrementality and Context-dependence in Generation
Study of dialogue has been proposed by Pickering and Garrod (forthcoming) as the major new challenge facing both linguistic and psycho-linguistic theory. Two of the phenomena which they highlight as common in dialogue, but posing a significant challenge to theoretical linguists, are alignment between conversational participants, and shared utterances. Alignment is the term used for the way conversational participants mirror each other's patterns at many levels including lexical, syntactic and semantic choices. Shared utterances are those in which participants exchange parser and producer roles mid-sentence.
Alignment is problematic for theoretical or computational approaches in which parsing and generation are seen as separate disconnected processes (even if based on the same reversible grammar), as it appears to require a connection between the processes at many levels (not just individual lexical choices, but apparently lexicalised syntactic rule preferences, and word sequences). Shared utterances are even more problematic, in particular for any grammar formalism whose output is the set of well-formed strings. What has to be modelled is how an initial hearer must parse an input which may not be a standard constituent, assigning it a (partial) interpretation, and then, in switching to become a speaker, complete that representation and generate an output from it, taking into account previous words and their selected form without re-producing them. The initial speaker (switching to hearer) must also be able to integrate these two fragments but in reverse direction. For all generation and parsing systems, this means defining the appropriate shift mechanism from one system to the other, in either direction.
This talk will introduce a psycholinguistically plausible model of
incremental context-dependent tactical generation within the Dynamic
Syntax framework (Kempson et al 2001), which directly reflects these
phenomena. In Dynamic Syntax, in which the concept of tree growth is
central, parsing is defined in terms of actions on semantic tree
structures, in which structures are incrementally and monotonically
built, as dictated by the serial order of words in a string. As shown
by Otsuka and Purver 2003, generation reflecting word-by-word
incrementality can be defined in this framework in terms of the very
same actions as manipulated by the parser. In this talk we propose a
departure from this model. Parsing and generation are defined in
context-dependent terms, defining both utterance processing and
dialogue context in terms of (partial) trees, the context comprising
(partial) trees that have been constructed prior to the current
utterance, together with associated tree-update actions used in
creating such trees. We show, on the basis of this model, how
alignment can be seen to result directly from the use of such context
as a generation strategy for minimizing general-lexicon search, and
how switch of speaker-hearer roles in shared utterances can be
straightforwardly modelled in virtue of sharing of context and
(partial) data structures by both participants.
Maria Lapata
Assessing Cohesion via Pattern Mining: Representation and
Models
A variety of NLP applications such as summarization, machine translation, and notably concept-to-text generation are expected to produce natural language texts that are not a random collection of sentences but are organised into a self-contained document. A mechanism for automatically judging whether machine generated text is cohesive (and therefore readable) will benefit these applications: it could be employed to guide the process of text revision or to select the most cohesive output among possible candidates.
In this talk we introduce a novel representation of cohesion that is
based on the notion of an entity matrix and combines distributional
and syntactic information about text entities. We then present
generative as well as discriminative models that uncover text
connectivity patterns from entity matrices. Finally, we apply these
models to measure the cohesiveness of multidocument summaries. Our
experiments show that the models based on the entity matrix
representation outperform previously proposed models.
Paula Buttery
Reliable Language Acquisition from Real Data
A child acquires the language of her environment from exposure to example utterances; she has no formal language teaching. Lacking any discerning information, a child is likely to assume that every utterance she hears is grammatical (i.e. a valid example of her target language). This presents the child with two problems:
Dunstan Brown and Carole Tiberius
Syncretism and textual frequency in Russian
We compare the relationship between syncretism - a kind of grammatical
ambiguity - in Russian nouns and their associated textual frequency
distribution. Russian is an ideal language for this purpose, as it has
a reasonable number of grammatical distinctions, with syncretism
occurring in different morphological classes. A comparison is made
between the position of a morphological class in a default inheritance
hierarchy, constructed for other purposes, and the frequency of the
grammatical functions involved in syncretism. Our cross-validated
results show that there is an intricate relationship between textual
frequency and inflectional syncretism.
Lynne Cahill
Phonemes, metaphonemes and features: multilingual lexical representation
of phonology
The representation of phonology in the lexicon is a problem that has rarely attracted a coherent research effort. Approaches to the issues tend to be task-driven and directed towards speech applications. They have, thus, tended to ignore or at least fail to take full advantage of advances in lexical representation techniques used in other areas of NLP. They also rarely take advantage of possible cross-lingual sharing of information in the way that approaches to the representation of semantics and syntax frequently do.
Much research in this area also falls foul of the perennial problem in phonological representation: segments or features? While many lexical models allow for the efficient representation of phonological information at the segmental level, the transition from this level to a featural level tends to be overly simplistic and reliant on the (totally incorrect) assumption that features are organised into segment-sized chunks.
In a variety of small projects over the past decade (recently together with Carole Tiberius) I have developed some suggestions for bringing together ideas in the field of lexical representation using inheritance networks with ideas in the field of computational phonology and phonetics. In this talk I will present a summary of those ideas, beginning with a proposal to interface an inheritance lexicon to the YorkTalk speech synthesis system, through some initial trial projects using language features and FSAs to represent lexical phonology at a featural level, to the METAPHON project and the notion of metaphonemes.
I will go on to assess the possible ways forward, considering recent ideas
from the speech community about lexical representation and access.
Gerardo Sierra Martinez
Terminological Information Retrieval
David Garcia Martul
An Approximation to a System of Knowledge Organisation through Computational Linguistics: Topic Maps
Topic Maps are an ISO standard wich provides a standardised notation for representing information about the structure of information resources used to define topics, and the relationships between topics. A set of one or more interrelated documents that employs the notation and grammar defined by the ISO/IEC 13250 International Standard is called a "topic map". In general, the structural information conveyed by topic maps includes: groupings of addressable information objects around topics (occurrences); relationships between topics (associations).
The possibility to use associations like meronym, antonym, hyperonym
and hyponym remind us that the use of semantic networks like WordNet,
in fact both (WordNet and Topic Maps) have the same aim: to show the
meaning of different concepts through the contexts they can be found
in. The relationships between different concepts identify their
meaning, but Topic Maps go beyond because Wordnet is a semantic
dictionary built as a semantic network, whereas a Topic Map can be
used to describe any domain with anything described as topics and the
associations between them.
Dominique Estival
Conversations with Virtual Advisers
In this talk, I will present the spoken dialogue system designed and implemented for Virtual Advisers in the FOCAL (Future Operations Centre Analysis Laboratory) environment.
The architecture of the system is based on Dialogue Agents using propositional attitudes. The Natural Language Understanding component using typed unification grammar (Regulus) is linked to a commercial speaker-independent speech recognition system (Nuance). The current application aims to facilitate the multi-media presentation of military planning information in a semi-immersive environment.
I will discuss some of the technical aspects of FOCAL which are
relevant to the dialogue application and show how a fragment of the
scenario has been implemented. I will also describe the improvements
and additions which have been made to the system in the past year and
sketch our next research directions.