ITRI Seminars - Summer 2004

ITRI Seminars usually take place at 12 noon on Thursdays in room W107 on the first floor of the Watts Building, University of Brighton (Moulsecoomb site).

Information on how to find W107 is available on our contact page.

22 April 2004
No Seminar
29 April 2004
Daniel Paiva
ITRI, University of Brighton
Controlling generation with stylistic parameters
Abstract and further information
6 May 2004
No Seminar
13 May 2004
Matthew Purver and Ruth Kempson
King's College London
Incrementality and Context-dependence in Generation
Abstract and further information
20 May 2003
No Seminar
27 May 2004
Maria Lapata
Department of Computer Science, University of Sheffield
Assessing Cohesion via Pattern Mining: Representation and Models
Abstract and further information
3 June 2004
Paula Buttery
Computer Lab, Cambridge University
Reliable Language Acquisition from Real Data
Abstract and further information
10 June 2004
Dunstan Brown and Carole Tiberius
Surrey University
Syncretism and textual frequency in Russian
Abstract and further information
17 June 2004
Lynne Cahill
Sussex University
Phonemes, metaphonemes and features: multilingual lexical representation of phonology
Abstract and further information
24 June 2004
Gerardo Sierra Martinez
Engineering Institute, Universidad Nacional Autónoma de México (UNAM)
Terminological Information Retrieval
Abstract and further information
1 July 2004
David Garcia Martul
Department of Information Sciences, University Carlos III, Madrid
An Approximation to a System of Knowledge Organisation through Computational Linguistics: Topic Maps
Abstract and further information
Additional Seminar: Monday, 19 July 2004
Dominique Estival
DSTO, Australian Government Department of Defence
Conversations with Virtual Advisers
Abstract and further information



Links:
Last term's ITRI seminars
NLP seminars at Informatics, University of Sussex



Abstracts


Daniel Paiva
Controlling generation with stylistic parameters

In this talk I will describe a framework for stylistic control of the generation process. The approach correlates stylistic dimensions obtained from a corpus-based factor analysis with internal generator decisions, and uses the correlation to direct the generator towards particular style settings. I will illustrate this with a prototype generator that produces patient information leaflets. Also I will compare this framework with previous approaches arguing that it offers a more generic approach to the problem of stylistic control.


Matthew Purver and Ruth Kempson
Incrementality and Context-dependence in Generation

Study of dialogue has been proposed by Pickering and Garrod (forthcoming) as the major new challenge facing both linguistic and psycho-linguistic theory. Two of the phenomena which they highlight as common in dialogue, but posing a significant challenge to theoretical linguists, are alignment between conversational participants, and shared utterances. Alignment is the term used for the way conversational participants mirror each other's patterns at many levels including lexical, syntactic and semantic choices. Shared utterances are those in which participants exchange parser and producer roles mid-sentence.

Alignment is problematic for theoretical or computational approaches in which parsing and generation are seen as separate disconnected processes (even if based on the same reversible grammar), as it appears to require a connection between the processes at many levels (not just individual lexical choices, but apparently lexicalised syntactic rule preferences, and word sequences). Shared utterances are even more problematic, in particular for any grammar formalism whose output is the set of well-formed strings. What has to be modelled is how an initial hearer must parse an input which may not be a standard constituent, assigning it a (partial) interpretation, and then, in switching to become a speaker, complete that representation and generate an output from it, taking into account previous words and their selected form without re-producing them. The initial speaker (switching to hearer) must also be able to integrate these two fragments but in reverse direction. For all generation and parsing systems, this means defining the appropriate shift mechanism from one system to the other, in either direction.

This talk will introduce a psycholinguistically plausible model of incremental context-dependent tactical generation within the Dynamic Syntax framework (Kempson et al 2001), which directly reflects these phenomena. In Dynamic Syntax, in which the concept of tree growth is central, parsing is defined in terms of actions on semantic tree structures, in which structures are incrementally and monotonically built, as dictated by the serial order of words in a string. As shown by Otsuka and Purver 2003, generation reflecting word-by-word incrementality can be defined in this framework in terms of the very same actions as manipulated by the parser. In this talk we propose a departure from this model. Parsing and generation are defined in context-dependent terms, defining both utterance processing and dialogue context in terms of (partial) trees, the context comprising (partial) trees that have been constructed prior to the current utterance, together with associated tree-update actions used in creating such trees. We show, on the basis of this model, how alignment can be seen to result directly from the use of such context as a generation strategy for minimizing general-lexicon search, and how switch of speaker-hearer roles in shared utterances can be straightforwardly modelled in virtue of sharing of context and (partial) data structures by both participants.


Maria Lapata
Assessing Cohesion via Pattern Mining: Representation and Models

A variety of NLP applications such as summarization, machine translation, and notably concept-to-text generation are expected to produce natural language texts that are not a random collection of sentences but are organised into a self-contained document. A mechanism for automatically judging whether machine generated text is cohesive (and therefore readable) will benefit these applications: it could be employed to guide the process of text revision or to select the most cohesive output among possible candidates.

In this talk we introduce a novel representation of cohesion that is based on the notion of an entity matrix and combines distributional and syntactic information about text entities. We then present generative as well as discriminative models that uncover text connectivity patterns from entity matrices. Finally, we apply these models to measure the cohesiveness of multidocument summaries. Our experiments show that the models based on the entity matrix representation outperform previously proposed models.


Paula Buttery
Reliable Language Acquisition from Real Data

A child acquires the language of her environment from exposure to example utterances; she has no formal language teaching. Lacking any discerning information, a child is likely to assume that every utterance she hears is grammatical (i.e. a valid example of her target language). This presents the child with two problems:

  1. Spoken language contains ungrammatical utterances, perhaps in the form of interruptions, lapses of concentration or slips-of-the-tongue. When a child mis-classifies these utterances as grammatical, errors are introduced into the acquisition process.
  2. Some utterances provide ambiguous grammatical evidence (the problem of ambiguous triggers). For instance, a sentence of English (subject-verb-object ordering) may be interpreted as subject-object-verb ordering with verb movement (V2) as in German. If a child chooses the wrong grammatical interpretation of an utterance, a source of error is again introduced.
A child cannot know when she has encountered an error. Any simulation or explanation of language acquisition should therefore attempt to learn from every utterance it encounters. In this presentation I will describe a statistical learning system (which implements the Bayesian Incremental Parameter Setting (BIPS) algorithm) that is robust to errors and discuss experiments which demonstrate the ability of such a system to learn from real child-directed speech.


Dunstan Brown and Carole Tiberius
Syncretism and textual frequency in Russian

We compare the relationship between syncretism - a kind of grammatical ambiguity - in Russian nouns and their associated textual frequency distribution. Russian is an ideal language for this purpose, as it has a reasonable number of grammatical distinctions, with syncretism occurring in different morphological classes. A comparison is made between the position of a morphological class in a default inheritance hierarchy, constructed for other purposes, and the frequency of the grammatical functions involved in syncretism. Our cross-validated results show that there is an intricate relationship between textual frequency and inflectional syncretism.


Lynne Cahill
Phonemes, metaphonemes and features: multilingual lexical representation of phonology

The representation of phonology in the lexicon is a problem that has rarely attracted a coherent research effort. Approaches to the issues tend to be task-driven and directed towards speech applications. They have, thus, tended to ignore or at least fail to take full advantage of advances in lexical representation techniques used in other areas of NLP. They also rarely take advantage of possible cross-lingual sharing of information in the way that approaches to the representation of semantics and syntax frequently do.

Much research in this area also falls foul of the perennial problem in phonological representation: segments or features? While many lexical models allow for the efficient representation of phonological information at the segmental level, the transition from this level to a featural level tends to be overly simplistic and reliant on the (totally incorrect) assumption that features are organised into segment-sized chunks.

In a variety of small projects over the past decade (recently together with Carole Tiberius) I have developed some suggestions for bringing together ideas in the field of lexical representation using inheritance networks with ideas in the field of computational phonology and phonetics. In this talk I will present a summary of those ideas, beginning with a proposal to interface an inheritance lexicon to the YorkTalk speech synthesis system, through some initial trial projects using language features and FSAs to represent lexical phonology at a featural level, to the METAPHON project and the notion of metaphonemes.

I will go on to assess the possible ways forward, considering recent ideas from the speech community about lexical representation and access.


Gerardo Sierra Martinez
Terminological Information Retrieval


David Garcia Martul
An Approximation to a System of Knowledge Organisation through Computational Linguistics: Topic Maps

Topic Maps are an ISO standard wich provides a standardised notation for representing information about the structure of information resources used to define topics, and the relationships between topics. A set of one or more interrelated documents that employs the notation and grammar defined by the ISO/IEC 13250 International Standard is called a "topic map". In general, the structural information conveyed by topic maps includes: groupings of addressable information objects around topics (occurrences); relationships between topics (associations).

The possibility to use associations like meronym, antonym, hyperonym and hyponym remind us that the use of semantic networks like WordNet, in fact both (WordNet and Topic Maps) have the same aim: to show the meaning of different concepts through the contexts they can be found in. The relationships between different concepts identify their meaning, but Topic Maps go beyond because Wordnet is a semantic dictionary built as a semantic network, whereas a Topic Map can be used to describe any domain with anything described as topics and the associations between them.


Dominique Estival
Conversations with Virtual Advisers

In this talk, I will present the spoken dialogue system designed and implemented for Virtual Advisers in the FOCAL (Future Operations Centre Analysis Laboratory) environment.

The architecture of the system is based on Dialogue Agents using propositional attitudes. The Natural Language Understanding component using typed unification grammar (Regulus) is linked to a commercial speaker-independent speech recognition system (Nuance). The current application aims to facilitate the multi-media presentation of military planning information in a semi-immersive environment.

I will discuss some of the technical aspects of FOCAL which are relevant to the dialogue application and show how a fragment of the scenario has been implemented. I will also describe the improvements and additions which have been made to the system in the past year and sketch our next research directions.



Maintained by the seminars organiser (seminars@itri.brighton.ac.uk ).
Last modified: Tue Jul 13 17:50:52 BST 2004

©Information Technology Research Institute