ITRI Seminars - Autumn 2004

ITRI Seminars usually take place at 12 noon on Thursdays in room W107 on the first floor of the Watts Building, University of Brighton (Moulsecoomb site).

Information on how to find W107 is available on our contact page.



23 September 2004
Pablo Duboue
Columbia University
Indirect Supervised Learning Of Content Selection Rules
Abstract
30 September 2004
Chris Fox
Essex University
Natural Language Semantics in a Flexibly Typed Intensional Logic
Abstract

7 October 2004
No Seminar

14 October 2004
No Seminar

21 October 2004
Friederike Moltmann
Department of Philosophy, Stirling University
Presuppositions: A New Perspective
Abstract
28 October 2004
Prof Marilyn Walker
Department of Computer Science, University of Sheffield
Can we talk? prospects for automatically training dialogue systems
Abstract
4 November 2004
Raquel Fernandez Rovira
Department of Computer Science, King's College London
A Machine Learning Approach to Bare Sluice Disambiguation in Dialogue
Abstract

11 November 2004
No Seminar

18 November 2004
Udo Kruschwitz
Essex University
Towards Dialogue-Driven Document Retrieval
Abstract
25 November 2004
Owen Rambow
Computer Science, Columbia University
Summarization of Email Threads
Abstract
2 December 2004
Jon Oberlander
Edinburgh University
Personality effects in language production: projection and perception in computer-mediated communication
Abstract
9 December 2004
Roger Evans
ITRI, Brighton University
RAGS and beyond - architectures for intelligent, knowledge-based systems
Abstract



Links:
Last term's ITRI seminars
NLP seminars at Informatics, University of Sussex



Abstracts


Pablo Duboue
Indirect Supervised Learning Of Content Selection Rules

As online data becomes more and more abundant, there is a growing need to filter information for users and therefore reduce the information overload. This situation is akin to the Content Selection (CS) problem faced by a Natural Language Generation (NLG) system when starting to build a new text. In a generation system, the CS module decides which pieces of information to include in the final generated text. It has been argued that CS is central for the user acceptance of a generation system (as users may tolerate other type of errors as long the information is readily available on the output). Moreover, the CS problem is quite domain dependent; major changes in CS knowledge are needed when moving a system to a different domain.

In this talk, I will present my work on the automatic acquisition of CS (Content Selection) rules, as a way to provide a domain independent solution to the the CS problem. As training material, I employ an aligned Text-Data corpus, a resource that is increasingly popular for learning for NLG (as they are readily available and do not require expensive hand labelling). However, aligned Text-Data corpora only provide indirect information about whether or not a piece of information has been selected or not by the human writer to be included in the text. Indirect Supervised Learning is my proposed solution to this problem. It has two steps; in the first step, the Text-Data corpus is transformed into a dataset with classification labels. In the second step, supervised learning machinery acquires the CS rules from this dataset. I evaluate the approach by comparing the output of my system with the information selected by human authors in unseen texts, obtaining a F* of 0.67 with high recall.


BIOGRAPHY: Pablo Duboue is a PhD candidate in the Computer Science Department at Columbia University, and expects to graduate in October 2004. He works in the Natural Language Processing Group with Prof Kathleen R McKeown, and his main research interests include natural language generation and machine learning.


Chris Fox
Natural Language Semantics in a Flexibly Typed Intensional Logic

In this talk I shall present Property Theory with Curry Typing (PTCT), an intensional first-order theory for natural language semantics developed by myself and Shalom Lappin. PTCT permits fine-grained specifications of meaning. It also supports polymorphic types and separation types. We have developed an intensional number theory within PTCT in order to represent proportional generalized quantifiers like "most". We use the type system and our treatment of generalized quantifiers in natural language to construct a type-theoretic approach to pronominal anaphora and ellipsis. We have also developed a theory of underspecification that is expressed within the term language of the theory.

The talk will focus on the basics of PTCT itself, and outline the treatment of anaphora and ellipsis. If there is time, a sketch of our treatment of underspecification may also be given.


Udo Kruschwitz
Towards Dialogue-Driven Document Retrieval

The Web provides a massive knowledge source. The same is true for intranets and other electronic document collections. However, much of that knowledge is encoded implicitly and cannot be applied directly without processing it into some more appropriate structures. Searching, browsing, question answering for example could all benefit from domain specific knowledge contained in the documents; and in applications such as simple search we do not actually need very ``deep'' knowledge structures such as ontologies but we can get a long way with a model of the domain that consists of term hierarchies. We combine domain knowledge automatically acquired by exploiting the documents' markup structure with knowledge extracted on the fly to assist a user with ad hoc search requests. Such a search system can suggest query modification options derived from the actual data and therefore guide a user through the space of documents.


Friederike Moltmann
Presuppositions: A New Perspective

I will argue for a new account of presuppositions which is based on double indexing as well as minimal representational contexts providing antecedent material for anaphoric presuppositions, rather than notions of context defined in terms of the interlocutors' pragmatic presuppositions or the information accumulated from the preceding discourse. The account differs from the Satisfaction Theory and the Binding Theory of presuppositions in that it can be viewed as a conservative extension of traditional static semantics and in that it does not involve the notion of pragmatic presupposition.


Marilyn Walker
Can we talk? prospects for automatically training dialogue systems



Raquel Fernandez Rovira
A Machine Learning Approach to Bare Sluice Disambiguation in Dialogue

Dialogue is full of fragmentary utterances that exhibit a sentential meaning, perhaps most prototypically the "short answers" used to respond to queries. As is well known, processing such non-sentential utterances (NSUs) is a difficult problem on both theoretical and computational grounds.

In this talk I will focus on a particular type of NSUs, namely bare wh-phrases (so called "sluices") like "who?", "what?" or "why?". I will use data from the dialogue component of the British National Corpus to motivate a typology of bare sluice readings, and present the results of some experiments that show that applying machine learnings techniques can be an efficient tool to disambiguate between sluice interpretations.


Owen Rambow
Summarization of Email Threads

Email is an interesting genre of inter-human communication as it has aspects of both spoken and written language. On the one hand, it has an interactive structure that resembles dialog, and email is not edited and often uses informal language. On the other hand, email exchanges happen over time, so that discourse participants have developed special strategies to remind each other of the context in which they are communicating. These strategies include various ways of citing previous parts of the conversation.

In this talk, I will review some work done in the Columbia Natural Language Processing (NLP) Group on email. I will describe a small corpus collection effort we have undertaken, and then concentrate on summarization by sentence extraction. While summarization by extraction works well for certain genres such as newswire, the approach needs to be modified for email. I will show how using email-specific features improves the choice of relevant sentences for the extraction.

I will also present some related work we have been doing. One line of research aims at classifying email, both at the thread level and at the email message level, into different categories. In a second line of research, we identify questions that are asked and attempt to identify corresponding answers. We have investigated various ways of integrating the simple sentence-extraction approach with question/answer information. I will also present a summarization client we have designed and implemented which can be used in conjunction with Microsoft Outlook. In conclusion I will outline some plans for extending our work to the immense Enron email corpus and some initial investigations.


Jon Oberlander
Personality effects in language production: projection and perception in computer-mediated communication

Our goal is to generate text in a way which helps convey the writer's character. Rather than considering a writer's style in terms of genre or idiolect, we draw upon concepts adopted from personality psychology. In this talk, I describe our data and corpus comparison techniques, and present some of the characteristic patterns of personality language. The systematic differences for Extraversion and Neuroticism have led us to consider the relationship between language production and personality. We sketch the possible links between personality parameters and production processes, and indicate how personality appears to influence low-level aspects of dialogue behaviour.

[Joint work with Alastair Gill and Scott Nowson]


Roger Evans
RAGS and beyond - architectures for intelligent, knowledge-based systems

The RAGS project ('Reference Architecture for Generation Systems'; Brighton/Edinburgh, EPSRC) aimed to build a concrete infrastructure for collaborative Natural Language Generation (NLG) research, founded on an apparent emerging architectural consensus among NLG system builders. However, a detailed survey of these existing systems revealed that the 'consensus' was much less secure than it appeared at first sight. In order to achieve the goals of the project, we started to develop a much more sophisticated view of system architectures, flexible enough to accommodate existing research, yet precise enough to make a useful contribution as a collaborative 'plug-and-play' framework for NLG. The resulting approach asks interesting and challenging questions about the nature of data manipulation and functional 'modulehood' in large, complex, computational systems.

In this talk, I will describe the progressive development of these ideas, from the starting point of the problem revealed by the RAGS survey, through the RAGS two-level data model and functional architecture for NLG systems, and its implementation in the OASYS system, to subsequent work with Chris Mellish on functional vs implementation architectures, and my current ideas for developing a more generic architectural substrate.



Maintained by the seminars organiser (seminars@itri.brighton.ac.uk ).
Last modified: Mon Dec 6 15:36:12 GMT 2004

©Information Technology Research Institute