Workshop aims
There are many branches of NLP research which involve the
generation of language (summarisation, MT, human-computer dialogue,
application front-ends, data-to-text generation, document authoring, etc.).
However, it is not always easy to identify common ground among the generation
components of these application areas, which has sometimes made it difficult
for generic research in 'Natural Language Generation' (NLG) to engage with
them effectively. Recent advances in corpus-based approaches (both manual and
automatic) across many of these areas, and in particular in NLG itself, offer
a new perspective on this problem and the opportunity to explore synergies and
differences from the secure common grounding of corpus data.
This workshop is the third in an occasional series seeking to exploit this
opportunity by providing a forum for discussing NLG and its links with these
closely related fields from a corpus-oriented perspective. These workshops
have the general aims:
- to provide a forum for reporting and
discussing corpus-oriented methods for generating language;
- to foster cross-fertilisation between NLG and related fields by looking for common
ground through corpus-oriented approaches;
- to promote the sharing of data
and methods in all language generation research.
Each of these workshops has a special theme: at the first workshop (at Corpus
Linguistics in 2005) it was the use of corpora in NLG, at the second (at MT Summit
in 2007) it was Language Generation and Machine Translation. The
special theme of the 2009 workshop is Language Generation and
Summarisation.
There are two basic approaches to text summarisation: abstractive,
where texts are analysed, the internal representations are pruned, and a more
condensed version regenerated, and extractive, where key passages of
the input texts themselves are identified and the 'glued together' to form a
shorter text. Extractive summarisation is less dependent on fragile 'deep'
analysis and regeneration techniques, but tends to produce summaries that are
not very coherent and whose referring expressions are not very clear (so for
example, they often score low on DUC human assessment criteria such as
'discourse coherence' and 'referential clarity').
The relevance of NLG techniques to abstractive summarisation is clear, but
recently there has also been increasing interest in regeneration as a
post-process for extractive summaries. Work by Otterbacher et al., Steinberger
et al. and Nenkova et al., for example, show how regeneration of (parts of)
extractive summaries can increase their coherence, referential clarity or
fluency. At the same time NLG researchers are investigating techniques that
could be used to improve extractive summaries by regenerating them (in
particular in the subfield of referring expression generation, see for example
the GREC Task papers at INLG 2008).
The core aim of this workshop is to provide a forum for NLG and
summarisation researchers to examine the similarities and differences between
their current approaches to generating language, and to explore the potential
for cross-fertilisation. To this end, the workshop will include:
This will be supported by a programme of technical papers,
on all aspects of using corpora in the generation of language, with a
particular interest in relevance to text summarisation. Specific topics
include, but are not limited to:
- generation techniques in abstractive summarisation
- regeneration/rewriting/post-processing techniques for extractive summarisation
- generation of references to named entities in discourse context
- annotating corpora for language generation and summarisation
- uses of corpora in the evaluation of language generation and summarisation systems
- reuse of corpus resources developed for NLU (e.g. treebanks) in language generation and summarisation
- domain-specific vs. general-purpose corpora for language generation and summarisation
- statistical approaches to language generation
- machine learning methods for language generation
Key workshop facts
Invited speaker:
Kathy McKeown, Columbia University, USA
Panelists:
Regina Barzilay, MIT
Ed Hovy, ISI
Kathy McKeown, Columbia
Donia Scott, Open University
Organisers:
Anja Belz, University of Brighton, UK
Sebastian Varges, University of Trento, Italy
Roger Evans, University of Brighton, UK
Programme committee:
Enrique Alfonseca, Google Zurich, Switzerland
Srinivas Bangalore, AT&T, USA
Robert Dale, Macquarie University, Australia
Daniel Marcu, ISI, University of Southern California, USA
Chris Mellish, Universiy of Aberdeen, UK
Ani Nenkova, University of Pennsylvania, USA
Amanda Stent, SUNY, USA
Michael Strube, EML Research, Germany
Stephen Wan, Macquarie University, Australia
Mike White, Ohio State University, USA
Jianguo Xiao, Peking University, China
Important dates
| 1 May 2009: | Deadline for paper submissions |
| 3 Jun 2009: | Notification of acceptance |
| 10 Jun 2009: | Camera-ready copies due |
| 30 Jun 2009: | Early registration deadline |
| 6 Aug 2009: | UCNLG+SUM workshop in Singapore |
Accepted papers
Keynote paper:
-
Kathy McKeown
Query-focused Summarization Using Text-to-Text Generation:
When Information Comes from Multilingual Sources
-
Abstract
The past five years have seen the emergence of robust, scalable natural
language processing systems that can summarize and answer questions about
online material. One key to the success of such systems is that they re-use
text that appeared in the documents rather than generating new sentences from
scratch. Re-using text is absolutely essential for the development of robust
systems; full semantic interpretation of unrestricted text is beyond the state
of the art. Better summaries and answers can be produced, however, if systems
can generate new sentences from the input text, fusing relevant phrases and
discarding irrelevant ones. When the underlying sources for question answering
come from multiple languages, the need for text-to-text generation is even
more pronounced.
This talk will present research on query-focused summarization over a
variety of sources, including news, broadcast news, talks shows and blogs. Our
research combines approaches from summarization and information extraction to
answer open-ended questions. Because our sources include informal genres as
well as formal genres and draw from English, Arabic and Chinese, text-to-text
generation is critical for improving the intelligibility of responses. In this
talk I will describe how we exploit information available at question
answering time to edit sentences, removing redundant and irrelevant
information and correcting errors in translated sentences.
Long papers:
Karolina Owkzarzak and Hoa Trang Dang
Evaluation of automatic summaries: Metrics under varying data
conditions
Horacio Saggion
A Classification Algorithm for Predicting the Structure of Summaries
Jackie Chi Kit Cheung, Giuseppe Carenini
and Raymond Ng
Optimization-based Content Selection for Opinion Summarization
Wei Xu and Ralph Grishman
A Parse-and-Trim Approach with Information Significance for Chinese
Sentence Compression
Hideki Tanaka, Akinori Kinoshita, Takeshi
Kobayakawa, Tadashi Kumano and Naoto Katoh
Syntax-Driven Sentence Revision for Broadcast News Summarization
João Cordeiro, Gaël Dias and Pavel Brazdil
Unsupervised Induction of Sentence Compression Rules
Short Papers:
Stephanie Schuldes, Michael Roth, Anette
Frank and Michael Strube
Creating an Annotated Corpus for Generating Walking Directions
Iris Hendrickx, Walter Daelemans, Erwin
Marsi and Emiel Krahmer
Reducing Redundancy in multi-document Summarization Using Lexical Semantic
Similarity
Maria Fernanda Caropreso, Diana Inkpen,
Shahzad Khan and Fazel Keshtkar
Visual Development Process for Automatic Generation of Digital Games
Narrative Content
Mohit Kumar, Dipanjan Das, Sachin Agarwal
and Alexander Rudnicky
Non-textual Event Summarization by Applying Machine Learning to
Template-based Language Generation
|
 |
|