| PREVIOUS | NEXT | |||||||||||||||||||
| R E S E A R C H P R O J E C T S
| ||||||||||||||||||||
| | ||||||||||||||||||||
|
| |||||||||||||||||||
As principal investigator or manager | ||||||||||||||||||||
|
DATR 1987-present |
DATR: a language for lexical representation.
Since 1987, I have been involved with research in collaboration with Gerald Gazdar on the problems of knowledge representation in a natural language lexicon. This research has resulted in the language, DATR, which is a formal non-monotonic logic for the concise specification of lexical information. DATR has attracted considerable attention in the Computational Linguistics community and has been adopted by several major groups across Europe. My own early research focused on the formal foundations of the language and explorations of its scope for linguistically interesting descriptions. More recently I have been using it as a vehicle for the study of large scale lexical phenomena, such as lexical rules and multilingual representation, invesitgating ways to embed DATR inheritance semantics into XML documents, and trying to embed probabilistic-style information into the DATR language. This work is not funded directly by external grants,
but DATR has been used, and to some extent developed, in the following
research projects:
GREG,
Text Mining Demonstrator,
CLIME,
MUC5,
POETIC,
SPL.
| |||||||||||||||||||
|
OpenPoplog 2003-present |
Development of the OpenPoplog system.
In addition to my main research interest in Computational Linguistics, I
have a secondary interest in software tools and programming environments.
Throughout my time at Sussex, I have had a keen interest in the
Poplog programming system, an advanced multi-language environment
developed within Cognitive and Computing Sciences at Sussex. Since 2003, I
have been participating with other Poplog users have been collaborating on an
unfunded basis towards the development of OpenPoplog a freeware
release of the Poplog system to support on-going community-based use and
development.
| |||||||||||||||||||
|
COGENT 2003-2006 |
Controlled Generation of Text
Natural Language Generation (NLG) technology has reached a level of
maturity where applied systems exist in a range of specialised
real-world domains (such as weather bulletins, software documentation,
health and legal advice and stock market movements). However,
developing such systems currently involves hand-crafting and
special-purpose tuning by NLG experts which is non-portable,
non-scaleable, time-consuming and expensive. Wider deployment of
language generation requires more generally applicable and reusable
NLG components based on wide-coverage grammars, but at present,
effective techniques for such wide-coverage generation are not well
understood. This three year project is investigating systematically
the characteristics of wide-coverage generation and developing
reflective techniques for controlling it effectively. As well as
furthering our understanding of wide-coverage generation, the project
will deliver a substantial and novel resource to support future
research in this area, and practical implementations of wide-coverage
controllable generators.
| |||||||||||||||||||
|
EUROMAP 2001-2003 |
EUROMAP/HOPE 2001
EUROMAP Language Technologies was a European Commission supported initiative dedicated to promoting greater awareness and faster take-up of Human Language Technologies (HMLT) within Europe. HOPE 2001 was second phase of EU funding to support the addition of more countries into the EUROMAP consortium. ITRI was the UK partner funded by this second phase of funding. Our role was to establish a national presence (primarily via a website) for EUROMAP, a central reference point for HLT providers, users and funders to keep up to date and exchange news. My role in the project was overall management, strategic planning and final approval of published information etc. At the end of the EUROMAP project, the UK-specific resources were used as
the basis of the CLUK Industrial
liaison website for the UK special-interest group for Computational
Linguistics (CLUK).
| |||||||||||||||||||
|
MATS 2000-2002 |
Manual Tagging for SENSEVAL
This project provided lexicography support for the second SENSEVAL
workshop - an international comparison and evaluation of word sense
disambiguation systems which took place in Summer 2001. SENSEVAL-2 extends the
first SENSEVAL to a wider range of languages, and more
rigorous task and evaluation data definition. My role in the project was
largely management support: Adam Kilgarriff was the primary author of the
proposal and de facto project manager. | |||||||||||||||||||
|
WASPS 1999-2002 |
A semi-automatic lexicographer's workbench for
writing word sense profiles.
This project brought together recent developments in data-driven algorithms for word sense disambiguation with corpus-based tools devloped to support lexicography to develop the WASPBENCH, an integrated environment for lexicography and word sense definition. The primary outputs of the WASPBENCH are both human-readable characterisations of the word senses and the data required for accurate word sense disambiguation. The ideas were tested through the production of lexical entries for a substantial sample of the English lexicon, and also the development of multilingual resources for use in machine translation. This project was managed jointly by myself and Adam Kilgarriff - as well
as overall management and supervision of the project, I had specific
responsibility for the evaluation workpackage of the project.
| |||||||||||||||||||
|
GREG 1999-2001 |
A Georgian, English, Russian and German multilingual valency
lexicon for natural language processing.
This project developed a multilingual lexicon, suitable for use with Language Engineering applications. It contains syntactic and semantic valency specifications for 1000 Georgian verbs and their Russian, English and German counterparts, with semantic valency described with reference to thematic roles as introduced by Fillmore and Halliday, and syntactic valency in terms of subcategorisation frames. Brighton's role in the project was primarly to provide advice and technology transfer in lexical representation. As part of this, I developed a substantial new multilingual lexicon representation framework for use in the project, written entirely in DATR, and designed to give a high degree of flexibility in representation, but also to be useable by relatively novice lexicon developers. This project was a collaboration with the University of Stuttgart
(coordinator), Tblisi State Univseristy and the Georgian Academy of
Sciences.
| |||||||||||||||||||
|
CLIME 1998-2001 |
Computerised legal information management and explanation.
This project developed software to support access to legal and regulatory information, specifically in the area of maritime law. The principal deliverable of the project was a multilingual web-based application that advised in the applicability of maritime regulations to a particular ship scenario (for example, whether it is allowable to run the only fire pump from the ship's main engine, or whether a ship can carry oil in its ballast tanks). ITRI's main role in the project was the development of the natural language interfaces: the query input interface using WYSWIYM technology, and the generation of natural language answers from the system's internal answer format (in English and French). query. My own role was to provide day-to-day management and research leadership of the local team, liaison and strategic development within the whole consortium, architectural design and implementation and application delivery. At the technical level, one particular achievement was the development of a tightly coupled prolog/java interface by linking the java virtual machine into the (poplog) prolog system. The consortium was led by British Maritime Technology Ltd., and
involved the University of Brighton, the University of Amsterdam, Bureau
Veritas (France) and TXT Ingeneria Informatica (Italy).
| |||||||||||||||||||
|
CONCEDE 1998-2000 |
Consortium for Central European Dictionary Encoding
This project developed medium-sized (1000-5000 word) electronic dictionaries for six Central European languages (Bulgarian, Czech, Estonian, Hungarian, Romanian and Slovene), drawing on recently developed standards for dictionary encoding and a sister project which is currently developing parallel aligned corpora for the six languages. ITRI coordinated the project and provided expertise on lexicography and dictionary encoding. My own role was overall project management, and participation in the development of an XML-based encoding scheme for the dictionaries. The other project partners were Bulgarian Academy of
Sciences, Sofia, Bulgaria; Charles University, Prague, Czech Republic;
University of Tartu, Tartu, Estonia; Hungarian Academy of Science,
Budapest, Hungary; Research Institute for Informatics, Bucharest,
Romania; Josef Stefan Institute, Ljublijana, Slovenia.
| |||||||||||||||||||
|
Text Mining Demonstrator 1998-1999 |
Development of a text mining demonstrator.
This project was carried out by the ITRI (myself and Lynne Cahill) and Integral Solutions Ltd on behalf of BG Technology. The aim was to produce a text mining demonstrator, by taking two pieces of existing software (the POETIC/MUC information extraction system and ISL's Clementine data mining system), and bolt them together, making it possible to mine information in textual as well as symbolic/numeric databases. The initial application domain was newswire stories about power generation: the information extraction system returned key information about participants, location, type, size and cost of new power stations, allowing the data mining system to discover long term trends in the market directly from a newswire feed. In a second phase in 1999, we successfully adapted the system to operate
in a new related application domain, liquefied natural gas. | |||||||||||||||||||
|
SENSEVAL 1998-1999 |
A Manually Sense-tagged Gold Standard Corpus.
This project provided lexicography support for the SENSEVAL word sense disambiguation system evaluation. In SENSEVAL, participants from all over the world tested their word sense disambiguation systems against manually sense-tagged data in a MUC-style evaluation. In September 1998 the results were compared any discussed at the SENSEVAL workshop. The funding for this project paid for the preparation of training and test data for the task. My role in the project was largely management support: Adam
Kilgarriff was the primary author of the proposal and is the de facto
project manager.
| |||||||||||||||||||
|
SEAL 1995-1998 |
Structural enhancement of automatically-acquired lexicons
This project addressed the problem of lexical tuning: how to develop a customised lexicon for a given application domain by using existing resources. The project adopted a corpus-based approach, drawing significantly on the British National Corpus for its input data. The main achievements of the project were:
| |||||||||||||||||||
|
Advanced Fellowship 1988-1994 |
SERC Advanced Fellowship.
| |||||||||||||||||||
|
SPL 1990-1992 |
An Integrated Approach to Structure and Processing in Natural
Languages.
In October 1988, I was awarded an SERC Advanced Fellowship to study the relationship between language structure and processing, and its relevance to the design of linguistic formalisms. In 1990 I also obtained a personal SERC research grant to further support this work. This research was pursued on two fronts. On the one hand, I explored the theoretical and representational issues through the language DATR, whilst on the other, I looked at the application of these ideas to processing through the work on message understanding (POETIC, MUC5) and to a limited extent natural language generation (DRAFTER, GIST). The results of this activity can be found in the development of the
DATR compiler, and a wide range of DATR fragments exploring different
representational ideas; in the grammar and lexicon of the POETIC system,
and its adaptation in the MUC5 system; and in subsequent work in SEAL,
on lexical access and multilingual representation.
| |||||||||||||||||||
|
MUC5 1993 |
Sussex University's participation in MUC5.
Robert Gaizauskas and I negotiated funding from US and UK sources for participation in the 5th Message Understanding Conference (MUC5), as the first European team to take part in the MUC series. This grant funded ten person-months of effort to attempt to port the POETIC natural language understanding component to a completely different topic domain (commercial joint ventures) and evaluate the resulting system against other state of the art message understanding systems. Based on this evaluation our system was placed in the third statistically-significant rank - only three systems (out of an international field of thirteen) were ranked higher. For further details see "MUCing about in Pop: Message Understanding in Five Languages" (Evans, Gaizauskas and Cahill 1993) or "Sussex University: Description of the Sussex System used for MUC5" (Gaizauskas, Cahill and Evans 1994). My roles in this project were liaison with funding consortium,
development of the phrasal lexicon analysis module and general technical
support for the other team members (R. Gaizauskas and L.J. Cahill).
| |||||||||||||||||||
|
POETIC 1990-1993 |
Portable Extendable Traffic Information Collator.
| |||||||||||||||||||
|
TIC 1989-1990 |
The Traffic Information Collator.
The overall aim of the "Traffic Information Collator" (TIC) project was to develop a system which analysed natural language text as found in police command and control logging systems, picked out messages about traffic incidents and automatically produced appropriately targeted traffic bulletins for other motorists. In April 1989, I worked on this project for a year, following the departure from Sussex of the principal investigator and senior research fellow on the project. I took over the management responsibility and supervision of the second research fellow (A.F. Hartley). POETIC was a follow-on project, which aimed to take the basic prototype developed under TIC and generalise it to be portable to a wide range of police sublanguages, traffic management policies and geographical area. The POETIC consortium was led by RACAL Research Ltd, and included the University of Sussex, the Automobile Association and National Transcommunications Ltd. As I was not myself funded by the POETIC project, I took the role of
project manager, supervising the two research staff on the project (L.J.
Cahill and R. Gaizauskas) and liaising with the industrial partners. As
well as providing general guidance for the research as a whole, I made
specific contributions in the redevelopment of the lexicon, the overall
architecture of the system, and the user interface. See "POETIC: A
System for Gathering and Disseminating Traffic Information" (Evans et
al, 1996) for further details.
| |||||||||||||||||||
As participant or collaborator | ||||||||||||||||||||
|
WYSIWYM 1997-present |
WYSIWYM: knowledge editing with natural language feedback.
WYSIWYM ('What you see is what you meant') is a technique for using natural language generation technology to support complex data-entry tasks, such as the development of a knowledge base or a complex formal query (such as SQL, or the legal query representations used in the CLIME system, described above). The core idea of WYSWIYM was introduced by Richard Power, and subsequently developed by Power, Donia Scott and myself in the context of various projects and other initiatives. The primary applications to date have been authoring of multilingual 'Patient Information Leaflets', explorations into stylistic variation, the CLIME legal enquiry system, and management of medical records. As well as contributing to the general development of this work, I have
specific involvement in the development of a WYSIWYM library in Java, suitable
for deployment in other applications, and management and negotiation of
licencing of the software.
| |||||||||||||||||||
|
HALO/DarkMatter 2004-2005 |
Project HALO - DarkMatter consortium
Project Halo is an effort by Vulcan Inc. towards the development of a Digital Aristotle a staged, long-term research and development initiative that aims to develop an application capable of answering novel questions and solving advanced problems in a broad range of scientific disciplines. The Digital Aristotle is being developed with a focus on two primary functions: as a tutor capable of instructing and assessing students in the sciences, and as a research assistant with broad, interdisciplinary skills to help scientists in their work. DarkMatter is one of two competing Halo subprojects, led by Ontoprise GmbH,
aiming to deliver phase II of the Halo project, developing technology that
will allow domain experts to formulate knowledge with decreasing dependence on
knowledge engineers, and for untrained users to pose questions and problems to
the knowledge systems. Our role in the project was to investigate the
deployment of WYSIWYM technology for knowledge creation and user query. My own
specific role was to manage the development and application of the
Java WYSIWYM library for use in DarkMatter, and to negotiate licencing
arrangements between Brighton and the other partners.
| |||||||||||||||||||
|
Semantic Mining 2004-2005 |
Semantic interoperability and data mining in Biomedicine
Semantic Interoperability and Data Mining in Biomedicine is a Network of
Excellence (NoE) funded by the European Commission under Framework 6.
The general objective of the network is to bridge gaps in European research
infrastructure and to facilitate cross-fertilisation between scientific
disciplines such as computer science, system engineering and medical/clinical
research. The long-term goal of the network is the development of generic
methods and tools supporting critical tasks in medical and biomedical
informatics, such as, data-mining, knowledge discovery, knowledge
representation, abstraction and indexing of information, semantic-based
information retrieval in a complex and high-dimensional information space, and
knowledge based adaptive systems for provision of decision support for
dissemination of evidence based medicine.
| |||||||||||||||||||
|
M3 2000-2003 |
Methods and Models of Morphology Seminar.
| |||||||||||||||||||
|
CID 1998-2000 |
Challenges for Inflectional Description Seminar.
| |||||||||||||||||||
|
FRiM 1996-1998 |
Frontiers of Research in Morphology Seminar.
These three ESRC grants funded successive two-year series of quarterly one-day
seminars on aspects of morphological description. The grants were jointly held
by Universities of Surrey, Sussex, Brighton, Essex, Cambridge and SOAS, and
funded travel and subsistence for participating groups and invited
international speakers to discuss morphological theory in general, and its
application to particular languages, notably endangered languages, such as
indigenous languages of Australia, Polynesia, North America and Africa.
| |||||||||||||||||||
|
PILLS 2001 |
Pharmaceutical Instructions Language Localisation System.
This project developed a prototype demonstrator of a multilingual
authoring tool for pharmaceutical information in a range of forms, suitable
for patients, nurses and doctors. I played a primarily technical role in the
project, adapting the web-delivered version of WYSIWYM that was developed in
the CLIME project for use as the demonstrator interface.
| |||||||||||||||||||
|
RAGS 1998-2001 |
RAGS: Reference Architecture for Generation Systems.
The aim of this project was to develop a standard 'reference architecture' for natural language generation systems. The lack of a standard view of the generation process as a whole is a signficant barrier to wider exploitation of the technology. This project aimed to develop such a view, building on the emerging consensus evident in practical generation systems. As well as defining the reference architecture, the project aimed to deliver resources (sample knoweldge sources and test data) to support the development of applications based on the architecture. I was a co-author of the proposal (but at the time ineligible to
be an official investigator), and was fully involved in the
management and execution of this project. Particular areas I have been
involved with include the formalisation of RAGS datatypes, the development
of the RAGS data model, and the development of the OASYS library, which
provides support for modular cooperatively-multitasking event-driven NLG
applications in prolog.
| |||||||||||||||||||
|
DRAFTER 1993-1997 |
DRAFTER: Drafting Assistant for Technical Writers.
| |||||||||||||||||||
|
GIST 1993-1996 |
Generating instructional texts.
These two research projects were both concerned with the multilingual generation of instructional texts in different application areas. DRAFTER was concerned with the generation of software manuals in English and French, while in GIST we were looking at the production of administrative forms (such as pension application forms) in English, German and Italian. For both projects, the basic approach was to replace conventional authoring followed by translation (manual or automatic), by a process of symbolic authoring - representing the content of the document symbolically, in a form from which texts in several languages can be generated automatically in parallel. My roles in these projects were in project management (including
representing the GIST consortium at EC project meetings in Luxembourg), as
well as technical involvement in the overall architecture design, and
associated work on multilingual lexical representation.
| |||||||||||||||||||
|
Poplog 1987-1988 |
Poplog development work.
Throughout my time at Sussex, I maintained a
keen interest in the Poplog programming system, an advanced
multi-language environment developed within Cognitive and Computing
Sciences at Sussex. From October 1987 to September 1988, I was employed
as a full-time research fellow as part of the Poplog development team.
My most significant contributions were the design and implementation of
the interface to X Windows which is now a central component of the
system, the addition of 'destroy actions' to the garbage collector and the
development of an interface to UNIX signal handling.
| |||||||||||||||||||
|
NLGP 1984-1987 |
Natural Language Generation from Plans.
In this project we developed a generation system which took a plan
representation for the performance of some task and used the plan
structure to guide the generation of multi-paragraph description of how
the task can be achieved. My primary area of responsibility in the
project was the transformation of the plan structure into discourse
structure using 'algebraic' techniques. See "Natural Language Generation
from Plans" (Mellish and Evans 1989) for further details.
| |||||||||||||||||||
|
ProGram 1982-1984 |
Computer Realisation of a Grammar.
The aim of this project was to develop some kind of computer representation of a moderately large grammar represented in the (then) quite novel "Generalised Phrase Structure Grammar" formalism. As a part time programmer on this project while studying for my D.Phil, I developed a very detailed understanding of GPSG at that time, which subsequently persuaded me to change my thesis topic (see "Education") The project resulted in the ProGram system, an early
instance of a "Grammar Development System" and probably the first
full, faithful implementation of the GPSG formalism. I was responsible
for much of the specification and design, and all the implementation of
the system (in Prolog). For further details see "ProGram - a
development tool for GPSG grammars" (Evans 1985).
| |||||||||||||||||||
| | ||||||||||||||||||||
| 16 January 2006 | INDEX | NEXT | ||||||||||||||||||