| ||
SEAL: Structural Enhancement of Automatically-acquired Lexicons | ||
| Overview |
This project is concerned with the development of practical, large-scale
lexicons for use in computer applications that use natural language.
While a number of large-scale lexical fragments already exist
(hand-crafted or derived from dictionaries or corpora), few of them are
practically useful. Reasons for this include insufficient content
density (eg syntax but no semantics), inadequate internal structuring,
or inappropriate level of detail for the application in hand (too much
irrelevant detail can greatly increase the processing load).
Rather than attempting to construct better lexicons from scratch, this project is developing tools which use these existing lexicons as base data, and support the development of new lexicons with greater content density, enhanced structure and application-specific level of detail. This is achieved by merging lexicons together, and by inducing additional structure, guided by insights from lexical representation theory. The work is being evaluated through two pilot applications in areas related to other research within the Institute, namely text generation and information extraction.
| |
| Background |
The recent commercially-motivated growth in interest in applied
computational linguistics (or Language Engineering) has
highlighted the need for realistic large-scale linguistic resources,
notably grammars and lexicons. Significant research activity has thus
been directed towards developing such resources, and a number of
large-scale lexical fragments are beginning to emerge (such as ACQUILEX,
CELEX, WordNet, XTAG). However these lexicons remain only of fairly
limited utility in practical applications. Reasons for this include:
| |
| The project |
The present project is concerned with taking existing lexicons as base
data, and producing new lexicons with greater content density, enhanced
structure and optimal feature detail (relative to a given domain or
task), by merging lexicons and by inducing additional structure. Our
approach is motivated by the following assumptions:
| |
| Staff |
The principal investigator is Roger Evans. Adam Kilgarriff
is also working on the project.
| |
| Financial Support |
The project is supported by the EPSRC under grant GR/K/18931.
| |
| Publications |
R.Evans and A. Kilgarriff, ``MRDs, Dictionaries, and How
To Do Lexical Engineering'' in Proceedings of the 2nd Language
Engineering Convention, pp. 125-132, London, UK, 1995. A. Kilgarriff, ``Which words are particularly characteristic of a text? A survey of statistical approaches'' in Proceedings, AISB Workshop on Language Engineering for Document Analysis and Recognition. Brighton, UK, 1996. Adam Kilgarriff and Raphael Salkie ``Corpus similarity and homogeneity via word frequency.'' Proceedings of Euralex '96 Gothenberg, Sweden. 1996. Adam Kilgarriff ``Putting frequencies in the dictionary.'' International Journal of Lexicography. Forthcoming. Maintained by Roger Evans (Roger.Evans@itri.brighton.ac.uk). Last updated 12 January 1997 ©Information Technology Research Institute ITRI home page | ITRI research overview |
|