<University of Brighton logotype>



Information Extraction

Information extraction, also known as `message understanding', is a sub-area of language engineering where the aim is to develop practical systems which can take short natural language texts and extract a limited range of key pieces of information from them. For example, the texts might be news articles about terrorist attacks and the key information might be the perpetrators, their affiliation, the location, the victims, etc. Or the texts might be pharmaceutical research abstracts and the key information might be new products, their manufacturers, patent information, etc. In a sense, this field attempts to apply the technology of language engineering to the problems addressed from a more purely computer science perspective by the information retrieval community. In recent years this work has been focused through the US-government sponsored MUC (Message Understanding Conference) initiative.

In one specific research project, we are investigating the problem of the generation of compound nominals from text abstracts, to make it easier to access texts on specific topics from large databases of abstracts. Compound nominals, such as `electronic games industry growth' facilitate the linguistic expression of complex concepts in a highly compact way, and are useful for producing elaborate `key terms' which succinctly describe the main content of a piece of text. This work focuses on generating such `aboutness' terms from abstracts, and offers the user a mechanism for making highly specific queries to the database. The work adopts a different approach to compound nominals from that traditionally taken, in that it does not require overt specification of the relationships that hold between the individual words which constitute the complex expression. Rather, the approach exploits the semantic information contained within an electronic dictionary in linking the constituents of the compound terms.

For further information, please contact Roger Evans (+44 1273 642902) - see our contact page for full contact details.


Maintained by Roger Evans (Roger.Evans@itri.brighton.ac.uk).
Last updated 15 October 1996

©Information Technology Research Institute

ITRI home page | Generation | Writing support tools | Lexicons