Information Retrieval(IR) encompasses a range of techniques aimed at finding document(s) which are relevant to some topic, and is traditionally used for library search activities. Conversely, Information Extraction(IE) focusses on finding relationships within document(s) that are relevant to some query, and is traditionally used for information filtering by completing a template.
These techniques have been applied to the problem of the day which is Internet searching for question answering. However, the scale and complexity of retrieving useful and reliable information from the Internet is such that existing techniques, as implemented in Search Engines and Retrieval Systems systems, are insufficient. The volume of material to search, its variability over time, and questions of authenticity, trustworthiness and granularity indicate that other methods are required. The SRI Highlight Information Extraction system, is described as an exemplar of this approach.
One of the problems in matching queries to language fragments (sentences, paragraphs, documents, etc) is the imprecision of the association between lexical items and their underlying concepts. Some approaches from Knowledge Representation (KR) have sought to handle this by the use of synonym lists and ontologies, although these require significant, language-dependant pre-analysis. A new method for matching words to concepts using techniques derived from Latent Semantic Analysis, is described. This is found to have unexpected benefits in other application areas, such as dictionary concordancing, and exam marking.