Robert Gaizauskas
Dept of Computer Science, University of Sheffield
Two Applications of Information Extraction to Biological
Science Journal Articles: Enzyme Interactions and Protein Structures
Information extraction technology, as defined
and developed through the U.S. DARPA Message Understanding
Conferences (MUCs), has proved successful at extracting information
primarily from newswire texts and primarily in domains concerned
with human activity. In this talk I discuss the application of
this technology to the extraction of information from scientific
journal papers in the area of molecular biology. In particular, I
describe how an information extraction system designed to
participate in the MUC exercises has been modified for two
bioinformatics projects: EMPathIE, concerned with enzyme and
metabolic pathways; and PASTA, concerned with protein structure.
Progress to date provides convincing grounds for believing that IE
techniques will deliver novel and effective ways for scientists to
make use of the core literature which defines their disciplines.