The Berkeley FrameNet Project
Charles Fillmore
FrameNet is a research project dedicated to building a particular kind of
lexical resource. Funded by the National Science Foundation, and
administered at the International Computer Science Institute in Berkeley,
it is a three-year project, now past the mid-point of its third year,
that keeps between six and ten half-time people busy. It uses the
British National Corpus (BNC) as its evidential basis, frame semantics as
its descriptive framework. The database, much of which will be made
public within a few months, is to contain information about the semantic
and syntactic combinatorial properties - briefly, the valence - of about
two thousand lexical units covering a dozen semantic domains. The
procedure involves labor-intensive manual annotation of example sentences
(the Alembic Workbench - MITRE Corp), but automatic means of gathering
the results of the annotation into lexical entries which will, for each
sense, identify the ways in which the semantic roles underlying the word
are syntactically realized in phrases headed by the word, as well as the
valence patterns. Each lexical unit (a word in a given sense) is linked
to one or more semantic frames, and the combinatorial statements are
expressed in terms of the components (semantic roles, "frame elements")
of such frames.
In this talk I want to explain how we originally defined the project, and
how we quickly discovered the need to include certain unanticipated kinds
of information.