The Berkeley FrameNet Project

Charles Fillmore

FrameNet is a research project dedicated to building a particular kind of lexical resource. Funded by the National Science Foundation, and administered at the International Computer Science Institute in Berkeley, it is a three-year project, now past the mid-point of its third year, that keeps between six and ten half-time people busy. It uses the British National Corpus (BNC) as its evidential basis, frame semantics as its descriptive framework. The database, much of which will be made public within a few months, is to contain information about the semantic and syntactic combinatorial properties - briefly, the valence - of about two thousand lexical units covering a dozen semantic domains. The procedure involves labor-intensive manual annotation of example sentences (the Alembic Workbench - MITRE Corp), but automatic means of gathering the results of the annotation into lexical entries which will, for each sense, identify the ways in which the semantic roles underlying the word are syntactically realized in phrases headed by the word, as well as the valence patterns. Each lexical unit (a word in a given sense) is linked to one or more semantic frames, and the combinatorial statements are expressed in terms of the components (semantic roles, "frame elements") of such frames.

In this talk I want to explain how we originally defined the project, and how we quickly discovered the need to include certain unanticipated kinds of information.