A Manually Sense-tagged Gold Standard Corpus for
Evaluating Word Sense Disambiguation Programs
EPSRC grant M03481: May 1998--Jan 1999
PI: Roger Evans; Research Fellow: Adam Kilgarriff
This project took place in the context of the SENSEVAL exercise for
evaluating Word Sense Disambiguation (WSD) programs.*
The role of the EPSRC project was to develop the manually-tagged
corpus of correct answers, or `Gold Standard', against which the WSD
systems could be evaluated. Related and supplementary goals were:
- to achieve a high level of agreement between taggers, so that we
could be confident that, in at least 90% of cases, the ``gold
standard'' tag was unequivocally the correct tag;
explore, quantitatively and qualitatively, the cases which cause
difficulties for human taggers.
The manually tagged corpus played a central role in the highly
successful SENSEVAL exercise. The whole exercise will be described in a
Special Issue of Computers and the Humanities,
co-edited by Kilgarriff. SENSEVAL has been a focal event
in the field of word sense disambiguation and will serve as
a benchmark for all future work.
The level at which the tagging of a corpus was replicable, using
the methods adopted in the project, was established as 95%. This is
considerably higher than various authors have thought achievable, and
provides a firm basis for future WSD research and evaluation. There
has been some detailed exploration of the kinds of occasions on which
lexicographers did not arrive at the same tagging, and this has been
used to re-evalaute Generative Lexicon theory.
- Kilgarriff and Palmer, editors. Computers and the Humanities,
Special Issue SENSEVAL: Evaluating word sense disambiguation
- Kilgarriff. 95% replicability for manual word sense tagging. To
appear in Proc. European ACL, Bergen, June 1999.
- Kilgarriff. Generative lexicon meets corpus data: the case of
non-standard word uses. In Bouillon and Busa, eds, Word meaning
and creativity. Cambridge University Press, forthcoming.
*This took place
under the auspices of the Association of Computational Linguistics
(Lexicons Special Interest Group), the European Association for
Lexicography, and European Union Projects ECRAN, ELSNET and SPARKLE.