Related pages background | participants | Planning/Programme Committee | Workshop Submissions
There are now many automatic Word Sense Disambiguation (WSD) programs but it is currently very hard to determine which are better, which worse, and where the strengths and weaknesses of each lie. There is widespread agreement that the field urgently needs an evaluation framework. Under the auspices of ACL SIGLEX and EURALEX a pilot will take place in the course of 1998. As in ARPA evaluation exercises, the framework comprises:
We shall be undertaking evaluation for at least English, French, Italian and Spanish. For information on the French and Italian exercises (ROMANSEVAL), click here. For Spanish, mail Evelyne Viegas.
The workshop will be held at Herstmonceux Castle, Sussex, UK., Sept 2-4 1998
If you have a working WSD program (or will have one by Summer 1998), and would like to subject it to objective, quantitative evaluation, or if you have skills or resources that you would like to contribute to the exercise, first look here and then mail your expression of interest to the co-ordinator.
We intend (funding permitting) to run three distinct exercises: one for those who need sense-tagged training data, and two variants for those who do not. In the first variant of the no-training-data task, all the content words in a set of sentences are tagged (the "all-types" task, using WordNet senses, like SEMCOR). In the second variant, tagging is only performed on a few selected words ("lexical-sample" task). In all, three tasks:
Systems that can perform all-types can perform lexical-sample and ones that can perform lexical-sample can perform with-training (assuming appropriate lexicons are available). Inevitably, some algorithms do not neatly fit the categories, with, eg, some algorithms requiring human input for lexicon development, possibly corpus-aided, and others only requiring minimal quantities of training data. All I can say about this is (1) it's as close as we can get to a level playing field, and (2) any comparison of scores must bear it in mind!
There will be no distribution of untagged corpus material of the same genre as that to be used for evaluation. But the evaluation material will be taken from a similar spread of genres to the BNC. Limited downloads of the BNC can be made without a BNC licence here. The BNC is a general-purpose, mixed genre corpus, so various other corpus resources (preferably for British English) would be suitable.
Detailed timetable for each task to follow.