eval

The main task, comprising all test items.

number of items in task:8448


all sensesmain senses only
average polysemy:10.3727.207


fine-grainedcoarse-grained
average entropy: 1.9161.512


All systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
human0.965 (0.963)0.968 (0.967)0.970 (0.968)
best system0.781 (0.781)0.804 (0.804)0.818 (0.818)
average of systems0.639 (0.518)0.696 (0.555)0.717 (0.571)
worst system0.418 (0.127)0.511 (0.511)0.538 (0.538)
best baseline0.691 (0.689)0.720 (0.719)0.741 (0.739)


A systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
best system0.616 (0.605)0.660 (0.648)0.683 (0.671)
average of systems0.499 (0.422)0.590 (0.480)0.621 (0.502)
worst system0.418 (0.127)0.511 (0.511)0.538 (0.538)
best baseline0.550 (0.548)0.584 (0.582)0.600 (0.597)


S systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
best system0.781 (0.781)0.804 (0.804)0.818 (0.818)
average of systems0.721 (0.604)0.757 (0.629)0.773 (0.641)
worst system0.653 (0.209)0.733 (0.234)0.751 (0.657)
best baseline0.691 (0.689)0.720 (0.719)0.741 (0.739)


O systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
best system0.711 (0.394)0.746 (0.413)0.767 (0.424)
average of systems0.711 (0.394)0.746 (0.413)0.767 (0.424)
worst system0.711 (0.394)0.746 (0.413)0.767 (0.424)


Detailed results



trainable

The subset of test items in files of words for which corpus training data was supplied.

number of items in task:7446


all sensesmain senses only
average polysemy:10.7887.432


fine-grainedcoarse-grained
average entropy: 1.9621.551


All systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
human0.963 (0.962)0.967 (0.966)0.969 (0.967)
best system0.771 (0.771)0.797 (0.796)0.812 (0.812)
average of systems0.644 (0.546)0.694 (0.581)0.719 (0.599)
worst system0.411 (0.113)0.502 (0.502)0.533 (0.533)
best baseline0.709 (0.708)0.735 (0.734)0.759 (0.758)


A systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
best system0.631 (0.621)0.665 (0.654)0.691 (0.680)
average of systems0.497 (0.418)0.581 (0.471)0.617 (0.496)
worst system0.411 (0.113)0.502 (0.502)0.533 (0.533)
best baseline0.549 (0.547)0.582 (0.579)0.599 (0.596)


S systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
best system0.771 (0.771)0.797 (0.796)0.812 (0.812)
average of systems0.731 (0.648)0.761 (0.673)0.778 (0.687)
worst system0.701 (0.701)0.734 (0.729)0.751 (0.745)
best baseline0.709 (0.708)0.735 (0.734)0.759 (0.758)


O systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
best system0.711 (0.447)0.746 (0.469)0.767 (0.482)
average of systems0.711 (0.447)0.746 (0.469)0.767 (0.482)
worst system0.711 (0.447)0.746 (0.469)0.767 (0.482)


Detailed results



untrainable

The complement of
trainable.

number of items in task:1002


all sensesmain senses only
average polysemy:7.2765.533


fine-grainedcoarse-grained
average entropy: 1.5681.220


All systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
human0.973 (0.973)0.974 (0.974)0.975 (0.975)
best system0.853 (0.853)0.860 (0.860)0.861 (0.861)
average of systems0.555 (0.491)0.674 (0.573)0.675 (0.574)
worst system0.440 (0.440)0.574 (0.574)0.576 (0.576)
best baseline0.626 (0.626)0.709 (0.709)0.711 (0.711)


A systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
best system0.636 (0.636)0.700 (0.358)0.700 (0.358)
average of systems0.505 (0.447)0.641 (0.550)0.643 (0.551)
worst system0.440 (0.440)0.574 (0.574)0.576 (0.576)
best baseline0.626 (0.626)0.709 (0.709)0.711 (0.711)


S systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
best system0.853 (0.853)0.860 (0.860)0.861 (0.861)
average of systems0.622 (0.550)0.718 (0.604)0.719 (0.605)
worst system0.444 (0.228)0.594 (0.594)0.596 (0.596)
best baseline0.556 (0.553)0.604 (0.602)0.606 (0.603)


Detailed results



multi-word

The subset of test items tagged with sense tags for word forms that are not derivable from the root form of a test word by a regular morphological process.

number of items in task:800


all sensesmain senses only
average polysemy:16.86212.594


fine-grainedcoarse-grained
average entropy: 2.4992.013


All systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
human0.984 (0.982)0.986 (0.984)0.986 (0.984)
best system0.907 (0.906)0.916 (0.914)0.927 (0.926)
average of systems0.677 (0.573)0.710 (0.599)0.744 (0.629)
worst system0.378 (0.378)0.420 (0.420)0.505 (0.235)
best baseline0.815 (0.658)0.832 (0.672)0.861 (0.696)


A systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
best system0.703 (0.702)0.757 (0.756)0.798 (0.798)
average of systems0.545 (0.481)0.589 (0.519)0.652 (0.578)
worst system0.378 (0.378)0.420 (0.420)0.505 (0.235)
best baseline0.801 (0.800)0.823 (0.822)0.843 (0.842)


S systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
best system0.907 (0.906)0.916 (0.914)0.927 (0.926)
average of systems0.747 (0.665)0.772 (0.684)0.788 (0.698)
worst system0.485 (0.228)0.527 (0.247)0.541 (0.254)
best baseline0.815 (0.658)0.832 (0.672)0.861 (0.696)


O systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
best system0.778 (0.385)0.826 (0.409)0.846 (0.419)
average of systems0.778 (0.385)0.826 (0.409)0.846 (0.419)
worst system0.778 (0.385)0.826 (0.409)0.846 (0.419)


Detailed results



simple-word

The complement of
multi-word.

number of items in task:7648


all sensesmain senses only
average polysemy:9.6936.644


fine-grainedcoarse-grained
average entropy: 1.8551.460


All systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
human0.963 (0.961)0.966 (0.965)0.968 (0.966)
best system0.768 (0.768)0.792 (0.792)0.806 (0.806)
average of systems0.636 (0.513)0.696 (0.551)0.717 (0.565)
worst system0.414 (0.119)0.520 (0.520)0.540 (0.540)
best baseline0.678 (0.677)0.709 (0.708)0.730 (0.729)


A systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
best system0.611 (0.601)0.656 (0.644)0.679 (0.195)
average of systems0.493 (0.415)0.592 (0.476)0.620 (0.495)
worst system0.414 (0.119)0.520 (0.520)0.540 (0.540)
best baseline0.524 (0.522)0.564 (0.564)0.605 (0.605)


S systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
best system0.768 (0.768)0.792 (0.792)0.806 (0.806)
average of systems0.720 (0.597)0.758 (0.623)0.774 (0.635)
worst system0.680 (0.207)0.727 (0.639)0.739 (0.650)
best baseline0.678 (0.677)0.709 (0.708)0.730 (0.729)


O systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
best system0.705 (0.395)0.739 (0.413)0.759 (0.425)
average of systems0.705 (0.395)0.739 (0.413)0.759 (0.425)
worst system0.705 (0.395)0.739 (0.413)0.759 (0.425)


Detailed results



unassignable

The subset of test items tagged with an UNASSIGNABLE sense tag in the key.

number of items in task:35


all sensesmain senses only
average polysemy:11.7718.086


fine-grainedcoarse-grained
average entropy: 2.4151.946


All systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
human0.938 (0.857)0.938 (0.857)0.938 (0.857)
best system0.500 (0.086)0.667 (0.114)0.667 (0.114)
average of systems0.335 (0.239)0.398 (0.275)0.407 (0.284)
worst system0.229 (0.229)0.243 (0.243)0.257 (0.257)
best baseline0.343 (0.343)0.371 (0.371)0.371 (0.371)


A systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
best system0.333 (0.057)0.500 (0.086)0.500 (0.086)
average of systems0.280 (0.208)0.365 (0.257)0.373 (0.265)
worst system0.229 (0.229)0.243 (0.243)0.257 (0.257)
best baseline0.343 (0.343)0.371 (0.371)0.371 (0.371)


S systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
best system0.500 (0.086)0.667 (0.114)0.667 (0.114)
average of systems0.374 (0.272)0.434 (0.305)0.444 (0.314)
worst system0.286 (0.229)0.329 (0.329)0.343 (0.343)
best baseline0.200 (0.200)0.250 (0.186)0.269 (0.200)


O systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
best system0.316 (0.171)0.316 (0.171)0.316 (0.171)
average of systems0.316 (0.171)0.316 (0.171)0.316 (0.171)
worst system0.316 (0.171)0.316 (0.171)0.316 (0.171)


Detailed results



proper

The subset of test items tagged with a PROPER NOUN sense tag in the key.

number of items in task:286


all sensesmain senses only
average polysemy:12.0148.899


fine-grainedcoarse-grained
average entropy: 2.0351.671


All systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
human0.991 (0.981)0.993 (0.983)0.993 (0.983)
best system0.937 (0.937)0.937 (0.937)0.937 (0.937)
average of systems0.591 (0.351)0.660 (0.395)0.665 (0.397)
worst system0.223 (0.223)0.282 (0.282)0.286 (0.286)
best baseline0.756 (0.325)0.756 (0.325)0.756 (0.325)


A systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
best system0.394 (0.113)0.691 (0.664)0.695 (0.668)
average of systems0.347 (0.273)0.488 (0.378)0.494 (0.383)
worst system0.223 (0.223)0.282 (0.282)0.286 (0.286)
best baseline0.556 (0.556)0.633 (0.633)0.643 (0.643)


S systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
best system0.937 (0.937)0.937 (0.937)0.937 (0.937)
average of systems0.751 (0.447)0.785 (0.458)0.785 (0.458)
worst system0.552 (0.552)0.552 (0.552)0.552 (0.552)
best baseline0.756 (0.325)0.756 (0.325)0.756 (0.325)


O systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
best system0.605 (0.091)0.605 (0.091)0.628 (0.094)
average of systems0.605 (0.091)0.605 (0.091)0.628 (0.094)
worst system0.605 (0.091)0.605 (0.091)0.628 (0.094)


Detailed results



nouns

All test items in files with -n suffix.

number of items in task:2756


all sensesmain senses only
average polysemy:9.1675.381


fine-grainedcoarse-grained
average entropy: 1.7401.167


All systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
human0.973 (0.973)0.975 (0.975)0.977 (0.977)
best system0.865 (0.865)0.891 (0.891)0.919 (0.919)
average of systems0.699 (0.635)0.766 (0.697)0.801 (0.730)
worst system0.418 (0.388)0.562 (0.562)0.629 (0.629)
best baseline0.738 (0.569)0.815 (0.629)0.879 (0.679)


A systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
best system0.711 (0.696)0.753 (0.753)0.803 (0.803)
average of systems0.562 (0.550)0.668 (0.653)0.718 (0.702)
worst system0.418 (0.388)0.562 (0.562)0.629 (0.629)
best baseline0.628 (0.625)0.675 (0.672)0.731 (0.731)


S systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
best system0.865 (0.865)0.891 (0.891)0.919 (0.919)
average of systems0.777 (0.694)0.822 (0.735)0.848 (0.758)
worst system0.653 (0.639)0.733 (0.718)0.755 (0.739)
best baseline0.738 (0.569)0.815 (0.629)0.879 (0.679)


O systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
best system0.779 (0.617)0.820 (0.649)0.850 (0.673)
average of systems0.779 (0.617)0.820 (0.649)0.850 (0.673)
worst system0.779 (0.617)0.820 (0.649)0.850 (0.673)


Detailed results



all-nouns

All test items in
nouns, plus test items in files with -p suffix that were tagged with noun sense tags in the key.

number of items in task:3792


all sensesmain senses only
average polysemy:10.9497.445


fine-grainedcoarse-grained
average entropy: 1.8321.363


All systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
human0.976 (0.976)0.978 (0.978)0.980 (0.980)
best system0.845 (0.845)0.865 (0.865)0.885 (0.885)
average of systems0.686 (0.575)0.746 (0.624)0.774 (0.647)
worst system0.418 (0.282)0.518 (0.518)0.567 (0.567)
best baseline0.746 (0.558)0.804 (0.602)0.852 (0.638)


A systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
best system0.678 (0.666)0.732 (0.718)0.756 (0.742)
average of systems0.544 (0.507)0.642 (0.589)0.682 (0.625)
worst system0.418 (0.282)0.518 (0.518)0.567 (0.567)
best baseline0.564 (0.561)0.604 (0.601)0.642 (0.642)


S systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
best system0.845 (0.845)0.865 (0.865)0.885 (0.885)
average of systems0.765 (0.642)0.803 (0.672)0.824 (0.689)
worst system0.653 (0.465)0.733 (0.522)0.755 (0.537)
best baseline0.746 (0.558)0.804 (0.602)0.852 (0.638)


O systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
best system0.779 (0.448)0.820 (0.472)0.850 (0.489)
average of systems0.779 (0.448)0.820 (0.472)0.850 (0.489)
worst system0.779 (0.448)0.820 (0.472)0.850 (0.489)


Detailed results



verbs

All test items in files with -v suffix.

number of items in task:2501


all sensesmain senses only
average polysemy:7.7914.994


fine-grainedcoarse-grained
average entropy: 1.8591.496


All systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
human0.950 (0.947)0.955 (0.952)0.957 (0.954)
best system0.709 (0.709)0.742 (0.741)0.755 (0.755)
average of systems0.611 (0.610)0.653 (0.652)0.668 (0.666)
worst system0.422 (0.421)0.474 (0.473)0.485 (0.485)
best baseline0.701 (0.700)0.727 (0.725)0.746 (0.744)


A systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
best system0.535 (0.527)0.578 (0.569)0.596 (0.587)
average of systems0.471 (0.468)0.535 (0.532)0.547 (0.544)
worst system0.422 (0.421)0.474 (0.473)0.485 (0.485)
best baseline0.547 (0.545)0.582 (0.579)0.592 (0.589)


S systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
best system0.709 (0.709)0.742 (0.741)0.755 (0.755)
average of systems0.687 (0.686)0.719 (0.718)0.735 (0.734)
worst system0.642 (0.642)0.683 (0.682)0.695 (0.695)
best baseline0.701 (0.700)0.727 (0.725)0.746 (0.744)


O systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
best system0.651 (0.650)0.681 (0.680)0.694 (0.692)
average of systems0.651 (0.650)0.681 (0.680)0.694 (0.692)
worst system0.651 (0.650)0.681 (0.680)0.694 (0.692)


Detailed results



all-verbs

All test items in
verbs, plus test items in files with -p suffix that were tagged with verb sense tags in the key.

number of items in task:2907


all sensesmain senses only
average polysemy:10.8217.733


fine-grainedcoarse-grained
average entropy: 2.0561.723


All systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
human0.954 (0.951)0.958 (0.956)0.960 (0.957)
best system0.720 (0.720)0.748 (0.748)0.761 (0.761)
average of systems0.617 (0.605)0.656 (0.644)0.670 (0.657)
worst system0.428 (0.428)0.475 (0.474)0.486 (0.485)
best baseline0.676 (0.675)0.699 (0.697)0.717 (0.715)


A systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
best system0.555 (0.546)0.596 (0.585)0.613 (0.602)
average of systems0.479 (0.476)0.539 (0.535)0.550 (0.546)
worst system0.428 (0.428)0.475 (0.474)0.486 (0.485)
best baseline0.541 (0.539)0.574 (0.572)0.583 (0.581)


S systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
best system0.720 (0.720)0.748 (0.748)0.761 (0.761)
average of systems0.693 (0.692)0.722 (0.721)0.737 (0.736)
worst system0.646 (0.645)0.682 (0.681)0.692 (0.692)
best baseline0.676 (0.675)0.699 (0.697)0.717 (0.715)


O systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
best system0.651 (0.559)0.681 (0.585)0.694 (0.595)
average of systems0.651 (0.559)0.681 (0.585)0.694 (0.595)
worst system0.651 (0.559)0.681 (0.585)0.694 (0.595)


Detailed results



adjectives

All test items in files with -a suffix.

number of items in task:1406


all sensesmain senses only
average polysemy:6.7604.576


fine-grainedcoarse-grained
average entropy: 1.6581.236


All systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
human0.966 (0.965)0.972 (0.972)0.973 (0.973)
best system0.777 (0.777)0.793 (0.793)0.795 (0.795)
average of systems0.644 (0.615)0.682 (0.651)0.694 (0.663)
worst system0.377 (0.377)0.476 (0.476)0.498 (0.498)
best baseline0.718 (0.717)0.737 (0.735)0.740 (0.738)


A systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
best system0.617 (0.605)0.652 (0.640)0.691 (0.679)
average of systems0.504 (0.489)0.560 (0.544)0.586 (0.569)
worst system0.377 (0.377)0.476 (0.476)0.498 (0.498)
best baseline0.681 (0.681)0.694 (0.694)0.709 (0.709)


S systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
best system0.777 (0.777)0.793 (0.793)0.795 (0.795)
average of systems0.728 (0.691)0.755 (0.716)0.759 (0.720)
worst system0.674 (0.616)0.722 (0.658)0.725 (0.661)
best baseline0.718 (0.717)0.737 (0.735)0.740 (0.738)


Detailed results



all-adjectives

All test items in
adjectives, plus test items in files with -p suffix that were tagged with adjective sense tags in the key.

number of items in task:1750


all sensesmain senses only
average polysemy:8.4305.868


fine-grainedcoarse-grained
average entropy: 1.8671.490


All systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
human0.957 (0.956)0.962 (0.962)0.963 (0.963)
best system0.751 (0.751)0.764 (0.764)0.766 (0.766)
average of systems0.619 (0.596)0.650 (0.626)0.660 (0.636)
worst system0.354 (0.354)0.436 (0.436)0.454 (0.454)
best baseline0.688 (0.686)0.703 (0.701)0.705 (0.704)


A systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
best system0.583 (0.571)0.611 (0.598)0.643 (0.630)
average of systems0.470 (0.457)0.515 (0.502)0.537 (0.523)
worst system0.354 (0.354)0.436 (0.436)0.454 (0.454)
best baseline0.615 (0.615)0.627 (0.627)0.629 (0.629)


S systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
best system0.751 (0.751)0.764 (0.764)0.766 (0.766)
average of systems0.709 (0.680)0.731 (0.700)0.734 (0.703)
worst system0.669 (0.621)0.686 (0.637)0.688 (0.639)
best baseline0.688 (0.686)0.703 (0.701)0.705 (0.704)


Detailed results



indeterminates

All test items in files with -p suffix; the part of speech of the word to be disambiguated has not been predetermined.

number of items in task:1785


all sensesmain senses only
average polysemy:18.69215.199


fine-grainedcoarse-grained
average entropy: 2.4682.284


All systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
human0.970 (0.969)0.972 (0.970)0.973 (0.971)
best system0.770 (0.630)0.779 (0.638)0.782 (0.640)
average of systems0.630 (0.578)0.644 (0.592)0.646 (0.594)
worst system0.382 (0.382)0.397 (0.397)0.399 (0.399)
best baseline0.656 (0.531)0.658 (0.533)0.661 (0.535)


A systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
best system0.584 (0.573)0.640 (0.628)0.642 (0.630)
average of systems0.489 (0.486)0.517 (0.513)0.520 (0.516)
worst system0.382 (0.382)0.397 (0.397)0.399 (0.399)
best baseline0.425 (0.424)0.444 (0.443)0.445 (0.444)


S systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
best system0.770 (0.630)0.779 (0.638)0.782 (0.640)
average of systems0.715 (0.634)0.720 (0.639)0.722 (0.641)
worst system0.646 (0.646)0.648 (0.648)0.650 (0.650)
best baseline0.656 (0.531)0.658 (0.533)0.661 (0.535)


Detailed results



determinates

All test items for ambiguous words with a predetermined part of speech; the complement of
indeterminates.

number of items in task:6663


all sensesmain senses only
average polysemy:8.1435.066


fine-grainedcoarse-grained
average entropy: 1.7681.305


All systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
human0.963 (0.962)0.967 (0.966)0.969 (0.967)
best system0.787 (0.787)0.814 (0.814)0.831 (0.831)
average of systems0.645 (0.544)0.706 (0.589)0.730 (0.608)
worst system0.418 (0.161)0.541 (0.541)0.575 (0.575)
best baseline0.720 (0.719)0.753 (0.752)0.779 (0.778)


A systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
best system0.625 (0.613)0.665 (0.653)0.694 (0.682)
average of systems0.506 (0.437)0.604 (0.506)0.640 (0.533)
worst system0.418 (0.161)0.541 (0.541)0.575 (0.575)
best baseline0.584 (0.581)0.622 (0.619)0.641 (0.638)


S systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
best system0.787 (0.787)0.814 (0.814)0.831 (0.831)
average of systems0.726 (0.623)0.767 (0.655)0.785 (0.670)
worst system0.653 (0.264)0.733 (0.297)0.755 (0.306)
best baseline0.720 (0.719)0.753 (0.752)0.779 (0.778)


O systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
best system0.711 (0.499)0.746 (0.524)0.767 (0.538)
average of systems0.711 (0.499)0.746 (0.524)0.767 (0.538)
worst system0.711 (0.499)0.746 (0.524)0.767 (0.538)


Detailed results



trainable-nouns

The intersection of
nouns and trainable.

number of items in task:2199


all sensesmain senses only
average polysemy:10.0675.675


fine-grainedcoarse-grained
average entropy: 1.8921.271


All systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
human0.976 (0.975)0.978 (0.978)0.980 (0.980)
best system0.849 (0.849)0.879 (0.879)0.914 (0.914)
average of systems0.696 (0.690)0.757 (0.750)0.798 (0.791)
worst system0.392 (0.392)0.499 (0.499)0.582 (0.582)
best baseline0.751 (0.751)0.815 (0.788)0.879 (0.850)


A systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
best system0.711 (0.698)0.757 (0.744)0.798 (0.784)
average of systems0.543 (0.533)0.642 (0.629)0.704 (0.690)
worst system0.392 (0.392)0.499 (0.499)0.582 (0.582)
best baseline0.615 (0.612)0.659 (0.656)0.706 (0.706)


S systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
best system0.849 (0.849)0.879 (0.879)0.914 (0.914)
average of systems0.784 (0.782)0.822 (0.820)0.851 (0.849)
worst system0.702 (0.698)0.740 (0.736)0.767 (0.763)
best baseline0.751 (0.751)0.815 (0.788)0.879 (0.850)


O systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
best system0.779 (0.773)0.820 (0.813)0.850 (0.843)
average of systems0.779 (0.773)0.820 (0.813)0.850 (0.843)
worst system0.779 (0.773)0.820 (0.813)0.850 (0.843)


Detailed results



all-trainable-nouns

The intersection of
all-nouns and trainable.

number of items in task:2914


all sensesmain senses only
average polysemy:11.9638.000


fine-grainedcoarse-grained
average entropy: 1.8981.406


All systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
human0.978 (0.977)0.980 (0.980)0.982 (0.981)
best system0.847 (0.847)0.870 (0.870)0.896 (0.896)
average of systems0.703 (0.656)0.755 (0.702)0.789 (0.732)
worst system0.411 (0.289)0.498 (0.498)0.560 (0.560)
best baseline0.754 (0.754)0.804 (0.783)0.852 (0.830)


A systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
best system0.736 (0.725)0.772 (0.759)0.803 (0.790)
average of systems0.561 (0.527)0.647 (0.600)0.697 (0.646)
worst system0.411 (0.289)0.498 (0.498)0.560 (0.560)
best baseline0.571 (0.568)0.607 (0.604)0.639 (0.636)


S systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
best system0.847 (0.847)0.870 (0.870)0.896 (0.896)
average of systems0.784 (0.753)0.816 (0.783)0.839 (0.805)
worst system0.702 (0.527)0.740 (0.556)0.767 (0.576)
best baseline0.754 (0.754)0.804 (0.783)0.852 (0.830)


O systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
best system0.779 (0.584)0.820 (0.614)0.850 (0.636)
average of systems0.779 (0.584)0.820 (0.614)0.850 (0.636)
worst system0.779 (0.584)0.820 (0.614)0.850 (0.636)


Detailed results



untrainable-nouns

The intersection of
nouns and untrainable.

number of items in task:557


all sensesmain senses only
average polysemy:5.6164.219


fine-grainedcoarse-grained
average entropy: 1.1400.755


All systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
human0.964 (0.964)0.965 (0.965)0.966 (0.966)
best system0.932 (0.932)0.938 (0.938)0.941 (0.941)
average of systems0.669 (0.656)0.791 (0.772)0.794 (0.774)
worst system0.444 (0.409)0.700 (0.645)0.700 (0.645)
best baseline0.756 (0.756)0.828 (0.828)0.831 (0.831)


A systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
best system0.795 (0.795)0.838 (0.838)0.842 (0.842)
average of systems0.635 (0.619)0.771 (0.750)0.773 (0.752)
worst system0.444 (0.409)0.700 (0.645)0.700 (0.645)
best baseline0.756 (0.756)0.828 (0.828)0.831 (0.831)


S systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
best system0.932 (0.932)0.938 (0.938)0.941 (0.941)
average of systems0.716 (0.704)0.819 (0.801)0.821 (0.803)
worst system0.444 (0.409)0.700 (0.645)0.700 (0.645)
best baseline0.681 (0.680)0.743 (0.741)0.746 (0.744)


Detailed results



all-untrainable-nouns

The intersection of
all-nouns and untrainable.

number of items in task:878


all sensesmain senses only
average polysemy:7.5845.601


fine-grainedcoarse-grained
average entropy: 1.6141.217


All systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
human0.972 (0.972)0.973 (0.973)0.973 (0.973)
best system0.842 (0.842)0.848 (0.848)0.850 (0.850)
average of systems0.540 (0.486)0.660 (0.574)0.662 (0.576)
worst system0.443 (0.443)0.558 (0.558)0.560 (0.560)
best baseline0.603 (0.603)0.697 (0.697)0.699 (0.699)


A systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
best system0.597 (0.597)0.700 (0.409)0.700 (0.409)
average of systems0.492 (0.442)0.629 (0.551)0.630 (0.553)
worst system0.443 (0.443)0.585 (0.585)0.587 (0.587)
best baseline0.603 (0.603)0.697 (0.697)0.699 (0.699)


S systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
best system0.842 (0.842)0.848 (0.848)0.850 (0.850)
average of systems0.605 (0.543)0.702 (0.605)0.703 (0.606)
worst system0.444 (0.260)0.558 (0.558)0.560 (0.560)
best baseline0.535 (0.534)0.591 (0.589)0.592 (0.591)


Detailed results



trainable-adjectives

The intersection of
adjectives and trainable.

number of items in task:1284


all sensesmain senses only
average polysemy:6.9274.536


fine-grainedcoarse-grained
average entropy: 1.7001.237


All systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
human0.964 (0.963)0.971 (0.970)0.972 (0.972)
best system0.761 (0.761)0.779 (0.779)0.783 (0.782)
average of systems0.635 (0.629)0.672 (0.666)0.685 (0.679)
worst system0.373 (0.373)0.474 (0.474)0.499 (0.499)
best baseline0.720 (0.719)0.740 (0.739)0.743 (0.743)


A systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
best system0.614 (0.607)0.636 (0.630)0.680 (0.672)
average of systems0.488 (0.474)0.541 (0.527)0.570 (0.554)
worst system0.373 (0.373)0.474 (0.474)0.499 (0.499)
best baseline0.660 (0.660)0.677 (0.677)0.700 (0.700)


S systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
best system0.761 (0.761)0.779 (0.779)0.783 (0.782)
average of systems0.723 (0.722)0.750 (0.750)0.754 (0.754)
worst system0.674 (0.674)0.722 (0.720)0.725 (0.724)
best baseline0.720 (0.719)0.740 (0.739)0.743 (0.743)


Detailed results



all-trainable-adjectives

The intersection of
all-adjectives and trainable.

number of items in task:1628


all sensesmain senses only
average polysemy:8.6875.933


fine-grainedcoarse-grained
average entropy: 1.9151.510


All systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
human0.955 (0.954)0.961 (0.960)0.962 (0.961)
best system0.736 (0.736)0.751 (0.751)0.753 (0.752)
average of systems0.610 (0.606)0.640 (0.636)0.651 (0.646)
worst system0.349 (0.349)0.431 (0.431)0.451 (0.451)
best baseline0.686 (0.685)0.703 (0.702)0.706 (0.705)


A systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
best system0.578 (0.570)0.596 (0.587)0.630 (0.621)
average of systems0.454 (0.443)0.497 (0.485)0.520 (0.508)
worst system0.349 (0.349)0.431 (0.431)0.451 (0.451)
best baseline0.594 (0.594)0.606 (0.606)0.609 (0.609)


S systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
best system0.736 (0.736)0.751 (0.751)0.753 (0.752)
average of systems0.704 (0.704)0.726 (0.726)0.729 (0.729)
worst system0.669 (0.668)0.686 (0.684)0.688 (0.687)
best baseline0.686 (0.685)0.703 (0.702)0.706 (0.705)


Detailed results



all-untrainable-adjectives

The intersection of
all-adjectives and untrainable.

number of items in task:122


all sensesmain senses only
average polysemy:5.0005.000


fine-grainedcoarse-grained
average entropy: 1.2241.224


All systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
human0.985 (0.985)0.985 (0.985)0.985 (0.985)
best system0.943 (0.943)0.943 (0.943)0.943 (0.943)
average of systems0.759 (0.747)0.811 (0.794)0.811 (0.794)
worst system0.422 (0.422)0.496 (0.496)0.496 (0.496)
best baseline0.902 (0.902)0.902 (0.902)0.902 (0.902)


A systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
best system0.926 (0.926)0.926 (0.926)0.926 (0.926)
average of systems0.664 (0.643)0.750 (0.723)0.750 (0.723)
worst system0.422 (0.422)0.496 (0.496)0.496 (0.496)
best baseline0.902 (0.902)0.902 (0.902)0.902 (0.902)


S systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
best system0.943 (0.943)0.943 (0.943)0.943 (0.943)
average of systems0.902 (0.902)0.902 (0.902)0.902 (0.902)
worst system0.861 (0.861)0.861 (0.861)0.861 (0.861)
best baseline0.704 (0.693)0.704 (0.693)0.704 (0.693)


Detailed results



trainable-indeterminates

The intersection of
indeterminates and trainable.

number of items in task:1462


all sensesmain senses only
average polysemy:20.39216.789


fine-grainedcoarse-grained
average entropy: 2.4752.343


All systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
human0.967 (0.965)0.969 (0.967)0.969 (0.967)
best system0.776 (0.776)0.782 (0.782)0.784 (0.784)
average of systems0.673 (0.670)0.681 (0.678)0.683 (0.680)
worst system0.424 (0.424)0.442 (0.442)0.445 (0.445)
best baseline0.656 (0.649)0.658 (0.651)0.661 (0.654)


A systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
best system0.692 (0.678)0.699 (0.685)0.701 (0.687)
average of systems0.558 (0.553)0.570 (0.565)0.573 (0.568)
worst system0.424 (0.424)0.442 (0.442)0.445 (0.445)
best baseline0.452 (0.451)0.466 (0.465)0.467 (0.466)


S systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
best system0.776 (0.776)0.782 (0.782)0.784 (0.784)
average of systems0.742 (0.740)0.748 (0.746)0.750 (0.747)
worst system0.677 (0.666)0.681 (0.671)0.684 (0.673)
best baseline0.656 (0.649)0.658 (0.651)0.661 (0.654)


Detailed results



trainable-determinates

The intersection of
determinates and trainable.

number of items in task:5984


all sensesmain senses only
average polysemy:8.4425.146


fine-grainedcoarse-grained
average entropy: 1.8371.358


All systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
human0.962 (0.961)0.967 (0.965)0.969 (0.967)
best system0.770 (0.770)0.800 (0.800)0.819 (0.819)
average of systems0.642 (0.560)0.697 (0.602)0.724 (0.624)
worst system0.411 (0.141)0.517 (0.517)0.554 (0.554)
best baseline0.724 (0.723)0.755 (0.754)0.784 (0.783)


A systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
best system0.617 (0.607)0.656 (0.646)0.688 (0.678)
average of systems0.491 (0.419)0.583 (0.483)0.624 (0.513)
worst system0.411 (0.141)0.517 (0.517)0.554 (0.554)
best baseline0.573 (0.570)0.610 (0.607)0.631 (0.628)


S systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
best system0.770 (0.770)0.800 (0.800)0.819 (0.819)
average of systems0.730 (0.655)0.764 (0.685)0.785 (0.703)
worst system0.696 (0.696)0.740 (0.271)0.758 (0.758)
best baseline0.724 (0.723)0.755 (0.754)0.784 (0.783)


O systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
best system0.711 (0.556)0.746 (0.583)0.767 (0.599)
average of systems0.711 (0.556)0.746 (0.583)0.767 (0.599)
worst system0.711 (0.556)0.746 (0.583)0.767 (0.599)


Detailed results



low-polysemy

Items involving words whose polysemy is less than the median polysemy of 8.

number of items in task:3771


all sensesmain senses only
average polysemy:5.0693.682


fine-grainedcoarse-grained
average entropy: 1.2811.011


All systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
human0.971 (0.970)0.974 (0.973)0.975 (0.973)
best system0.837 (0.836)0.855 (0.855)0.863 (0.863)
average of systems0.704 (0.562)0.749 (0.593)0.761 (0.602)
worst system0.486 (0.116)0.624 (0.150)0.642 (0.642)
best baseline0.780 (0.678)0.798 (0.694)0.810 (0.704)


A systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
best system0.677 (0.667)0.710 (0.700)0.735 (0.724)
average of systems0.573 (0.478)0.651 (0.530)0.669 (0.544)
worst system0.486 (0.116)0.624 (0.150)0.642 (0.642)
best baseline0.675 (0.675)0.706 (0.706)0.731 (0.730)


S systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
best system0.837 (0.836)0.855 (0.855)0.863 (0.863)
average of systems0.782 (0.641)0.808 (0.660)0.816 (0.667)
worst system0.737 (0.177)0.771 (0.185)0.775 (0.186)
best baseline0.780 (0.678)0.798 (0.694)0.810 (0.704)


O systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
best system0.762 (0.422)0.790 (0.437)0.803 (0.444)
average of systems0.762 (0.422)0.790 (0.437)0.803 (0.444)
worst system0.762 (0.422)0.790 (0.437)0.803 (0.444)


Detailed results



high-polysemy

Items involving words whose polysemy is equal to or greater than the median polysemy of 8.

number of items in task:4677


all sensesmain senses only
average polysemy:14.64810.050


fine-grainedcoarse-grained
average entropy: 2.4281.916


All systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
human0.959 (0.957)0.963 (0.962)0.965 (0.964)
best system0.736 (0.736)0.763 (0.763)0.782 (0.782)
average of systems0.591 (0.483)0.654 (0.525)0.682 (0.546)
worst system0.328 (0.328)0.416 (0.416)0.454 (0.454)
best baseline0.624 (0.623)0.658 (0.657)0.689 (0.688)


A systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
best system0.567 (0.554)0.619 (0.605)0.659 (0.234)
average of systems0.443 (0.376)0.542 (0.440)0.582 (0.470)
worst system0.328 (0.328)0.416 (0.416)0.454 (0.454)
best baseline0.465 (0.462)0.504 (0.501)0.527 (0.524)


S systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
best system0.736 (0.736)0.763 (0.763)0.782 (0.782)
average of systems0.676 (0.574)0.719 (0.604)0.740 (0.621)
worst system0.610 (0.234)0.675 (0.592)0.697 (0.612)
best baseline0.624 (0.623)0.658 (0.657)0.689 (0.688)


O systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
best system0.670 (0.371)0.710 (0.393)0.737 (0.408)
average of systems0.670 (0.371)0.710 (0.393)0.737 (0.408)
worst system0.670 (0.371)0.710 (0.393)0.737 (0.408)


Detailed results



low-entropy

Items involving words whose entropy is less than the median entropy of 1.85.

number of items in task:3872


all sensesmain senses only
average polysemy:7.3095.472


fine-grainedcoarse-grained
average entropy: 1.1160.846


All systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
human0.979 (0.978)0.981 (0.980)0.981 (0.981)
best system0.921 (0.921)0.932 (0.932)0.936 (0.936)
average of systems0.772 (0.625)0.829 (0.661)0.841 (0.671)
worst system0.452 (0.180)0.671 (0.267)0.694 (0.276)
best baseline0.877 (0.713)0.892 (0.725)0.907 (0.738)


A systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
best system0.792 (0.777)0.818 (0.802)0.840 (0.824)
average of systems0.633 (0.562)0.733 (0.628)0.752 (0.643)
worst system0.452 (0.180)0.671 (0.267)0.694 (0.276)
best baseline0.696 (0.696)0.728 (0.728)0.749 (0.749)


S systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
best system0.921 (0.921)0.932 (0.932)0.936 (0.936)
average of systems0.844 (0.698)0.877 (0.719)0.885 (0.725)
worst system0.686 (0.273)0.787 (0.313)0.804 (0.320)
best baseline0.877 (0.713)0.892 (0.725)0.907 (0.738)


O systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
best system0.895 (0.438)0.920 (0.450)0.933 (0.456)
average of systems0.895 (0.438)0.920 (0.450)0.933 (0.456)
worst system0.895 (0.438)0.920 (0.450)0.933 (0.456)


Detailed results



high-entropy

Items involving words whose entropy is equal to or greater than the median entropy of 1.85.

number of items in task:4576


all sensesmain senses only
average polysemy:12.9638.675


fine-grainedcoarse-grained
average entropy: 2.5932.075


All systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
human0.953 (0.950)0.957 (0.955)0.960 (0.957)
best system0.662 (0.662)0.696 (0.696)0.718 (0.718)
average of systems0.530 (0.428)0.583 (0.466)0.614 (0.487)
worst system0.288 (0.288)0.351 (0.351)0.390 (0.390)
best baseline0.579 (0.578)0.615 (0.614)0.643 (0.642)


A systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
best system0.467 (0.459)0.526 (0.517)0.595 (0.133)
average of systems0.378 (0.303)0.458 (0.356)0.504 (0.384)
worst system0.288 (0.288)0.351 (0.351)0.390 (0.390)
best baseline0.454 (0.452)0.495 (0.492)0.519 (0.516)


S systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
best system0.662 (0.662)0.696 (0.696)0.718 (0.718)
average of systems0.621 (0.523)0.659 (0.553)0.681 (0.571)
worst system0.595 (0.550)0.624 (0.577)0.648 (0.599)
best baseline0.579 (0.578)0.615 (0.614)0.643 (0.642)


O systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
best system0.586 (0.356)0.628 (0.382)0.653 (0.397)
average of systems0.586 (0.356)0.628 (0.382)0.653 (0.397)
worst system0.586 (0.356)0.628 (0.382)0.653 (0.397)


Detailed results



accident-n

number of items in task:267


all sensesmain senses only
polysemy:82


fine-grainedcoarse-grained
entropy: 1.4300.571


All systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
human0.987 (0.987)0.988 (0.988)0.991 (0.991)
best system0.933 (0.933)0.954 (0.954)0.981 (0.981)
average of systems0.767 (0.766)0.831 (0.829)0.888 (0.886)
worst system0.273 (0.272)0.328 (0.328)0.375 (0.375)
best baseline0.789 (0.783)0.828 (0.828)0.963 (0.963)


A systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
best system0.855 (0.839)0.883 (0.867)0.950 (0.933)
average of systems0.548 (0.544)0.673 (0.669)0.753 (0.748)
worst system0.273 (0.272)0.328 (0.328)0.375 (0.375)
best baseline0.753 (0.753)0.789 (0.789)0.933 (0.933)


S systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
best system0.933 (0.933)0.954 (0.954)0.981 (0.981)
average of systems0.892 (0.892)0.923 (0.923)0.966 (0.966)
worst system0.843 (0.843)0.900 (0.900)0.948 (0.948)
best baseline0.789 (0.783)0.828 (0.828)0.963 (0.963)


O systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
best system0.895 (0.891)0.908 (0.905)0.962 (0.959)
average of systems0.895 (0.891)0.908 (0.905)0.962 (0.959)
worst system0.895 (0.891)0.908 (0.905)0.962 (0.959)


Detailed results



behaviour-n

number of items in task:279


all sensesmain senses only
polysemy:32


fine-grainedcoarse-grained
entropy: 0.3900.295


All systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
human0.973 (0.973)0.973 (0.973)0.973 (0.973)
best system0.964 (0.964)0.964 (0.964)0.964 (0.964)
average of systems0.860 (0.857)0.899 (0.896)0.899 (0.896)
worst system0.380 (0.380)0.530 (0.530)0.530 (0.530)
best baseline0.946 (0.946)0.961 (0.961)0.961 (0.961)


A systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
best system0.961 (0.961)0.961 (0.961)0.961 (0.961)
average of systems0.701 (0.699)0.792 (0.789)0.792 (0.789)
worst system0.380 (0.380)0.530 (0.530)0.530 (0.530)
best baseline0.946 (0.946)0.961 (0.961)0.961 (0.961)


S systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
best system0.964 (0.964)0.964 (0.964)0.964 (0.964)
average of systems0.952 (0.948)0.959 (0.956)0.959 (0.956)
worst system0.927 (0.927)0.953 (0.953)0.953 (0.953)
best baseline0.946 (0.946)0.961 (0.961)0.961 (0.961)


O systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
best system0.946 (0.946)0.961 (0.961)0.961 (0.961)
average of systems0.946 (0.946)0.961 (0.961)0.961 (0.961)
worst system0.946 (0.946)0.961 (0.961)0.961 (0.961)


Detailed results



bet-n

number of items in task:274


all sensesmain senses only
polysemy:159


fine-grainedcoarse-grained
entropy: 3.2002.563


All systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
human0.970 (0.970)0.980 (0.980)0.980 (0.980)
best system0.712 (0.712)0.783 (0.783)0.861 (0.861)
average of systems0.552 (0.549)0.620 (0.615)0.662 (0.657)
worst system0.338 (0.338)0.416 (0.416)0.443 (0.412)
best baseline0.547 (0.547)0.652 (0.621)0.782 (0.745)


A systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
best system0.631 (0.631)0.703 (0.703)0.818 (0.818)
average of systems0.451 (0.443)0.531 (0.521)0.587 (0.577)
worst system0.338 (0.338)0.416 (0.416)0.443 (0.412)
best baseline0.406 (0.400)0.426 (0.404)0.452 (0.429)


S systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
best system0.712 (0.712)0.783 (0.783)0.861 (0.861)
average of systems0.629 (0.629)0.692 (0.691)0.723 (0.723)
worst system0.498 (0.496)0.544 (0.542)0.557 (0.555)
best baseline0.547 (0.547)0.652 (0.621)0.782 (0.745)


O systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
best system0.494 (0.489)0.542 (0.536)0.594 (0.588)
average of systems0.494 (0.489)0.542 (0.536)0.594 (0.588)
worst system0.494 (0.489)0.542 (0.536)0.594 (0.588)


Detailed results



disability-n

number of items in task:160


all sensesmain senses only
polysemy:32


fine-grainedcoarse-grained
entropy: 1.0530.457


All systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
human0.959 (0.959)0.959 (0.959)0.959 (0.959)
best system0.900 (0.900)0.938 (0.938)0.938 (0.938)
average of systems0.828 (0.827)0.930 (0.930)0.930 (0.930)
worst system0.800 (0.800)0.900 (0.900)0.900 (0.900)
best baseline0.800 (0.800)0.938 (0.938)0.938 (0.938)


A systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
best system0.874 (0.869)0.938 (0.938)0.938 (0.938)
average of systems0.823 (0.822)0.934 (0.933)0.934 (0.933)
worst system0.800 (0.800)0.931 (0.931)0.931 (0.931)
best baseline0.800 (0.800)0.938 (0.938)0.938 (0.938)


S systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
best system0.900 (0.900)0.938 (0.938)0.938 (0.938)
average of systems0.833 (0.833)0.925 (0.925)0.925 (0.925)
worst system0.800 (0.800)0.900 (0.900)0.900 (0.900)
best baseline0.755 (0.750)0.906 (0.900)0.906 (0.900)


Detailed results



excess-n

number of items in task:186


all sensesmain senses only
polysemy:83


fine-grainedcoarse-grained
entropy: 2.3871.199


All systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
human0.973 (0.968)0.973 (0.968)0.973 (0.968)
best system0.871 (0.871)0.875 (0.875)0.907 (0.892)
average of systems0.704 (0.700)0.749 (0.745)0.787 (0.783)
worst system0.413 (0.413)0.511 (0.511)0.613 (0.610)
best baseline0.800 (0.796)0.820 (0.816)0.881 (0.876)


A systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
best system0.847 (0.833)0.857 (0.843)0.869 (0.855)
average of systems0.554 (0.550)0.632 (0.627)0.709 (0.705)
worst system0.413 (0.413)0.511 (0.511)0.613 (0.610)
best baseline0.661 (0.661)0.747 (0.747)0.747 (0.747)


S systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
best system0.871 (0.871)0.875 (0.875)0.907 (0.892)
average of systems0.799 (0.797)0.817 (0.815)0.834 (0.831)
worst system0.703 (0.703)0.736 (0.736)0.737 (0.737)
best baseline0.800 (0.796)0.820 (0.816)0.881 (0.876)


O systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
best system0.727 (0.715)0.814 (0.801)0.814 (0.801)
average of systems0.727 (0.715)0.814 (0.801)0.814 (0.801)
worst system0.727 (0.715)0.814 (0.801)0.814 (0.801)


Detailed results



float-n

number of items in task:75


all sensesmain senses only
polysemy:128


fine-grainedcoarse-grained
entropy: 2.3402.042


All systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
human0.980 (0.980)0.980 (0.980)0.993 (0.993)
best system0.813 (0.813)0.813 (0.813)0.840 (0.840)
average of systems0.475 (0.475)0.487 (0.487)0.549 (0.549)
worst system0.053 (0.053)0.107 (0.107)0.120 (0.120)
best baseline0.720 (0.720)0.720 (0.720)0.733 (0.733)


A systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
best system0.347 (0.347)0.347 (0.347)0.440 (0.440)
average of systems0.157 (0.157)0.187 (0.187)0.303 (0.303)
worst system0.053 (0.053)0.107 (0.107)0.120 (0.120)
best baseline0.267 (0.260)0.281 (0.273)0.520 (0.520)


S systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
best system0.813 (0.813)0.813 (0.813)0.840 (0.840)
average of systems0.671 (0.671)0.673 (0.673)0.705 (0.705)
worst system0.573 (0.573)0.573 (0.573)0.613 (0.613)
best baseline0.720 (0.720)0.720 (0.720)0.733 (0.733)


O systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
best system0.573 (0.573)0.573 (0.573)0.600 (0.600)
average of systems0.573 (0.573)0.573 (0.573)0.600 (0.600)
worst system0.573 (0.573)0.573 (0.573)0.600 (0.600)


Detailed results



giant-n

number of items in task:118


all sensesmain senses only
polysemy:73


fine-grainedcoarse-grained
entropy: 2.0541.239


All systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
human0.975 (0.975)0.975 (0.975)0.992 (0.992)
best system0.822 (0.822)0.856 (0.856)0.975 (0.975)
average of systems0.581 (0.573)0.643 (0.634)0.752 (0.741)
worst system0.163 (0.163)0.246 (0.246)0.246 (0.246)
best baseline0.763 (0.763)0.839 (0.839)0.983 (0.983)


A systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
best system0.492 (0.492)0.568 (0.568)0.720 (0.720)
average of systems0.289 (0.289)0.370 (0.370)0.501 (0.501)
worst system0.163 (0.163)0.246 (0.246)0.246 (0.246)
best baseline0.534 (0.534)0.571 (0.571)0.720 (0.720)


S systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
best system0.822 (0.822)0.856 (0.856)0.975 (0.975)
average of systems0.759 (0.746)0.808 (0.793)0.900 (0.883)
worst system0.669 (0.669)0.678 (0.678)0.712 (0.712)
best baseline0.763 (0.763)0.839 (0.839)0.983 (0.983)


O systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
best system0.681 (0.669)0.750 (0.737)0.862 (0.847)
average of systems0.681 (0.669)0.750 (0.737)0.862 (0.847)
worst system0.681 (0.669)0.750 (0.737)0.862 (0.847)


Detailed results



knee-n

number of items in task:251


all sensesmain senses only
polysemy:2212


fine-grainedcoarse-grained
entropy: 2.4841.463


All systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
human0.988 (0.988)0.988 (0.988)0.988 (0.988)
best system0.832 (0.829)0.848 (0.848)0.869 (0.869)
average of systems0.640 (0.636)0.708 (0.704)0.763 (0.759)
worst system0.209 (0.209)0.249 (0.249)0.306 (0.306)
best baseline0.665 (0.665)0.819 (0.803)0.861 (0.861)


A systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
best system0.639 (0.614)0.671 (0.644)0.793 (0.793)
average of systems0.427 (0.420)0.531 (0.523)0.649 (0.640)
worst system0.209 (0.209)0.249 (0.249)0.306 (0.306)
best baseline0.578 (0.578)0.656 (0.656)0.833 (0.833)


S systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
best system0.821 (0.821)0.848 (0.848)0.869 (0.869)
average of systems0.750 (0.748)0.803 (0.801)0.822 (0.820)
worst system0.651 (0.645)0.740 (0.734)0.759 (0.753)
best baseline0.665 (0.665)0.819 (0.803)0.861 (0.861)


O systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
best system0.832 (0.829)0.847 (0.844)0.868 (0.865)
average of systems0.832 (0.829)0.847 (0.844)0.868 (0.865)
worst system0.832 (0.829)0.847 (0.844)0.868 (0.865)


Detailed results



onion-n

number of items in task:214


all sensesmain senses only
polysemy:44


fine-grainedcoarse-grained
entropy: 0.8620.862


All systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
human0.951 (0.951)0.951 (0.951)0.951 (0.951)
best system0.921 (0.921)0.921 (0.921)0.921 (0.921)
average of systems0.856 (0.854)0.856 (0.854)0.856 (0.854)
worst system0.752 (0.752)0.752 (0.752)0.752 (0.752)
best baseline0.911 (0.911)0.911 (0.911)0.911 (0.911)


A systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
best system0.916 (0.916)0.916 (0.916)0.916 (0.916)
average of systems0.841 (0.839)0.841 (0.839)0.841 (0.839)
worst system0.752 (0.752)0.752 (0.752)0.752 (0.752)
best baseline0.911 (0.911)0.911 (0.911)0.911 (0.911)


S systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
best system0.921 (0.921)0.921 (0.921)0.921 (0.921)
average of systems0.870 (0.870)0.870 (0.870)0.870 (0.870)
worst system0.808 (0.808)0.808 (0.808)0.808 (0.808)
best baseline0.909 (0.841)0.909 (0.841)0.909 (0.841)


O systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
best system0.836 (0.820)0.836 (0.820)0.836 (0.820)
average of systems0.836 (0.820)0.836 (0.820)0.836 (0.820)
worst system0.836 (0.820)0.836 (0.820)0.836 (0.820)


Detailed results



promise-n

number of items in task:113


all sensesmain senses only
polysemy:84


fine-grainedcoarse-grained
entropy: 1.8510.963


All systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
human0.965 (0.965)0.965 (0.965)0.965 (0.965)
best system0.867 (0.867)0.885 (0.885)0.929 (0.929)
average of systems0.641 (0.641)0.754 (0.754)0.820 (0.820)
worst system0.195 (0.195)0.628 (0.628)0.690 (0.690)
best baseline0.717 (0.717)0.746 (0.746)0.832 (0.832)


A systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
best system0.619 (0.619)0.710 (0.710)0.770 (0.770)
average of systems0.418 (0.418)0.656 (0.656)0.740 (0.740)
worst system0.195 (0.195)0.628 (0.628)0.708 (0.708)
best baseline0.708 (0.708)0.737 (0.737)0.823 (0.823)


S systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
best system0.867 (0.867)0.878 (0.878)0.929 (0.929)
average of systems0.754 (0.754)0.797 (0.797)0.857 (0.857)
worst system0.628 (0.628)0.644 (0.644)0.690 (0.690)
best baseline0.717 (0.717)0.746 (0.746)0.832 (0.832)


O systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
best system0.858 (0.858)0.885 (0.885)0.920 (0.920)
average of systems0.858 (0.858)0.885 (0.885)0.920 (0.920)
worst system0.858 (0.858)0.885 (0.885)0.920 (0.920)


Detailed results



rabbit-n

number of items in task:221


all sensesmain senses only
polysemy:86


fine-grainedcoarse-grained
entropy: 0.7480.522


All systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
human0.948 (0.948)0.950 (0.950)0.952 (0.952)
best system0.946 (0.946)0.955 (0.955)0.964 (0.964)
average of systems0.723 (0.712)0.933 (0.921)0.938 (0.926)
worst system0.432 (0.430)0.912 (0.912)0.919 (0.919)
best baseline0.919 (0.919)0.928 (0.928)0.937 (0.937)


A systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
best system0.946 (0.946)0.955 (0.955)0.964 (0.964)
average of systems0.692 (0.673)0.931 (0.911)0.937 (0.916)
worst system0.432 (0.430)0.912 (0.912)0.919 (0.919)
best baseline0.919 (0.919)0.928 (0.928)0.937 (0.937)


S systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
best system0.941 (0.941)0.948 (0.948)0.955 (0.955)
average of systems0.765 (0.765)0.936 (0.934)0.941 (0.940)
worst system0.432 (0.430)0.927 (0.923)0.927 (0.923)
best baseline0.600 (0.600)0.624 (0.624)0.631 (0.631)


Detailed results



sack-n

number of items in task:82


all sensesmain senses only
polysemy:119


fine-grainedcoarse-grained
entropy: 1.7721.667


All systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
human1.000 (1.000)1.000 (1.000)1.000 (1.000)
best system0.902 (0.902)0.915 (0.915)0.915 (0.915)
average of systems0.730 (0.728)0.758 (0.756)0.758 (0.756)
worst system0.537 (0.537)0.537 (0.537)0.537 (0.537)
best baseline0.889 (0.878)0.889 (0.878)0.889 (0.878)


A systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
best system0.732 (0.732)0.817 (0.817)0.817 (0.817)
average of systems0.626 (0.622)0.687 (0.683)0.687 (0.683)
worst system0.537 (0.537)0.537 (0.537)0.537 (0.537)
best baseline0.524 (0.524)0.524 (0.524)0.524 (0.524)


S systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
best system0.878 (0.878)0.878 (0.878)0.878 (0.878)
average of systems0.770 (0.770)0.779 (0.779)0.779 (0.779)
worst system0.659 (0.659)0.659 (0.659)0.659 (0.659)
best baseline0.889 (0.878)0.889 (0.878)0.889 (0.878)


O systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
best system0.902 (0.902)0.915 (0.915)0.915 (0.915)
average of systems0.902 (0.902)0.915 (0.915)0.915 (0.915)
worst system0.902 (0.902)0.915 (0.915)0.915 (0.915)


Detailed results



scrap-n

number of items in task:156


all sensesmain senses only
polysemy:148


fine-grainedcoarse-grained
entropy: 2.8391.999


All systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
human0.965 (0.965)0.974 (0.974)0.978 (0.978)
best system0.718 (0.718)0.966 (0.179)0.966 (0.179)
average of systems0.558 (0.510)0.728 (0.650)0.781 (0.704)
worst system0.329 (0.329)0.529 (0.529)0.564 (0.564)
best baseline0.622 (0.622)0.760 (0.760)0.795 (0.795)


A systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
best system0.586 (0.109)0.966 (0.179)0.966 (0.179)
average of systems0.489 (0.368)0.742 (0.542)0.799 (0.599)
worst system0.329 (0.329)0.604 (0.604)0.690 (0.690)
best baseline0.622 (0.622)0.760 (0.760)0.795 (0.795)


S systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
best system0.718 (0.718)0.853 (0.853)0.891 (0.891)
average of systems0.595 (0.590)0.713 (0.707)0.768 (0.761)
worst system0.494 (0.494)0.529 (0.529)0.564 (0.564)
best baseline0.583 (0.583)0.708 (0.708)0.782 (0.782)


O systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
best system0.608 (0.596)0.761 (0.747)0.791 (0.776)
average of systems0.608 (0.596)0.761 (0.747)0.791 (0.776)
worst system0.608 (0.596)0.761 (0.747)0.791 (0.776)


Detailed results



shirt-n

number of items in task:184


all sensesmain senses only
polysemy:86


fine-grainedcoarse-grained
entropy: 1.7781.235


All systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
human0.992 (0.992)0.995 (0.995)0.997 (0.997)
best system0.882 (0.853)0.951 (0.951)0.995 (0.995)
average of systems0.745 (0.741)0.824 (0.820)0.874 (0.870)
worst system0.457 (0.457)0.516 (0.516)0.576 (0.576)
best baseline0.858 (0.821)0.920 (0.880)0.983 (0.940)


A systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
best system0.882 (0.853)0.941 (0.910)0.994 (0.962)
average of systems0.667 (0.660)0.774 (0.766)0.836 (0.827)
worst system0.457 (0.457)0.516 (0.516)0.576 (0.576)
best baseline0.821 (0.821)0.899 (0.899)0.940 (0.940)


S systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
best system0.880 (0.880)0.951 (0.951)0.984 (0.984)
average of systems0.774 (0.773)0.836 (0.835)0.879 (0.878)
worst system0.462 (0.462)0.519 (0.519)0.576 (0.576)
best baseline0.858 (0.821)0.920 (0.880)0.983 (0.940)


O systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
best system0.880 (0.880)0.948 (0.948)0.995 (0.995)
average of systems0.880 (0.880)0.948 (0.948)0.995 (0.995)
worst system0.880 (0.880)0.948 (0.948)0.995 (0.995)


Detailed results



steering-n

number of items in task:176


all sensesmain senses only
polysemy:54


fine-grainedcoarse-grained
entropy: 1.7121.319


All systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
human0.989 (0.989)0.989 (0.989)0.989 (0.989)
best system0.949 (0.949)0.960 (0.960)0.960 (0.960)
average of systems0.433 (0.429)0.444 (0.441)0.444 (0.441)
worst system0.038 (0.028)0.038 (0.028)0.038 (0.028)
best baseline0.716 (0.716)0.775 (0.775)0.775 (0.775)


A systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
best system0.585 (0.585)0.608 (0.608)0.608 (0.608)
average of systems0.371 (0.367)0.385 (0.382)0.385 (0.382)
worst system0.038 (0.028)0.038 (0.028)0.038 (0.028)
best baseline0.716 (0.716)0.775 (0.775)0.775 (0.775)


S systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
best system0.949 (0.949)0.960 (0.960)0.960 (0.960)
average of systems0.515 (0.511)0.522 (0.519)0.522 (0.519)
worst system0.038 (0.028)0.038 (0.028)0.038 (0.028)
best baseline0.716 (0.716)0.744 (0.744)0.744 (0.744)


Detailed results



amaze-v

number of items in task:70


all sensesmain senses only
polysemy:11


fine-grainedcoarse-grained
entropy: 0.0000.000


All systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
human1.000 (1.000)1.000 (1.000)1.000 (1.000)
best system1.000 (1.000)1.000 (1.000)1.000 (1.000)
average of systems0.975 (0.968)0.975 (0.968)0.975 (0.968)
worst system0.843 (0.843)0.843 (0.843)0.843 (0.843)
best baseline1.000 (1.000)1.000 (1.000)1.000 (1.000)


A systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
best system1.000 (1.000)1.000 (1.000)1.000 (1.000)
average of systems0.986 (0.971)0.986 (0.971)0.986 (0.971)
worst system0.957 (0.957)0.957 (0.957)0.957 (0.957)
best baseline1.000 (1.000)1.000 (1.000)1.000 (1.000)


S systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
best system1.000 (1.000)1.000 (1.000)1.000 (1.000)
average of systems0.963 (0.960)0.963 (0.960)0.963 (0.960)
worst system0.843 (0.843)0.843 (0.843)0.843 (0.843)
best baseline1.000 (1.000)1.000 (1.000)1.000 (1.000)


O systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
best system1.000 (1.000)1.000 (1.000)1.000 (1.000)
average of systems1.000 (1.000)1.000 (1.000)1.000 (1.000)
worst system1.000 (1.000)1.000 (1.000)1.000 (1.000)


Detailed results



bet-v

number of items in task:117


all sensesmain senses only
polysemy:94


fine-grainedcoarse-grained
entropy: 2.3491.581


All systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
human0.924 (0.916)0.932 (0.925)0.932 (0.925)
best system0.778 (0.778)0.786 (0.786)0.838 (0.838)
average of systems0.527 (0.527)0.550 (0.550)0.590 (0.590)
worst system0.026 (0.026)0.034 (0.034)0.034 (0.034)
best baseline0.714 (0.714)0.726 (0.726)0.803 (0.803)


A systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
best system0.513 (0.513)0.521 (0.521)0.547 (0.547)
average of systems0.231 (0.231)0.269 (0.269)0.288 (0.288)
worst system0.026 (0.026)0.034 (0.034)0.034 (0.034)
best baseline0.714 (0.714)0.726 (0.726)0.803 (0.803)


S systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
best system0.778 (0.778)0.786 (0.786)0.838 (0.838)
average of systems0.663 (0.663)0.680 (0.680)0.735 (0.735)
worst system0.547 (0.547)0.551 (0.551)0.624 (0.624)
best baseline0.692 (0.692)0.701 (0.701)0.795 (0.795)


O systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
best system0.735 (0.735)0.744 (0.744)0.769 (0.769)
average of systems0.735 (0.735)0.744 (0.744)0.769 (0.769)
worst system0.735 (0.735)0.744 (0.744)0.769 (0.769)


Detailed results



bother-v

number of items in task:209


all sensesmain senses only
polysemy:86


fine-grainedcoarse-grained
entropy: 2.1681.837


All systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
human0.976 (0.976)0.976 (0.976)0.976 (0.976)
best system0.866 (0.866)0.880 (0.880)0.880 (0.880)
average of systems0.680 (0.679)0.707 (0.706)0.707 (0.706)
worst system0.443 (0.443)0.455 (0.455)0.455 (0.455)
best baseline0.632 (0.617)0.637 (0.622)0.637 (0.622)


A systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
best system0.575 (0.569)0.633 (0.627)0.633 (0.627)
average of systems0.510 (0.508)0.543 (0.541)0.543 (0.541)
worst system0.443 (0.443)0.455 (0.455)0.455 (0.455)
best baseline0.415 (0.405)0.467 (0.467)0.467 (0.467)


S systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
best system0.866 (0.866)0.880 (0.880)0.880 (0.880)
average of systems0.770 (0.770)0.786 (0.786)0.786 (0.786)
worst system0.622 (0.622)0.627 (0.627)0.627 (0.627)
best baseline0.632 (0.617)0.637 (0.622)0.637 (0.622)


O systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
best system0.737 (0.737)0.804 (0.804)0.804 (0.804)
average of systems0.737 (0.737)0.804 (0.804)0.804 (0.804)
worst system0.737 (0.737)0.804 (0.804)0.804 (0.804)


Detailed results



bury-v

number of items in task:201


all sensesmain senses only
polysemy:146


fine-grainedcoarse-grained
entropy: 2.7592.401


All systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
human0.928 (0.923)0.930 (0.925)0.933 (0.928)
best system0.572 (0.572)0.578 (0.578)0.592 (0.592)
average of systems0.421 (0.420)0.443 (0.442)0.454 (0.452)
worst system0.212 (0.212)0.216 (0.216)0.223 (0.223)
best baseline0.552 (0.552)0.557 (0.557)0.567 (0.567)


A systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
best system0.437 (0.433)0.457 (0.453)0.467 (0.463)
average of systems0.322 (0.319)0.342 (0.339)0.351 (0.348)
worst system0.212 (0.212)0.216 (0.216)0.223 (0.223)
best baseline0.365 (0.365)0.383 (0.383)0.384 (0.384)


S systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
best system0.572 (0.572)0.578 (0.578)0.592 (0.592)
average of systems0.479 (0.479)0.503 (0.503)0.516 (0.516)
worst system0.413 (0.413)0.439 (0.439)0.458 (0.458)
best baseline0.552 (0.552)0.557 (0.557)0.567 (0.567)


O systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
best system0.432 (0.428)0.450 (0.445)0.452 (0.448)
average of systems0.432 (0.428)0.450 (0.445)0.452 (0.448)
worst system0.432 (0.428)0.450 (0.445)0.452 (0.448)


Detailed results



calculate-v

number of items in task:218


all sensesmain senses only
polysemy:53


fine-grainedcoarse-grained
entropy: 0.9820.864


All systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
human0.954 (0.950)0.959 (0.954)0.959 (0.954)
best system0.922 (0.922)0.922 (0.922)0.922 (0.922)
average of systems0.788 (0.787)0.793 (0.792)0.793 (0.792)
worst system0.271 (0.271)0.271 (0.271)0.271 (0.271)
best baseline0.904 (0.904)0.904 (0.904)0.904 (0.904)


A systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
best system0.858 (0.858)0.862 (0.862)0.862 (0.862)
average of systems0.625 (0.623)0.626 (0.624)0.626 (0.624)
worst system0.271 (0.271)0.271 (0.271)0.271 (0.271)
best baseline0.493 (0.493)0.493 (0.493)0.493 (0.493)


S systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
best system0.922 (0.922)0.922 (0.922)0.922 (0.922)
average of systems0.869 (0.869)0.875 (0.875)0.875 (0.875)
worst system0.775 (0.775)0.789 (0.789)0.789 (0.789)
best baseline0.904 (0.904)0.904 (0.904)0.904 (0.904)


O systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
best system0.871 (0.867)0.880 (0.876)0.880 (0.876)
average of systems0.871 (0.867)0.880 (0.876)0.880 (0.876)
worst system0.871 (0.867)0.880 (0.876)0.880 (0.876)


Detailed results



consume-v

number of items in task:186


all sensesmain senses only
polysemy:64


fine-grainedcoarse-grained
entropy: 2.2181.677


All systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
human0.944 (0.939)0.955 (0.950)0.958 (0.953)
best system0.503 (0.500)0.586 (0.583)0.616 (0.613)
average of systems0.418 (0.416)0.515 (0.511)0.551 (0.548)
worst system0.189 (0.188)0.427 (0.425)0.478 (0.468)
best baseline0.546 (0.543)0.608 (0.605)0.654 (0.651)


A systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
best system0.500 (0.500)0.586 (0.583)0.616 (0.613)
average of systems0.354 (0.351)0.505 (0.501)0.531 (0.527)
worst system0.189 (0.188)0.429 (0.419)0.478 (0.468)
best baseline0.416 (0.414)0.500 (0.497)0.535 (0.532)


S systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
best system0.503 (0.500)0.570 (0.567)0.616 (0.613)
average of systems0.451 (0.448)0.522 (0.519)0.565 (0.562)
worst system0.362 (0.360)0.427 (0.425)0.486 (0.484)
best baseline0.546 (0.543)0.608 (0.605)0.654 (0.651)


O systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
best system0.449 (0.446)0.505 (0.503)0.541 (0.538)
average of systems0.449 (0.446)0.505 (0.503)0.541 (0.538)
worst system0.449 (0.446)0.505 (0.503)0.541 (0.538)


Detailed results



derive-v

number of items in task:217


all sensesmain senses only
polysemy:64


fine-grainedcoarse-grained
entropy: 1.9551.731


All systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
human0.965 (0.961)0.965 (0.961)0.965 (0.961)
best system0.664 (0.664)0.677 (0.677)0.687 (0.687)
average of systems0.561 (0.560)0.571 (0.569)0.575 (0.573)
worst system0.459 (0.459)0.478 (0.478)0.481 (0.481)
best baseline0.588 (0.585)0.588 (0.585)0.588 (0.585)


A systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
best system0.535 (0.535)0.535 (0.535)0.535 (0.535)
average of systems0.508 (0.502)0.514 (0.508)0.515 (0.509)
worst system0.459 (0.459)0.478 (0.478)0.481 (0.481)
best baseline0.530 (0.530)0.530 (0.530)0.530 (0.530)


S systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
best system0.664 (0.664)0.677 (0.677)0.687 (0.687)
average of systems0.600 (0.600)0.614 (0.614)0.620 (0.620)
worst system0.502 (0.502)0.532 (0.532)0.539 (0.539)
best baseline0.588 (0.585)0.588 (0.585)0.588 (0.585)


O systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
best system0.530 (0.530)0.530 (0.530)0.530 (0.530)
average of systems0.530 (0.530)0.530 (0.530)0.530 (0.530)
worst system0.530 (0.530)0.530 (0.530)0.530 (0.530)


Detailed results



float-v

number of items in task:229


all sensesmain senses only
polysemy:1611


fine-grainedcoarse-grained
entropy: 3.3332.632


All systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
human0.927 (0.923)0.938 (0.934)0.943 (0.939)
best system0.555 (0.555)0.614 (0.614)0.629 (0.629)
average of systems0.383 (0.382)0.437 (0.436)0.455 (0.454)
worst system0.200 (0.200)0.266 (0.266)0.288 (0.288)
best baseline0.524 (0.524)0.579 (0.579)0.616 (0.616)


A systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
best system0.302 (0.293)0.338 (0.328)0.374 (0.362)
average of systems0.244 (0.241)0.301 (0.298)0.323 (0.319)
worst system0.200 (0.200)0.266 (0.266)0.288 (0.288)
best baseline0.403 (0.400)0.467 (0.463)0.502 (0.498)


S systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
best system0.555 (0.555)0.614 (0.614)0.629 (0.629)
average of systems0.475 (0.475)0.529 (0.529)0.546 (0.546)
worst system0.406 (0.406)0.463 (0.463)0.507 (0.507)
best baseline0.524 (0.524)0.579 (0.579)0.616 (0.616)


O systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
best system0.341 (0.341)0.380 (0.380)0.397 (0.397)
average of systems0.341 (0.341)0.380 (0.380)0.397 (0.397)
worst system0.341 (0.341)0.380 (0.380)0.397 (0.397)


Detailed results



invade-v

number of items in task:207


all sensesmain senses only
polysemy:63


fine-grainedcoarse-grained
entropy: 2.1951.518


All systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
human0.921 (0.912)0.922 (0.913)0.924 (0.915)
best system0.556 (0.556)0.623 (0.623)0.662 (0.662)
average of systems0.449 (0.448)0.538 (0.538)0.571 (0.570)
worst system0.239 (0.239)0.401 (0.401)0.415 (0.415)
best baseline0.570 (0.570)0.643 (0.643)0.686 (0.686)


A systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
best system0.498 (0.498)0.570 (0.570)0.618 (0.618)
average of systems0.357 (0.355)0.478 (0.476)0.516 (0.513)
worst system0.239 (0.239)0.401 (0.401)0.415 (0.415)
best baseline0.420 (0.418)0.495 (0.493)0.517 (0.514)


S systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
best system0.556 (0.556)0.623 (0.623)0.662 (0.662)
average of systems0.505 (0.505)0.583 (0.583)0.615 (0.615)
worst system0.464 (0.464)0.527 (0.527)0.546 (0.546)
best baseline0.570 (0.570)0.643 (0.643)0.686 (0.686)


O systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
best system0.440 (0.440)0.495 (0.495)0.517 (0.517)
average of systems0.440 (0.440)0.495 (0.495)0.517 (0.517)
worst system0.440 (0.440)0.495 (0.495)0.517 (0.517)


Detailed results



promise-v

number of items in task:224


all sensesmain senses only
polysemy:63


fine-grainedcoarse-grained
entropy: 0.9820.812


All systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
human0.953 (0.953)0.962 (0.962)0.962 (0.962)
best system0.906 (0.906)0.911 (0.911)0.911 (0.911)
average of systems0.765 (0.763)0.834 (0.833)0.842 (0.841)
worst system0.431 (0.431)0.678 (0.672)0.689 (0.683)
best baseline0.862 (0.862)0.873 (0.873)0.884 (0.884)


A systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
best system0.710 (0.710)0.834 (0.834)0.839 (0.839)
average of systems0.571 (0.569)0.746 (0.744)0.755 (0.753)
worst system0.431 (0.431)0.678 (0.672)0.689 (0.683)
best baseline0.862 (0.862)0.873 (0.873)0.884 (0.884)


S systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
best system0.906 (0.906)0.911 (0.911)0.911 (0.911)
average of systems0.864 (0.863)0.883 (0.882)0.889 (0.888)
worst system0.826 (0.826)0.871 (0.871)0.875 (0.875)
best baseline0.857 (0.857)0.868 (0.868)0.879 (0.879)


O systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
best system0.848 (0.844)0.859 (0.855)0.870 (0.866)
average of systems0.848 (0.844)0.859 (0.855)0.870 (0.866)
worst system0.848 (0.844)0.859 (0.855)0.870 (0.866)


Detailed results



sack-v

number of items in task:178


all sensesmain senses only
polysemy:44


fine-grainedcoarse-grained
entropy: 0.1320.132


All systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
human0.994 (0.994)0.994 (0.994)0.994 (0.994)
best system0.978 (0.978)0.978 (0.978)0.978 (0.978)
average of systems0.864 (0.864)0.865 (0.865)0.865 (0.865)
worst system0.034 (0.034)0.039 (0.039)0.039 (0.039)
best baseline0.980 (0.980)0.980 (0.980)0.980 (0.980)


A systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
best system0.966 (0.966)0.966 (0.966)0.966 (0.966)
average of systems0.637 (0.637)0.638 (0.638)0.638 (0.638)
worst system0.034 (0.034)0.039 (0.039)0.039 (0.039)
best baseline0.978 (0.978)0.978 (0.978)0.978 (0.978)


S systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
best system0.978 (0.978)0.978 (0.978)0.978 (0.978)
average of systems0.978 (0.978)0.978 (0.978)0.978 (0.978)
worst system0.978 (0.978)0.978 (0.978)0.978 (0.978)
best baseline0.980 (0.980)0.980 (0.980)0.980 (0.980)


O systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
best system0.978 (0.978)0.978 (0.978)0.978 (0.978)
average of systems0.978 (0.978)0.978 (0.978)0.978 (0.978)
worst system0.978 (0.978)0.978 (0.978)0.978 (0.978)


Detailed results



scrap-v

number of items in task:186


all sensesmain senses only
polysemy:32


fine-grainedcoarse-grained
entropy: 0.6940.133


All systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
human0.981 (0.981)0.995 (0.995)0.995 (0.995)
best system0.898 (0.898)0.978 (0.978)0.978 (0.978)
average of systems0.823 (0.820)0.959 (0.957)0.959 (0.957)
worst system0.454 (0.454)0.863 (0.863)0.863 (0.863)
best baseline0.887 (0.887)0.978 (0.978)0.978 (0.978)


A systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
best system0.871 (0.871)0.968 (0.968)0.968 (0.968)
average of systems0.722 (0.718)0.922 (0.917)0.922 (0.917)
worst system0.454 (0.454)0.863 (0.863)0.863 (0.863)
best baseline0.876 (0.876)0.978 (0.978)0.978 (0.978)


S systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
best system0.898 (0.898)0.978 (0.978)0.978 (0.978)
average of systems0.874 (0.874)0.978 (0.978)0.978 (0.978)
worst system0.828 (0.828)0.978 (0.978)0.978 (0.978)
best baseline0.887 (0.887)0.978 (0.978)0.978 (0.978)


O systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
best system0.865 (0.860)0.978 (0.973)0.978 (0.973)
average of systems0.865 (0.860)0.978 (0.973)0.978 (0.973)
worst system0.865 (0.860)0.978 (0.973)0.978 (0.973)


Detailed results



seize-v

number of items in task:259


all sensesmain senses only
polysemy:119


fine-grainedcoarse-grained
entropy: 2.8062.576


All systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
human0.921 (0.921)0.921 (0.921)0.929 (0.929)
best system0.714 (0.714)0.714 (0.714)0.753 (0.753)
average of systems0.545 (0.544)0.545 (0.544)0.579 (0.578)
worst system0.357 (0.351)0.357 (0.351)0.375 (0.375)
best baseline0.680 (0.680)0.680 (0.680)0.710 (0.710)


A systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
best system0.367 (0.367)0.367 (0.367)0.396 (0.390)
average of systems0.361 (0.359)0.361 (0.359)0.382 (0.380)
worst system0.357 (0.351)0.357 (0.351)0.375 (0.375)
best baseline0.498 (0.498)0.498 (0.498)0.498 (0.498)


S systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
best system0.714 (0.714)0.714 (0.714)0.753 (0.753)
average of systems0.649 (0.649)0.649 (0.649)0.689 (0.689)
worst system0.587 (0.587)0.587 (0.587)0.637 (0.637)
best baseline0.680 (0.680)0.680 (0.680)0.710 (0.710)


O systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
best system0.579 (0.579)0.579 (0.579)0.618 (0.618)
average of systems0.579 (0.579)0.579 (0.579)0.618 (0.618)
worst system0.579 (0.579)0.579 (0.579)0.618 (0.618)


Detailed results



brilliant-a

number of items in task:229


all sensesmain senses only
polysemy:108


fine-grainedcoarse-grained
entropy: 2.3821.965


All systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
human0.930 (0.926)0.954 (0.950)0.954 (0.950)
best system0.563 (0.563)0.624 (0.624)0.624 (0.624)
average of systems0.457 (0.434)0.542 (0.513)0.542 (0.513)
worst system0.228 (0.228)0.317 (0.317)0.317 (0.317)
best baseline0.512 (0.510)0.610 (0.608)0.610 (0.608)


A systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
best system0.444 (0.262)0.563 (0.332)0.563 (0.332)
average of systems0.361 (0.299)0.465 (0.387)0.465 (0.387)
worst system0.228 (0.228)0.317 (0.317)0.317 (0.317)
best baseline0.476 (0.476)0.585 (0.585)0.585 (0.585)


S systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
best system0.563 (0.563)0.624 (0.624)0.624 (0.624)
average of systems0.515 (0.515)0.588 (0.588)0.588 (0.588)
worst system0.454 (0.454)0.524 (0.524)0.524 (0.524)
best baseline0.512 (0.510)0.610 (0.608)0.610 (0.608)


Detailed results



deaf-a

number of items in task:122


all sensesmain senses only
polysemy:55


fine-grainedcoarse-grained
entropy: 1.2241.224


All systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
human0.985 (0.985)0.985 (0.985)0.985 (0.985)
best system0.943 (0.943)0.943 (0.943)0.943 (0.943)
average of systems0.759 (0.747)0.811 (0.794)0.811 (0.794)
worst system0.422 (0.422)0.496 (0.496)0.496 (0.496)
best baseline0.902 (0.902)0.902 (0.902)0.902 (0.902)


A systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
best system0.926 (0.926)0.926 (0.926)0.926 (0.926)
average of systems0.664 (0.643)0.750 (0.723)0.750 (0.723)
worst system0.422 (0.422)0.496 (0.496)0.496 (0.496)
best baseline0.902 (0.902)0.902 (0.902)0.902 (0.902)


S systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
best system0.943 (0.943)0.943 (0.943)0.943 (0.943)
average of systems0.902 (0.902)0.902 (0.902)0.902 (0.902)
worst system0.861 (0.861)0.861 (0.861)0.861 (0.861)
best baseline0.704 (0.693)0.704 (0.693)0.704 (0.693)


Detailed results



floating-a

number of items in task:47


all sensesmain senses only
polysemy:54


fine-grainedcoarse-grained
entropy: 1.7481.539


All systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
human0.979 (0.979)0.979 (0.979)0.979 (0.979)
best system0.702 (0.702)0.702 (0.702)0.702 (0.702)
average of systems0.401 (0.398)0.427 (0.424)0.432 (0.429)
worst system0.000 (0.000)0.021 (0.021)0.043 (0.043)
best baseline0.660 (0.660)0.681 (0.681)0.681 (0.681)


A systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
best system0.578 (0.553)0.578 (0.553)0.578 (0.553)
average of systems0.196 (0.188)0.213 (0.205)0.223 (0.214)
worst system0.000 (0.000)0.021 (0.021)0.043 (0.043)
best baseline0.596 (0.596)0.596 (0.596)0.596 (0.596)


S systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
best system0.702 (0.702)0.702 (0.702)0.702 (0.702)
average of systems0.523 (0.523)0.555 (0.555)0.557 (0.557)
worst system0.298 (0.298)0.330 (0.330)0.340 (0.340)
best baseline0.660 (0.660)0.681 (0.681)0.681 (0.681)


Detailed results



generous-a

number of items in task:227


all sensesmain senses only
polysemy:66


fine-grainedcoarse-grained
entropy: 2.3032.303


All systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
human0.969 (0.969)0.969 (0.969)0.969 (0.969)
best system0.612 (0.612)0.612 (0.612)0.612 (0.612)
average of systems0.471 (0.470)0.471 (0.470)0.471 (0.470)
worst system0.225 (0.225)0.225 (0.225)0.225 (0.225)
best baseline0.488 (0.488)0.488 (0.488)0.488 (0.488)


A systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
best system0.383 (0.383)0.383 (0.383)0.383 (0.383)
average of systems0.329 (0.329)0.329 (0.329)0.329 (0.329)
worst system0.225 (0.225)0.225 (0.225)0.225 (0.225)
best baseline0.407 (0.405)0.407 (0.405)0.407 (0.405)


S systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
best system0.612 (0.612)0.612 (0.612)0.612 (0.612)
average of systems0.556 (0.555)0.556 (0.555)0.556 (0.555)
worst system0.478 (0.471)0.478 (0.471)0.478 (0.471)
best baseline0.488 (0.488)0.488 (0.488)0.488 (0.488)


Detailed results



giant-a

number of items in task:97


all sensesmain senses only
polysemy:52


fine-grainedcoarse-grained
entropy: 0.6170.214


All systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
human1.000 (1.000)1.000 (1.000)1.000 (1.000)
best system0.990 (0.990)0.995 (0.995)1.000 (1.000)
average of systems0.822 (0.820)0.839 (0.837)0.843 (0.840)
worst system0.155 (0.155)0.155 (0.155)0.155 (0.155)
best baseline0.990 (0.990)0.995 (0.995)1.000 (1.000)


A systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
best system0.884 (0.866)0.895 (0.876)0.895 (0.876)
average of systems0.580 (0.574)0.585 (0.579)0.587 (0.581)
worst system0.155 (0.155)0.155 (0.155)0.155 (0.155)
best baseline0.985 (0.985)0.995 (0.995)1.000 (1.000)


S systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
best system0.990 (0.990)0.995 (0.995)1.000 (1.000)
average of systems0.967 (0.967)0.991 (0.991)0.996 (0.996)
worst system0.918 (0.918)0.985 (0.985)0.990 (0.990)
best baseline0.990 (0.990)0.995 (0.995)1.000 (1.000)


Detailed results



modest-a

number of items in task:270


all sensesmain senses only
polysemy:93


fine-grainedcoarse-grained
entropy: 2.2981.323


All systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
human0.920 (0.920)0.935 (0.935)0.941 (0.941)
best system0.704 (0.704)0.725 (0.725)0.737 (0.737)
average of systems0.566 (0.565)0.588 (0.586)0.600 (0.598)
worst system0.219 (0.219)0.219 (0.219)0.219 (0.219)
best baseline0.648 (0.648)0.656 (0.656)0.670 (0.670)


A systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
best system0.663 (0.648)0.668 (0.653)0.678 (0.663)
average of systems0.394 (0.389)0.405 (0.400)0.414 (0.409)
worst system0.219 (0.219)0.219 (0.219)0.219 (0.219)
best baseline0.637 (0.637)0.643 (0.643)0.656 (0.656)


S systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
best system0.704 (0.704)0.725 (0.725)0.737 (0.737)
average of systems0.670 (0.670)0.698 (0.698)0.711 (0.711)
worst system0.604 (0.604)0.654 (0.654)0.667 (0.667)
best baseline0.648 (0.648)0.656 (0.656)0.670 (0.670)


Detailed results



slight-a

number of items in task:218


all sensesmain senses only
polysemy:63


fine-grainedcoarse-grained
entropy: 1.2850.432


All systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
human0.995 (0.995)0.995 (0.995)0.995 (0.995)
best system0.963 (0.963)0.963 (0.963)0.963 (0.963)
average of systems0.763 (0.763)0.857 (0.856)0.915 (0.915)
worst system0.304 (0.304)0.593 (0.587)0.833 (0.826)
best baseline0.954 (0.954)0.954 (0.954)0.954 (0.954)


A systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
best system0.711 (0.711)0.830 (0.830)0.922 (0.922)
average of systems0.533 (0.531)0.729 (0.727)0.879 (0.876)
worst system0.304 (0.304)0.593 (0.587)0.833 (0.826)
best baseline0.954 (0.954)0.954 (0.954)0.954 (0.954)


S systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
best system0.963 (0.963)0.963 (0.963)0.963 (0.963)
average of systems0.902 (0.902)0.933 (0.933)0.938 (0.938)
worst system0.817 (0.817)0.908 (0.908)0.908 (0.908)
best baseline0.954 (0.954)0.954 (0.954)0.954 (0.954)


Detailed results



wooden-a

number of items in task:196


all sensesmain senses only
polysemy:44


fine-grainedcoarse-grained
entropy: 0.3650.365


All systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
human1.000 (1.000)1.000 (1.000)1.000 (1.000)
best system0.980 (0.980)0.980 (0.980)0.980 (0.980)
average of systems0.946 (0.945)0.946 (0.945)0.946 (0.945)
worst system0.816 (0.816)0.816 (0.816)0.816 (0.816)
best baseline0.964 (0.964)0.964 (0.964)0.964 (0.964)


A systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
best system0.980 (0.980)0.980 (0.980)0.980 (0.980)
average of systems0.922 (0.922)0.922 (0.922)0.922 (0.922)
worst system0.816 (0.816)0.816 (0.816)0.816 (0.816)
best baseline0.949 (0.949)0.949 (0.949)0.949 (0.949)


S systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
best system0.974 (0.974)0.974 (0.974)0.974 (0.974)
average of systems0.960 (0.959)0.960 (0.959)0.960 (0.959)
worst system0.939 (0.939)0.939 (0.939)0.939 (0.939)
best baseline0.964 (0.964)0.964 (0.964)0.964 (0.964)


Detailed results



band-p

number of items in task:302


all sensesmain senses only
polysemy:2925


fine-grainedcoarse-grained
entropy: 1.7491.669


All systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
human0.990 (0.990)0.990 (0.990)0.990 (0.990)
best system0.904 (0.904)0.907 (0.907)0.907 (0.907)
average of systems0.849 (0.840)0.850 (0.841)0.850 (0.841)
worst system0.689 (0.689)0.689 (0.689)0.689 (0.689)
best baseline0.852 (0.843)0.852 (0.843)0.852 (0.843)


A systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
best system0.889 (0.874)0.889 (0.874)0.889 (0.874)
average of systems0.803 (0.798)0.803 (0.798)0.803 (0.798)
worst system0.689 (0.689)0.689 (0.689)0.689 (0.689)
best baseline0.250 (0.248)0.257 (0.255)0.257 (0.255)


S systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
best system0.904 (0.904)0.907 (0.907)0.907 (0.907)
average of systems0.877 (0.866)0.878 (0.867)0.878 (0.867)
worst system0.819 (0.762)0.819 (0.762)0.819 (0.762)
best baseline0.852 (0.843)0.852 (0.843)0.852 (0.843)


Detailed results



bitter-p

number of items in task:373


all sensesmain senses only
polysemy:1410


fine-grainedcoarse-grained
entropy: 2.6662.472


All systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
human0.924 (0.917)0.927 (0.920)0.927 (0.920)
best system0.668 (0.668)0.680 (0.680)0.681 (0.681)
average of systems0.521 (0.519)0.526 (0.525)0.528 (0.526)
worst system0.223 (0.223)0.229 (0.229)0.233 (0.233)
best baseline0.551 (0.550)0.556 (0.555)0.556 (0.555)


A systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
best system0.486 (0.475)0.489 (0.477)0.489 (0.477)
average of systems0.336 (0.333)0.343 (0.339)0.346 (0.342)
worst system0.223 (0.223)0.229 (0.229)0.233 (0.233)
best baseline0.403 (0.402)0.414 (0.412)0.414 (0.412)


S systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
best system0.668 (0.668)0.680 (0.680)0.681 (0.681)
average of systems0.631 (0.631)0.636 (0.636)0.637 (0.636)
worst system0.523 (0.523)0.523 (0.523)0.523 (0.523)
best baseline0.551 (0.550)0.556 (0.555)0.556 (0.555)


Detailed results



hurdle-p

number of items in task:323


all sensesmain senses only
polysemy:118


fine-grainedcoarse-grained
entropy: 2.4372.019


All systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
human0.985 (0.985)0.987 (0.987)0.987 (0.987)
best system0.684 (0.684)0.693 (0.693)0.693 (0.693)
average of systems0.266 (0.266)0.327 (0.326)0.327 (0.326)
worst system0.097 (0.096)0.105 (0.105)0.105 (0.105)
best baseline0.334 (0.334)0.467 (0.467)0.467 (0.467)


A systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
best system0.251 (0.251)0.374 (0.368)0.374 (0.368)
average of systems0.180 (0.180)0.279 (0.277)0.279 (0.277)
worst system0.097 (0.096)0.193 (0.193)0.193 (0.193)
best baseline0.334 (0.334)0.467 (0.467)0.467 (0.467)


S systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
best system0.684 (0.684)0.693 (0.693)0.693 (0.693)
average of systems0.395 (0.395)0.399 (0.399)0.399 (0.399)
worst system0.105 (0.105)0.105 (0.105)0.105 (0.105)
best baseline0.284 (0.283)0.328 (0.327)0.328 (0.327)


Detailed results



sanction-p

number of items in task:431


all sensesmain senses only
polysemy:76


fine-grainedcoarse-grained
entropy: 1.8101.722


All systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
human0.981 (0.981)0.984 (0.984)0.984 (0.984)
best system0.865 (0.865)0.865 (0.865)0.865 (0.865)
average of systems0.718 (0.718)0.724 (0.723)0.724 (0.723)
worst system0.450 (0.450)0.450 (0.450)0.450 (0.450)
best baseline0.781 (0.780)0.781 (0.780)0.781 (0.780)


A systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
best system0.755 (0.749)0.755 (0.749)0.755 (0.749)
average of systems0.599 (0.597)0.607 (0.605)0.607 (0.605)
worst system0.450 (0.450)0.450 (0.450)0.450 (0.450)
best baseline0.585 (0.585)0.601 (0.601)0.601 (0.601)


S systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
best system0.865 (0.865)0.865 (0.865)0.865 (0.865)
average of systems0.790 (0.790)0.794 (0.794)0.794 (0.794)
worst system0.677 (0.677)0.687 (0.687)0.687 (0.687)
best baseline0.781 (0.780)0.781 (0.780)0.781 (0.780)


Detailed results



shake-p

number of items in task:356


all sensesmain senses only
polysemy:3630


fine-grainedcoarse-grained
entropy: 3.6963.531


All systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
human0.974 (0.974)0.977 (0.977)0.978 (0.978)
best system0.753 (0.753)0.767 (0.767)0.775 (0.775)
average of systems0.628 (0.625)0.648 (0.645)0.657 (0.653)
worst system0.299 (0.299)0.357 (0.357)0.368 (0.368)
best baseline0.632 (0.632)0.640 (0.640)0.649 (0.649)


A systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
best system0.660 (0.638)0.689 (0.666)0.698 (0.674)
average of systems0.531 (0.524)0.564 (0.556)0.574 (0.566)
worst system0.299 (0.299)0.357 (0.357)0.368 (0.368)
best baseline0.583 (0.581)0.609 (0.607)0.614 (0.612)


S systems
fine-grained
precision (recall)
mixed-grained
precision (recall)
coarse-grained
precision (recall)
best system0.753 (0.753)0.767 (0.767)0.775 (0.775)
average of systems0.686 (0.686)0.699 (0.698)0.706 (0.706)
worst system0.613 (0.610)0.631 (0.627)0.641 (0.638)
best baseline0.632 (0.632)0.640 (0.640)0.649 (0.649)


Detailed results