| number of items in task: | 8448 |
| all senses | main senses only | |
| average polysemy: | 10.372 | 7.207 |
| fine-grained | coarse-grained | |
| average entropy: | 1.916 | 1.512 |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| human | 0.965 (0.963) | 0.968 (0.967) | 0.970 (0.968) |
| best system | 0.781 (0.781) | 0.804 (0.804) | 0.818 (0.818) |
| average of systems | 0.639 (0.518) | 0.696 (0.555) | 0.717 (0.571) |
| worst system | 0.418 (0.127) | 0.511 (0.511) | 0.538 (0.538) |
| best baseline | 0.691 (0.689) | 0.720 (0.719) | 0.741 (0.739) |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| best system | 0.616 (0.605) | 0.660 (0.648) | 0.683 (0.671) |
| average of systems | 0.499 (0.422) | 0.590 (0.480) | 0.621 (0.502) |
| worst system | 0.418 (0.127) | 0.511 (0.511) | 0.538 (0.538) |
| best baseline | 0.550 (0.548) | 0.584 (0.582) | 0.600 (0.597) |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| best system | 0.781 (0.781) | 0.804 (0.804) | 0.818 (0.818) |
| average of systems | 0.721 (0.604) | 0.757 (0.629) | 0.773 (0.641) |
| worst system | 0.653 (0.209) | 0.733 (0.234) | 0.751 (0.657) |
| best baseline | 0.691 (0.689) | 0.720 (0.719) | 0.741 (0.739) |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| best system | 0.711 (0.394) | 0.746 (0.413) | 0.767 (0.424) |
| average of systems | 0.711 (0.394) | 0.746 (0.413) | 0.767 (0.424) |
| worst system | 0.711 (0.394) | 0.746 (0.413) | 0.767 (0.424) |
| number of items in task: | 7446 |
| all senses | main senses only | |
| average polysemy: | 10.788 | 7.432 |
| fine-grained | coarse-grained | |
| average entropy: | 1.962 | 1.551 |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| human | 0.963 (0.962) | 0.967 (0.966) | 0.969 (0.967) |
| best system | 0.771 (0.771) | 0.797 (0.796) | 0.812 (0.812) |
| average of systems | 0.644 (0.546) | 0.694 (0.581) | 0.719 (0.599) |
| worst system | 0.411 (0.113) | 0.502 (0.502) | 0.533 (0.533) |
| best baseline | 0.709 (0.708) | 0.735 (0.734) | 0.759 (0.758) |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| best system | 0.631 (0.621) | 0.665 (0.654) | 0.691 (0.680) |
| average of systems | 0.497 (0.418) | 0.581 (0.471) | 0.617 (0.496) |
| worst system | 0.411 (0.113) | 0.502 (0.502) | 0.533 (0.533) |
| best baseline | 0.549 (0.547) | 0.582 (0.579) | 0.599 (0.596) |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| best system | 0.771 (0.771) | 0.797 (0.796) | 0.812 (0.812) |
| average of systems | 0.731 (0.648) | 0.761 (0.673) | 0.778 (0.687) |
| worst system | 0.701 (0.701) | 0.734 (0.729) | 0.751 (0.745) |
| best baseline | 0.709 (0.708) | 0.735 (0.734) | 0.759 (0.758) |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| best system | 0.711 (0.447) | 0.746 (0.469) | 0.767 (0.482) |
| average of systems | 0.711 (0.447) | 0.746 (0.469) | 0.767 (0.482) |
| worst system | 0.711 (0.447) | 0.746 (0.469) | 0.767 (0.482) |
| number of items in task: | 1002 |
| all senses | main senses only | |
| average polysemy: | 7.276 | 5.533 |
| fine-grained | coarse-grained | |
| average entropy: | 1.568 | 1.220 |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| human | 0.973 (0.973) | 0.974 (0.974) | 0.975 (0.975) |
| best system | 0.853 (0.853) | 0.860 (0.860) | 0.861 (0.861) |
| average of systems | 0.555 (0.491) | 0.674 (0.573) | 0.675 (0.574) |
| worst system | 0.440 (0.440) | 0.574 (0.574) | 0.576 (0.576) |
| best baseline | 0.626 (0.626) | 0.709 (0.709) | 0.711 (0.711) |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| best system | 0.636 (0.636) | 0.700 (0.358) | 0.700 (0.358) |
| average of systems | 0.505 (0.447) | 0.641 (0.550) | 0.643 (0.551) |
| worst system | 0.440 (0.440) | 0.574 (0.574) | 0.576 (0.576) |
| best baseline | 0.626 (0.626) | 0.709 (0.709) | 0.711 (0.711) |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| best system | 0.853 (0.853) | 0.860 (0.860) | 0.861 (0.861) |
| average of systems | 0.622 (0.550) | 0.718 (0.604) | 0.719 (0.605) |
| worst system | 0.444 (0.228) | 0.594 (0.594) | 0.596 (0.596) |
| best baseline | 0.556 (0.553) | 0.604 (0.602) | 0.606 (0.603) |
| number of items in task: | 800 |
| all senses | main senses only | |
| average polysemy: | 16.862 | 12.594 |
| fine-grained | coarse-grained | |
| average entropy: | 2.499 | 2.013 |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| human | 0.984 (0.982) | 0.986 (0.984) | 0.986 (0.984) |
| best system | 0.907 (0.906) | 0.916 (0.914) | 0.927 (0.926) |
| average of systems | 0.677 (0.573) | 0.710 (0.599) | 0.744 (0.629) |
| worst system | 0.378 (0.378) | 0.420 (0.420) | 0.505 (0.235) |
| best baseline | 0.815 (0.658) | 0.832 (0.672) | 0.861 (0.696) |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| best system | 0.703 (0.702) | 0.757 (0.756) | 0.798 (0.798) |
| average of systems | 0.545 (0.481) | 0.589 (0.519) | 0.652 (0.578) |
| worst system | 0.378 (0.378) | 0.420 (0.420) | 0.505 (0.235) |
| best baseline | 0.801 (0.800) | 0.823 (0.822) | 0.843 (0.842) |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| best system | 0.907 (0.906) | 0.916 (0.914) | 0.927 (0.926) |
| average of systems | 0.747 (0.665) | 0.772 (0.684) | 0.788 (0.698) |
| worst system | 0.485 (0.228) | 0.527 (0.247) | 0.541 (0.254) |
| best baseline | 0.815 (0.658) | 0.832 (0.672) | 0.861 (0.696) |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| best system | 0.778 (0.385) | 0.826 (0.409) | 0.846 (0.419) |
| average of systems | 0.778 (0.385) | 0.826 (0.409) | 0.846 (0.419) |
| worst system | 0.778 (0.385) | 0.826 (0.409) | 0.846 (0.419) |
| number of items in task: | 7648 |
| all senses | main senses only | |
| average polysemy: | 9.693 | 6.644 |
| fine-grained | coarse-grained | |
| average entropy: | 1.855 | 1.460 |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| human | 0.963 (0.961) | 0.966 (0.965) | 0.968 (0.966) |
| best system | 0.768 (0.768) | 0.792 (0.792) | 0.806 (0.806) |
| average of systems | 0.636 (0.513) | 0.696 (0.551) | 0.717 (0.565) |
| worst system | 0.414 (0.119) | 0.520 (0.520) | 0.540 (0.540) |
| best baseline | 0.678 (0.677) | 0.709 (0.708) | 0.730 (0.729) |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| best system | 0.611 (0.601) | 0.656 (0.644) | 0.679 (0.195) |
| average of systems | 0.493 (0.415) | 0.592 (0.476) | 0.620 (0.495) |
| worst system | 0.414 (0.119) | 0.520 (0.520) | 0.540 (0.540) |
| best baseline | 0.524 (0.522) | 0.564 (0.564) | 0.605 (0.605) |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| best system | 0.768 (0.768) | 0.792 (0.792) | 0.806 (0.806) |
| average of systems | 0.720 (0.597) | 0.758 (0.623) | 0.774 (0.635) |
| worst system | 0.680 (0.207) | 0.727 (0.639) | 0.739 (0.650) |
| best baseline | 0.678 (0.677) | 0.709 (0.708) | 0.730 (0.729) |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| best system | 0.705 (0.395) | 0.739 (0.413) | 0.759 (0.425) |
| average of systems | 0.705 (0.395) | 0.739 (0.413) | 0.759 (0.425) |
| worst system | 0.705 (0.395) | 0.739 (0.413) | 0.759 (0.425) |
| number of items in task: | 35 |
| all senses | main senses only | |
| average polysemy: | 11.771 | 8.086 |
| fine-grained | coarse-grained | |
| average entropy: | 2.415 | 1.946 |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| human | 0.938 (0.857) | 0.938 (0.857) | 0.938 (0.857) |
| best system | 0.500 (0.086) | 0.667 (0.114) | 0.667 (0.114) |
| average of systems | 0.335 (0.239) | 0.398 (0.275) | 0.407 (0.284) |
| worst system | 0.229 (0.229) | 0.243 (0.243) | 0.257 (0.257) |
| best baseline | 0.343 (0.343) | 0.371 (0.371) | 0.371 (0.371) |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| best system | 0.333 (0.057) | 0.500 (0.086) | 0.500 (0.086) |
| average of systems | 0.280 (0.208) | 0.365 (0.257) | 0.373 (0.265) |
| worst system | 0.229 (0.229) | 0.243 (0.243) | 0.257 (0.257) |
| best baseline | 0.343 (0.343) | 0.371 (0.371) | 0.371 (0.371) |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| best system | 0.500 (0.086) | 0.667 (0.114) | 0.667 (0.114) |
| average of systems | 0.374 (0.272) | 0.434 (0.305) | 0.444 (0.314) |
| worst system | 0.286 (0.229) | 0.329 (0.329) | 0.343 (0.343) |
| best baseline | 0.200 (0.200) | 0.250 (0.186) | 0.269 (0.200) |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| best system | 0.316 (0.171) | 0.316 (0.171) | 0.316 (0.171) |
| average of systems | 0.316 (0.171) | 0.316 (0.171) | 0.316 (0.171) |
| worst system | 0.316 (0.171) | 0.316 (0.171) | 0.316 (0.171) |
| number of items in task: | 286 |
| all senses | main senses only | |
| average polysemy: | 12.014 | 8.899 |
| fine-grained | coarse-grained | |
| average entropy: | 2.035 | 1.671 |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| human | 0.991 (0.981) | 0.993 (0.983) | 0.993 (0.983) |
| best system | 0.937 (0.937) | 0.937 (0.937) | 0.937 (0.937) |
| average of systems | 0.591 (0.351) | 0.660 (0.395) | 0.665 (0.397) |
| worst system | 0.223 (0.223) | 0.282 (0.282) | 0.286 (0.286) |
| best baseline | 0.756 (0.325) | 0.756 (0.325) | 0.756 (0.325) |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| best system | 0.394 (0.113) | 0.691 (0.664) | 0.695 (0.668) |
| average of systems | 0.347 (0.273) | 0.488 (0.378) | 0.494 (0.383) |
| worst system | 0.223 (0.223) | 0.282 (0.282) | 0.286 (0.286) |
| best baseline | 0.556 (0.556) | 0.633 (0.633) | 0.643 (0.643) |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| best system | 0.937 (0.937) | 0.937 (0.937) | 0.937 (0.937) |
| average of systems | 0.751 (0.447) | 0.785 (0.458) | 0.785 (0.458) |
| worst system | 0.552 (0.552) | 0.552 (0.552) | 0.552 (0.552) |
| best baseline | 0.756 (0.325) | 0.756 (0.325) | 0.756 (0.325) |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| best system | 0.605 (0.091) | 0.605 (0.091) | 0.628 (0.094) |
| average of systems | 0.605 (0.091) | 0.605 (0.091) | 0.628 (0.094) |
| worst system | 0.605 (0.091) | 0.605 (0.091) | 0.628 (0.094) |
| number of items in task: | 2756 |
| all senses | main senses only | |
| average polysemy: | 9.167 | 5.381 |
| fine-grained | coarse-grained | |
| average entropy: | 1.740 | 1.167 |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| human | 0.973 (0.973) | 0.975 (0.975) | 0.977 (0.977) |
| best system | 0.865 (0.865) | 0.891 (0.891) | 0.919 (0.919) |
| average of systems | 0.699 (0.635) | 0.766 (0.697) | 0.801 (0.730) |
| worst system | 0.418 (0.388) | 0.562 (0.562) | 0.629 (0.629) |
| best baseline | 0.738 (0.569) | 0.815 (0.629) | 0.879 (0.679) |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| best system | 0.711 (0.696) | 0.753 (0.753) | 0.803 (0.803) |
| average of systems | 0.562 (0.550) | 0.668 (0.653) | 0.718 (0.702) |
| worst system | 0.418 (0.388) | 0.562 (0.562) | 0.629 (0.629) |
| best baseline | 0.628 (0.625) | 0.675 (0.672) | 0.731 (0.731) |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| best system | 0.865 (0.865) | 0.891 (0.891) | 0.919 (0.919) |
| average of systems | 0.777 (0.694) | 0.822 (0.735) | 0.848 (0.758) |
| worst system | 0.653 (0.639) | 0.733 (0.718) | 0.755 (0.739) |
| best baseline | 0.738 (0.569) | 0.815 (0.629) | 0.879 (0.679) |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| best system | 0.779 (0.617) | 0.820 (0.649) | 0.850 (0.673) |
| average of systems | 0.779 (0.617) | 0.820 (0.649) | 0.850 (0.673) |
| worst system | 0.779 (0.617) | 0.820 (0.649) | 0.850 (0.673) |
| number of items in task: | 3792 |
| all senses | main senses only | |
| average polysemy: | 10.949 | 7.445 |
| fine-grained | coarse-grained | |
| average entropy: | 1.832 | 1.363 |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| human | 0.976 (0.976) | 0.978 (0.978) | 0.980 (0.980) |
| best system | 0.845 (0.845) | 0.865 (0.865) | 0.885 (0.885) |
| average of systems | 0.686 (0.575) | 0.746 (0.624) | 0.774 (0.647) |
| worst system | 0.418 (0.282) | 0.518 (0.518) | 0.567 (0.567) |
| best baseline | 0.746 (0.558) | 0.804 (0.602) | 0.852 (0.638) |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| best system | 0.678 (0.666) | 0.732 (0.718) | 0.756 (0.742) |
| average of systems | 0.544 (0.507) | 0.642 (0.589) | 0.682 (0.625) |
| worst system | 0.418 (0.282) | 0.518 (0.518) | 0.567 (0.567) |
| best baseline | 0.564 (0.561) | 0.604 (0.601) | 0.642 (0.642) |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| best system | 0.845 (0.845) | 0.865 (0.865) | 0.885 (0.885) |
| average of systems | 0.765 (0.642) | 0.803 (0.672) | 0.824 (0.689) |
| worst system | 0.653 (0.465) | 0.733 (0.522) | 0.755 (0.537) |
| best baseline | 0.746 (0.558) | 0.804 (0.602) | 0.852 (0.638) |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| best system | 0.779 (0.448) | 0.820 (0.472) | 0.850 (0.489) |
| average of systems | 0.779 (0.448) | 0.820 (0.472) | 0.850 (0.489) |
| worst system | 0.779 (0.448) | 0.820 (0.472) | 0.850 (0.489) |
| number of items in task: | 2501 |
| all senses | main senses only | |
| average polysemy: | 7.791 | 4.994 |
| fine-grained | coarse-grained | |
| average entropy: | 1.859 | 1.496 |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| human | 0.950 (0.947) | 0.955 (0.952) | 0.957 (0.954) |
| best system | 0.709 (0.709) | 0.742 (0.741) | 0.755 (0.755) |
| average of systems | 0.611 (0.610) | 0.653 (0.652) | 0.668 (0.666) |
| worst system | 0.422 (0.421) | 0.474 (0.473) | 0.485 (0.485) |
| best baseline | 0.701 (0.700) | 0.727 (0.725) | 0.746 (0.744) |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| best system | 0.535 (0.527) | 0.578 (0.569) | 0.596 (0.587) |
| average of systems | 0.471 (0.468) | 0.535 (0.532) | 0.547 (0.544) |
| worst system | 0.422 (0.421) | 0.474 (0.473) | 0.485 (0.485) |
| best baseline | 0.547 (0.545) | 0.582 (0.579) | 0.592 (0.589) |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| best system | 0.709 (0.709) | 0.742 (0.741) | 0.755 (0.755) |
| average of systems | 0.687 (0.686) | 0.719 (0.718) | 0.735 (0.734) |
| worst system | 0.642 (0.642) | 0.683 (0.682) | 0.695 (0.695) |
| best baseline | 0.701 (0.700) | 0.727 (0.725) | 0.746 (0.744) |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| best system | 0.651 (0.650) | 0.681 (0.680) | 0.694 (0.692) |
| average of systems | 0.651 (0.650) | 0.681 (0.680) | 0.694 (0.692) |
| worst system | 0.651 (0.650) | 0.681 (0.680) | 0.694 (0.692) |
| number of items in task: | 2907 |
| all senses | main senses only | |
| average polysemy: | 10.821 | 7.733 |
| fine-grained | coarse-grained | |
| average entropy: | 2.056 | 1.723 |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| human | 0.954 (0.951) | 0.958 (0.956) | 0.960 (0.957) |
| best system | 0.720 (0.720) | 0.748 (0.748) | 0.761 (0.761) |
| average of systems | 0.617 (0.605) | 0.656 (0.644) | 0.670 (0.657) |
| worst system | 0.428 (0.428) | 0.475 (0.474) | 0.486 (0.485) |
| best baseline | 0.676 (0.675) | 0.699 (0.697) | 0.717 (0.715) |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| best system | 0.555 (0.546) | 0.596 (0.585) | 0.613 (0.602) |
| average of systems | 0.479 (0.476) | 0.539 (0.535) | 0.550 (0.546) |
| worst system | 0.428 (0.428) | 0.475 (0.474) | 0.486 (0.485) |
| best baseline | 0.541 (0.539) | 0.574 (0.572) | 0.583 (0.581) |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| best system | 0.720 (0.720) | 0.748 (0.748) | 0.761 (0.761) |
| average of systems | 0.693 (0.692) | 0.722 (0.721) | 0.737 (0.736) |
| worst system | 0.646 (0.645) | 0.682 (0.681) | 0.692 (0.692) |
| best baseline | 0.676 (0.675) | 0.699 (0.697) | 0.717 (0.715) |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| best system | 0.651 (0.559) | 0.681 (0.585) | 0.694 (0.595) |
| average of systems | 0.651 (0.559) | 0.681 (0.585) | 0.694 (0.595) |
| worst system | 0.651 (0.559) | 0.681 (0.585) | 0.694 (0.595) |
| number of items in task: | 1406 |
| all senses | main senses only | |
| average polysemy: | 6.760 | 4.576 |
| fine-grained | coarse-grained | |
| average entropy: | 1.658 | 1.236 |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| human | 0.966 (0.965) | 0.972 (0.972) | 0.973 (0.973) |
| best system | 0.777 (0.777) | 0.793 (0.793) | 0.795 (0.795) |
| average of systems | 0.644 (0.615) | 0.682 (0.651) | 0.694 (0.663) |
| worst system | 0.377 (0.377) | 0.476 (0.476) | 0.498 (0.498) |
| best baseline | 0.718 (0.717) | 0.737 (0.735) | 0.740 (0.738) |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| best system | 0.617 (0.605) | 0.652 (0.640) | 0.691 (0.679) |
| average of systems | 0.504 (0.489) | 0.560 (0.544) | 0.586 (0.569) |
| worst system | 0.377 (0.377) | 0.476 (0.476) | 0.498 (0.498) |
| best baseline | 0.681 (0.681) | 0.694 (0.694) | 0.709 (0.709) |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| best system | 0.777 (0.777) | 0.793 (0.793) | 0.795 (0.795) |
| average of systems | 0.728 (0.691) | 0.755 (0.716) | 0.759 (0.720) |
| worst system | 0.674 (0.616) | 0.722 (0.658) | 0.725 (0.661) |
| best baseline | 0.718 (0.717) | 0.737 (0.735) | 0.740 (0.738) |
| number of items in task: | 1750 |
| all senses | main senses only | |
| average polysemy: | 8.430 | 5.868 |
| fine-grained | coarse-grained | |
| average entropy: | 1.867 | 1.490 |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| human | 0.957 (0.956) | 0.962 (0.962) | 0.963 (0.963) |
| best system | 0.751 (0.751) | 0.764 (0.764) | 0.766 (0.766) |
| average of systems | 0.619 (0.596) | 0.650 (0.626) | 0.660 (0.636) |
| worst system | 0.354 (0.354) | 0.436 (0.436) | 0.454 (0.454) |
| best baseline | 0.688 (0.686) | 0.703 (0.701) | 0.705 (0.704) |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| best system | 0.583 (0.571) | 0.611 (0.598) | 0.643 (0.630) |
| average of systems | 0.470 (0.457) | 0.515 (0.502) | 0.537 (0.523) |
| worst system | 0.354 (0.354) | 0.436 (0.436) | 0.454 (0.454) |
| best baseline | 0.615 (0.615) | 0.627 (0.627) | 0.629 (0.629) |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| best system | 0.751 (0.751) | 0.764 (0.764) | 0.766 (0.766) |
| average of systems | 0.709 (0.680) | 0.731 (0.700) | 0.734 (0.703) |
| worst system | 0.669 (0.621) | 0.686 (0.637) | 0.688 (0.639) |
| best baseline | 0.688 (0.686) | 0.703 (0.701) | 0.705 (0.704) |
| number of items in task: | 1785 |
| all senses | main senses only | |
| average polysemy: | 18.692 | 15.199 |
| fine-grained | coarse-grained | |
| average entropy: | 2.468 | 2.284 |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| human | 0.970 (0.969) | 0.972 (0.970) | 0.973 (0.971) |
| best system | 0.770 (0.630) | 0.779 (0.638) | 0.782 (0.640) |
| average of systems | 0.630 (0.578) | 0.644 (0.592) | 0.646 (0.594) |
| worst system | 0.382 (0.382) | 0.397 (0.397) | 0.399 (0.399) |
| best baseline | 0.656 (0.531) | 0.658 (0.533) | 0.661 (0.535) |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| best system | 0.584 (0.573) | 0.640 (0.628) | 0.642 (0.630) |
| average of systems | 0.489 (0.486) | 0.517 (0.513) | 0.520 (0.516) |
| worst system | 0.382 (0.382) | 0.397 (0.397) | 0.399 (0.399) |
| best baseline | 0.425 (0.424) | 0.444 (0.443) | 0.445 (0.444) |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| best system | 0.770 (0.630) | 0.779 (0.638) | 0.782 (0.640) |
| average of systems | 0.715 (0.634) | 0.720 (0.639) | 0.722 (0.641) |
| worst system | 0.646 (0.646) | 0.648 (0.648) | 0.650 (0.650) |
| best baseline | 0.656 (0.531) | 0.658 (0.533) | 0.661 (0.535) |
| number of items in task: | 6663 |
| all senses | main senses only | |
| average polysemy: | 8.143 | 5.066 |
| fine-grained | coarse-grained | |
| average entropy: | 1.768 | 1.305 |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| human | 0.963 (0.962) | 0.967 (0.966) | 0.969 (0.967) |
| best system | 0.787 (0.787) | 0.814 (0.814) | 0.831 (0.831) |
| average of systems | 0.645 (0.544) | 0.706 (0.589) | 0.730 (0.608) |
| worst system | 0.418 (0.161) | 0.541 (0.541) | 0.575 (0.575) |
| best baseline | 0.720 (0.719) | 0.753 (0.752) | 0.779 (0.778) |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| best system | 0.625 (0.613) | 0.665 (0.653) | 0.694 (0.682) |
| average of systems | 0.506 (0.437) | 0.604 (0.506) | 0.640 (0.533) |
| worst system | 0.418 (0.161) | 0.541 (0.541) | 0.575 (0.575) |
| best baseline | 0.584 (0.581) | 0.622 (0.619) | 0.641 (0.638) |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| best system | 0.787 (0.787) | 0.814 (0.814) | 0.831 (0.831) |
| average of systems | 0.726 (0.623) | 0.767 (0.655) | 0.785 (0.670) |
| worst system | 0.653 (0.264) | 0.733 (0.297) | 0.755 (0.306) |
| best baseline | 0.720 (0.719) | 0.753 (0.752) | 0.779 (0.778) |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| best system | 0.711 (0.499) | 0.746 (0.524) | 0.767 (0.538) |
| average of systems | 0.711 (0.499) | 0.746 (0.524) | 0.767 (0.538) |
| worst system | 0.711 (0.499) | 0.746 (0.524) | 0.767 (0.538) |
| number of items in task: | 2199 |
| all senses | main senses only | |
| average polysemy: | 10.067 | 5.675 |
| fine-grained | coarse-grained | |
| average entropy: | 1.892 | 1.271 |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| human | 0.976 (0.975) | 0.978 (0.978) | 0.980 (0.980) |
| best system | 0.849 (0.849) | 0.879 (0.879) | 0.914 (0.914) |
| average of systems | 0.696 (0.690) | 0.757 (0.750) | 0.798 (0.791) |
| worst system | 0.392 (0.392) | 0.499 (0.499) | 0.582 (0.582) |
| best baseline | 0.751 (0.751) | 0.815 (0.788) | 0.879 (0.850) |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| best system | 0.711 (0.698) | 0.757 (0.744) | 0.798 (0.784) |
| average of systems | 0.543 (0.533) | 0.642 (0.629) | 0.704 (0.690) |
| worst system | 0.392 (0.392) | 0.499 (0.499) | 0.582 (0.582) |
| best baseline | 0.615 (0.612) | 0.659 (0.656) | 0.706 (0.706) |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| best system | 0.849 (0.849) | 0.879 (0.879) | 0.914 (0.914) |
| average of systems | 0.784 (0.782) | 0.822 (0.820) | 0.851 (0.849) |
| worst system | 0.702 (0.698) | 0.740 (0.736) | 0.767 (0.763) |
| best baseline | 0.751 (0.751) | 0.815 (0.788) | 0.879 (0.850) |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| best system | 0.779 (0.773) | 0.820 (0.813) | 0.850 (0.843) |
| average of systems | 0.779 (0.773) | 0.820 (0.813) | 0.850 (0.843) |
| worst system | 0.779 (0.773) | 0.820 (0.813) | 0.850 (0.843) |
| number of items in task: | 2914 |
| all senses | main senses only | |
| average polysemy: | 11.963 | 8.000 |
| fine-grained | coarse-grained | |
| average entropy: | 1.898 | 1.406 |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| human | 0.978 (0.977) | 0.980 (0.980) | 0.982 (0.981) |
| best system | 0.847 (0.847) | 0.870 (0.870) | 0.896 (0.896) |
| average of systems | 0.703 (0.656) | 0.755 (0.702) | 0.789 (0.732) |
| worst system | 0.411 (0.289) | 0.498 (0.498) | 0.560 (0.560) |
| best baseline | 0.754 (0.754) | 0.804 (0.783) | 0.852 (0.830) |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| best system | 0.736 (0.725) | 0.772 (0.759) | 0.803 (0.790) |
| average of systems | 0.561 (0.527) | 0.647 (0.600) | 0.697 (0.646) |
| worst system | 0.411 (0.289) | 0.498 (0.498) | 0.560 (0.560) |
| best baseline | 0.571 (0.568) | 0.607 (0.604) | 0.639 (0.636) |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| best system | 0.847 (0.847) | 0.870 (0.870) | 0.896 (0.896) |
| average of systems | 0.784 (0.753) | 0.816 (0.783) | 0.839 (0.805) |
| worst system | 0.702 (0.527) | 0.740 (0.556) | 0.767 (0.576) |
| best baseline | 0.754 (0.754) | 0.804 (0.783) | 0.852 (0.830) |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| best system | 0.779 (0.584) | 0.820 (0.614) | 0.850 (0.636) |
| average of systems | 0.779 (0.584) | 0.820 (0.614) | 0.850 (0.636) |
| worst system | 0.779 (0.584) | 0.820 (0.614) | 0.850 (0.636) |
| number of items in task: | 557 |
| all senses | main senses only | |
| average polysemy: | 5.616 | 4.219 |
| fine-grained | coarse-grained | |
| average entropy: | 1.140 | 0.755 |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| human | 0.964 (0.964) | 0.965 (0.965) | 0.966 (0.966) |
| best system | 0.932 (0.932) | 0.938 (0.938) | 0.941 (0.941) |
| average of systems | 0.669 (0.656) | 0.791 (0.772) | 0.794 (0.774) |
| worst system | 0.444 (0.409) | 0.700 (0.645) | 0.700 (0.645) |
| best baseline | 0.756 (0.756) | 0.828 (0.828) | 0.831 (0.831) |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| best system | 0.795 (0.795) | 0.838 (0.838) | 0.842 (0.842) |
| average of systems | 0.635 (0.619) | 0.771 (0.750) | 0.773 (0.752) |
| worst system | 0.444 (0.409) | 0.700 (0.645) | 0.700 (0.645) |
| best baseline | 0.756 (0.756) | 0.828 (0.828) | 0.831 (0.831) |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| best system | 0.932 (0.932) | 0.938 (0.938) | 0.941 (0.941) |
| average of systems | 0.716 (0.704) | 0.819 (0.801) | 0.821 (0.803) |
| worst system | 0.444 (0.409) | 0.700 (0.645) | 0.700 (0.645) |
| best baseline | 0.681 (0.680) | 0.743 (0.741) | 0.746 (0.744) |
| number of items in task: | 878 |
| all senses | main senses only | |
| average polysemy: | 7.584 | 5.601 |
| fine-grained | coarse-grained | |
| average entropy: | 1.614 | 1.217 |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| human | 0.972 (0.972) | 0.973 (0.973) | 0.973 (0.973) |
| best system | 0.842 (0.842) | 0.848 (0.848) | 0.850 (0.850) |
| average of systems | 0.540 (0.486) | 0.660 (0.574) | 0.662 (0.576) |
| worst system | 0.443 (0.443) | 0.558 (0.558) | 0.560 (0.560) |
| best baseline | 0.603 (0.603) | 0.697 (0.697) | 0.699 (0.699) |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| best system | 0.597 (0.597) | 0.700 (0.409) | 0.700 (0.409) |
| average of systems | 0.492 (0.442) | 0.629 (0.551) | 0.630 (0.553) |
| worst system | 0.443 (0.443) | 0.585 (0.585) | 0.587 (0.587) |
| best baseline | 0.603 (0.603) | 0.697 (0.697) | 0.699 (0.699) |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| best system | 0.842 (0.842) | 0.848 (0.848) | 0.850 (0.850) |
| average of systems | 0.605 (0.543) | 0.702 (0.605) | 0.703 (0.606) |
| worst system | 0.444 (0.260) | 0.558 (0.558) | 0.560 (0.560) |
| best baseline | 0.535 (0.534) | 0.591 (0.589) | 0.592 (0.591) |
| number of items in task: | 1284 |
| all senses | main senses only | |
| average polysemy: | 6.927 | 4.536 |
| fine-grained | coarse-grained | |
| average entropy: | 1.700 | 1.237 |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| human | 0.964 (0.963) | 0.971 (0.970) | 0.972 (0.972) |
| best system | 0.761 (0.761) | 0.779 (0.779) | 0.783 (0.782) |
| average of systems | 0.635 (0.629) | 0.672 (0.666) | 0.685 (0.679) |
| worst system | 0.373 (0.373) | 0.474 (0.474) | 0.499 (0.499) |
| best baseline | 0.720 (0.719) | 0.740 (0.739) | 0.743 (0.743) |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| best system | 0.614 (0.607) | 0.636 (0.630) | 0.680 (0.672) |
| average of systems | 0.488 (0.474) | 0.541 (0.527) | 0.570 (0.554) |
| worst system | 0.373 (0.373) | 0.474 (0.474) | 0.499 (0.499) |
| best baseline | 0.660 (0.660) | 0.677 (0.677) | 0.700 (0.700) |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| best system | 0.761 (0.761) | 0.779 (0.779) | 0.783 (0.782) |
| average of systems | 0.723 (0.722) | 0.750 (0.750) | 0.754 (0.754) |
| worst system | 0.674 (0.674) | 0.722 (0.720) | 0.725 (0.724) |
| best baseline | 0.720 (0.719) | 0.740 (0.739) | 0.743 (0.743) |
| number of items in task: | 1628 |
| all senses | main senses only | |
| average polysemy: | 8.687 | 5.933 |
| fine-grained | coarse-grained | |
| average entropy: | 1.915 | 1.510 |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| human | 0.955 (0.954) | 0.961 (0.960) | 0.962 (0.961) |
| best system | 0.736 (0.736) | 0.751 (0.751) | 0.753 (0.752) |
| average of systems | 0.610 (0.606) | 0.640 (0.636) | 0.651 (0.646) |
| worst system | 0.349 (0.349) | 0.431 (0.431) | 0.451 (0.451) |
| best baseline | 0.686 (0.685) | 0.703 (0.702) | 0.706 (0.705) |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| best system | 0.578 (0.570) | 0.596 (0.587) | 0.630 (0.621) |
| average of systems | 0.454 (0.443) | 0.497 (0.485) | 0.520 (0.508) |
| worst system | 0.349 (0.349) | 0.431 (0.431) | 0.451 (0.451) |
| best baseline | 0.594 (0.594) | 0.606 (0.606) | 0.609 (0.609) |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| best system | 0.736 (0.736) | 0.751 (0.751) | 0.753 (0.752) |
| average of systems | 0.704 (0.704) | 0.726 (0.726) | 0.729 (0.729) |
| worst system | 0.669 (0.668) | 0.686 (0.684) | 0.688 (0.687) |
| best baseline | 0.686 (0.685) | 0.703 (0.702) | 0.706 (0.705) |
| number of items in task: | 122 |
| all senses | main senses only | |
| average polysemy: | 5.000 | 5.000 |
| fine-grained | coarse-grained | |
| average entropy: | 1.224 | 1.224 |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| human | 0.985 (0.985) | 0.985 (0.985) | 0.985 (0.985) |
| best system | 0.943 (0.943) | 0.943 (0.943) | 0.943 (0.943) |
| average of systems | 0.759 (0.747) | 0.811 (0.794) | 0.811 (0.794) |
| worst system | 0.422 (0.422) | 0.496 (0.496) | 0.496 (0.496) |
| best baseline | 0.902 (0.902) | 0.902 (0.902) | 0.902 (0.902) |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| best system | 0.926 (0.926) | 0.926 (0.926) | 0.926 (0.926) |
| average of systems | 0.664 (0.643) | 0.750 (0.723) | 0.750 (0.723) |
| worst system | 0.422 (0.422) | 0.496 (0.496) | 0.496 (0.496) |
| best baseline | 0.902 (0.902) | 0.902 (0.902) | 0.902 (0.902) |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| best system | 0.943 (0.943) | 0.943 (0.943) | 0.943 (0.943) |
| average of systems | 0.902 (0.902) | 0.902 (0.902) | 0.902 (0.902) |
| worst system | 0.861 (0.861) | 0.861 (0.861) | 0.861 (0.861) |
| best baseline | 0.704 (0.693) | 0.704 (0.693) | 0.704 (0.693) |
| number of items in task: | 1462 |
| all senses | main senses only | |
| average polysemy: | 20.392 | 16.789 |
| fine-grained | coarse-grained | |
| average entropy: | 2.475 | 2.343 |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| human | 0.967 (0.965) | 0.969 (0.967) | 0.969 (0.967) |
| best system | 0.776 (0.776) | 0.782 (0.782) | 0.784 (0.784) |
| average of systems | 0.673 (0.670) | 0.681 (0.678) | 0.683 (0.680) |
| worst system | 0.424 (0.424) | 0.442 (0.442) | 0.445 (0.445) |
| best baseline | 0.656 (0.649) | 0.658 (0.651) | 0.661 (0.654) |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| best system | 0.692 (0.678) | 0.699 (0.685) | 0.701 (0.687) |
| average of systems | 0.558 (0.553) | 0.570 (0.565) | 0.573 (0.568) |
| worst system | 0.424 (0.424) | 0.442 (0.442) | 0.445 (0.445) |
| best baseline | 0.452 (0.451) | 0.466 (0.465) | 0.467 (0.466) |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| best system | 0.776 (0.776) | 0.782 (0.782) | 0.784 (0.784) |
| average of systems | 0.742 (0.740) | 0.748 (0.746) | 0.750 (0.747) |
| worst system | 0.677 (0.666) | 0.681 (0.671) | 0.684 (0.673) |
| best baseline | 0.656 (0.649) | 0.658 (0.651) | 0.661 (0.654) |
| number of items in task: | 5984 |
| all senses | main senses only | |
| average polysemy: | 8.442 | 5.146 |
| fine-grained | coarse-grained | |
| average entropy: | 1.837 | 1.358 |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| human | 0.962 (0.961) | 0.967 (0.965) | 0.969 (0.967) |
| best system | 0.770 (0.770) | 0.800 (0.800) | 0.819 (0.819) |
| average of systems | 0.642 (0.560) | 0.697 (0.602) | 0.724 (0.624) |
| worst system | 0.411 (0.141) | 0.517 (0.517) | 0.554 (0.554) |
| best baseline | 0.724 (0.723) | 0.755 (0.754) | 0.784 (0.783) |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| best system | 0.617 (0.607) | 0.656 (0.646) | 0.688 (0.678) |
| average of systems | 0.491 (0.419) | 0.583 (0.483) | 0.624 (0.513) |
| worst system | 0.411 (0.141) | 0.517 (0.517) | 0.554 (0.554) |
| best baseline | 0.573 (0.570) | 0.610 (0.607) | 0.631 (0.628) |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| best system | 0.770 (0.770) | 0.800 (0.800) | 0.819 (0.819) |
| average of systems | 0.730 (0.655) | 0.764 (0.685) | 0.785 (0.703) |
| worst system | 0.696 (0.696) | 0.740 (0.271) | 0.758 (0.758) |
| best baseline | 0.724 (0.723) | 0.755 (0.754) | 0.784 (0.783) |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| best system | 0.711 (0.556) | 0.746 (0.583) | 0.767 (0.599) |
| average of systems | 0.711 (0.556) | 0.746 (0.583) | 0.767 (0.599) |
| worst system | 0.711 (0.556) | 0.746 (0.583) | 0.767 (0.599) |
| number of items in task: | 3771 |
| all senses | main senses only | |
| average polysemy: | 5.069 | 3.682 |
| fine-grained | coarse-grained | |
| average entropy: | 1.281 | 1.011 |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| human | 0.971 (0.970) | 0.974 (0.973) | 0.975 (0.973) |
| best system | 0.837 (0.836) | 0.855 (0.855) | 0.863 (0.863) |
| average of systems | 0.704 (0.562) | 0.749 (0.593) | 0.761 (0.602) |
| worst system | 0.486 (0.116) | 0.624 (0.150) | 0.642 (0.642) |
| best baseline | 0.780 (0.678) | 0.798 (0.694) | 0.810 (0.704) |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| best system | 0.677 (0.667) | 0.710 (0.700) | 0.735 (0.724) |
| average of systems | 0.573 (0.478) | 0.651 (0.530) | 0.669 (0.544) |
| worst system | 0.486 (0.116) | 0.624 (0.150) | 0.642 (0.642) |
| best baseline | 0.675 (0.675) | 0.706 (0.706) | 0.731 (0.730) |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| best system | 0.837 (0.836) | 0.855 (0.855) | 0.863 (0.863) |
| average of systems | 0.782 (0.641) | 0.808 (0.660) | 0.816 (0.667) |
| worst system | 0.737 (0.177) | 0.771 (0.185) | 0.775 (0.186) |
| best baseline | 0.780 (0.678) | 0.798 (0.694) | 0.810 (0.704) |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| best system | 0.762 (0.422) | 0.790 (0.437) | 0.803 (0.444) |
| average of systems | 0.762 (0.422) | 0.790 (0.437) | 0.803 (0.444) |
| worst system | 0.762 (0.422) | 0.790 (0.437) | 0.803 (0.444) |
| number of items in task: | 4677 |
| all senses | main senses only | |
| average polysemy: | 14.648 | 10.050 |
| fine-grained | coarse-grained | |
| average entropy: | 2.428 | 1.916 |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| human | 0.959 (0.957) | 0.963 (0.962) | 0.965 (0.964) |
| best system | 0.736 (0.736) | 0.763 (0.763) | 0.782 (0.782) |
| average of systems | 0.591 (0.483) | 0.654 (0.525) | 0.682 (0.546) |
| worst system | 0.328 (0.328) | 0.416 (0.416) | 0.454 (0.454) |
| best baseline | 0.624 (0.623) | 0.658 (0.657) | 0.689 (0.688) |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| best system | 0.567 (0.554) | 0.619 (0.605) | 0.659 (0.234) |
| average of systems | 0.443 (0.376) | 0.542 (0.440) | 0.582 (0.470) |
| worst system | 0.328 (0.328) | 0.416 (0.416) | 0.454 (0.454) |
| best baseline | 0.465 (0.462) | 0.504 (0.501) | 0.527 (0.524) |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| best system | 0.736 (0.736) | 0.763 (0.763) | 0.782 (0.782) |
| average of systems | 0.676 (0.574) | 0.719 (0.604) | 0.740 (0.621) |
| worst system | 0.610 (0.234) | 0.675 (0.592) | 0.697 (0.612) |
| best baseline | 0.624 (0.623) | 0.658 (0.657) | 0.689 (0.688) |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| best system | 0.670 (0.371) | 0.710 (0.393) | 0.737 (0.408) |
| average of systems | 0.670 (0.371) | 0.710 (0.393) | 0.737 (0.408) |
| worst system | 0.670 (0.371) | 0.710 (0.393) | 0.737 (0.408) |
| number of items in task: | 3872 |
| all senses | main senses only | |
| average polysemy: | 7.309 | 5.472 |
| fine-grained | coarse-grained | |
| average entropy: | 1.116 | 0.846 |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| human | 0.979 (0.978) | 0.981 (0.980) | 0.981 (0.981) |
| best system | 0.921 (0.921) | 0.932 (0.932) | 0.936 (0.936) |
| average of systems | 0.772 (0.625) | 0.829 (0.661) | 0.841 (0.671) |
| worst system | 0.452 (0.180) | 0.671 (0.267) | 0.694 (0.276) |
| best baseline | 0.877 (0.713) | 0.892 (0.725) | 0.907 (0.738) |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| best system | 0.792 (0.777) | 0.818 (0.802) | 0.840 (0.824) |
| average of systems | 0.633 (0.562) | 0.733 (0.628) | 0.752 (0.643) |
| worst system | 0.452 (0.180) | 0.671 (0.267) | 0.694 (0.276) |
| best baseline | 0.696 (0.696) | 0.728 (0.728) | 0.749 (0.749) |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| best system | 0.921 (0.921) | 0.932 (0.932) | 0.936 (0.936) |
| average of systems | 0.844 (0.698) | 0.877 (0.719) | 0.885 (0.725) |
| worst system | 0.686 (0.273) | 0.787 (0.313) | 0.804 (0.320) |
| best baseline | 0.877 (0.713) | 0.892 (0.725) | 0.907 (0.738) |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| best system | 0.895 (0.438) | 0.920 (0.450) | 0.933 (0.456) |
| average of systems | 0.895 (0.438) | 0.920 (0.450) | 0.933 (0.456) |
| worst system | 0.895 (0.438) | 0.920 (0.450) | 0.933 (0.456) |
| number of items in task: | 4576 |
| all senses | main senses only | |
| average polysemy: | 12.963 | 8.675 |
| fine-grained | coarse-grained | |
| average entropy: | 2.593 | 2.075 |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| human | 0.953 (0.950) | 0.957 (0.955) | 0.960 (0.957) |
| best system | 0.662 (0.662) | 0.696 (0.696) | 0.718 (0.718) |
| average of systems | 0.530 (0.428) | 0.583 (0.466) | 0.614 (0.487) |
| worst system | 0.288 (0.288) | 0.351 (0.351) | 0.390 (0.390) |
| best baseline | 0.579 (0.578) | 0.615 (0.614) | 0.643 (0.642) |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| best system | 0.467 (0.459) | 0.526 (0.517) | 0.595 (0.133) |
| average of systems | 0.378 (0.303) | 0.458 (0.356) | 0.504 (0.384) |
| worst system | 0.288 (0.288) | 0.351 (0.351) | 0.390 (0.390) |
| best baseline | 0.454 (0.452) | 0.495 (0.492) | 0.519 (0.516) |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| best system | 0.662 (0.662) | 0.696 (0.696) | 0.718 (0.718) |
| average of systems | 0.621 (0.523) | 0.659 (0.553) | 0.681 (0.571) |
| worst system | 0.595 (0.550) | 0.624 (0.577) | 0.648 (0.599) |
| best baseline | 0.579 (0.578) | 0.615 (0.614) | 0.643 (0.642) |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| best system | 0.586 (0.356) | 0.628 (0.382) | 0.653 (0.397) |
| average of systems | 0.586 (0.356) | 0.628 (0.382) | 0.653 (0.397) |
| worst system | 0.586 (0.356) | 0.628 (0.382) | 0.653 (0.397) |
| number of items in task: | 267 |
| all senses | main senses only | |
| polysemy: | 8 | 2 |
| fine-grained | coarse-grained | |
| entropy: | 1.430 | 0.571 |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| human | 0.987 (0.987) | 0.988 (0.988) | 0.991 (0.991) |
| best system | 0.933 (0.933) | 0.954 (0.954) | 0.981 (0.981) |
| average of systems | 0.767 (0.766) | 0.831 (0.829) | 0.888 (0.886) |
| worst system | 0.273 (0.272) | 0.328 (0.328) | 0.375 (0.375) |
| best baseline | 0.789 (0.783) | 0.828 (0.828) | 0.963 (0.963) |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| best system | 0.855 (0.839) | 0.883 (0.867) | 0.950 (0.933) |
| average of systems | 0.548 (0.544) | 0.673 (0.669) | 0.753 (0.748) |
| worst system | 0.273 (0.272) | 0.328 (0.328) | 0.375 (0.375) |
| best baseline | 0.753 (0.753) | 0.789 (0.789) | 0.933 (0.933) |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| best system | 0.933 (0.933) | 0.954 (0.954) | 0.981 (0.981) |
| average of systems | 0.892 (0.892) | 0.923 (0.923) | 0.966 (0.966) |
| worst system | 0.843 (0.843) | 0.900 (0.900) | 0.948 (0.948) |
| best baseline | 0.789 (0.783) | 0.828 (0.828) | 0.963 (0.963) |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| best system | 0.895 (0.891) | 0.908 (0.905) | 0.962 (0.959) |
| average of systems | 0.895 (0.891) | 0.908 (0.905) | 0.962 (0.959) |
| worst system | 0.895 (0.891) | 0.908 (0.905) | 0.962 (0.959) |
| number of items in task: | 279 |
| all senses | main senses only | |
| polysemy: | 3 | 2 |
| fine-grained | coarse-grained | |
| entropy: | 0.390 | 0.295 |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| human | 0.973 (0.973) | 0.973 (0.973) | 0.973 (0.973) |
| best system | 0.964 (0.964) | 0.964 (0.964) | 0.964 (0.964) |
| average of systems | 0.860 (0.857) | 0.899 (0.896) | 0.899 (0.896) |
| worst system | 0.380 (0.380) | 0.530 (0.530) | 0.530 (0.530) |
| best baseline | 0.946 (0.946) | 0.961 (0.961) | 0.961 (0.961) |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| best system | 0.961 (0.961) | 0.961 (0.961) | 0.961 (0.961) |
| average of systems | 0.701 (0.699) | 0.792 (0.789) | 0.792 (0.789) |
| worst system | 0.380 (0.380) | 0.530 (0.530) | 0.530 (0.530) |
| best baseline | 0.946 (0.946) | 0.961 (0.961) | 0.961 (0.961) |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| best system | 0.964 (0.964) | 0.964 (0.964) | 0.964 (0.964) |
| average of systems | 0.952 (0.948) | 0.959 (0.956) | 0.959 (0.956) |
| worst system | 0.927 (0.927) | 0.953 (0.953) | 0.953 (0.953) |
| best baseline | 0.946 (0.946) | 0.961 (0.961) | 0.961 (0.961) |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| best system | 0.946 (0.946) | 0.961 (0.961) | 0.961 (0.961) |
| average of systems | 0.946 (0.946) | 0.961 (0.961) | 0.961 (0.961) |
| worst system | 0.946 (0.946) | 0.961 (0.961) | 0.961 (0.961) |
| number of items in task: | 274 |
| all senses | main senses only | |
| polysemy: | 15 | 9 |
| fine-grained | coarse-grained | |
| entropy: | 3.200 | 2.563 |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| human | 0.970 (0.970) | 0.980 (0.980) | 0.980 (0.980) |
| best system | 0.712 (0.712) | 0.783 (0.783) | 0.861 (0.861) |
| average of systems | 0.552 (0.549) | 0.620 (0.615) | 0.662 (0.657) |
| worst system | 0.338 (0.338) | 0.416 (0.416) | 0.443 (0.412) |
| best baseline | 0.547 (0.547) | 0.652 (0.621) | 0.782 (0.745) |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| best system | 0.631 (0.631) | 0.703 (0.703) | 0.818 (0.818) |
| average of systems | 0.451 (0.443) | 0.531 (0.521) | 0.587 (0.577) |
| worst system | 0.338 (0.338) | 0.416 (0.416) | 0.443 (0.412) |
| best baseline | 0.406 (0.400) | 0.426 (0.404) | 0.452 (0.429) |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| best system | 0.712 (0.712) | 0.783 (0.783) | 0.861 (0.861) |
| average of systems | 0.629 (0.629) | 0.692 (0.691) | 0.723 (0.723) |
| worst system | 0.498 (0.496) | 0.544 (0.542) | 0.557 (0.555) |
| best baseline | 0.547 (0.547) | 0.652 (0.621) | 0.782 (0.745) |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| best system | 0.494 (0.489) | 0.542 (0.536) | 0.594 (0.588) |
| average of systems | 0.494 (0.489) | 0.542 (0.536) | 0.594 (0.588) |
| worst system | 0.494 (0.489) | 0.542 (0.536) | 0.594 (0.588) |
| number of items in task: | 160 |
| all senses | main senses only | |
| polysemy: | 3 | 2 |
| fine-grained | coarse-grained | |
| entropy: | 1.053 | 0.457 |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| human | 0.959 (0.959) | 0.959 (0.959) | 0.959 (0.959) |
| best system | 0.900 (0.900) | 0.938 (0.938) | 0.938 (0.938) |
| average of systems | 0.828 (0.827) | 0.930 (0.930) | 0.930 (0.930) |
| worst system | 0.800 (0.800) | 0.900 (0.900) | 0.900 (0.900) |
| best baseline | 0.800 (0.800) | 0.938 (0.938) | 0.938 (0.938) |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| best system | 0.874 (0.869) | 0.938 (0.938) | 0.938 (0.938) |
| average of systems | 0.823 (0.822) | 0.934 (0.933) | 0.934 (0.933) |
| worst system | 0.800 (0.800) | 0.931 (0.931) | 0.931 (0.931) |
| best baseline | 0.800 (0.800) | 0.938 (0.938) | 0.938 (0.938) |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| best system | 0.900 (0.900) | 0.938 (0.938) | 0.938 (0.938) |
| average of systems | 0.833 (0.833) | 0.925 (0.925) | 0.925 (0.925) |
| worst system | 0.800 (0.800) | 0.900 (0.900) | 0.900 (0.900) |
| best baseline | 0.755 (0.750) | 0.906 (0.900) | 0.906 (0.900) |
| number of items in task: | 186 |
| all senses | main senses only | |
| polysemy: | 8 | 3 |
| fine-grained | coarse-grained | |
| entropy: | 2.387 | 1.199 |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| human | 0.973 (0.968) | 0.973 (0.968) | 0.973 (0.968) |
| best system | 0.871 (0.871) | 0.875 (0.875) | 0.907 (0.892) |
| average of systems | 0.704 (0.700) | 0.749 (0.745) | 0.787 (0.783) |
| worst system | 0.413 (0.413) | 0.511 (0.511) | 0.613 (0.610) |
| best baseline | 0.800 (0.796) | 0.820 (0.816) | 0.881 (0.876) |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| best system | 0.847 (0.833) | 0.857 (0.843) | 0.869 (0.855) |
| average of systems | 0.554 (0.550) | 0.632 (0.627) | 0.709 (0.705) |
| worst system | 0.413 (0.413) | 0.511 (0.511) | 0.613 (0.610) |
| best baseline | 0.661 (0.661) | 0.747 (0.747) | 0.747 (0.747) |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| best system | 0.871 (0.871) | 0.875 (0.875) | 0.907 (0.892) |
| average of systems | 0.799 (0.797) | 0.817 (0.815) | 0.834 (0.831) |
| worst system | 0.703 (0.703) | 0.736 (0.736) | 0.737 (0.737) |
| best baseline | 0.800 (0.796) | 0.820 (0.816) | 0.881 (0.876) |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| best system | 0.727 (0.715) | 0.814 (0.801) | 0.814 (0.801) |
| average of systems | 0.727 (0.715) | 0.814 (0.801) | 0.814 (0.801) |
| worst system | 0.727 (0.715) | 0.814 (0.801) | 0.814 (0.801) |
| number of items in task: | 75 |
| all senses | main senses only | |
| polysemy: | 12 | 8 |
| fine-grained | coarse-grained | |
| entropy: | 2.340 | 2.042 |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| human | 0.980 (0.980) | 0.980 (0.980) | 0.993 (0.993) |
| best system | 0.813 (0.813) | 0.813 (0.813) | 0.840 (0.840) |
| average of systems | 0.475 (0.475) | 0.487 (0.487) | 0.549 (0.549) |
| worst system | 0.053 (0.053) | 0.107 (0.107) | 0.120 (0.120) |
| best baseline | 0.720 (0.720) | 0.720 (0.720) | 0.733 (0.733) |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| best system | 0.347 (0.347) | 0.347 (0.347) | 0.440 (0.440) |
| average of systems | 0.157 (0.157) | 0.187 (0.187) | 0.303 (0.303) |
| worst system | 0.053 (0.053) | 0.107 (0.107) | 0.120 (0.120) |
| best baseline | 0.267 (0.260) | 0.281 (0.273) | 0.520 (0.520) |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| best system | 0.813 (0.813) | 0.813 (0.813) | 0.840 (0.840) |
| average of systems | 0.671 (0.671) | 0.673 (0.673) | 0.705 (0.705) |
| worst system | 0.573 (0.573) | 0.573 (0.573) | 0.613 (0.613) |
| best baseline | 0.720 (0.720) | 0.720 (0.720) | 0.733 (0.733) |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| best system | 0.573 (0.573) | 0.573 (0.573) | 0.600 (0.600) |
| average of systems | 0.573 (0.573) | 0.573 (0.573) | 0.600 (0.600) |
| worst system | 0.573 (0.573) | 0.573 (0.573) | 0.600 (0.600) |
| number of items in task: | 118 |
| all senses | main senses only | |
| polysemy: | 7 | 3 |
| fine-grained | coarse-grained | |
| entropy: | 2.054 | 1.239 |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| human | 0.975 (0.975) | 0.975 (0.975) | 0.992 (0.992) |
| best system | 0.822 (0.822) | 0.856 (0.856) | 0.975 (0.975) |
| average of systems | 0.581 (0.573) | 0.643 (0.634) | 0.752 (0.741) |
| worst system | 0.163 (0.163) | 0.246 (0.246) | 0.246 (0.246) |
| best baseline | 0.763 (0.763) | 0.839 (0.839) | 0.983 (0.983) |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| best system | 0.492 (0.492) | 0.568 (0.568) | 0.720 (0.720) |
| average of systems | 0.289 (0.289) | 0.370 (0.370) | 0.501 (0.501) |
| worst system | 0.163 (0.163) | 0.246 (0.246) | 0.246 (0.246) |
| best baseline | 0.534 (0.534) | 0.571 (0.571) | 0.720 (0.720) |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| best system | 0.822 (0.822) | 0.856 (0.856) | 0.975 (0.975) |
| average of systems | 0.759 (0.746) | 0.808 (0.793) | 0.900 (0.883) |
| worst system | 0.669 (0.669) | 0.678 (0.678) | 0.712 (0.712) |
| best baseline | 0.763 (0.763) | 0.839 (0.839) | 0.983 (0.983) |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| best system | 0.681 (0.669) | 0.750 (0.737) | 0.862 (0.847) |
| average of systems | 0.681 (0.669) | 0.750 (0.737) | 0.862 (0.847) |
| worst system | 0.681 (0.669) | 0.750 (0.737) | 0.862 (0.847) |
| number of items in task: | 251 |
| all senses | main senses only | |
| polysemy: | 22 | 12 |
| fine-grained | coarse-grained | |
| entropy: | 2.484 | 1.463 |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| human | 0.988 (0.988) | 0.988 (0.988) | 0.988 (0.988) |
| best system | 0.832 (0.829) | 0.848 (0.848) | 0.869 (0.869) |
| average of systems | 0.640 (0.636) | 0.708 (0.704) | 0.763 (0.759) |
| worst system | 0.209 (0.209) | 0.249 (0.249) | 0.306 (0.306) |
| best baseline | 0.665 (0.665) | 0.819 (0.803) | 0.861 (0.861) |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| best system | 0.639 (0.614) | 0.671 (0.644) | 0.793 (0.793) |
| average of systems | 0.427 (0.420) | 0.531 (0.523) | 0.649 (0.640) |
| worst system | 0.209 (0.209) | 0.249 (0.249) | 0.306 (0.306) |
| best baseline | 0.578 (0.578) | 0.656 (0.656) | 0.833 (0.833) |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| best system | 0.821 (0.821) | 0.848 (0.848) | 0.869 (0.869) |
| average of systems | 0.750 (0.748) | 0.803 (0.801) | 0.822 (0.820) |
| worst system | 0.651 (0.645) | 0.740 (0.734) | 0.759 (0.753) |
| best baseline | 0.665 (0.665) | 0.819 (0.803) | 0.861 (0.861) |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| best system | 0.832 (0.829) | 0.847 (0.844) | 0.868 (0.865) |
| average of systems | 0.832 (0.829) | 0.847 (0.844) | 0.868 (0.865) |
| worst system | 0.832 (0.829) | 0.847 (0.844) | 0.868 (0.865) |
| number of items in task: | 214 |
| all senses | main senses only | |
| polysemy: | 4 | 4 |
| fine-grained | coarse-grained | |
| entropy: | 0.862 | 0.862 |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| human | 0.951 (0.951) | 0.951 (0.951) | 0.951 (0.951) |
| best system | 0.921 (0.921) | 0.921 (0.921) | 0.921 (0.921) |
| average of systems | 0.856 (0.854) | 0.856 (0.854) | 0.856 (0.854) |
| worst system | 0.752 (0.752) | 0.752 (0.752) | 0.752 (0.752) |
| best baseline | 0.911 (0.911) | 0.911 (0.911) | 0.911 (0.911) |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| best system | 0.916 (0.916) | 0.916 (0.916) | 0.916 (0.916) |
| average of systems | 0.841 (0.839) | 0.841 (0.839) | 0.841 (0.839) |
| worst system | 0.752 (0.752) | 0.752 (0.752) | 0.752 (0.752) |
| best baseline | 0.911 (0.911) | 0.911 (0.911) | 0.911 (0.911) |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| best system | 0.921 (0.921) | 0.921 (0.921) | 0.921 (0.921) |
| average of systems | 0.870 (0.870) | 0.870 (0.870) | 0.870 (0.870) |
| worst system | 0.808 (0.808) | 0.808 (0.808) | 0.808 (0.808) |
| best baseline | 0.909 (0.841) | 0.909 (0.841) | 0.909 (0.841) |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| best system | 0.836 (0.820) | 0.836 (0.820) | 0.836 (0.820) |
| average of systems | 0.836 (0.820) | 0.836 (0.820) | 0.836 (0.820) |
| worst system | 0.836 (0.820) | 0.836 (0.820) | 0.836 (0.820) |
| number of items in task: | 113 |
| all senses | main senses only | |
| polysemy: | 8 | 4 |
| fine-grained | coarse-grained | |
| entropy: | 1.851 | 0.963 |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| human | 0.965 (0.965) | 0.965 (0.965) | 0.965 (0.965) |
| best system | 0.867 (0.867) | 0.885 (0.885) | 0.929 (0.929) |
| average of systems | 0.641 (0.641) | 0.754 (0.754) | 0.820 (0.820) |
| worst system | 0.195 (0.195) | 0.628 (0.628) | 0.690 (0.690) |
| best baseline | 0.717 (0.717) | 0.746 (0.746) | 0.832 (0.832) |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| best system | 0.619 (0.619) | 0.710 (0.710) | 0.770 (0.770) |
| average of systems | 0.418 (0.418) | 0.656 (0.656) | 0.740 (0.740) |
| worst system | 0.195 (0.195) | 0.628 (0.628) | 0.708 (0.708) |
| best baseline | 0.708 (0.708) | 0.737 (0.737) | 0.823 (0.823) |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| best system | 0.867 (0.867) | 0.878 (0.878) | 0.929 (0.929) |
| average of systems | 0.754 (0.754) | 0.797 (0.797) | 0.857 (0.857) |
| worst system | 0.628 (0.628) | 0.644 (0.644) | 0.690 (0.690) |
| best baseline | 0.717 (0.717) | 0.746 (0.746) | 0.832 (0.832) |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| best system | 0.858 (0.858) | 0.885 (0.885) | 0.920 (0.920) |
| average of systems | 0.858 (0.858) | 0.885 (0.885) | 0.920 (0.920) |
| worst system | 0.858 (0.858) | 0.885 (0.885) | 0.920 (0.920) |
| number of items in task: | 221 |
| all senses | main senses only | |
| polysemy: | 8 | 6 |
| fine-grained | coarse-grained | |
| entropy: | 0.748 | 0.522 |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| human | 0.948 (0.948) | 0.950 (0.950) | 0.952 (0.952) |
| best system | 0.946 (0.946) | 0.955 (0.955) | 0.964 (0.964) |
| average of systems | 0.723 (0.712) | 0.933 (0.921) | 0.938 (0.926) |
| worst system | 0.432 (0.430) | 0.912 (0.912) | 0.919 (0.919) |
| best baseline | 0.919 (0.919) | 0.928 (0.928) | 0.937 (0.937) |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| best system | 0.946 (0.946) | 0.955 (0.955) | 0.964 (0.964) |
| average of systems | 0.692 (0.673) | 0.931 (0.911) | 0.937 (0.916) |
| worst system | 0.432 (0.430) | 0.912 (0.912) | 0.919 (0.919) |
| best baseline | 0.919 (0.919) | 0.928 (0.928) | 0.937 (0.937) |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| best system | 0.941 (0.941) | 0.948 (0.948) | 0.955 (0.955) |
| average of systems | 0.765 (0.765) | 0.936 (0.934) | 0.941 (0.940) |
| worst system | 0.432 (0.430) | 0.927 (0.923) | 0.927 (0.923) |
| best baseline | 0.600 (0.600) | 0.624 (0.624) | 0.631 (0.631) |
| number of items in task: | 82 |
| all senses | main senses only | |
| polysemy: | 11 | 9 |
| fine-grained | coarse-grained | |
| entropy: | 1.772 | 1.667 |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| human | 1.000 (1.000) | 1.000 (1.000) | 1.000 (1.000) |
| best system | 0.902 (0.902) | 0.915 (0.915) | 0.915 (0.915) |
| average of systems | 0.730 (0.728) | 0.758 (0.756) | 0.758 (0.756) |
| worst system | 0.537 (0.537) | 0.537 (0.537) | 0.537 (0.537) |
| best baseline | 0.889 (0.878) | 0.889 (0.878) | 0.889 (0.878) |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| best system | 0.732 (0.732) | 0.817 (0.817) | 0.817 (0.817) |
| average of systems | 0.626 (0.622) | 0.687 (0.683) | 0.687 (0.683) |
| worst system | 0.537 (0.537) | 0.537 (0.537) | 0.537 (0.537) |
| best baseline | 0.524 (0.524) | 0.524 (0.524) | 0.524 (0.524) |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| best system | 0.878 (0.878) | 0.878 (0.878) | 0.878 (0.878) |
| average of systems | 0.770 (0.770) | 0.779 (0.779) | 0.779 (0.779) |
| worst system | 0.659 (0.659) | 0.659 (0.659) | 0.659 (0.659) |
| best baseline | 0.889 (0.878) | 0.889 (0.878) | 0.889 (0.878) |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| best system | 0.902 (0.902) | 0.915 (0.915) | 0.915 (0.915) |
| average of systems | 0.902 (0.902) | 0.915 (0.915) | 0.915 (0.915) |
| worst system | 0.902 (0.902) | 0.915 (0.915) | 0.915 (0.915) |
| number of items in task: | 156 |
| all senses | main senses only | |
| polysemy: | 14 | 8 |
| fine-grained | coarse-grained | |
| entropy: | 2.839 | 1.999 |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| human | 0.965 (0.965) | 0.974 (0.974) | 0.978 (0.978) |
| best system | 0.718 (0.718) | 0.966 (0.179) | 0.966 (0.179) |
| average of systems | 0.558 (0.510) | 0.728 (0.650) | 0.781 (0.704) |
| worst system | 0.329 (0.329) | 0.529 (0.529) | 0.564 (0.564) |
| best baseline | 0.622 (0.622) | 0.760 (0.760) | 0.795 (0.795) |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| best system | 0.586 (0.109) | 0.966 (0.179) | 0.966 (0.179) |
| average of systems | 0.489 (0.368) | 0.742 (0.542) | 0.799 (0.599) |
| worst system | 0.329 (0.329) | 0.604 (0.604) | 0.690 (0.690) |
| best baseline | 0.622 (0.622) | 0.760 (0.760) | 0.795 (0.795) |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| best system | 0.718 (0.718) | 0.853 (0.853) | 0.891 (0.891) |
| average of systems | 0.595 (0.590) | 0.713 (0.707) | 0.768 (0.761) |
| worst system | 0.494 (0.494) | 0.529 (0.529) | 0.564 (0.564) |
| best baseline | 0.583 (0.583) | 0.708 (0.708) | 0.782 (0.782) |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| best system | 0.608 (0.596) | 0.761 (0.747) | 0.791 (0.776) |
| average of systems | 0.608 (0.596) | 0.761 (0.747) | 0.791 (0.776) |
| worst system | 0.608 (0.596) | 0.761 (0.747) | 0.791 (0.776) |
| number of items in task: | 184 |
| all senses | main senses only | |
| polysemy: | 8 | 6 |
| fine-grained | coarse-grained | |
| entropy: | 1.778 | 1.235 |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| human | 0.992 (0.992) | 0.995 (0.995) | 0.997 (0.997) |
| best system | 0.882 (0.853) | 0.951 (0.951) | 0.995 (0.995) |
| average of systems | 0.745 (0.741) | 0.824 (0.820) | 0.874 (0.870) |
| worst system | 0.457 (0.457) | 0.516 (0.516) | 0.576 (0.576) |
| best baseline | 0.858 (0.821) | 0.920 (0.880) | 0.983 (0.940) |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| best system | 0.882 (0.853) | 0.941 (0.910) | 0.994 (0.962) |
| average of systems | 0.667 (0.660) | 0.774 (0.766) | 0.836 (0.827) |
| worst system | 0.457 (0.457) | 0.516 (0.516) | 0.576 (0.576) |
| best baseline | 0.821 (0.821) | 0.899 (0.899) | 0.940 (0.940) |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| best system | 0.880 (0.880) | 0.951 (0.951) | 0.984 (0.984) |
| average of systems | 0.774 (0.773) | 0.836 (0.835) | 0.879 (0.878) |
| worst system | 0.462 (0.462) | 0.519 (0.519) | 0.576 (0.576) |
| best baseline | 0.858 (0.821) | 0.920 (0.880) | 0.983 (0.940) |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| best system | 0.880 (0.880) | 0.948 (0.948) | 0.995 (0.995) |
| average of systems | 0.880 (0.880) | 0.948 (0.948) | 0.995 (0.995) |
| worst system | 0.880 (0.880) | 0.948 (0.948) | 0.995 (0.995) |
| number of items in task: | 176 |
| all senses | main senses only | |
| polysemy: | 5 | 4 |
| fine-grained | coarse-grained | |
| entropy: | 1.712 | 1.319 |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| human | 0.989 (0.989) | 0.989 (0.989) | 0.989 (0.989) |
| best system | 0.949 (0.949) | 0.960 (0.960) | 0.960 (0.960) |
| average of systems | 0.433 (0.429) | 0.444 (0.441) | 0.444 (0.441) |
| worst system | 0.038 (0.028) | 0.038 (0.028) | 0.038 (0.028) |
| best baseline | 0.716 (0.716) | 0.775 (0.775) | 0.775 (0.775) |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| best system | 0.585 (0.585) | 0.608 (0.608) | 0.608 (0.608) |
| average of systems | 0.371 (0.367) | 0.385 (0.382) | 0.385 (0.382) |
| worst system | 0.038 (0.028) | 0.038 (0.028) | 0.038 (0.028) |
| best baseline | 0.716 (0.716) | 0.775 (0.775) | 0.775 (0.775) |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| best system | 0.949 (0.949) | 0.960 (0.960) | 0.960 (0.960) |
| average of systems | 0.515 (0.511) | 0.522 (0.519) | 0.522 (0.519) |
| worst system | 0.038 (0.028) | 0.038 (0.028) | 0.038 (0.028) |
| best baseline | 0.716 (0.716) | 0.744 (0.744) | 0.744 (0.744) |
| number of items in task: | 70 |
| all senses | main senses only | |
| polysemy: | 1 | 1 |
| fine-grained | coarse-grained | |
| entropy: | 0.000 | 0.000 |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| human | 1.000 (1.000) | 1.000 (1.000) | 1.000 (1.000) |
| best system | 1.000 (1.000) | 1.000 (1.000) | 1.000 (1.000) |
| average of systems | 0.975 (0.968) | 0.975 (0.968) | 0.975 (0.968) |
| worst system | 0.843 (0.843) | 0.843 (0.843) | 0.843 (0.843) |
| best baseline | 1.000 (1.000) | 1.000 (1.000) | 1.000 (1.000) |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| best system | 1.000 (1.000) | 1.000 (1.000) | 1.000 (1.000) |
| average of systems | 0.986 (0.971) | 0.986 (0.971) | 0.986 (0.971) |
| worst system | 0.957 (0.957) | 0.957 (0.957) | 0.957 (0.957) |
| best baseline | 1.000 (1.000) | 1.000 (1.000) | 1.000 (1.000) |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| best system | 1.000 (1.000) | 1.000 (1.000) | 1.000 (1.000) |
| average of systems | 0.963 (0.960) | 0.963 (0.960) | 0.963 (0.960) |
| worst system | 0.843 (0.843) | 0.843 (0.843) | 0.843 (0.843) |
| best baseline | 1.000 (1.000) | 1.000 (1.000) | 1.000 (1.000) |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| best system | 1.000 (1.000) | 1.000 (1.000) | 1.000 (1.000) |
| average of systems | 1.000 (1.000) | 1.000 (1.000) | 1.000 (1.000) |
| worst system | 1.000 (1.000) | 1.000 (1.000) | 1.000 (1.000) |
| number of items in task: | 117 |
| all senses | main senses only | |
| polysemy: | 9 | 4 |
| fine-grained | coarse-grained | |
| entropy: | 2.349 | 1.581 |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| human | 0.924 (0.916) | 0.932 (0.925) | 0.932 (0.925) |
| best system | 0.778 (0.778) | 0.786 (0.786) | 0.838 (0.838) |
| average of systems | 0.527 (0.527) | 0.550 (0.550) | 0.590 (0.590) |
| worst system | 0.026 (0.026) | 0.034 (0.034) | 0.034 (0.034) |
| best baseline | 0.714 (0.714) | 0.726 (0.726) | 0.803 (0.803) |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| best system | 0.513 (0.513) | 0.521 (0.521) | 0.547 (0.547) |
| average of systems | 0.231 (0.231) | 0.269 (0.269) | 0.288 (0.288) |
| worst system | 0.026 (0.026) | 0.034 (0.034) | 0.034 (0.034) |
| best baseline | 0.714 (0.714) | 0.726 (0.726) | 0.803 (0.803) |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| best system | 0.778 (0.778) | 0.786 (0.786) | 0.838 (0.838) |
| average of systems | 0.663 (0.663) | 0.680 (0.680) | 0.735 (0.735) |
| worst system | 0.547 (0.547) | 0.551 (0.551) | 0.624 (0.624) |
| best baseline | 0.692 (0.692) | 0.701 (0.701) | 0.795 (0.795) |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| best system | 0.735 (0.735) | 0.744 (0.744) | 0.769 (0.769) |
| average of systems | 0.735 (0.735) | 0.744 (0.744) | 0.769 (0.769) |
| worst system | 0.735 (0.735) | 0.744 (0.744) | 0.769 (0.769) |
| number of items in task: | 209 |
| all senses | main senses only | |
| polysemy: | 8 | 6 |
| fine-grained | coarse-grained | |
| entropy: | 2.168 | 1.837 |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| human | 0.976 (0.976) | 0.976 (0.976) | 0.976 (0.976) |
| best system | 0.866 (0.866) | 0.880 (0.880) | 0.880 (0.880) |
| average of systems | 0.680 (0.679) | 0.707 (0.706) | 0.707 (0.706) |
| worst system | 0.443 (0.443) | 0.455 (0.455) | 0.455 (0.455) |
| best baseline | 0.632 (0.617) | 0.637 (0.622) | 0.637 (0.622) |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| best system | 0.575 (0.569) | 0.633 (0.627) | 0.633 (0.627) |
| average of systems | 0.510 (0.508) | 0.543 (0.541) | 0.543 (0.541) |
| worst system | 0.443 (0.443) | 0.455 (0.455) | 0.455 (0.455) |
| best baseline | 0.415 (0.405) | 0.467 (0.467) | 0.467 (0.467) |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| best system | 0.866 (0.866) | 0.880 (0.880) | 0.880 (0.880) |
| average of systems | 0.770 (0.770) | 0.786 (0.786) | 0.786 (0.786) |
| worst system | 0.622 (0.622) | 0.627 (0.627) | 0.627 (0.627) |
| best baseline | 0.632 (0.617) | 0.637 (0.622) | 0.637 (0.622) |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| best system | 0.737 (0.737) | 0.804 (0.804) | 0.804 (0.804) |
| average of systems | 0.737 (0.737) | 0.804 (0.804) | 0.804 (0.804) |
| worst system | 0.737 (0.737) | 0.804 (0.804) | 0.804 (0.804) |
| number of items in task: | 201 |
| all senses | main senses only | |
| polysemy: | 14 | 6 |
| fine-grained | coarse-grained | |
| entropy: | 2.759 | 2.401 |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| human | 0.928 (0.923) | 0.930 (0.925) | 0.933 (0.928) |
| best system | 0.572 (0.572) | 0.578 (0.578) | 0.592 (0.592) |
| average of systems | 0.421 (0.420) | 0.443 (0.442) | 0.454 (0.452) |
| worst system | 0.212 (0.212) | 0.216 (0.216) | 0.223 (0.223) |
| best baseline | 0.552 (0.552) | 0.557 (0.557) | 0.567 (0.567) |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| best system | 0.437 (0.433) | 0.457 (0.453) | 0.467 (0.463) |
| average of systems | 0.322 (0.319) | 0.342 (0.339) | 0.351 (0.348) |
| worst system | 0.212 (0.212) | 0.216 (0.216) | 0.223 (0.223) |
| best baseline | 0.365 (0.365) | 0.383 (0.383) | 0.384 (0.384) |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| best system | 0.572 (0.572) | 0.578 (0.578) | 0.592 (0.592) |
| average of systems | 0.479 (0.479) | 0.503 (0.503) | 0.516 (0.516) |
| worst system | 0.413 (0.413) | 0.439 (0.439) | 0.458 (0.458) |
| best baseline | 0.552 (0.552) | 0.557 (0.557) | 0.567 (0.567) |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| best system | 0.432 (0.428) | 0.450 (0.445) | 0.452 (0.448) |
| average of systems | 0.432 (0.428) | 0.450 (0.445) | 0.452 (0.448) |
| worst system | 0.432 (0.428) | 0.450 (0.445) | 0.452 (0.448) |
| number of items in task: | 218 |
| all senses | main senses only | |
| polysemy: | 5 | 3 |
| fine-grained | coarse-grained | |
| entropy: | 0.982 | 0.864 |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| human | 0.954 (0.950) | 0.959 (0.954) | 0.959 (0.954) |
| best system | 0.922 (0.922) | 0.922 (0.922) | 0.922 (0.922) |
| average of systems | 0.788 (0.787) | 0.793 (0.792) | 0.793 (0.792) |
| worst system | 0.271 (0.271) | 0.271 (0.271) | 0.271 (0.271) |
| best baseline | 0.904 (0.904) | 0.904 (0.904) | 0.904 (0.904) |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| best system | 0.858 (0.858) | 0.862 (0.862) | 0.862 (0.862) |
| average of systems | 0.625 (0.623) | 0.626 (0.624) | 0.626 (0.624) |
| worst system | 0.271 (0.271) | 0.271 (0.271) | 0.271 (0.271) |
| best baseline | 0.493 (0.493) | 0.493 (0.493) | 0.493 (0.493) |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| best system | 0.922 (0.922) | 0.922 (0.922) | 0.922 (0.922) |
| average of systems | 0.869 (0.869) | 0.875 (0.875) | 0.875 (0.875) |
| worst system | 0.775 (0.775) | 0.789 (0.789) | 0.789 (0.789) |
| best baseline | 0.904 (0.904) | 0.904 (0.904) | 0.904 (0.904) |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| best system | 0.871 (0.867) | 0.880 (0.876) | 0.880 (0.876) |
| average of systems | 0.871 (0.867) | 0.880 (0.876) | 0.880 (0.876) |
| worst system | 0.871 (0.867) | 0.880 (0.876) | 0.880 (0.876) |
| number of items in task: | 186 |
| all senses | main senses only | |
| polysemy: | 6 | 4 |
| fine-grained | coarse-grained | |
| entropy: | 2.218 | 1.677 |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| human | 0.944 (0.939) | 0.955 (0.950) | 0.958 (0.953) |
| best system | 0.503 (0.500) | 0.586 (0.583) | 0.616 (0.613) |
| average of systems | 0.418 (0.416) | 0.515 (0.511) | 0.551 (0.548) |
| worst system | 0.189 (0.188) | 0.427 (0.425) | 0.478 (0.468) |
| best baseline | 0.546 (0.543) | 0.608 (0.605) | 0.654 (0.651) |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| best system | 0.500 (0.500) | 0.586 (0.583) | 0.616 (0.613) |
| average of systems | 0.354 (0.351) | 0.505 (0.501) | 0.531 (0.527) |
| worst system | 0.189 (0.188) | 0.429 (0.419) | 0.478 (0.468) |
| best baseline | 0.416 (0.414) | 0.500 (0.497) | 0.535 (0.532) |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| best system | 0.503 (0.500) | 0.570 (0.567) | 0.616 (0.613) |
| average of systems | 0.451 (0.448) | 0.522 (0.519) | 0.565 (0.562) |
| worst system | 0.362 (0.360) | 0.427 (0.425) | 0.486 (0.484) |
| best baseline | 0.546 (0.543) | 0.608 (0.605) | 0.654 (0.651) |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| best system | 0.449 (0.446) | 0.505 (0.503) | 0.541 (0.538) |
| average of systems | 0.449 (0.446) | 0.505 (0.503) | 0.541 (0.538) |
| worst system | 0.449 (0.446) | 0.505 (0.503) | 0.541 (0.538) |
| number of items in task: | 217 |
| all senses | main senses only | |
| polysemy: | 6 | 4 |
| fine-grained | coarse-grained | |
| entropy: | 1.955 | 1.731 |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| human | 0.965 (0.961) | 0.965 (0.961) | 0.965 (0.961) |
| best system | 0.664 (0.664) | 0.677 (0.677) | 0.687 (0.687) |
| average of systems | 0.561 (0.560) | 0.571 (0.569) | 0.575 (0.573) |
| worst system | 0.459 (0.459) | 0.478 (0.478) | 0.481 (0.481) |
| best baseline | 0.588 (0.585) | 0.588 (0.585) | 0.588 (0.585) |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| best system | 0.535 (0.535) | 0.535 (0.535) | 0.535 (0.535) |
| average of systems | 0.508 (0.502) | 0.514 (0.508) | 0.515 (0.509) |
| worst system | 0.459 (0.459) | 0.478 (0.478) | 0.481 (0.481) |
| best baseline | 0.530 (0.530) | 0.530 (0.530) | 0.530 (0.530) |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| best system | 0.664 (0.664) | 0.677 (0.677) | 0.687 (0.687) |
| average of systems | 0.600 (0.600) | 0.614 (0.614) | 0.620 (0.620) |
| worst system | 0.502 (0.502) | 0.532 (0.532) | 0.539 (0.539) |
| best baseline | 0.588 (0.585) | 0.588 (0.585) | 0.588 (0.585) |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| best system | 0.530 (0.530) | 0.530 (0.530) | 0.530 (0.530) |
| average of systems | 0.530 (0.530) | 0.530 (0.530) | 0.530 (0.530) |
| worst system | 0.530 (0.530) | 0.530 (0.530) | 0.530 (0.530) |
| number of items in task: | 229 |
| all senses | main senses only | |
| polysemy: | 16 | 11 |
| fine-grained | coarse-grained | |
| entropy: | 3.333 | 2.632 |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| human | 0.927 (0.923) | 0.938 (0.934) | 0.943 (0.939) |
| best system | 0.555 (0.555) | 0.614 (0.614) | 0.629 (0.629) |
| average of systems | 0.383 (0.382) | 0.437 (0.436) | 0.455 (0.454) |
| worst system | 0.200 (0.200) | 0.266 (0.266) | 0.288 (0.288) |
| best baseline | 0.524 (0.524) | 0.579 (0.579) | 0.616 (0.616) |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| best system | 0.302 (0.293) | 0.338 (0.328) | 0.374 (0.362) |
| average of systems | 0.244 (0.241) | 0.301 (0.298) | 0.323 (0.319) |
| worst system | 0.200 (0.200) | 0.266 (0.266) | 0.288 (0.288) |
| best baseline | 0.403 (0.400) | 0.467 (0.463) | 0.502 (0.498) |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| best system | 0.555 (0.555) | 0.614 (0.614) | 0.629 (0.629) |
| average of systems | 0.475 (0.475) | 0.529 (0.529) | 0.546 (0.546) |
| worst system | 0.406 (0.406) | 0.463 (0.463) | 0.507 (0.507) |
| best baseline | 0.524 (0.524) | 0.579 (0.579) | 0.616 (0.616) |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| best system | 0.341 (0.341) | 0.380 (0.380) | 0.397 (0.397) |
| average of systems | 0.341 (0.341) | 0.380 (0.380) | 0.397 (0.397) |
| worst system | 0.341 (0.341) | 0.380 (0.380) | 0.397 (0.397) |
| number of items in task: | 207 |
| all senses | main senses only | |
| polysemy: | 6 | 3 |
| fine-grained | coarse-grained | |
| entropy: | 2.195 | 1.518 |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| human | 0.921 (0.912) | 0.922 (0.913) | 0.924 (0.915) |
| best system | 0.556 (0.556) | 0.623 (0.623) | 0.662 (0.662) |
| average of systems | 0.449 (0.448) | 0.538 (0.538) | 0.571 (0.570) |
| worst system | 0.239 (0.239) | 0.401 (0.401) | 0.415 (0.415) |
| best baseline | 0.570 (0.570) | 0.643 (0.643) | 0.686 (0.686) |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| best system | 0.498 (0.498) | 0.570 (0.570) | 0.618 (0.618) |
| average of systems | 0.357 (0.355) | 0.478 (0.476) | 0.516 (0.513) |
| worst system | 0.239 (0.239) | 0.401 (0.401) | 0.415 (0.415) |
| best baseline | 0.420 (0.418) | 0.495 (0.493) | 0.517 (0.514) |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| best system | 0.556 (0.556) | 0.623 (0.623) | 0.662 (0.662) |
| average of systems | 0.505 (0.505) | 0.583 (0.583) | 0.615 (0.615) |
| worst system | 0.464 (0.464) | 0.527 (0.527) | 0.546 (0.546) |
| best baseline | 0.570 (0.570) | 0.643 (0.643) | 0.686 (0.686) |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| best system | 0.440 (0.440) | 0.495 (0.495) | 0.517 (0.517) |
| average of systems | 0.440 (0.440) | 0.495 (0.495) | 0.517 (0.517) |
| worst system | 0.440 (0.440) | 0.495 (0.495) | 0.517 (0.517) |
| number of items in task: | 224 |
| all senses | main senses only | |
| polysemy: | 6 | 3 |
| fine-grained | coarse-grained | |
| entropy: | 0.982 | 0.812 |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| human | 0.953 (0.953) | 0.962 (0.962) | 0.962 (0.962) |
| best system | 0.906 (0.906) | 0.911 (0.911) | 0.911 (0.911) |
| average of systems | 0.765 (0.763) | 0.834 (0.833) | 0.842 (0.841) |
| worst system | 0.431 (0.431) | 0.678 (0.672) | 0.689 (0.683) |
| best baseline | 0.862 (0.862) | 0.873 (0.873) | 0.884 (0.884) |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| best system | 0.710 (0.710) | 0.834 (0.834) | 0.839 (0.839) |
| average of systems | 0.571 (0.569) | 0.746 (0.744) | 0.755 (0.753) |
| worst system | 0.431 (0.431) | 0.678 (0.672) | 0.689 (0.683) |
| best baseline | 0.862 (0.862) | 0.873 (0.873) | 0.884 (0.884) |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| best system | 0.906 (0.906) | 0.911 (0.911) | 0.911 (0.911) |
| average of systems | 0.864 (0.863) | 0.883 (0.882) | 0.889 (0.888) |
| worst system | 0.826 (0.826) | 0.871 (0.871) | 0.875 (0.875) |
| best baseline | 0.857 (0.857) | 0.868 (0.868) | 0.879 (0.879) |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| best system | 0.848 (0.844) | 0.859 (0.855) | 0.870 (0.866) |
| average of systems | 0.848 (0.844) | 0.859 (0.855) | 0.870 (0.866) |
| worst system | 0.848 (0.844) | 0.859 (0.855) | 0.870 (0.866) |
| number of items in task: | 178 |
| all senses | main senses only | |
| polysemy: | 4 | 4 |
| fine-grained | coarse-grained | |
| entropy: | 0.132 | 0.132 |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| human | 0.994 (0.994) | 0.994 (0.994) | 0.994 (0.994) |
| best system | 0.978 (0.978) | 0.978 (0.978) | 0.978 (0.978) |
| average of systems | 0.864 (0.864) | 0.865 (0.865) | 0.865 (0.865) |
| worst system | 0.034 (0.034) | 0.039 (0.039) | 0.039 (0.039) |
| best baseline | 0.980 (0.980) | 0.980 (0.980) | 0.980 (0.980) |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| best system | 0.966 (0.966) | 0.966 (0.966) | 0.966 (0.966) |
| average of systems | 0.637 (0.637) | 0.638 (0.638) | 0.638 (0.638) |
| worst system | 0.034 (0.034) | 0.039 (0.039) | 0.039 (0.039) |
| best baseline | 0.978 (0.978) | 0.978 (0.978) | 0.978 (0.978) |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| best system | 0.978 (0.978) | 0.978 (0.978) | 0.978 (0.978) |
| average of systems | 0.978 (0.978) | 0.978 (0.978) | 0.978 (0.978) |
| worst system | 0.978 (0.978) | 0.978 (0.978) | 0.978 (0.978) |
| best baseline | 0.980 (0.980) | 0.980 (0.980) | 0.980 (0.980) |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| best system | 0.978 (0.978) | 0.978 (0.978) | 0.978 (0.978) |
| average of systems | 0.978 (0.978) | 0.978 (0.978) | 0.978 (0.978) |
| worst system | 0.978 (0.978) | 0.978 (0.978) | 0.978 (0.978) |
| number of items in task: | 186 |
| all senses | main senses only | |
| polysemy: | 3 | 2 |
| fine-grained | coarse-grained | |
| entropy: | 0.694 | 0.133 |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| human | 0.981 (0.981) | 0.995 (0.995) | 0.995 (0.995) |
| best system | 0.898 (0.898) | 0.978 (0.978) | 0.978 (0.978) |
| average of systems | 0.823 (0.820) | 0.959 (0.957) | 0.959 (0.957) |
| worst system | 0.454 (0.454) | 0.863 (0.863) | 0.863 (0.863) |
| best baseline | 0.887 (0.887) | 0.978 (0.978) | 0.978 (0.978) |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| best system | 0.871 (0.871) | 0.968 (0.968) | 0.968 (0.968) |
| average of systems | 0.722 (0.718) | 0.922 (0.917) | 0.922 (0.917) |
| worst system | 0.454 (0.454) | 0.863 (0.863) | 0.863 (0.863) |
| best baseline | 0.876 (0.876) | 0.978 (0.978) | 0.978 (0.978) |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| best system | 0.898 (0.898) | 0.978 (0.978) | 0.978 (0.978) |
| average of systems | 0.874 (0.874) | 0.978 (0.978) | 0.978 (0.978) |
| worst system | 0.828 (0.828) | 0.978 (0.978) | 0.978 (0.978) |
| best baseline | 0.887 (0.887) | 0.978 (0.978) | 0.978 (0.978) |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| best system | 0.865 (0.860) | 0.978 (0.973) | 0.978 (0.973) |
| average of systems | 0.865 (0.860) | 0.978 (0.973) | 0.978 (0.973) |
| worst system | 0.865 (0.860) | 0.978 (0.973) | 0.978 (0.973) |
| number of items in task: | 259 |
| all senses | main senses only | |
| polysemy: | 11 | 9 |
| fine-grained | coarse-grained | |
| entropy: | 2.806 | 2.576 |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| human | 0.921 (0.921) | 0.921 (0.921) | 0.929 (0.929) |
| best system | 0.714 (0.714) | 0.714 (0.714) | 0.753 (0.753) |
| average of systems | 0.545 (0.544) | 0.545 (0.544) | 0.579 (0.578) |
| worst system | 0.357 (0.351) | 0.357 (0.351) | 0.375 (0.375) |
| best baseline | 0.680 (0.680) | 0.680 (0.680) | 0.710 (0.710) |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| best system | 0.367 (0.367) | 0.367 (0.367) | 0.396 (0.390) |
| average of systems | 0.361 (0.359) | 0.361 (0.359) | 0.382 (0.380) |
| worst system | 0.357 (0.351) | 0.357 (0.351) | 0.375 (0.375) |
| best baseline | 0.498 (0.498) | 0.498 (0.498) | 0.498 (0.498) |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| best system | 0.714 (0.714) | 0.714 (0.714) | 0.753 (0.753) |
| average of systems | 0.649 (0.649) | 0.649 (0.649) | 0.689 (0.689) |
| worst system | 0.587 (0.587) | 0.587 (0.587) | 0.637 (0.637) |
| best baseline | 0.680 (0.680) | 0.680 (0.680) | 0.710 (0.710) |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| best system | 0.579 (0.579) | 0.579 (0.579) | 0.618 (0.618) |
| average of systems | 0.579 (0.579) | 0.579 (0.579) | 0.618 (0.618) |
| worst system | 0.579 (0.579) | 0.579 (0.579) | 0.618 (0.618) |
| number of items in task: | 229 |
| all senses | main senses only | |
| polysemy: | 10 | 8 |
| fine-grained | coarse-grained | |
| entropy: | 2.382 | 1.965 |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| human | 0.930 (0.926) | 0.954 (0.950) | 0.954 (0.950) |
| best system | 0.563 (0.563) | 0.624 (0.624) | 0.624 (0.624) |
| average of systems | 0.457 (0.434) | 0.542 (0.513) | 0.542 (0.513) |
| worst system | 0.228 (0.228) | 0.317 (0.317) | 0.317 (0.317) |
| best baseline | 0.512 (0.510) | 0.610 (0.608) | 0.610 (0.608) |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| best system | 0.444 (0.262) | 0.563 (0.332) | 0.563 (0.332) |
| average of systems | 0.361 (0.299) | 0.465 (0.387) | 0.465 (0.387) |
| worst system | 0.228 (0.228) | 0.317 (0.317) | 0.317 (0.317) |
| best baseline | 0.476 (0.476) | 0.585 (0.585) | 0.585 (0.585) |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| best system | 0.563 (0.563) | 0.624 (0.624) | 0.624 (0.624) |
| average of systems | 0.515 (0.515) | 0.588 (0.588) | 0.588 (0.588) |
| worst system | 0.454 (0.454) | 0.524 (0.524) | 0.524 (0.524) |
| best baseline | 0.512 (0.510) | 0.610 (0.608) | 0.610 (0.608) |
| number of items in task: | 122 |
| all senses | main senses only | |
| polysemy: | 5 | 5 |
| fine-grained | coarse-grained | |
| entropy: | 1.224 | 1.224 |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| human | 0.985 (0.985) | 0.985 (0.985) | 0.985 (0.985) |
| best system | 0.943 (0.943) | 0.943 (0.943) | 0.943 (0.943) |
| average of systems | 0.759 (0.747) | 0.811 (0.794) | 0.811 (0.794) |
| worst system | 0.422 (0.422) | 0.496 (0.496) | 0.496 (0.496) |
| best baseline | 0.902 (0.902) | 0.902 (0.902) | 0.902 (0.902) |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| best system | 0.926 (0.926) | 0.926 (0.926) | 0.926 (0.926) |
| average of systems | 0.664 (0.643) | 0.750 (0.723) | 0.750 (0.723) |
| worst system | 0.422 (0.422) | 0.496 (0.496) | 0.496 (0.496) |
| best baseline | 0.902 (0.902) | 0.902 (0.902) | 0.902 (0.902) |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| best system | 0.943 (0.943) | 0.943 (0.943) | 0.943 (0.943) |
| average of systems | 0.902 (0.902) | 0.902 (0.902) | 0.902 (0.902) |
| worst system | 0.861 (0.861) | 0.861 (0.861) | 0.861 (0.861) |
| best baseline | 0.704 (0.693) | 0.704 (0.693) | 0.704 (0.693) |
| number of items in task: | 47 |
| all senses | main senses only | |
| polysemy: | 5 | 4 |
| fine-grained | coarse-grained | |
| entropy: | 1.748 | 1.539 |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| human | 0.979 (0.979) | 0.979 (0.979) | 0.979 (0.979) |
| best system | 0.702 (0.702) | 0.702 (0.702) | 0.702 (0.702) |
| average of systems | 0.401 (0.398) | 0.427 (0.424) | 0.432 (0.429) |
| worst system | 0.000 (0.000) | 0.021 (0.021) | 0.043 (0.043) |
| best baseline | 0.660 (0.660) | 0.681 (0.681) | 0.681 (0.681) |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| best system | 0.578 (0.553) | 0.578 (0.553) | 0.578 (0.553) |
| average of systems | 0.196 (0.188) | 0.213 (0.205) | 0.223 (0.214) |
| worst system | 0.000 (0.000) | 0.021 (0.021) | 0.043 (0.043) |
| best baseline | 0.596 (0.596) | 0.596 (0.596) | 0.596 (0.596) |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| best system | 0.702 (0.702) | 0.702 (0.702) | 0.702 (0.702) |
| average of systems | 0.523 (0.523) | 0.555 (0.555) | 0.557 (0.557) |
| worst system | 0.298 (0.298) | 0.330 (0.330) | 0.340 (0.340) |
| best baseline | 0.660 (0.660) | 0.681 (0.681) | 0.681 (0.681) |
| number of items in task: | 227 |
| all senses | main senses only | |
| polysemy: | 6 | 6 |
| fine-grained | coarse-grained | |
| entropy: | 2.303 | 2.303 |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| human | 0.969 (0.969) | 0.969 (0.969) | 0.969 (0.969) |
| best system | 0.612 (0.612) | 0.612 (0.612) | 0.612 (0.612) |
| average of systems | 0.471 (0.470) | 0.471 (0.470) | 0.471 (0.470) |
| worst system | 0.225 (0.225) | 0.225 (0.225) | 0.225 (0.225) |
| best baseline | 0.488 (0.488) | 0.488 (0.488) | 0.488 (0.488) |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| best system | 0.383 (0.383) | 0.383 (0.383) | 0.383 (0.383) |
| average of systems | 0.329 (0.329) | 0.329 (0.329) | 0.329 (0.329) |
| worst system | 0.225 (0.225) | 0.225 (0.225) | 0.225 (0.225) |
| best baseline | 0.407 (0.405) | 0.407 (0.405) | 0.407 (0.405) |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| best system | 0.612 (0.612) | 0.612 (0.612) | 0.612 (0.612) |
| average of systems | 0.556 (0.555) | 0.556 (0.555) | 0.556 (0.555) |
| worst system | 0.478 (0.471) | 0.478 (0.471) | 0.478 (0.471) |
| best baseline | 0.488 (0.488) | 0.488 (0.488) | 0.488 (0.488) |
| number of items in task: | 97 |
| all senses | main senses only | |
| polysemy: | 5 | 2 |
| fine-grained | coarse-grained | |
| entropy: | 0.617 | 0.214 |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| human | 1.000 (1.000) | 1.000 (1.000) | 1.000 (1.000) |
| best system | 0.990 (0.990) | 0.995 (0.995) | 1.000 (1.000) |
| average of systems | 0.822 (0.820) | 0.839 (0.837) | 0.843 (0.840) |
| worst system | 0.155 (0.155) | 0.155 (0.155) | 0.155 (0.155) |
| best baseline | 0.990 (0.990) | 0.995 (0.995) | 1.000 (1.000) |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| best system | 0.884 (0.866) | 0.895 (0.876) | 0.895 (0.876) |
| average of systems | 0.580 (0.574) | 0.585 (0.579) | 0.587 (0.581) |
| worst system | 0.155 (0.155) | 0.155 (0.155) | 0.155 (0.155) |
| best baseline | 0.985 (0.985) | 0.995 (0.995) | 1.000 (1.000) |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| best system | 0.990 (0.990) | 0.995 (0.995) | 1.000 (1.000) |
| average of systems | 0.967 (0.967) | 0.991 (0.991) | 0.996 (0.996) |
| worst system | 0.918 (0.918) | 0.985 (0.985) | 0.990 (0.990) |
| best baseline | 0.990 (0.990) | 0.995 (0.995) | 1.000 (1.000) |
| number of items in task: | 270 |
| all senses | main senses only | |
| polysemy: | 9 | 3 |
| fine-grained | coarse-grained | |
| entropy: | 2.298 | 1.323 |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| human | 0.920 (0.920) | 0.935 (0.935) | 0.941 (0.941) |
| best system | 0.704 (0.704) | 0.725 (0.725) | 0.737 (0.737) |
| average of systems | 0.566 (0.565) | 0.588 (0.586) | 0.600 (0.598) |
| worst system | 0.219 (0.219) | 0.219 (0.219) | 0.219 (0.219) |
| best baseline | 0.648 (0.648) | 0.656 (0.656) | 0.670 (0.670) |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| best system | 0.663 (0.648) | 0.668 (0.653) | 0.678 (0.663) |
| average of systems | 0.394 (0.389) | 0.405 (0.400) | 0.414 (0.409) |
| worst system | 0.219 (0.219) | 0.219 (0.219) | 0.219 (0.219) |
| best baseline | 0.637 (0.637) | 0.643 (0.643) | 0.656 (0.656) |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| best system | 0.704 (0.704) | 0.725 (0.725) | 0.737 (0.737) |
| average of systems | 0.670 (0.670) | 0.698 (0.698) | 0.711 (0.711) |
| worst system | 0.604 (0.604) | 0.654 (0.654) | 0.667 (0.667) |
| best baseline | 0.648 (0.648) | 0.656 (0.656) | 0.670 (0.670) |
| number of items in task: | 218 |
| all senses | main senses only | |
| polysemy: | 6 | 3 |
| fine-grained | coarse-grained | |
| entropy: | 1.285 | 0.432 |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| human | 0.995 (0.995) | 0.995 (0.995) | 0.995 (0.995) |
| best system | 0.963 (0.963) | 0.963 (0.963) | 0.963 (0.963) |
| average of systems | 0.763 (0.763) | 0.857 (0.856) | 0.915 (0.915) |
| worst system | 0.304 (0.304) | 0.593 (0.587) | 0.833 (0.826) |
| best baseline | 0.954 (0.954) | 0.954 (0.954) | 0.954 (0.954) |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| best system | 0.711 (0.711) | 0.830 (0.830) | 0.922 (0.922) |
| average of systems | 0.533 (0.531) | 0.729 (0.727) | 0.879 (0.876) |
| worst system | 0.304 (0.304) | 0.593 (0.587) | 0.833 (0.826) |
| best baseline | 0.954 (0.954) | 0.954 (0.954) | 0.954 (0.954) |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| best system | 0.963 (0.963) | 0.963 (0.963) | 0.963 (0.963) |
| average of systems | 0.902 (0.902) | 0.933 (0.933) | 0.938 (0.938) |
| worst system | 0.817 (0.817) | 0.908 (0.908) | 0.908 (0.908) |
| best baseline | 0.954 (0.954) | 0.954 (0.954) | 0.954 (0.954) |
| number of items in task: | 196 |
| all senses | main senses only | |
| polysemy: | 4 | 4 |
| fine-grained | coarse-grained | |
| entropy: | 0.365 | 0.365 |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| human | 1.000 (1.000) | 1.000 (1.000) | 1.000 (1.000) |
| best system | 0.980 (0.980) | 0.980 (0.980) | 0.980 (0.980) |
| average of systems | 0.946 (0.945) | 0.946 (0.945) | 0.946 (0.945) |
| worst system | 0.816 (0.816) | 0.816 (0.816) | 0.816 (0.816) |
| best baseline | 0.964 (0.964) | 0.964 (0.964) | 0.964 (0.964) |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| best system | 0.980 (0.980) | 0.980 (0.980) | 0.980 (0.980) |
| average of systems | 0.922 (0.922) | 0.922 (0.922) | 0.922 (0.922) |
| worst system | 0.816 (0.816) | 0.816 (0.816) | 0.816 (0.816) |
| best baseline | 0.949 (0.949) | 0.949 (0.949) | 0.949 (0.949) |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| best system | 0.974 (0.974) | 0.974 (0.974) | 0.974 (0.974) |
| average of systems | 0.960 (0.959) | 0.960 (0.959) | 0.960 (0.959) |
| worst system | 0.939 (0.939) | 0.939 (0.939) | 0.939 (0.939) |
| best baseline | 0.964 (0.964) | 0.964 (0.964) | 0.964 (0.964) |
| number of items in task: | 302 |
| all senses | main senses only | |
| polysemy: | 29 | 25 |
| fine-grained | coarse-grained | |
| entropy: | 1.749 | 1.669 |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| human | 0.990 (0.990) | 0.990 (0.990) | 0.990 (0.990) |
| best system | 0.904 (0.904) | 0.907 (0.907) | 0.907 (0.907) |
| average of systems | 0.849 (0.840) | 0.850 (0.841) | 0.850 (0.841) |
| worst system | 0.689 (0.689) | 0.689 (0.689) | 0.689 (0.689) |
| best baseline | 0.852 (0.843) | 0.852 (0.843) | 0.852 (0.843) |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| best system | 0.889 (0.874) | 0.889 (0.874) | 0.889 (0.874) |
| average of systems | 0.803 (0.798) | 0.803 (0.798) | 0.803 (0.798) |
| worst system | 0.689 (0.689) | 0.689 (0.689) | 0.689 (0.689) |
| best baseline | 0.250 (0.248) | 0.257 (0.255) | 0.257 (0.255) |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| best system | 0.904 (0.904) | 0.907 (0.907) | 0.907 (0.907) |
| average of systems | 0.877 (0.866) | 0.878 (0.867) | 0.878 (0.867) |
| worst system | 0.819 (0.762) | 0.819 (0.762) | 0.819 (0.762) |
| best baseline | 0.852 (0.843) | 0.852 (0.843) | 0.852 (0.843) |
| number of items in task: | 373 |
| all senses | main senses only | |
| polysemy: | 14 | 10 |
| fine-grained | coarse-grained | |
| entropy: | 2.666 | 2.472 |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| human | 0.924 (0.917) | 0.927 (0.920) | 0.927 (0.920) |
| best system | 0.668 (0.668) | 0.680 (0.680) | 0.681 (0.681) |
| average of systems | 0.521 (0.519) | 0.526 (0.525) | 0.528 (0.526) |
| worst system | 0.223 (0.223) | 0.229 (0.229) | 0.233 (0.233) |
| best baseline | 0.551 (0.550) | 0.556 (0.555) | 0.556 (0.555) |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| best system | 0.486 (0.475) | 0.489 (0.477) | 0.489 (0.477) |
| average of systems | 0.336 (0.333) | 0.343 (0.339) | 0.346 (0.342) |
| worst system | 0.223 (0.223) | 0.229 (0.229) | 0.233 (0.233) |
| best baseline | 0.403 (0.402) | 0.414 (0.412) | 0.414 (0.412) |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| best system | 0.668 (0.668) | 0.680 (0.680) | 0.681 (0.681) |
| average of systems | 0.631 (0.631) | 0.636 (0.636) | 0.637 (0.636) |
| worst system | 0.523 (0.523) | 0.523 (0.523) | 0.523 (0.523) |
| best baseline | 0.551 (0.550) | 0.556 (0.555) | 0.556 (0.555) |
| number of items in task: | 323 |
| all senses | main senses only | |
| polysemy: | 11 | 8 |
| fine-grained | coarse-grained | |
| entropy: | 2.437 | 2.019 |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| human | 0.985 (0.985) | 0.987 (0.987) | 0.987 (0.987) |
| best system | 0.684 (0.684) | 0.693 (0.693) | 0.693 (0.693) |
| average of systems | 0.266 (0.266) | 0.327 (0.326) | 0.327 (0.326) |
| worst system | 0.097 (0.096) | 0.105 (0.105) | 0.105 (0.105) |
| best baseline | 0.334 (0.334) | 0.467 (0.467) | 0.467 (0.467) |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| best system | 0.251 (0.251) | 0.374 (0.368) | 0.374 (0.368) |
| average of systems | 0.180 (0.180) | 0.279 (0.277) | 0.279 (0.277) |
| worst system | 0.097 (0.096) | 0.193 (0.193) | 0.193 (0.193) |
| best baseline | 0.334 (0.334) | 0.467 (0.467) | 0.467 (0.467) |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| best system | 0.684 (0.684) | 0.693 (0.693) | 0.693 (0.693) |
| average of systems | 0.395 (0.395) | 0.399 (0.399) | 0.399 (0.399) |
| worst system | 0.105 (0.105) | 0.105 (0.105) | 0.105 (0.105) |
| best baseline | 0.284 (0.283) | 0.328 (0.327) | 0.328 (0.327) |
| number of items in task: | 431 |
| all senses | main senses only | |
| polysemy: | 7 | 6 |
| fine-grained | coarse-grained | |
| entropy: | 1.810 | 1.722 |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| human | 0.981 (0.981) | 0.984 (0.984) | 0.984 (0.984) |
| best system | 0.865 (0.865) | 0.865 (0.865) | 0.865 (0.865) |
| average of systems | 0.718 (0.718) | 0.724 (0.723) | 0.724 (0.723) |
| worst system | 0.450 (0.450) | 0.450 (0.450) | 0.450 (0.450) |
| best baseline | 0.781 (0.780) | 0.781 (0.780) | 0.781 (0.780) |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| best system | 0.755 (0.749) | 0.755 (0.749) | 0.755 (0.749) |
| average of systems | 0.599 (0.597) | 0.607 (0.605) | 0.607 (0.605) |
| worst system | 0.450 (0.450) | 0.450 (0.450) | 0.450 (0.450) |
| best baseline | 0.585 (0.585) | 0.601 (0.601) | 0.601 (0.601) |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| best system | 0.865 (0.865) | 0.865 (0.865) | 0.865 (0.865) |
| average of systems | 0.790 (0.790) | 0.794 (0.794) | 0.794 (0.794) |
| worst system | 0.677 (0.677) | 0.687 (0.687) | 0.687 (0.687) |
| best baseline | 0.781 (0.780) | 0.781 (0.780) | 0.781 (0.780) |
| number of items in task: | 356 |
| all senses | main senses only | |
| polysemy: | 36 | 30 |
| fine-grained | coarse-grained | |
| entropy: | 3.696 | 3.531 |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| human | 0.974 (0.974) | 0.977 (0.977) | 0.978 (0.978) |
| best system | 0.753 (0.753) | 0.767 (0.767) | 0.775 (0.775) |
| average of systems | 0.628 (0.625) | 0.648 (0.645) | 0.657 (0.653) |
| worst system | 0.299 (0.299) | 0.357 (0.357) | 0.368 (0.368) |
| best baseline | 0.632 (0.632) | 0.640 (0.640) | 0.649 (0.649) |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| best system | 0.660 (0.638) | 0.689 (0.666) | 0.698 (0.674) |
| average of systems | 0.531 (0.524) | 0.564 (0.556) | 0.574 (0.566) |
| worst system | 0.299 (0.299) | 0.357 (0.357) | 0.368 (0.368) |
| best baseline | 0.583 (0.581) | 0.609 (0.607) | 0.614 (0.612) |
| fine-grained precision (recall) | mixed-grained precision (recall) | coarse-grained precision (recall) | |
| best system | 0.753 (0.753) | 0.767 (0.767) | 0.775 (0.775) |
| average of systems | 0.686 (0.686) | 0.699 (0.698) | 0.706 (0.706) |
| worst system | 0.613 (0.610) | 0.631 (0.627) | 0.641 (0.638) |
| best baseline | 0.632 (0.632) | 0.640 (0.640) | 0.649 (0.649) |