Semantic Representations of Near-Synonyms for Automatic Lexical Choice

Philip Edmonds, Sharp Laboratories of Europe Ltd

ABSTRACT

If we are ever to have faithful machine translation systems and articulate natural language generation systems, we will need a very sophisticated and principled model of lexical choice. One problem is that a language will often provide many near-synonyms for a word in another language that differ in only fine-grained nuances --- of denotation, style, expressed attitude, and usage. In this talk I will discuss a clustered model of lexical knowledge that explicitly represents the differences between near-synonyms. The near-synonyms in a cluster share the same core denotation, represented by means of an ontology of language-neutral concepts, and are differentiated subconceptually in terms of peripheral concepts (also defined in the ontology), stylistic features, and expressed attitude. I will also discuss a lexical choice process that uses the model. During lexical choice, the core denotation is used as a necessary applicability condition, activating all of the near-synonyms in the cluster. The most appropriate near-synonym, however, is the one whose peripheral concepts and stylistic features most closely match a set of preferences for expressing certain concepts in certain manners or for using certain styles, and that obeys constraints imposed by other aspects of sentence planning.