In this talk we will consider some of the basic principles underlying the speaker's choice of a particular type of referential expression to refer to an object in a shared multimodal domain of conversation. We will do so in line with Clark and Wilkes-Gibbs' principle of minimal cooperative effort and pay especially attention to the amalgam of the processes of object reference and object identification. From the principle of minimal cooperative effort, we have formulated two hypothesis: first, speakers limit the number of potential target objects by making use of an assumed common focus of attention, and second, speakers try to include as few objects as possible as a relatum in the referential expression itself, either implicitly or explicitly. These two principles help, on the one hand, to limit the number of objects that have to be considered in order to find the target object (i.e., the object the speaker intends to refer to) and, on the other hand, to keep the referential expression as short as possible. Consequently, it takes less effort for the speaker to utter the expression and for the hearer to identify the target object. To test the two hypothesis, an empirical study was carried out where pairs of subjects cooperatively carried out a simple block-building task. In the analyis, we concentrated on the part of the referential act that actually provides content information about the object to be identified, i.e., the entire referential act except the determiner and the gestures. Moreover, the analysis was restricted to references to target objects that appeared for the first time in the discourse. By means of the empirical study, we were able to show that a common focus of attention is not only a discourse related phenomenon, but also domain related. It was also found that if a target object was present in the current focus, the information to refer to this object was indeed reduced: in more than half of the cases, the expression was ambiguous with respect to the domain of conversation without any problems for identification by the hearer. However, another important finding from our study was that references to objects outside the focus area were significantly more redundant than references inside the focus area. In other words, speakers included more information in reference to objects out forcus than to objects in focus. With respect to the type of descriptive features, we did not find significant differences between in or out focus references. Limitations of the present study are mainly due to the type of referential acts that were studied and to the choice of the domain and the task that was carried out. Future research should be broadened to include non-initial referential acts, other tasks and domains, and other modalities of communication. Since the concepts introduced in this talk are basic properties of almost every human communication situation, we expect, however, the principles to be relevant for a broad field of applications in natural language generation.