ITRI SEMINAR Thursday 13th November at 12.00 A corpus-based study of anaphora in dialogues in English and Portuguese Marco Rochas University of Sussex The talk describes a study of anaphora in English and Portuguese face-to-face conversations. The sources of data were the London-Lund Corpus, for English dialogues, and the Rio de Janeiro Corpus of Clinical Dialogues, which was collected during the research process. The approach relies on the manual annotation of a significant number of anaphora cases - around three thousand for each language - according to four properties which will be described in the talk. Once the required number of cases had been analysed, a probabilistic model was built by linking categories across the four properties to form a probability tree based on aggregate co-occurrence. The results were summed up in an antecedent-likelihood theory, which elaborates on the probabilities and observed regularities of the immediate context to describe anaphoric phenomena by means of patterns for recognition and resolution. The theory was manually tested on a previously annotated dialogue. The process of statistical analysis and the results will be discussed in the talk. The same process was repeated for the Portuguese data, generating a probabilistic model and an antecedent-likelihood theory with distinctive features which will be analysed contrastively. The talk concludes with a discussion on possible developments for natural language processing.