First, I will present a learning theory account of the major statistical approaches to learning in natural language and explain why they work although, typically, the assumptions they are based on do not hold in the data. Then I will present our own SNoW learning architecture and discuss how it is used to support large-scale inference problems. The emphasis is on a learning architecture and algorithms that tolerate data of high dimensionality, support relational knowledge representations and allow the incorporation of additional knowledge into the process. The approach will be exemplified with experimental evidence from a diverse collection of language understanding related tasks.