Distributional semantics in linguistic and cognitive research

Lenci, Alessandro

The hypothesis that word co-occurrence statistics extracted from text corpora can provide a basis for semantic representations has been gaining growing attention both in computational linguistics and in cognitive science. The terms distributional, context-theoretic, corpus- based or statistical can all be used (almost interchangeably) to qualify a rich family of approaches to semantics that share a “usage-based” perspective on meaning, and assume that the statistical distribution of words in context plays a key role in characterizing their semantic behavior. Besides this common core, many differences exist depend- ing on the specific mathematical and computational techniques, the type of semantic properties associated with text distributions, the definition of the linguistic context used to determine the combinato- rial spaces of lexical items, etc. Yet, at a closer look, we may discover that the commonalities are more than we could expect prima facie, and that a general model of meaning can indeed be discerned behind the differences, a model that formulates specific hypotheses on the format of semantic representations, and on the way they are built and processed by the human mind.