In this article we discuss the notion of corpus frequency as applied to multiword units. Most commonly, corpus frequency data regards single word forms, mainly because such data is very easy to obtain. It would be much more useful, however, to have access to information regarding the relative frequency of all units of meaning or function in a given language. That is, frequency data should be available for both single-word and multi-word units, and be sense-differentiated where homonymy exists. Such information would play an important role in contributing to the overall description of a language, in making cross-corpus and cross-language comparisons, and in providing the basis for other computational tasks. In the present paper we discuss some of the major problems involved in drawing up frequency figures for multiword units, and then proceed to present a case study of how partial frequency data was arrived at for a corpus of written Italian. We also make cross-corpus comparisons, notably with a typologically similar corpus of English.

Considerations emerging from a frequency study of multiword units in a corpus of contemporary written Italian

COFFEY, STEPHEN JAMES
2003-01-01

Abstract

In this article we discuss the notion of corpus frequency as applied to multiword units. Most commonly, corpus frequency data regards single word forms, mainly because such data is very easy to obtain. It would be much more useful, however, to have access to information regarding the relative frequency of all units of meaning or function in a given language. That is, frequency data should be available for both single-word and multi-word units, and be sense-differentiated where homonymy exists. Such information would play an important role in contributing to the overall description of a language, in making cross-corpus and cross-language comparisons, and in providing the basis for other computational tasks. In the present paper we discuss some of the major problems involved in drawing up frequency figures for multiword units, and then proceed to present a case study of how partial frequency data was arrived at for a corpus of written Italian. We also make cross-corpus comparisons, notably with a typologically similar corpus of English.
2003
Cignoni, L; Coffey, STEPHEN JAMES
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11568/76664
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact