The huge amounts of digital information about human activities produced by a wide range of high-throughput tools and technologies nowadays oer an objective description of human behaviour. These rich large-scale datasets, often referred to as `big data', generally cover a large and considerable portion of the population within a territory, often reaching nationwide and even worldwide coverage. Often, big data provide information at a detailed geographical and temporal level (GPS data, calls from mobile phones, Internet searches and networking), sometimes in real time (Giannotti et al. 2012). However, the use of big data to make inference is a challenging task. On the other hand, statistical agencies are able to produce statistically sound estimates, but they are not able to provide timely and accurate estimates at local level. The idea introduced by Marchetti et al. (2015) is to use small area estimation meth- ods combined with big data to improve our ability to measure, monitor and predict social performance, well-being, deprivation, poverty, exclusion and inequality on a ne-grained spatial scale. The authors identify three possible approaches to the use of big data in the small area estimation framework: 1. Use big data sources to create local indicators and compare them to those obtained with small area estimation methods 2. Use big data sources to generate new covariates for small area models 3. Use survey data to check and remove the self-selection bias of the values of the indicators obtained using big data Here, we extend the second approach to allow for dependence between auxiliary variables and target variable in the measurement error area-level model used by Marchetti et al. (2015). This can be of interest when auxiliary variables come from the same source of the target variable.
The use of big data and survey data as covariates in area-level small area models
MARCHETTI, STEFANO;GIUSTI, CATERINA;PRATESI, MONICA
2015-01-01
Abstract
The huge amounts of digital information about human activities produced by a wide range of high-throughput tools and technologies nowadays oer an objective description of human behaviour. These rich large-scale datasets, often referred to as `big data', generally cover a large and considerable portion of the population within a territory, often reaching nationwide and even worldwide coverage. Often, big data provide information at a detailed geographical and temporal level (GPS data, calls from mobile phones, Internet searches and networking), sometimes in real time (Giannotti et al. 2012). However, the use of big data to make inference is a challenging task. On the other hand, statistical agencies are able to produce statistically sound estimates, but they are not able to provide timely and accurate estimates at local level. The idea introduced by Marchetti et al. (2015) is to use small area estimation meth- ods combined with big data to improve our ability to measure, monitor and predict social performance, well-being, deprivation, poverty, exclusion and inequality on a ne-grained spatial scale. The authors identify three possible approaches to the use of big data in the small area estimation framework: 1. Use big data sources to create local indicators and compare them to those obtained with small area estimation methods 2. Use big data sources to generate new covariates for small area models 3. Use survey data to check and remove the self-selection bias of the values of the indicators obtained using big data Here, we extend the second approach to allow for dependence between auxiliary variables and target variable in the measurement error area-level model used by Marchetti et al. (2015). This can be of interest when auxiliary variables come from the same source of the target variable.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


