Introduction: Time series and case-crossover have been used in the last years to establish the association between PM10 and hospital admissions (H.A.). We have experimented new techniques of analyses pooling data sets on air pollution and health events available because of previous epidemiological studies. The process of Knowledge Discovery and Delivery (KDD) that tries to obtain “useful” knowledge from “raw” data, has been applied. Analyses are ruled either by multidimensional techniques (OLAP tools) or by Data Mining algorithms, by means of specific software to obtain models for transforming information in knowledge. Methods: A Data Warehouse, organized by H.A., has been built, through SQL server, from heterogeneous data sources, such as daily number of H.A., daily levels of air pollutants, anonymous individual data from General Registry Office and results of 1991 census by census units. By OLAP analyses, ad hoc indexes were defined to point out results of aggregative function of multidimensional analyses. Comparison of different tools of Data Mining (Weka, Clementine 6.5, e KDDML) ruled out Clementine 6.5 as the more suitable to available data for Mining analyses. Results: OLAP analyses outlined possible associations between H.A. for digestive diseases and house quality and between H.A. for respiratory diseases and PM10 levels and meteorological variables on the same and previous days, as shown in the table. Data Mining produced a cluster showing that on January/February days characterized by cool/ very-cool temperature, high barometric pressure, high values of SO2, NO2, CO, high/very-high PM10 levels and low Ozone, the percentage of Respiratory H.A. is higher (7,6%) compared to the average distribution (5.7%). Another cluster shows higher percentage of H.A. for digestive diseases in census units with higher number of houses without drinking water. Discussion and Conclusions: These analyses confirm results obtained by epidemiological analyses. Due to increasing number of computerized data sets from routinely collected data, these methods could represent a useful tool in environmental descriptive epidemiology to identify new hypothesis to investigate.
Air pollution and hospital admissions: Looking for spatial-time association by technologies of knowledge discovery and delivery (KDD) in data base.
VIGOTTI, MARIA ANGELA;
2006-01-01
Abstract
Introduction: Time series and case-crossover have been used in the last years to establish the association between PM10 and hospital admissions (H.A.). We have experimented new techniques of analyses pooling data sets on air pollution and health events available because of previous epidemiological studies. The process of Knowledge Discovery and Delivery (KDD) that tries to obtain “useful” knowledge from “raw” data, has been applied. Analyses are ruled either by multidimensional techniques (OLAP tools) or by Data Mining algorithms, by means of specific software to obtain models for transforming information in knowledge. Methods: A Data Warehouse, organized by H.A., has been built, through SQL server, from heterogeneous data sources, such as daily number of H.A., daily levels of air pollutants, anonymous individual data from General Registry Office and results of 1991 census by census units. By OLAP analyses, ad hoc indexes were defined to point out results of aggregative function of multidimensional analyses. Comparison of different tools of Data Mining (Weka, Clementine 6.5, e KDDML) ruled out Clementine 6.5 as the more suitable to available data for Mining analyses. Results: OLAP analyses outlined possible associations between H.A. for digestive diseases and house quality and between H.A. for respiratory diseases and PM10 levels and meteorological variables on the same and previous days, as shown in the table. Data Mining produced a cluster showing that on January/February days characterized by cool/ very-cool temperature, high barometric pressure, high values of SO2, NO2, CO, high/very-high PM10 levels and low Ozone, the percentage of Respiratory H.A. is higher (7,6%) compared to the average distribution (5.7%). Another cluster shows higher percentage of H.A. for digestive diseases in census units with higher number of houses without drinking water. Discussion and Conclusions: These analyses confirm results obtained by epidemiological analyses. Due to increasing number of computerized data sets from routinely collected data, these methods could represent a useful tool in environmental descriptive epidemiology to identify new hypothesis to investigate.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.