In recent years, artificial intelligence in geosciences is spreading more and more, thanks to the availability of a large amount of data. In particular, the development of automatic raingauges networks allows to get rainfall data and makes these techniques effective, even if the performance of artificial intelligence models is a consequence of the coherency and quality of the input data. In this work, we intended to provide machine learning models capable of predicting rainfall data starting from the values of the nearest raingauges at one historic time point. Moreover, we investigated the influence of the anomalous input data on the prediction of rainfall data. We pursued these goals by applying machine learning models based on Linear Regression, LSTM and CNN architectures to several raingauges in Tuscany (central Italy). More than 75% of the cases show an R2 higher than 0.65 and a MAE lower than 4 mm. As expected, we emphasized a strong influence of the input data on the prediction capacity of the models. We quantified the model inaccuracy using the Pearson's correlation. Measurement anomalies in time series cause major errors in deep learning models. These anomalous data may be due to several factors such as temporary malfunctions of raingauges or weather conditions. We showed that, in both cases, the data-driven model features could highlight these situations, allowing a better management of the raingauges network and rainfall databases.

Machine learning models to complete rainfall time series databases affected by missing or anomalous data

Luppichini M.;Barsanti M.;Bini M.;Giannecchini R.
2023-01-01

Abstract

In recent years, artificial intelligence in geosciences is spreading more and more, thanks to the availability of a large amount of data. In particular, the development of automatic raingauges networks allows to get rainfall data and makes these techniques effective, even if the performance of artificial intelligence models is a consequence of the coherency and quality of the input data. In this work, we intended to provide machine learning models capable of predicting rainfall data starting from the values of the nearest raingauges at one historic time point. Moreover, we investigated the influence of the anomalous input data on the prediction of rainfall data. We pursued these goals by applying machine learning models based on Linear Regression, LSTM and CNN architectures to several raingauges in Tuscany (central Italy). More than 75% of the cases show an R2 higher than 0.65 and a MAE lower than 4 mm. As expected, we emphasized a strong influence of the input data on the prediction capacity of the models. We quantified the model inaccuracy using the Pearson's correlation. Measurement anomalies in time series cause major errors in deep learning models. These anomalous data may be due to several factors such as temporary malfunctions of raingauges or weather conditions. We showed that, in both cases, the data-driven model features could highlight these situations, allowing a better management of the raingauges network and rainfall databases.
2023
Lupi, A.; Luppichini, M.; Barsanti, M.; Bini, M.; Giannecchini, R.
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11568/1214891
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? 1
social impact