Predicting electricity production from renewable energy sources, such as solar photovoltaic installations, is crucial for effective grid management and energy planning in the transition towards a sustainable future. This study proposes machine learning approaches for predicting electricity production from solar photovoltaic installations at a regional level in Italy, not using data on individual installations. Addressing the challenge of diverse data availability between pinpoint meteorological inputs and aggregated power data for entire regions, we propose leveraging meteorological data from the centroid of each Italian province within each region. Particular attention is given to the selection of the best input features, which leads to augmenting the input with 1-hour-lagged meteorological data and previous-hour power data. Several ML approaches were compared and examined, optimizing the hyperparameters through five-fold cross-validation. The hourly predictions encompass a time horizon ranging from 1 to 24 h. Among tested methods, Kernel Ridge Regression and Random Forest Regression emerge as the most effective models for our specific application. We also performed experiments to assess how frequently the models should be retrained and how frequently the hyperparameters should be optimized in order to comprise between accuracy and computational costs. Our results indicate that once trained, the model can provide accurate predictions for extended periods without frequent retraining, highlighting its long-term reliability.
Machine Learning Models for Regional Photovoltaic Power Generation Forecasting with Limited Plant-Specific Data
Tucci M.
;Thomopulos D.
2024-01-01
Abstract
Predicting electricity production from renewable energy sources, such as solar photovoltaic installations, is crucial for effective grid management and energy planning in the transition towards a sustainable future. This study proposes machine learning approaches for predicting electricity production from solar photovoltaic installations at a regional level in Italy, not using data on individual installations. Addressing the challenge of diverse data availability between pinpoint meteorological inputs and aggregated power data for entire regions, we propose leveraging meteorological data from the centroid of each Italian province within each region. Particular attention is given to the selection of the best input features, which leads to augmenting the input with 1-hour-lagged meteorological data and previous-hour power data. Several ML approaches were compared and examined, optimizing the hyperparameters through five-fold cross-validation. The hourly predictions encompass a time horizon ranging from 1 to 24 h. Among tested methods, Kernel Ridge Regression and Random Forest Regression emerge as the most effective models for our specific application. We also performed experiments to assess how frequently the models should be retrained and how frequently the hyperparameters should be optimized in order to comprise between accuracy and computational costs. Our results indicate that once trained, the model can provide accurate predictions for extended periods without frequent retraining, highlighting its long-term reliability.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.