Hydrothermal carbonization (HTC) modelling through machine learning requires high-quality datasets with large size to adequately train the predictive algorithms. For energy-intensive processes like HTC, data collection may be highly expensive and time-consuming. Therefore, predictive models are usually trained using small datasets collected from literature. To overcome this limitation, we introduced controlled Gaussian noise into the training datasets, obtained experimentally, to significantly expand their size without distorting fundamental properties. Differently from the current HTC modelling approach and other data augmentation techniques, this novel method allows to obtain large and homogeneous datasets, in terms of consistency and reliability, leading to more robust predictions and maintaining a realistic process representation. After data augmentation, the developed models, based on artificial neural network (ANN) and support vector machine (SVM), performed under more realistic conditions and greatly benefited from training with enlarged datasets. ANN exhibited superior predictive capabilities compared to SVM, with a reduction of Mean Square Error greater than 90 %. Mean Absolute Percentage Errors were below 5 % for ANN and in the range 6–15 % for SVM. The proposed approach will contribute to enhance the algorithms’ predictive power by requiring limited experimental data and reducing research time and costs, accordingly
Improving hydrothermal carbonization prediction by machine learning: towards a more accurate and less expensive process modelling through data augmentation
Bartolomeo Cosenza;
2025-01-01
Abstract
Hydrothermal carbonization (HTC) modelling through machine learning requires high-quality datasets with large size to adequately train the predictive algorithms. For energy-intensive processes like HTC, data collection may be highly expensive and time-consuming. Therefore, predictive models are usually trained using small datasets collected from literature. To overcome this limitation, we introduced controlled Gaussian noise into the training datasets, obtained experimentally, to significantly expand their size without distorting fundamental properties. Differently from the current HTC modelling approach and other data augmentation techniques, this novel method allows to obtain large and homogeneous datasets, in terms of consistency and reliability, leading to more robust predictions and maintaining a realistic process representation. After data augmentation, the developed models, based on artificial neural network (ANN) and support vector machine (SVM), performed under more realistic conditions and greatly benefited from training with enlarged datasets. ANN exhibited superior predictive capabilities compared to SVM, with a reduction of Mean Square Error greater than 90 %. Mean Absolute Percentage Errors were below 5 % for ANN and in the range 6–15 % for SVM. The proposed approach will contribute to enhance the algorithms’ predictive power by requiring limited experimental data and reducing research time and costs, accordinglyI documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


