: Structure-based virtual screening approaches like molecular docking rely on accurately identifying and precisely calculating binding pockets to efficiently search for potential ligands. In this paper, we introduce GENEOnet, a machine learning model designed for volumetric protein pocket detection that employs Group Equivariant Non-Expansive Operators (GENEOs). These operators simplify model complexity and enable more informed domain knowledge integration by selecting specific physical and chemical properties for each operator to focus on, as well as how they should react. Unlike other methods in this field, GENEOnet has fewer model parameters, resulting in reduced training costs, and offers greater explainability, allowing the parameters to be easily interpreted. GENEOnet processes the empty space within a protein by converting it into a 3D grid of uniform blocks, known as 'voxels'. It then identifies regions of the grid with an output value above a threshold, thus producing a list of predicted pockets, ranked according to the model's average output value. Our experimental results show that GENEOnet performs robustly even with small training datasets of 200 proteins and surpasses other established state-of-the-art methods in various metrics. Specifically, GENEOnet's [Formula: see text] score indicating the probability that the top-ranked pocket is the correct one is 0.764, compared to 0.702 for P2Rank, the next best performing algorithm on our PDBbind test set. Moreover, a case study considering various ABL1 kinase conformations demonstrates the excellent agreement between GENEOnet's predictions and experimental sites. GENEOnet is available as a web service at https://geneonet.exscalate.eu , where users can access the pre-trained model for detecting and ranking protein cavities.
GENEOnet: a breakthrough in protein binding pocket detection using group equivariant non-expansive operators
Frosini, Patrizio;Biswas, Akash Deep;
2025-01-01
Abstract
: Structure-based virtual screening approaches like molecular docking rely on accurately identifying and precisely calculating binding pockets to efficiently search for potential ligands. In this paper, we introduce GENEOnet, a machine learning model designed for volumetric protein pocket detection that employs Group Equivariant Non-Expansive Operators (GENEOs). These operators simplify model complexity and enable more informed domain knowledge integration by selecting specific physical and chemical properties for each operator to focus on, as well as how they should react. Unlike other methods in this field, GENEOnet has fewer model parameters, resulting in reduced training costs, and offers greater explainability, allowing the parameters to be easily interpreted. GENEOnet processes the empty space within a protein by converting it into a 3D grid of uniform blocks, known as 'voxels'. It then identifies regions of the grid with an output value above a threshold, thus producing a list of predicted pockets, ranked according to the model's average output value. Our experimental results show that GENEOnet performs robustly even with small training datasets of 200 proteins and surpasses other established state-of-the-art methods in various metrics. Specifically, GENEOnet's [Formula: see text] score indicating the probability that the top-ranked pocket is the correct one is 0.764, compared to 0.702 for P2Rank, the next best performing algorithm on our PDBbind test set. Moreover, a case study considering various ABL1 kinase conformations demonstrates the excellent agreement between GENEOnet's predictions and experimental sites. GENEOnet is available as a web service at https://geneonet.exscalate.eu , where users can access the pre-trained model for detecting and ranking protein cavities.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


