Monoclonal antibodies provide targeted treatment options for various diseases. In infectious diseases, techniques like reverse vaccinology 2.0 can extract potent monoclonal antibodies from human donors. However, there is no control over the specific target to which the antibody binds. Therefore, it is crucial to thoroughly understand the characteristics of the antibody sequence to extract target information and prioritize potential candidates for further in-vitro analysis. Addressing this challenge requires integrating both in-vitro and in-silico approaches. This study aims to explore machine learning algorithms and various augmentation techniques to determine whether monoclonal antibodies bind to proteins or non-protein targets solely based on their sequence. We used oversampling, SMOTE, ESM1-b, and hallucination techniques to enhance sequence information. To the best of our knowledge, this research is the first attempt to apply machine learning methodologies to this specific classification task, resulting in an accuracy rate of 80%, and an MCC of 61%. These findings pave the way for further exploration and introduce a new approach to streamline and enhance antibody development processes.
Pre-trained Models Based on Primary Sequence to Classify Antibody Binding to Protein and Non-protein Targets with 80% Accuracy
Joubbi, Sara;Micheli, Alessio;Milazzo, Paolo;
2025-01-01
Abstract
Monoclonal antibodies provide targeted treatment options for various diseases. In infectious diseases, techniques like reverse vaccinology 2.0 can extract potent monoclonal antibodies from human donors. However, there is no control over the specific target to which the antibody binds. Therefore, it is crucial to thoroughly understand the characteristics of the antibody sequence to extract target information and prioritize potential candidates for further in-vitro analysis. Addressing this challenge requires integrating both in-vitro and in-silico approaches. This study aims to explore machine learning algorithms and various augmentation techniques to determine whether monoclonal antibodies bind to proteins or non-protein targets solely based on their sequence. We used oversampling, SMOTE, ESM1-b, and hallucination techniques to enhance sequence information. To the best of our knowledge, this research is the first attempt to apply machine learning methodologies to this specific classification task, resulting in an accuracy rate of 80%, and an MCC of 61%. These findings pave the way for further exploration and introduce a new approach to streamline and enhance antibody development processes.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


