Genes belonging to the same organism are called paralogs when they show a significant similarity in the sequences, even if they have a different biological function. It is an emergent biological paradigm that the families of paralogs in a genome derive from a mechanism of gene duplication-with-modification, repeated many times in the history of the organism. This paradigm could be at the basis of the increase in the complexity of the organisms observed during evolution. In order to understand how such process could have taken place, it is necessary to put the paralogs belonging to same family in a tree which describes the history of their appearance in the genome: a paralogy tree. Here we present a method, called PaTre, which is able to generate paralogy trees by receiving in input a family of genes. The reliability of the inferential process has been tested by means of a simulator that implemented different hypotheses on the duplication-with-modification paradigm. The simulator receives in input a sequence and generates some copies of it which are modified accordingly to probability distributions derived from statistical genomics. These sequences are then used to test the robustness of PaTre. The experimental results show that PaTre constructs a set of paralogy trees which always contains the correct one. The size of this set can be seen to be related to the completeness of the input set of sequences; in particular, when the input set is complete then PaTre constructs very few paralogy trees. A user could exploit this property to measure the incompleteness of an input set of sequences. The robustness and biological applications of PaTre will be discussed.

PaTre: A METHOD FOR PARALOGY TREE CONSTRUCTION

MARANGONI, ROBERTO;FERRAGINA, PAOLO;FRANGIONI, ANTONIO;PISANTI, NADIA;
2001-01-01

Abstract

Genes belonging to the same organism are called paralogs when they show a significant similarity in the sequences, even if they have a different biological function. It is an emergent biological paradigm that the families of paralogs in a genome derive from a mechanism of gene duplication-with-modification, repeated many times in the history of the organism. This paradigm could be at the basis of the increase in the complexity of the organisms observed during evolution. In order to understand how such process could have taken place, it is necessary to put the paralogs belonging to same family in a tree which describes the history of their appearance in the genome: a paralogy tree. Here we present a method, called PaTre, which is able to generate paralogy trees by receiving in input a family of genes. The reliability of the inferential process has been tested by means of a simulator that implemented different hypotheses on the duplication-with-modification paradigm. The simulator receives in input a sequence and generates some copies of it which are modified accordingly to probability distributions derived from statistical genomics. These sequences are then used to test the robustness of PaTre. The experimental results show that PaTre constructs a set of paralogy trees which always contains the correct one. The size of this set can be seen to be related to the completeness of the input set of sequences; in particular, when the input set is complete then PaTre constructs very few paralogy trees. A user could exploit this property to measure the incompleteness of an input set of sequences. The robustness and biological applications of PaTre will be discussed.
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11568/191451
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact