We studied the frequency distribution of oligonucleotides 10 bp long in a sample of 620 Kb of viral genomes, containing 102 sequences from GenBank, with the aim of detecting transcription control signals. Two thousand three hundred decamers had a frequency 10 times higher than the mean and were subjected to further statistical analysis. For each of the 2300 decamers (parents), we counted the individual frequencies of the 30 decamers differing from the parent by one base mutation (progeny) and then calculated two variance/mean chi squares for the progency, with and without the parent. We then studied the distribution of the ratio between the two chi squares. Out of 2300 decamers, 10 times more frequent than average, 479 decamers had a chi square ratio of 1.9 or larger. In this final set, which corresponds to less than 0.05% of all possible decamers, 58 decamers were found to contain viral and eukaryotic transcription control elements, like NF-kB, Sp1 and others. Furthermore, this set contains an excess of signals of length 5, 6, 7, 8, 9 and 10, when compared to 150 random sets, bootstrapped from the same viral genomes.
File in questo prodotto:
Non ci sono file associati a questo prodotto.