Background: The haplotypes of the X chromosome are accessible to direct count in males, whereas the diplotypes of the females may be inferred knowing the haplotype of their sons or fathers. Here, we investigated: 1) the possible large-scale haplotypic structure of the X chromosome in a Caucasian population sample, given the single-nucleotide polymorphism ( SNP) maps and genotypes provided by Illumina and Affimetrix for Genetic Analysis Workshop 14, and, 2) the performances of widely used programs in reconstructing haplotypes from population genotypic data, given their known distribution in a sample of unrelated individuals. Results: All possible unrelated mother-son pairs of Caucasian ancestry ( N = 104) were selected from the 143 families of the Collaborative Study on the Genetics of Alcoholism pedigree files, and the diplotypes of the mothers were inferred from the X chromosomes of their sons. The marker set included 313 SNPs at an average density of 0.47 Mb. Linkage disequilibrium between pairs of markers was computed by the parameter D', whereas for measuring multilocus disequilibrium, we developed here an index called D*, and applied it to all possible sliding windows of 5 markers each. Results showed a complex pattern of haplotypic structure, with regions of low linkage disequilibrium separated by regions of high values of D*. The following programs were evaluated for their accuracy in inferring population haplotype frequencies: 1) ARLEQUIN 2.001; 2) PHASE 2.1.1; 3) SNPHAP 1.1; 4) HAPLOBLOCK 1.2; 5) HAPLOTYPER 1.0. Performances were evaluated by Pearson correlation ( r) coefficient between the true and the inferred distribution of haplotype frequencies. Conclusion: The SNP haplotypic structure of the X chromosome is complex, with regions of high haplotype conservation interspersed among regions of higher haplotype diversity. All the tested programs were accurate ( r = 1) in reconstructing the distribution of haplotype frequencies in case of high D* values. However, only the program PHASE realized a high correlation coefficient ( r > 0.7) in conditions of low linkage disequilibrium.
Haplotypic structure of the X chromosome in the COGA population sample and the quality of its reconstruction by extant software packages
PRESCIUTTINI, SILVANO
2005-01-01
Abstract
Background: The haplotypes of the X chromosome are accessible to direct count in males, whereas the diplotypes of the females may be inferred knowing the haplotype of their sons or fathers. Here, we investigated: 1) the possible large-scale haplotypic structure of the X chromosome in a Caucasian population sample, given the single-nucleotide polymorphism ( SNP) maps and genotypes provided by Illumina and Affimetrix for Genetic Analysis Workshop 14, and, 2) the performances of widely used programs in reconstructing haplotypes from population genotypic data, given their known distribution in a sample of unrelated individuals. Results: All possible unrelated mother-son pairs of Caucasian ancestry ( N = 104) were selected from the 143 families of the Collaborative Study on the Genetics of Alcoholism pedigree files, and the diplotypes of the mothers were inferred from the X chromosomes of their sons. The marker set included 313 SNPs at an average density of 0.47 Mb. Linkage disequilibrium between pairs of markers was computed by the parameter D', whereas for measuring multilocus disequilibrium, we developed here an index called D*, and applied it to all possible sliding windows of 5 markers each. Results showed a complex pattern of haplotypic structure, with regions of low linkage disequilibrium separated by regions of high values of D*. The following programs were evaluated for their accuracy in inferring population haplotype frequencies: 1) ARLEQUIN 2.001; 2) PHASE 2.1.1; 3) SNPHAP 1.1; 4) HAPLOBLOCK 1.2; 5) HAPLOTYPER 1.0. Performances were evaluated by Pearson correlation ( r) coefficient between the true and the inferred distribution of haplotype frequencies. Conclusion: The SNP haplotypic structure of the X chromosome is complex, with regions of high haplotype conservation interspersed among regions of higher haplotype diversity. All the tested programs were accurate ( r = 1) in reconstructing the distribution of haplotype frequencies in case of high D* values. However, only the program PHASE realized a high correlation coefficient ( r > 0.7) in conditions of low linkage disequilibrium.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.