The genome assembly of allogamous perennial species can be very challenging due to the high heterozygosity and repeat content. In fruit trees many important phenotypic traits of a specific genotype lie in its heterozigosity, maintained by a widespread clonal propagation. The fig tree (Ficus carica L.) has a great potential for expansion thanks to valuable nutritional and nutraceutical characteristics, combined with the ability to adapt well to marginal soils and difficult environmental conditions. However, the fig is still poorly characterized at genomic level, and only a preliminary genome sequence (of the Japanese cv. Horaishi) has been released. Here we report a de novo high-quality assembly of the typical Italian fig cultivar Dottato obtained by single-molecule, real-time sequencing (SMRT). PacBio reads (with average length of 12,364 nt and corresponding to about 74 genome equivalents) allowed us to obtain sequence contiguity and resolve the repetitive component. The assembly, of approximately 333 Mb and N50 of 823 kb, was haplotype-phased using FALCON-Unzip and it is composed by 905 sequences whose 407 were arranged in 13 chromosome-related pseudomolecules. This new reference genome improved the assembly N50 of the previous short-read based fig assembly of about 5-fold. A curated genome annotation analysis resulted in the identification of 37,840 protein-coding genes and 1,685 non-coding genes, respectively. Furthermore, we found that the amount of repetitive sequences accounted for the 37.39% of the assembly. Finally, genome-wide analysis of N6-methyladenine and N4-methylcytosine DNA modifications, through SMRT sequencing, gave an insight of the epigenetic profiles at gene and repeat levels. The production of a high-quality haplotype-phased reference genome sequence of fig offers interesting insights into the genomics structure of this species, opening great opportunities for speeding up the development of new cultivars and for the application to this species of genome editing, a new technology which seems especially suitable to change the specific traits currently limiting the success of this ancient species.

High-quality, haplotype-phased de novo assembly of the highly heterozygous fig genome, a major genetic resource for fig breeding.

G. Usai;F. Mascagni;T. Giordani;A. Vangelisti;E. Bosi;A. Zuccolo;A. Cavallini;L. Natali
2019-01-01

Abstract

The genome assembly of allogamous perennial species can be very challenging due to the high heterozygosity and repeat content. In fruit trees many important phenotypic traits of a specific genotype lie in its heterozigosity, maintained by a widespread clonal propagation. The fig tree (Ficus carica L.) has a great potential for expansion thanks to valuable nutritional and nutraceutical characteristics, combined with the ability to adapt well to marginal soils and difficult environmental conditions. However, the fig is still poorly characterized at genomic level, and only a preliminary genome sequence (of the Japanese cv. Horaishi) has been released. Here we report a de novo high-quality assembly of the typical Italian fig cultivar Dottato obtained by single-molecule, real-time sequencing (SMRT). PacBio reads (with average length of 12,364 nt and corresponding to about 74 genome equivalents) allowed us to obtain sequence contiguity and resolve the repetitive component. The assembly, of approximately 333 Mb and N50 of 823 kb, was haplotype-phased using FALCON-Unzip and it is composed by 905 sequences whose 407 were arranged in 13 chromosome-related pseudomolecules. This new reference genome improved the assembly N50 of the previous short-read based fig assembly of about 5-fold. A curated genome annotation analysis resulted in the identification of 37,840 protein-coding genes and 1,685 non-coding genes, respectively. Furthermore, we found that the amount of repetitive sequences accounted for the 37.39% of the assembly. Finally, genome-wide analysis of N6-methyladenine and N4-methylcytosine DNA modifications, through SMRT sequencing, gave an insight of the epigenetic profiles at gene and repeat levels. The production of a high-quality haplotype-phased reference genome sequence of fig offers interesting insights into the genomics structure of this species, opening great opportunities for speeding up the development of new cultivars and for the application to this species of genome editing, a new technology which seems especially suitable to change the specific traits currently limiting the success of this ancient species.
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11568/1028827
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact