scholarly journals Long-read assembly of the Chinese rhesus macaque genome and identification of ape-specific structural variants

2019 ◽  
Vol 10 (1) ◽  
Author(s):  
Yaoxi He ◽  
Xin Luo ◽  
Bin Zhou ◽  
Ting Hu ◽  
Xiaoyu Meng ◽  
...  

Abstract We present a high-quality de novo genome assembly (rheMacS) of the Chinese rhesus macaque (Macaca mulatta) using long-read sequencing and multiplatform scaffolding approaches. Compared to the current Indian rhesus macaque reference genome (rheMac8), rheMacS increases sequence contiguity 75-fold, closing 21,940 of the remaining assembly gaps (60.8 Mbp). We improve gene annotation by generating more than two million full-length transcripts from ten different tissues by long-read RNA sequencing. We sequence resolve 53,916 structural variants (96% novel) and identify 17,000 ape-specific structural variants (ASSVs) based on comparison to ape genomes. Many ASSVs map within ChIP-seq predicted enhancer regions where apes and macaque show diverged enhancer activity and gene expression. We further characterize a subset that may contribute to ape- or great-ape-specific phenotypic traits, including taillessness, brain volume expansion, improved manual dexterity, and large body size. The rheMacS genome assembly serves as an ideal reference for future biomedical and evolutionary studies.

2019 ◽  
Author(s):  
Yaoxi He ◽  
Xin Luo ◽  
Bin Zhou ◽  
Ting Hu ◽  
Xiaoyu Meng ◽  
...  

AbstractRhesus macaque (Macaca mulatta) is a widely-studied nonhuman primate. Here we present a high-quality de novo genome assembly of the Chinese rhesus macaque (rheMacS) using long-read sequencing and multiplatform scaffolding approaches. Compared to the current Indian rhesus macaque reference genome (rheMac8), the rheMacS genome assembly improves sequence contiguity by 75-fold, closing 21,940 of the remaining assembly gaps (60.8 Mbp). To improve gene annotation, we generated more than two million full-length transcripts from ten different tissues by long-read RNA sequencing. We sequence resolve 53,916 structural variants (96% novel) and identify 17,000 ape-specific structural variants (ASSVs) based on comparison to the long-read assembly of ape genomes. We show that many ASSVs map within ChIP-seq predicted enhancer regions where apes and macaque show diverged enhancer activity and gene expression. We further characterize a set of candidate ASSVs that may contribute to ape- or great-ape-specific phenotypic traits, including taillessness, brain volume expansion, improved manual dexterity, and large body size. This improved rheMacS genome assembly serves as an ideal reference for future biomedical and evolutionary studies.


Author(s):  
Hailin Liu ◽  
Shigang Wu ◽  
Alun Li ◽  
Jue Ruan

Long-read single-molecule sequencing has revolutionized de novo genome assembly and enabled the automated reconstruction of reference-quality genomes. It also has been widely used to study structural variants, phase haplotypes and more. Here, we introduce the assembler— SMARTdenovo, which is an SMS assembler that follows the overlap-layout-consensus (OLC) paradigm. SMARTdenovo (RRID: SCR_017622) was designed to be a fast assembler that did not require highly accurate raw reads for error correction, unlike other, contemporaneous SMS assemblers. It has performed well for evaluating congeneric assemblers and has been successful for a variety of assembly projects. It is compatible with Canu for assembling high-quality genomes, and several of the assembly strategies in this program have been incorporated into subsequent popular assemblers. The assembler has been in use since 2015, and here we provide information on the development of SMARTdenovo and how to implement its algorithms into current projects.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Yu Chen ◽  
Yixin Zhang ◽  
Amy Y. Wang ◽  
Min Gao ◽  
Zechen Chong

AbstractLong-read de novo genome assembly continues to advance rapidly. However, there is a lack of effective tools to accurately evaluate the assembly results, especially for structural errors. We present Inspector, a reference-free long-read de novo assembly evaluator which faithfully reports types of errors and their precise locations. Notably, Inspector can correct the assembly errors based on consensus sequences derived from raw reads covering erroneous regions. Based on in silico and long-read assembly results from multiple long-read data and assemblers, we demonstrate that in addition to providing generic metrics, Inspector can accurately identify both large-scale and small-scale assembly errors.


Genes ◽  
2021 ◽  
Vol 12 (9) ◽  
pp. 1359
Author(s):  
Esther Camacho ◽  
Sandra González-de la Fuente ◽  
Jose C. Solana ◽  
Alberto Rastrojo ◽  
Fernando Carrasco-Ramiro ◽  
...  

Leishmania major is the main causative agent of cutaneous leishmaniasis in humans. The Friedlin strain of this species (LmjF) was chosen when a multi-laboratory consortium undertook the objective of deciphering the first genome sequence for a parasite of the genus Leishmania. The objective was successfully attained in 2005, and this represented a milestone for Leishmania molecular biology studies around the world. Although the LmjF genome sequence was done following a shotgun strategy and using classical Sanger sequencing, the results were excellent, and this genome assembly served as the reference for subsequent genome assemblies in other Leishmania species. Here, we present a new assembly for the genome of this strain (named LMJFC for clarity), generated by the combination of two high throughput sequencing platforms, Illumina short-read sequencing and PacBio Single Molecular Real-Time (SMRT) sequencing, which provides long-read sequences. Apart from resolving uncertain nucleotide positions, several genomic regions were reorganized and a more precise composition of tandemly repeated gene loci was attained. Additionally, the genome annotation was improved by adding 542 genes and more accurate coding-sequences defined for around two hundred genes, based on the transcriptome delimitation also carried out in this work. As a result, we are providing gene models (including untranslated regions and introns) for 11,238 genes. Genomic information ultimately determines the biology of every organism; therefore, our understanding of molecular mechanisms will depend on the availability of precise genome sequences and accurate gene annotations. In this regard, this work is providing an improved genome sequence and updated transcriptome annotations for the reference L. major Friedlin strain.


Author(s):  
David Porubsky ◽  
◽  
Peter Ebert ◽  
Peter A. Audano ◽  
Mitchell R. Vollger ◽  
...  

AbstractHuman genomes are typically assembled as consensus sequences that lack information on parental haplotypes. Here we describe a reference-free workflow for diploid de novo genome assembly that combines the chromosome-wide phasing and scaffolding capabilities of single-cell strand sequencing1,2 with continuous long-read or high-fidelity3 sequencing data. Employing this strategy, we produced a completely phased de novo genome assembly for each haplotype of an individual of Puerto Rican descent (HG00733) in the absence of parental data. The assemblies are accurate (quality value > 40) and highly contiguous (contig N50 > 23 Mbp) with low switch error rates (0.17%), providing fully phased single-nucleotide variants, indels and structural variants. A comparison of Oxford Nanopore Technologies and Pacific Biosciences phased assemblies identified 154 regions that are preferential sites of contig breaks, irrespective of sequencing technology or phasing algorithms.


GigaScience ◽  
2019 ◽  
Vol 8 (10) ◽  
Author(s):  
Sarah B Kingan ◽  
Julie Urban ◽  
Christine C Lambert ◽  
Primo Baybayan ◽  
Anna K Childers ◽  
...  

ABSTRACT Background A high-quality reference genome is an essential tool for applied and basic research on arthropods. Long-read sequencing technologies may be used to generate more complete and contiguous genome assemblies than alternate technologies; however, long-read methods have historically had greater input DNA requirements and higher costs than next-generation sequencing, which are barriers to their use on many samples. Here, we present a 2.3 Gb de novo genome assembly of a field-collected adult female spotted lanternfly (Lycorma delicatula) using a single Pacific Biosciences SMRT Cell. The spotted lanternfly is an invasive species recently discovered in the northeastern United States that threatens to damage economically important crop plants in the region. Results The DNA from 1 individual was used to make 1 standard, size-selected library with an average DNA fragment size of ∼20 kb. The library was run on 1 Sequel II SMRT Cell 8M, generating a total of 132 Gb of long-read sequences, of which 82 Gb were from unique library molecules, representing ∼36× coverage of the genome. The assembly had high contiguity (contig N50 length = 1.5 Mb), completeness, and sequence level accuracy as estimated by conserved gene set analysis (96.8% of conserved genes both complete and without frame shift errors). Furthermore, it was possible to segregate more than half of the diploid genome into the 2 separate haplotypes. The assembly also recovered 2 microbial symbiont genomes known to be associated with L. delicatula, each microbial genome being assembled into a single contig. Conclusions We demonstrate that field-collected arthropods can be used for the rapid generation of high-quality genome assemblies, an attractive approach for projects on emerging invasive species, disease vectors, or conservation efforts of endangered species.


2020 ◽  
Author(s):  
José M. Ranz ◽  
Pablo M. González ◽  
Bryan D. Clifton ◽  
Nestor O. Nazario ◽  
Pablo L. Hernández-Cervantes ◽  
...  

ABSTRACTThe monarch butterfly epitomizes insect biodiversity decline. Understanding the genetic basis of the adaptation of the monarch to a changing environment requires genomic and transcriptomic resources that better reflect its genetic diversity while being informative about gene functionality during life cycle. We report a reference-quality genome assembly from an individual resident at a nonmigratory colony in Mexico, and a new gene annotation and expression atlas for 14,865 genes, including 492 unreported long noncoding RNA (lncRNA) genes, based on RNA-seq data from 14 larval and pupal stages, plus adult morphological sections. Two thirds of the genes show significant expression changes associated with a life stage or section, with lncRNAs being more finely regulated during adulthood than protein-coding genes, and male-biased expression being four times more common than female-biased. The two portions of the heterochromosome Z display distinct patterns of differential expression between the sexes, reflecting that dosage compensation is either absent or incomplete –depending on the sample– in the ancestral but not in the novel portion of the Z. This study represents a major advance in the genomic and transcriptome resources available for D. plexippus while providing the first systematic analysis of its transcriptional program across most of its life cycle.


2018 ◽  
Author(s):  
Jolene T. Sutton ◽  
Martin Helmkampf ◽  
Cynthia C. Steiner ◽  
M. Renee Bellinger ◽  
Jonas Korlach ◽  
...  

AbstractGenome-level data can provide researchers with unprecedented precision to examine the causes and genetic consequences of population declines, and to apply these results to conservation management. Here we present a high-quality, long-read, de novo genome assembly for one of the world’s most endangered bird species, the Alala. As the only remaining native crow species in Hawaii, the Alala survived solely in a captive breeding program from 2002 until 2016, at which point a long-term reintroduction program was initiated. The high-quality genome assembly was generated to lay the foundation for both comparative genomics studies, and the development of population-level genomic tools that will aid conservation and recovery efforts. We illustrate how the quality of this assembly places it amongst the very best avian genomes assembled to date, comparable to intensively studied model systems. We describe the genome architecture in terms of repetitive elements and runs of homozygosity, and we show that compared with more outbred species, the Alala genome is substantially more homozygous. We also provide annotations for a subset of immunity genes that are likely to be important for conservation applications, and we discuss how this genome is currently being used as a roadmap for downstream conservation applications.


2021 ◽  
Vol 53 (1) ◽  
Author(s):  
Ran Li ◽  
Peng Yang ◽  
Xuelei Dai ◽  
Hojjat Asadollahpour Nanaei ◽  
Wenwen Fang ◽  
...  

Abstract Background Goat, one of the first domesticated livestock, is a worldwide important species both culturally and economically. The current goat reference genome, known as ARS1, is reported as the first nonhuman genome assembly using 69× PacBio sequencing. However, ARS1 suffers from incomplete X chromosome and highly fragmented Y chromosome scaffolds. Results Here, we present a very high-quality de novo genome assembly, Saanen_v1, from a male Saanen dairy goat, with the first goat Y chromosome scaffold based on 117× PacBio long-read sequencing and 118× Hi-C data. Saanen_v1 displays a high level of completeness thanks to the presence of centromeric and telomeric repeats at the proximal and distal ends of two-thirds of the autosomes, and a much reduced number of gaps (169 vs. 773). The completeness and accuracy of the Saanen_v1 genome assembly are also evidenced by more assembled sequences on the chromosomes (2.63 Gb for Saanen_v1 vs. 2.58 Gb for ARS1), a slightly increased mapping ratio for transcriptomic data, and more genes anchored to chromosomes. The eight putative large assembly errors (1 to ~ 7 Mb each) found in ARS1 were amended, and for the first time, the substitution rate of this ruminant Y chromosome was estimated. Furthermore, sequence improvement in Saanen_v1, compared with ARS1, enables us to assign the likely correct positions for 4.4% of the single nucleotide polymorphism (SNP) probes in the widely used GoatSNP50 chip. Conclusions The updated goat genome assembly including both sex chromosomes (X and Y) and the autosomes with high-resolution quality will serve as a valuable resource for goat genetic research and applications.


Sign in / Sign up

Export Citation Format

Share Document