scholarly journals De Novo Transcriptome Assembly, Functional Annotation and SSR Marker Discovery of Qinling Takin (Budorcas taxicolor bedfordi)

Animals ◽  
2021 ◽  
Vol 11 (8) ◽  
pp. 2366
Author(s):  
Ju Qiu ◽  
Rui Guo ◽  
Yidan Li ◽  
Yuyao Zhang ◽  
Kangsheng Jia ◽  
...  

The takin (Budorcas taxicolor) is an endemic ruminant species belonging to the bovine family. The International Union for Conservation of Nature (IUCN) has listed it as an endangered and vulnerable species. However, little is known about its molecular characterization since it lacks a reference genome. This study used RNA sequencing followed by de novo assembly, annotation and simple sequence repeats (SSRs) prediction to assess the transcriptome of Qinling takin (Budorcas taxicolor bedfordi) muscles. In total, 21,648 unigenes with an N50 and mean length of 1388 bp and 817 bp, respectively, were successfully detected and annotated against the public databases (NR, GO, KEGG, and EggNOG). Furthermore, 6222 SSRs were identified using the MIcroSAtellite (MISA) identification tool software. Taken together, these findings will provide valuable information for genetic, genomic, and evolutionary studies on takin.

PLoS ONE ◽  
2016 ◽  
Vol 11 (1) ◽  
pp. e0147132 ◽  
Author(s):  
Tingxian Deng ◽  
Chunying Pang ◽  
Xingrong Lu ◽  
Peng Zhu ◽  
Anqin Duan ◽  
...  

2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Wenhao Shao ◽  
Shiqing Huang ◽  
Yongzhi Zhang ◽  
Jingmin Jiang ◽  
Hui Li

AbstractThe genus Chaenomeles has long been considered an important ornamental, herbal and cash crop and is widely cultivated in East Asia. Traditional studies of Chaenomeles mainly focus on evolutionary relationships at the phenotypic level. In this study, we conducted RNA-seq on 10 Chaenomeles germplasms supplemented with one outgroup species, Docynia delavayi (D. delavayi), on the Illumina HiSeq2500 platform. After de novo assemblies, we generated from 40,084 to 49,571 unigenes for each germplasm. After pairwise comparison of the orthologous sequences, 9,659 orthologues within the 11 germplasms were obtained, with 6,154 orthologous genes identified as single-copy genes. The phylogenetic tree was visualized to reveal evolutionary relationships for these 11 germplasms. GO and KEGG analyses were performed for these common single-copy genes to compare their functional similarities and differences. Selective pressure analysis based on 6,154 common single-copy genes revealed that 45 genes were under positive selection. Most of these genes are involved in building the plant disease defence system. A total of 292 genes containing simple sequence repeats (SSRs) were used to develop SSR markers and compare their functions in secondary metabolism pathways. Finally, 10 primers were chosen as SSR marker candidates for Chaenomeles germplasms by comprehensive standards. Our research provides a new methodology and reference for future related research in Chaenomeles and is also useful for improvement, breeding and selection projects in other related species.


Author(s):  
Adam Voshall ◽  
Sairam Behera ◽  
Xiangjun Li ◽  
Xiao-Hong Yu ◽  
Kushagra Kapil ◽  
...  

AbstractSystems-level analyses, such as differential gene expression analysis, co-expression analysis, and metabolic pathway reconstruction, depend on the accuracy of the transcriptome. Multiple tools exist to perform transcriptome assembly from RNAseq data. However, assembling high quality transcriptomes is still not a trivial problem. This is especially the case for non-model organisms where adequate reference genomes are often not available. Different methods produce different transcriptome models and there is no easy way to determine which are more accurate. Furthermore, having alternative splicing events could exacerbate such difficult assembly problems. While benchmarking transcriptome assemblies is critical, this is also not trivial due to the general lack of true reference transcriptomes. In this study, we provide a pipeline to generate a set of the benchmark transcriptome and corresponding RNAseq data. Using the simulated benchmarking datasets, we compared the performance of various transcriptome assembly approaches including genome-guided, de novo, and ensemble methods. The results showed that the assembly performance deteriorates significantly when the reference is not available from the same genome (for genome-guided methods) or when alternative transcripts (isoforms) exist. We demonstrated the value of consensus between de novo assemblers in transcriptome assembly. Leveraging the overlapping predictions between the four de novo assemblers, we further present ConSemble, a consensus-based de novo ensemble transcriptome assembly pipeline. Without using a reference genome, ConSemble achieved an accuracy up to twice as high as any de novo assemblers we compared. It matched or exceeded the best performing genome-guided assemblers even when the transcriptomes included isoforms. The RNAseq simulation pipeline, the benchmark transcriptome datasets, and the ConSemble pipeline are all freely available from: http://bioinfolab.unl.edu/emlab/consemble/.Author summaryObtaining the accurate representation of the gene expression is critical in many analyses, such as differential gene expression analysis, co-expression analysis, and metabolic pathway reconstruction. The state of the art high-throughput RNA-sequencing (RNAseq) technologies can be used to sequence the set of all transcripts in a cell, the transcriptome. Although many computational tools are available for transcriptome assembly from RNAseq data, assembling high-quality transcriptomes is difficult especially for non-model organisms. Different methods often produce different transcriptome models and there is no easy way to determine which are more accurate. In this study, we present an approach to evaluate transcriptome assembly performance using simulated benchmarking read sets. The results showed that the assembly performance of genome-guided assembly methods deteriorates significantly when the adequate reference genome is not available. The assembly performance of all methods is affected when alternative transcripts (isoforms) exist. We further demonstrated the value of consensus among assemblers in improving transcriptome assembly. Leveraging the overlapping predictions between the four de novo assemblers, we present ConSemble. Without using a reference genome, ConSemble achieved a much higher accuracy than any de novo assemblers we compared. It matched or exceeded the best performing genome-guided assemblers even when the transcriptomes included isoforms.


2021 ◽  
Vol 8 ◽  
Author(s):  
Yunbang Zhang ◽  
Jian Gao ◽  
Yunhai Zhang ◽  
Yuanchao Zou ◽  
Xiaojuan Cao

Elongate loach (Leptobotia elongata) is endemic to middle and upper reaches of the Yangtze River in China. Due to overfishing and habitat destruction, this loach has become an endangered species. So far, lack of reliable genetic information and molecular markers has hindered the conservation and utilization of elongate loach resources. Therefore, we here performed an Illumina sequencing and de novo transcriptome assembly in elongate loach, and then developed polymorphic simple sequence repeat markers (SSRs). After assembly, 51,185 unigenes were obtained, with an average length of 1,496 bp. A total of 23,901 expressed sequence tag-simple sequence repeats (EST-SSRs) were identified, distributing in 14,422 unigenes, with a distribution frequency of 28.18%. Out of 16,885 designed EST-SSR primers, 150 primers (3 or 4 base repetition-dominated) were synthesized for polymorphic EST-SSR development. Then, 52 polymorphic EST-SSRs were identified, with polymorphism information contents (PIC) ranging from 0.03 to 0.88 (average 0.54). In conclusion, this was the first report of transcriptome sequencing of elongate loach. Meanwhile, we developed a set of polymorphic EST-SSRs for the loach. This study will provide an important basis, namely genetic information and polymorphic SSRs, for further population genetics and breeding studies of this endangered and economic loach in China.


2021 ◽  
Vol 5 ◽  
Author(s):  
Xin Xie ◽  
Junmei Jiang ◽  
Meiqing Chen ◽  
Maoxi Huang ◽  
Linhong Jin ◽  
...  

Myllocerinus aurolineatus Voss is a species of the insecta class in the arthropod. In this study, we first observed and identified M. aurolineatus Voss in tea plants in Guizhou, China, where it caused severe quantity and quality losses in tea plants. Knowledge on M. aurolineatus Voss genome is inadequate, especially for biological or functional research. We performed the first transcriptome sequencing by using the Illumina Hiseq™ technique on M. aurolineatus Voss. Over 55.9 million high-quality paired-end reads were generated and assembled into 69,439 unigenes using the Trinity short read software, resulting in a cluster of 1,207 bp of the N50 length. A total of 69,439 genes were predicted by BLAST to known proteins in the NCBI database and were distributed into Gene Ontology (20,190), eukaryotic complete genomes (12,488), and the Kyoto Encyclopedia of Genes and Genomes (3,170). We also identified 96,790 single-nucleotide polymorphisms and 13,121 simple sequence repeats in these unigenes. Our transcriptome data provide a useful resource for future functional studies of M. aurolineatus Voss for dispersal control in tea plants.


2022 ◽  
Author(s):  
Karl Johan Westrin ◽  
Warren W Kretzschmar ◽  
Olof Emanuelsson

Motivation: Transcriptome assembly from RNA sequencing data in species without a reliable reference genome has to be performed de novo, but studies have shown that de novo methods often have inadequate reconstruction ability of transcript isoforms. This impedes the study of alternative splicing, in particular for lowly expressed isoforms. Result: We present the de novo transcript isoform assembler ClusTrast, which clusters a set of guiding contigs by similarity, aligns short reads to the guiding contigs, and assembles each clustered set of short reads individually. We tested ClusTrast on datasets from six eukaryotic species, and showed that ClusTrast reconstructed more expressed known isoforms than any of the other tested de novo assemblers, at a moderate reduction in precision. An appreciable fraction were reconstructed to at least 95% of their length. We suggest that ClusTrast will be useful for studying alternative splicing in the absence of a reference genome. Availability and implementation: The code and usage instructions are available at https://github.com/karljohanw/clustrast.


Sign in / Sign up

Export Citation Format

Share Document