scholarly journals Novel Library Method for High-Throughput Full-Length Sequencing of the Immune Repertoire from Unseparated B-Cells with Single-Cell Resolutionution

Blood ◽  
2016 ◽  
Vol 128 (22) ◽  
pp. 3699-3699
Author(s):  
Stefano Vergani ◽  
Ilya Korsunsky ◽  
Nicholas Chiorazzi ◽  
Davide Bagnara

Abstract High-throughput DNA sequencing of the adaptive immune receptor repertoire is a relatively new and fast growing technology used to study the immune response in health and disease. In B and T cell lymphoproliferative disorders, antigen receptor sequencing can be used to study clonal diversity and evolution of the disease in treatment free condition and in response to treatment. Furthermore, it can be used for the detection of minimal residual disease (MRD), providing information on the relationship between the presence and number of pre-treatment clone(s) and their relationship and responsibility for a subsequent relapse. The characteristics and quality of the data generated by high-throughput DNA sequencing of immune receptor signatures are the results of three major components: library preparation, sequencing platform, and software tools. For both the library and software, there are no standard protocols and tools. Indeed, new approaches are continually being developed to accommodate new sequencing platform features and shortcomings, such as errors and read length restrictions. Two major technical challenges are: procuring an unbiased repertoire library that for B lymphocytes obtains and retains the full length IGHV-D-J along with (sub)isotype information, and resolving data to a single cell level, crucial for detection of MRD and rare clonal variants existing in the early phase of the disease, which might emerge and be involved in future relapse or progression. We describe here a library preparation method for use with the Illumina MiSeq platform that results in an exhaustive full-length repertoire where virtually every B cell is sequenced, thereby maximizing the likelihood of identifying and quantifying the “real” IGHV-D-J repertoire of the sample analyzed. The method also allows the detection of very infrequent rearrangements and maintains IG sub-isotype information without compromising data quality. From 0.5 - 1 million human B cells can be sequenced in a single MiSeq 2x300 run with this approach. Key aspects of the technique are: 1) start from a well defined number of B lymphocytes 2) avoid V-gene specific PCR amplification and genetic material dilution in the pre-amplification phases 3) the specific depth of sequencing should depend on the starting B (or T) cell subset (i.e. na•ve, memory or plasma cell), and should be proportional to the number of starting cells. High quality sub-isotype information can be obtained with a second round of sequencing of shorter read length, e.g., with the Illumina 2x150 platform. We used 58 different CLL clones with known IGH sequence mixed all together with polyclonal B cell from a donor PBMC (Figure 1). The mixed lysate is used to test the ability to detect the different clones. The following describes how the absence of genetic material dilution in the pre-amplification phases impact on the ability to obtain a comprehensive repertoire. These are crucial in MRD detection, since diluting the genetic material (RNA and/or cDNA) prior PCR amplification compromises the ability to accurately and consistently detect the clonal variants, reducing the de facto sensitivity and reproducibility of the analysis. As a final example of the method's utility, we also demonstrate how different chronic lymphocytic leukemia clones present considerable variability in IG mRNA expression level that correlate with the number of unique mRNA molecule sequenced (Figure 3), which, if using a method with sub-optimal efficiency, could lead to a reduced clone-specific ability of detection by PCR based techniques. Figure 1. Figure 1. Figure 2. Each dilution is performed in replicates. The cDNA is obtained from all the RNA extracted from the starting cells. Each slice represents a different CLL, and each slice size is the frequency for which it is detected. A comprehensive detection of each CLL is dependent to the absence of genetic material dilution. Figure 2. Each dilution is performed in replicates. The cDNA is obtained from all the RNA extracted from the starting cells. Each slice represents a different CLL, and each slice size is the frequency for which it is detected. A comprehensive detection of each CLL is dependent to the absence of genetic material dilution. Figure 3. qPCR IgH expression correlate with the number of unique mRNA molecule sequenced. Figure 3. qPCR IgH expression correlate with the number of unique mRNA molecule sequenced. Disclosures No relevant conflicts of interest to declare.

2025 ◽  
Vol 77 (11) ◽  
pp. 6589-2025
Author(s):  
ALEKSANDRA GIZA ◽  
EWELINA IWAN ◽  
ARKADIUSZ BOMBA ◽  
DARIUSZ WASYL

Sequencing can provide genomic characterisation of a specific organism, as well as of a whole environmental or clinical sample. High Throughput Sequencing (HTS) makes it possible to generate an enormous amount of genomic data at gradually decreasing costs and almost in real-time. HTS is used, among others, in medicine, veterinary medicine, microbiology, virology and epidemiology. The paper presents practical aspects of the HTS technology. It describes generations of sequencing, which vary in throughput, read length, accuracy and costs ̶ and thus are used for different applications. The stages of HTS, as well as their purposes and pitfalls, are presented: extraction of the genetic material, library preparation, sequencing and data processing. For success of the whole process, all stages need to follow strict quality control measurements. Choosing the right sequencing platform, proper sample and library preparation procedures, as well as adequate bioinformatic tools are crucial for high quality results.


2012 ◽  
Vol 78 (8) ◽  
pp. 2677-2688 ◽  
Author(s):  
Noha Youssef ◽  
Brandi L. Steidley ◽  
Mostafa S. Elshahed

ABSTRACTThe utilization of high-throughput sequencing technologies in 16S rRNA gene-based diversity surveys has indicated that within most ecosystems, a significant fraction of the community could not be assigned to known microbial phyla. Accurate determination of the phylogenetic affiliation of such sequences is difficult due to the short-read-length output of currently available high-throughput technologies. This fraction could harbor multiple novel phylogenetic lineages that have so far escaped detection. Here we describe our efforts in accurate assessment of the novelty and phylogenetic affiliation of selected unclassified lineages within a pyrosequencing data set generated from source sediments of Zodletone Spring, a sulfide- and sulfur-rich spring in southwestern Oklahoma. Lineage-specific forward primers were designed for 78 putatively novel lineages identified within the pyrosequencing data set, and representative nearly full-length small-subunit (SSU) rRNA gene sequences were obtained by pairing those primers with reverse universal bacterial primers. Of the 78 lineages tested, amplifiable products were obtained for 52, 32 of which had at least one nearly full-length sequence that was representative of the lineage targeted. Analysis of phylogenetic affiliation of the obtained Sanger sequences identified 5 novel candidate phyla and 10 novel candidate classes (withinFibrobacteres,Planctomycetes, and candidate phyla BRC1, GN12, TM6, TM7, LD1, WS2, and GN06) in the data set, in addition to multiple novel orders and families. The discovery of multiple novel phyla within a pilot study of a single ecosystem clearly shows the potential of the approach in identifying novel diversities within the rare biosphere.


2021 ◽  
Vol 4 ◽  
Author(s):  
Kristine Bohmann ◽  
Christian Carøe

Labelling strategies in metabarcoding studies & how to ensure that nucleotide tags stay in place Metabarcoding of environmental DNA (eDNA) and DNA extracted from bulk specimen samples is a powerful tool in studies of ecological interactions, diet and biodiversity, as its labelling of amplicons allows high-throughput sequencing of taxonomically informative DNA sequences from many samples in parallel. The backbone of metabarcoding is the addition of sample-specific nucleotide identifiers to amplicons and then following sequencing using these to assign metabarcoding sequences to the samples they originated from. This allows the pooling of hundreds to thousands of samples before sequencing and thereby full utilisation of the capacity of high-throughput sequencing platforms. The nucleotide identifiers can be added both during the metabarcoding PCR and during library preparation, i.e. when amplicons are prepared for sequencing. There are three main strategies with which to achieve nucleotide labelling in metabarcoding studies. One commonly used strategy is the so-called tagged PCR approach in which DNA extracts are individually amplified with metabarcoding primers that carry sample-specific nucleotide tags at the 5’ end. The uniquely tagged products are then pooled and a library prepared on the pool of amplicons. However, tag‐jumps have been documented in this commonly used metabarcoding approach (Schnell et al. 2015). Tag-jumps cause nucleotide tags to switch between amplicons, resulting in occurrence of amplicons that carry different tags than originally applied. Sequences in the sequencing output that carry tag combinations not used in the study design are easily identified and excluded. However, sequences carrying incorrect, but already used, tag combinations will cause incorrect assignments of sequences to samples. This can - much to the detriment of metabarcoding studies - lead to false positives and artificial inflation of diversity in the samples (Schnell et al. 2015). The occurrence of tag-jumps has led to recommendations to only carry out metabarcoding PCR amplifications with primers carrying twin-tags to ensure that tag‐jumps cannot result in false assignments of sequences to samples (Schnell et al. 2015). However, this increases both cost and workload of metabarcoding studies. In a recently published article, we demonstrate a tag-jump free single-tube library preparation protocol for Illumina sequencing specifically designed for 5’ nucleotide tagged amplicons, the Tagsteady protocol (Carøe & Bohmann 2020). We designed the Tagsteady protocol to circumvent the two steps during library preparation of pools of 5ʹ nucleotide-tagged amplicons that had previously been suggested to cause tag-jumps; i) T4 DNA polymerase blunt-ending in the end-repair step, and ii) post-ligation PCR amplification of amplicon libraries. We used pools of twin‐tagged amplicons to investigate the effect of these two steps on the occurrence of tag‐jumps. Doing this, we demonstrated that blunt‐ending and post-ligation PCR, alone or together, can result in high proportions of tag-jumps, in our study up to ca. 49% of total sequences. The Tagsteady protocol where both these steps were left out resulted in tag‐jump levels comparable to background contamination (Carøe & Bohmann 2020). In our study, we encourage practitioners to avoid using T4 DNA polymerase blunt‐ending and post-ligation PCR in library preparation of 5’ nucleotide tagged amplicon pools, for example by using the Tagsteady protocol (Carøe & Bohmann 2020). This will enable efficient and cost-effective generation of metabarcoding data with correct assignment of sequences to samples. References Carøe C, Bohmann K (2020) Tagsteady: A metabarcoding library preparation protocol to avoid false assignment of sequences to samples. Molecular Ecology Resources, 20, 1620–1631. Schnell IB, Bohmann K, Gilbert MTP (2015) Tag jumps illuminated - reducing sequence-to-sample misidentifications in metabarcoding studies. Molecular Ecology Resources, 15, 1289–1303.


2015 ◽  
Author(s):  
Ilan Shomorony ◽  
Thomas Courtade ◽  
David Tse

AbstractWhile most current high-throughput DNA sequencing technologies generate short reads with low error rates, emerging sequencing technologies generate long reads with high error rates. A basic question of interest is the tradeoff between read length and error rate in terms of the information needed for the perfect assembly of the genome. Using an adversarial erasure error model, we make progress on this problem by establishing a critical read length, as a function of the genome and the error rate, above which perfect assembly is guaranteed. For several real genomes, including those from the GAGE dataset, we verify that this critical read length is not significantly greater than the read length required for perfect assembly from reads without errors.


BMC Genomics ◽  
2020 ◽  
Vol 21 (1) ◽  
Author(s):  
Chentao Yang ◽  
Yuxuan Zheng ◽  
Shangjin Tan ◽  
Guanliang Meng ◽  
Wei Rao ◽  
...  

Abstract Background Over the last decade, the rapid development of high-throughput sequencing platforms has accelerated species description and assisted morphological classification through DNA barcoding. However, the current high-throughput DNA barcoding methods cannot obtain full-length barcode sequences due to read length limitations (e.g. a maximum read length of 300 bp for the Illumina’s MiSeq system), or are hindered by a relatively high cost or low sequencing output (e.g. a maximum number of eight million reads per cell for the PacBio’s SEQUEL II system). Results Pooled cytochrome c oxidase subunit I (COI) barcodes from individual specimens were sequenced on the MGISEQ-2000 platform using the single-end 400 bp (SE400) module. We present a bioinformatic pipeline, HIFI-SE, that takes reads generated from the 5′ and 3′ ends of the COI barcode region and assembles them into full-length barcodes. HIFI-SE is written in Python and includes four function modules of filter, assign, assembly and taxonomy. We applied the HIFI-SE to a set of 845 samples (30 marine invertebrates, 815 insects) and delivered a total of 747 fully assembled COI barcodes as well as 70 Wolbachia and fungi symbionts. Compared to their corresponding Sanger sequences (72 sequences available), nearly all samples (71/72) were correctly and accurately assembled, including 46 samples that had a similarity score of 100% and 25 of ca. 99%. Conclusions The HIFI-SE pipeline represents an efficient way to produce standard full-length barcodes, while the reasonable cost and high sensitivity of our method can contribute considerably more DNA barcodes under the same budget. Our method thereby advances DNA-based species identification from diverse ecosystems and increases the number of relevant applications.


Author(s):  
E.V. Korneenko ◽  
◽  
А.E. Samoilov ◽  
I.V. Artyushin ◽  
M.V. Safonova ◽  
...  

In our study we analyzed viral RNA in bat fecal samples from Moscow region (Zvenigorod district) collected in 2015. To detect various virus families and genera in bat fecal samples we used PCR amplification of viral genome fragments, followed by high-throughput sequencing. Blastn search of unassembled reads revealed the presence of viruses from families Astroviridae, Coronaviridae and Herpesviridae. Assembly using SPAdes 3.14 yields contigs of length 460–530 b.p. which correspond to genome fragments of Coronaviridae and Astroviridae. The taxonomy of coronaviruses has been determined to the genus level. We also showed that one bat can be a reservoir of several virus genuses. Thus, the bats in the Moscow region were confirmed as reservoir hosts for potentially zoonotic viruses.


2021 ◽  
Author(s):  
Noemi M. Fernandes ◽  
Pedro H. Campello-Nunes ◽  
Thiago S. Paiva ◽  
Carlos A. G. Soares ◽  
Inácio D. Silva-Neto

Plants ◽  
2021 ◽  
Vol 10 (3) ◽  
pp. 466
Author(s):  
Marie-Christine Carpentier ◽  
Cécile Bousquet-Antonelli ◽  
Rémy Merret

The recent development of high-throughput technologies based on RNA sequencing has allowed a better description of the role of post-transcriptional regulation in gene expression. In particular, the development of degradome approaches based on the capture of 5′monophosphate decay intermediates allows the discovery of a new decay pathway called co-translational mRNA decay. Thanks to these approaches, ribosome dynamics could now be revealed by analysis of 5′P reads accumulation. However, library preparation could be difficult to set-up for non-specialists. Here, we present a fast and efficient 5′P degradome library preparation for Arabidopsis samples. Our protocol was designed without commercial kit and gel purification and can be easily done in one working day. We demonstrated the robustness and the reproducibility of our protocol. Finally, we present the bioinformatic reads-outs necessary to assess library quality control.


Genes ◽  
2019 ◽  
Vol 11 (1) ◽  
pp. 7
Author(s):  
Jinghao Chen ◽  
Chao Xing ◽  
Xin Zheng ◽  
Xiaofang Li

Functional (meta) genomics allows the high-throughput identification of functional genes in a premise-free way. However, it is still difficult to perform Sanger sequencing for high GC DNA templates, which hinders the functional genomic exploration of a high GC genomic library. Here, we developed a procedure to resolve this problem by coupling the Sanger and PacBio sequencing strategies. Identification of cadmium (Cd) resistance genes from a small-insert high GC genomic library was performed to test the procedure. The library was generated from a high GC (75.35%) bacterial genome. Nineteen clones that conferred Cd resistance to Escherichia coli subject to Sanger sequencing directly. The positive clones were in parallel subject to in vivo amplification in host cells, from which recombinant plasmids were extracted and linearized by selected restriction endonucleases. PacBio sequencing was performed to obtain the full-length sequences. As the identities, partial sequences from Sanger sequencing were aligned to the full-length sequences from PacBio sequencing, which led to the identification of seven unique full-length sequences. The unique sequences were further aligned to the full genome sequence of the source strain. Functional screening showed that the identified positive clones were all able to improve Cd resistance of the host cells. The functional genomic procedure developed here couples the Sanger and PacBio sequencing methods and overcomes the difficulties in PCR approaches for high GC DNA. The procedure can be a promising option for the high-throughput sequencing of functional genomic libraries, and realize a cost-effective and time-efficient identification of the positive clones, particularly for high GC genetic materials.


Sign in / Sign up

Export Citation Format

Share Document