assembly quality
Recently Published Documents


TOTAL DOCUMENTS

182
(FIVE YEARS 69)

H-INDEX

12
(FIVE YEARS 4)

2022 ◽  
Vol 23 (1) ◽  
Author(s):  
Atul Sharma ◽  
Pranjal Jain ◽  
Ashraf Mahgoub ◽  
Zihan Zhou ◽  
Kanak Mahadik ◽  
...  

Abstract Background Sequencing technologies are prone to errors, making error correction (EC) necessary for downstream applications. EC tools need to be manually configured for optimal performance. We find that the optimal parameters (e.g., k-mer size) are both tool- and dataset-dependent. Moreover, evaluating the performance (i.e., Alignment-rate or Gain) of a given tool usually relies on a reference genome, but quality reference genomes are not always available. We introduce Lerna for the automated configuration of k-mer-based EC tools. Lerna first creates a language model (LM) of the uncorrected genomic reads, and then, based on this LM, calculates a metric called the perplexity metric to evaluate the corrected reads for different parameter choices. Next, it finds the one that produces the highest alignment rate without using a reference genome. The fundamental intuition of our approach is that the perplexity metric is inversely correlated with the quality of the assembly after error correction. Therefore, Lerna leverages the perplexity metric for automated tuning of k-mer sizes without needing a reference genome. Results First, we show that the best k-mer value can vary for different datasets, even for the same EC tool. This motivates our design that automates k-mer size selection without using a reference genome. Second, we show the gains of our LM using its component attention-based transformers. We show the model’s estimation of the perplexity metric before and after error correction. The lower the perplexity after correction, the better the k-mer size. We also show that the alignment rate and assembly quality computed for the corrected reads are strongly negatively correlated with the perplexity, enabling the automated selection of k-mer values for better error correction, and hence, improved assembly quality. We validate our approach on both short and long reads. Additionally, we show that our attention-based models have significant runtime improvement for the entire pipeline—18$$\times$$ × faster than previous works, due to parallelizing the attention mechanism and the use of JIT compilation for GPU inferencing. Conclusion Lerna improves de novo genome assembly by optimizing EC tools. Our code is made available in a public repository at: https://github.com/icanforce/lerna-genomics.


2022 ◽  
Vol 2022 ◽  
pp. 1-12
Author(s):  
Sixia Zhao ◽  
Yizhen Ma ◽  
Mengnan Liu ◽  
Xiaoliang Chen ◽  
Liyou Xu

In order to detect the assembly quality of the combine harvester accurately and effectively, a method for the assembly quality inspection of the combine harvester based on the improved whale algorithm (IWOA) to optimize the least square support vector machine is proposed. Aiming at the characteristics of whale optimization algorithm’s weak search ability and easy maturity, this paper introduces the cosine control factor and the sine time-varying adaptive weight to improve it and uses the benchmark function to verify the general adaptability of the algorithm. Combined with the local mean decomposition (LMD), the assembly quality inspection model of the combine harvester was established and applied to the Dongfanghong 4LZ-9A2 combine harvester for experimental verification. The experimental results show that the IWOA proposed in this paper has better optimization ability and adaptability. The average accuracy of the IWOA model proposed in this paper reaches 90.5%, which is 4% higher than that of the WOA model, and the standard deviation of the average accuracy is reduced by 0.15%, which indicates that the IWOA model has better stability.


Processes ◽  
2021 ◽  
Vol 10 (1) ◽  
pp. 34
Author(s):  
Rongshun Pan ◽  
Jiahao Yu ◽  
Yongman Zhao

In Industry 4.0, data are sensed and merged to drive intelligent systems. This research focuses on the optimization of selective assembly of complex mechanical products (CMPs) under intelligent system environment conditions. For the batch assembly of CMPs, it is difficult to obtain the best combinations of components from combinations for simultaneous optimization of success rate and multiple assembly quality. Hence, the Taguchi quality loss function was used to quantitatively evaluate each assembly quality and the assembly success rate is combined to establish a many-objective optimization model. The crossover and mutation operators were improved to enhance the ability of NSGA-III to obtain high-quality solution set and jump out of a local optimal solution, and the Pareto optimal solution set was obtained accordingly. Finally, considering the production mode of Human–Machine Intelligent System interaction, the optimal compromise solution is obtained by using fuzzy theory, entropy theory and the VIKOR method. The results show that this work has obvious advantages in improving the quality of batch selective assembly of CMPs and assembly success rate and gives a sorting selection strategy for non-dominated selective assembly schemes while taking into account the group benefit and individual regret.


2021 ◽  
Vol 118 (52) ◽  
pp. e2109019118
Author(s):  
Scott Hotaling ◽  
Joanna L. Kelley ◽  
Paul B. Frandsen

In less than 25 y, the field of animal genome science has transformed from a discipline seeking its first glimpses into genome sequences across the Tree of Life to a global enterprise with ambitions to sequence genomes for all of Earth’s eukaryotic diversity [H. A. Lewin et al., Proc. Natl. Acad. Sci. U.S.A. 115, 4325–4333 (2018)]. As the field rapidly moves forward, it is important to take stock of the progress that has been made to best inform the discipline’s future. In this Perspective, we provide a contemporary, quantitative overview of animal genome sequencing. We identified the best available genome assemblies in GenBank, the world’s most extensive genetic database, for 3,278 unique animal species across 24 phyla. We assessed taxonomic representation, assembly quality, and annotation status for major clades. We show that while tremendous taxonomic progress has occurred, stark disparities in genomic representation exist, highlighted by a systemic overrepresentation of vertebrates and underrepresentation of arthropods. In terms of assembly quality, long-read sequencing has dramatically improved contiguity, whereas gene annotations are available for just 34.3% of taxa. Furthermore, we show that animal genome science has diversified in recent years with an ever-expanding pool of researchers participating. However, the field still appears to be dominated by institutions in the Global North, which have been listed as the submitting institution for 77% of all assemblies. We conclude by offering recommendations for improving genomic resource availability and research value while also broadening global representation.


Nature Plants ◽  
2021 ◽  
Author(s):  
Rose A. Marks ◽  
Scott Hotaling ◽  
Paul B. Frandsen ◽  
Robert VanBuren

AbstractThe field of plant genome sequencing has grown rapidly in the past 20 years, leading to increases in the quantity and quality of publicly available genomic resources. The growing wealth of genomic data from an increasingly diverse set of taxa provides unprecedented potential to better understand the genome biology and evolution of land plants. Here we provide a contemporary view of land plant genomics, including analyses on assembly quality, taxonomic distribution of sequenced species and national participation. We show that assembly quality has increased dramatically in recent years, that substantial taxonomic gaps exist and that the field has been dominated by affluent nations in the Global North and China, despite a wide geographic distribution of study species. We identify numerous disconnects between the native range of focal species and the national affiliation of the researchers studying them, which we argue are rooted in colonialism—both past and present. Luckily, falling sequencing costs, widening availability of analytical tools and an increasingly connected scientific community provide key opportunities to improve existing assemblies, fill sampling gaps and empower a more global plant genomics community.


Author(s):  
Artem Eliseev ◽  
Sergey Lupuleac ◽  
Boris Grigor'ev ◽  
Julia Shinder

Abstract The article discusses the process of aeronautical structure assembly in the presence of a sealant between the parts to be joined. An attempt to estimate the influence of sealant on assembly quality in terms of variation analysis is presented. The sealant is considered as a highly viscous liquid that is applied to the surfaces of the assembled parts before the start of final assembly. The modeling approach is based on simulation of two-way coupled fluid-structure interaction between fluid sealant and compliant structural parts. Reynolds lubrication approximation is used in the fluid dynamics problem and variational formulation of contact problem combined with static condensation is used in the structural one. The joining of two aircraft panels is used as a numerical test for demonstration of developed approach. Various phenomena connected with the presence of sealant are demonstrated. In particular, the difference in the fastener loosening due to sealant flow between different types of fasteners is investigated. Results of variation simulation show that presence of sealant should be considered among determining factors in the analysis of assembly quality.


2021 ◽  
Vol 2125 (1) ◽  
pp. 012021
Author(s):  
Menghui Xuan ◽  
Sixia Zhao ◽  
Mengnan Liu ◽  
Liyou Xu ◽  
Xiaoliang Chen

Abstract Aiming at the problems of low assembly accuracy and difficult to detect assembly quality of combine, a method of combine assembly quality detection based on sparrow search algorithm (SSA) optimized variational mode decomposition (VMD) and particle swarm optimization (PSO) optimized least squares support vector machine (LSSVM) was proposed, Firstly, the sparrow search algorithm is used to obtain the optimal VMD decomposition modal parameter K and penalty factor α, then the combined vibration signal of combine harvester is decomposed into intrinsic modal components of different center frequencies by using the best parameter combination [K, α]. Finally, the feature vector is used as the input of LSSVM classifier to classify different fault features. The analysis results show that the classification accuracy of SSA-VMD joint feature extraction method is 99.5%, which is 17.5% and 9.5% higher than ensemble empirical mode decomposition (EEMD) and fixed parameter VMD, which verifies the superiority of this method in the detection of combine assembly quality.


2021 ◽  
Author(s):  
Romain Feron ◽  
Robert Michael Waterhouse

Ambitious initiatives to coordinate genome sequencing of Earth's biodiversity mean that the accumulation of genomic data is growing rapidly. In addition to cataloguing biodiversity, these data provide the basis for understanding biological function and evolution. Accurate and complete genome assemblies offer a comprehensive and reliable foundation upon which to advance our understanding of organismal biology at genetic, species, and ecosystem levels. However, ever-changing sequencing technologies and analysis methods mean that available data are often heterogeneous in quality. In order to guide forthcoming genome generation efforts and promote efficient prioritisation of resources, it is thus essential to define and monitor taxonomic coverage and quality of the data. Here we present an automated analysis workflow that surveys genome assemblies from the United States National Center for Biotechnology Information (NCBI), assesses their completeness using the relevant Benchmarking Universal Single-Copy Orthologue (BUSCO) datasets, and collates the results into an interactively browsable resource. We apply our workflow to produce a community resource of available assemblies from the phylum Arthropoda, the Arthropoda Assembly Assessment Catalogue. Using this resource, we survey current taxonomic coverage and assembly quality at the NCBI, we examine how key assembly metrics relate to gene content completeness, and we compare results from using different BUSCO lineage datasets. These results demonstrate how the workflow can be used to build a community resource that enables large-scale assessments to survey species coverage and data quality of available genome assemblies, and to guide prioritisations for ongoing and future sampling, sequencing, and genome generation initiatives.


2021 ◽  
pp. 1-15
Author(s):  
Wentao Luo ◽  
Pingfa Feng ◽  
Jianfu Zhang ◽  
Dingwen Yu ◽  
Zhijun Wu

As the service life of the assembly equipment are short, the tightening data it produces are very limited. Therefore, data-driven assembly quality diagnosis is still a challenge task in industries. Transfer learning can be used to address small data problems. However, transfer learning has strict requirements on the training dataset, which is hard to satisfy. To solve the above problem, an Improved Deep Convolution Generative Adversarial Transfer Learning Model (IDCGAN-TM) is proposed, which integrates three modules: The generative learning module automatically produces source datasets based on small target datasets by using the improved generative-adversarial theory. The feature learning module improves the feature extraction ability by building a lightweight deep learning model (DL). The transfer learning module consists of a pre-trained DL and a one fully connected layer to better perform the intelligent quality diagnosis on the training small sample data. A parallel computing method is adopted to obtain produced source data efficiently. Real assembly quality diagnosis cases are designed and discussed to validate the advance of the proposed model. In addition, the comparison experiments are designed to show that the proposed approach holds the better transfer diagnosis performance compared with the existing three state-of-art approaches.


Sign in / Sign up

Export Citation Format

Share Document