Codon usage correlations and crystal basis model of the genetic code

2001 ◽  
Vol 55 (2) ◽  
pp. 287-293 ◽  
Author(s):  
M. L Chiusano ◽  
L Frappat ◽  
P Sorba ◽  
A Sciarrino
2016 ◽  
Author(s):  
Bohdan B. Khomtchouk ◽  
Claes Wahlestedt ◽  
Wolfgang Nonner

Codon usage in 2730 genomes is analyzed for evolutionary patterns in the usage of synonymous codons and amino acids across prokaryotic and eukaryotic taxa. We group genomes together that have similar amounts of intra-genomic bias in their codon usage, and then compare how usage of particular different codons is diversified across each genome group, and how that usage varies from group to group. Inter-genomic diversity of codon usage increases with intra-genomic usage bias, following a universal pattern. The frequencies of the different codons vary in robust mutual correlation, and the implied synonymous codon and amino acid usages drift together. This kind of correlation indicates that the variation of codon usage across organisms is chiefly a consequence of lateral DNA transfer among diverse organisms. The group of genomes with the greatest intra-genomic bias comprises two distinct subgroups, with each one restricting its codon usage to essentially one unique half of the genetic code table. These organisms include eubacteria and archaea thought to be closest to the hypothesized last universal common ancestor (LUCA). Their codon usages imply genetic diversity near the hypothesized base of the tree of life. There is a continuous evolutionary progression across taxa from the two extremely diversified usages toward balanced usage of different codons (as approached, e.g. in mammals). In that progression, codon frequency variations are correlated as expected from a blending of the two extreme codon usages seen in prokaryotes.AUTHOR SUMMARYThe redundancy intrinsic to the genetic code allows different amino acids to be encoded by up to six synonymous codons. Genomes of different organisms prefer different synonymous codons, a phenomenon known as ‘codon usage bias.’ The phenomenon of codon usage bias is of fundamental interest for evolutionary biology, and is important in a variety of applied settings (e.g., transgene expression). The spectrum of codon usage biases seen in current organisms is commonly thought to have arisen by the combined actions of mutations and selective pressures. This view focuses on codon usage in specific genomes and the consequences of that usage for protein expression.Here we investigate an unresolved question of molecular genetics: are there global rules governing the usage of synonymous codons made by genomic DNA across organisms? To answer this question, we employed a data-driven approach to surveying 2730 species from all kingdoms of the ‘tree of life’ in order to classify their codon usage. A first major result was that the large majority of these organisms use codons rather uniformly on the genome-wide scale, without giving preference to particular codons among possible synonymous alternatives. A second major result was that two compartments of codon usage seem to co-exist and to be expressed in different proportions by different organisms. As such, we investigate how individual different codons are used in different organisms from all taxa. Whereas codon usage is generally believed to be the evolutionary result of both mutations and natural selection, our results suggest a different perspective: the usage of different codons (and amino acids) by different organisms follows a superposition of two distinct patterns of usage. One distinction locates to the third base pair of all different codons, which in one pattern is U or A, and in the other pattern is G or C. This result has two major implications: (1) the variation of codon usage as seen across different organisms is best accounted for by lateral gene transfer among diverse organisms; (2) the organisms that are by protein homology grouped near the base of the ‘tree of life’ comprise two genetically distinct lineages.We find that, over evolutionary time, codon usages have converged from two distinct, non-overlapping usages (e.g., as evident in bacteria and archaea) to a near-uniform, balanced usage of synonymous codons (e.g., in mammals). This shows that the variations of codon (and amino acid) biases reveal a distinct evolutionary progression. We also find that codon usage in bacteria and archaea is most diverse between organisms thought to be closest to the hypothesized last universal common ancestor (LUCA). The dichotomy in codon (and amino acid usages) present near the origin of the current ‘tree of life’ might provide information about the evolutionary development of the genetic code.


1987 ◽  
Vol 244 (2) ◽  
pp. 331-335 ◽  
Author(s):  
P H Andreasen ◽  
H Dreisig ◽  
K Kristiansen

The codon usage of Tetrahymena thermophila and other ciliates deviates from the ‘universal genetic code’ in that UAA and probably UAG are not translational termination signals but code for glutamine. Therefore, translation in vitro of mRNA from Tetrahymena in a reticulocyte lysate is prematurely terminated if a UAA or UAG triplet is present in the reading frame of the mRNA. We show that the addition of a subcellular fraction from Tetrahymena thermophila enables a rabbit reticulocyte lysate to translate Tetrahymena mRNAs into full-sized proteins. The activity of the subcellular fraction is shown to depend on the combined function of a protein component(s) and a tRNA(s). The subcellular fraction is easily prepared and its usefulness for the identification of isolated mRNAs from Tetrahymena by their translation products in vitro is demonstrated.


2020 ◽  
Vol 18 (1) ◽  
Author(s):  
Yi Liu

Abstract The genetic code is degenerate, and most amino acids are encoded by two to six synonymous codons. Codon usage bias, the preference for certain synonymous codons, is a universal feature of all genomes examined. Synonymous codon mutations were previously thought to be silent; however, a growing body evidence now shows that codon usage regulates protein structure and gene expression through effects on co-translational protein folding, translation efficiency and accuracy, mRNA stability, and transcription. Codon usage regulates the speed of translation elongation, resulting in non-uniform ribosome decoding rates on mRNAs during translation that is adapted to co-translational protein folding process. Biochemical and genetic evidence demonstrate that codon usage plays an important role in regulating protein folding and function in both prokaryotic and eukaryotic organisms. Certain protein structural types are more sensitive than others to the effects of codon usage on protein folding, and predicted intrinsically disordered domains are more prone to misfolding caused by codon usage changes than other domain types. Bioinformatic analyses revealed that gene codon usage correlates with different protein structures in diverse organisms, indicating the existence of a codon usage code for co-translational protein folding. This review focuses on recent literature on the role and mechanism of codon usage in regulating translation kinetics and co-translational protein folding.


1995 ◽  
Vol 27 (3) ◽  
pp. 249-256 ◽  
Author(s):  
Kiyohiko Angata ◽  
Kenji Kuroe ◽  
Kaichirou Yanagisawa ◽  
Yoshimasa Tanaka

2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Ádám Radványi ◽  
Ádám Kun

AbstractThe mutational robustness of the genetic code is rarely discussed in the context of biological diversity, such as codon usage and related factors, often considered as independent of the actual organism’s proteome. Here we put the living beings back to picture and use distortion as a metric of mutational robustness. Distortion estimates the expected severities of non-synonymous mutations measuring it by amino acid physicochemical properties and weighting for codon usage. Using the biological variance of codon frequencies, we interpret the mutational robustness of the standard genetic code with regards to their corresponding environments and genomic compositions (GC-content). Employing phylogenetic analyses, we show that coding fidelity in physicochemical properties can deteriorate with codon usages adapted to extreme environments and these putative effects are not the artefacts of phylogenetic bias. High temperature environments select for codon usages with decreased mutational robustness of hydrophobic, volumetric, and isoelectric properties. Selection at high saline concentrations also leads to reduced fidelity in polar and isoelectric patterns. These show that the genetic code performs best with mesophilic codon usages, strengthening the view that LUCA or its ancestors preferred lower temperature environments. Taxonomic implications, such as rooting the tree of life, are also discussed.


2020 ◽  
Author(s):  
Bohdan B. Khomtchouk

AbstractIn this study, we investigate how an organism’s codon usage bias levels can serve as a predictor and classifier of various genomic and evolutionary features across the three kingdoms of life (archaea, bacteria, eukarya). We perform secondary analysis of existing genetic datasets to build several artificial intelligence (AI) and machine learning models trained on over 13,000 organisms that show it is possible to accurately predict an organism’s DNA type (nuclear, mitochondrial, chloroplast) and taxonomic identity simply using its genetic code (64 codon usage frequencies). By leveraging advanced AI and machine learning methods to accurately identify evolutionary origins and genetic composition from codon usage patterns, our study suggests that the genetic code can be utilized to train accurate machine learning classifiers of taxonomic and phylogenetic features. Our dataset and analyses are made publicly available on Github and the UCI Machine Learning Repository (https://archive.ics.uci.edu/ml/datasets/Codon+usage) to facilitate open-source reproducibility and community engagement.


2018 ◽  
Author(s):  
Bea Yu ◽  
Matthew Murphy ◽  
Peter A. Carr

AbstractExtreme engineering of an organism’s genetic code could impart true genetic incompatibility, even blocking effects of horizontal gene transfer and viral infection. Recent experiments exploring this possibility demonstrate that such radical genome engineering achievements are plausible. However, it is unclear when the modifications will compromise the fitness of an organism. Efforts to reformat an entire genome are difficult and expensive; computational methods predicting fruitful experimental trajectories could play a pivotal role in advancing such efforts. We present a framework for building in silico models to assist genome-scale engineering. Genetic code engineering requires choosing from many possible codon-usage schemes, to find a design that is viable and effective. We use machine learning to identify which alternative codon-usage schemes are likely to result in no observed viable cells. Our data-driven approach employs observations of how modifying codon usage in individual genes impacted observed viability in E. coli, revealing salient features for early identification of problematic genetic code designs. We achieved an average area under the receiver operating characteristic of 0.72 on out-ofsample data.Author SummaryAs machine learning and artificial intelligence play an increasingly central role in science and engineering, it will be important to establish standardized techniques that facilitate the dialogue between experimentation and modeling. Biological experimental techniques are concurrently evolving at a rapid pace, providing unique opportunities to collect high-quality, novel information that was previously unobtainable. This work navigates the landscape of this vast, new territory, identifies interesting landmarks for exploration and posits new approaches towards advancing our research efforts in these areas. In this work, we show that, using a small dataset of 47 observations and rigorous nested cross validation techniques, we can build a model that makes better-than-random predictions of how codon usage changes in essential genes influence viability in E. coli. These predictions can be used to inform experimental trajectories in both genetic code and codon optimization experiments. We discuss ways to improve this model, iteratively, by performing high value experiments that decrease uncertainty in predictions and extrapolation error. Finally, we present novel visualization methods to aid in developing intuitions for how re-coding impacts groups of genes. These methods are also useful tools in building important insights into how well machine learning algorithms can generalize to new data.


1999 ◽  
Vol 259 (5) ◽  
pp. 339-348 ◽  
Author(s):  
L. Frappat ◽  
P. Sorba ◽  
A. Sciarrino
Keyword(s):  

Sign in / Sign up

Export Citation Format

Share Document