A Novel Method to Predict Highly Expressed Genes Based on Radius Clustering and Relative Synonymous Codon Usage

Synonymous codon usage of protein coding genes of thirty two completely sequenced mycobacteriophage genomes was studied using multivariate statistical analysis. One of the major factors influencing codon usage is identified to be compositional bias. Codons ending with either C or G are preferred in highly expressed genes among which C ending codons are highly preferred over G ending codons. A strong negative correlation between effective number of codons (Nc) and GC3s content was also observed, showing that the codon usage was effected by gene nucleotide composition. Translational selection is also identified to play a role in shaping the codon usage operative at the level of translational accuracy. High level of heterogeneity is seen among and between the genomes. Length of genes is also identified to influence the codon usage in 11 out of 32 phage genomes. Mycobacteriophage Cooper is identified to be the highly biased genome with better translation efficiency comparing well with the host specific tRNA genes.

Download Full-text

The selection-mutation-drift theory of synonymous codon usage.

Genetics ◽

10.1093/genetics/129.3.897 ◽

1991 ◽

Vol 129 (3) ◽

pp. 897-907 ◽

Cited By ~ 51

Author(s):

M Bulmer

Keyword(s):

Codon Usage ◽

Genetic Model ◽

Synonymous Codon ◽

Growth Yield ◽

Synonymous Codon Usage ◽

Synonymous Codons ◽

Drift Theory ◽

Highly Expressed Genes ◽

Unicellular Organisms ◽

Selection For

Abstract It is argued that the bias in synonymous codon usage observed in unicellular organisms is due to a balance between the forces of selection and mutation in a finite population, with greater bias in highly expressed genes reflecting stronger selection for efficiency of translation. A population genetic model is developed taking into account population size and selective differences between synonymous codons. A biochemical model is then developed to predict the magnitude of selective differences between synonymous codons in unicellular organisms in which growth rate (or possibly growth yield) can be equated with fitness. Selection can arise from differences in either the speed or the accuracy of translation. A model for the effect of speed of translation on fitness is considered in detail, a similar model for accuracy more briefly. The model is successful in predicting a difference in the degree of bias at the beginning than in the rest of the gene under some circumstances, as observed in Escherichia coli, but grossly overestimates the amount of bias expected. Possible reasons for this discrepancy are discussed.

Download Full-text

EVOLUTION OF RELATIVE SYNONYMOUS CODON USAGE IN HUMAN IMMUNODEFICIENCY VIRUS TYPE-1

Journal of Bioinformatics and Computational Biology ◽

10.1142/s0219720005000953 ◽

2005 ◽

Vol 03 (01) ◽

pp. 157-168 ◽

Cited By ~ 26

Author(s):

PETER L. MEINTJES ◽

ALLEN G. RODRIGO

Keyword(s):

Human Immunodeficiency Virus ◽

Codon Usage ◽

Human Immunodeficiency Virus Type ◽

Synonymous Codon ◽

Host Adaptation ◽

Synonymous Codon Usage ◽

Relative Synonymous Codon Usage ◽

Immunodeficiency Virus ◽

Hiv 1

Mutation in Human Immunodeficiency Virus type-1 (HIV-1) is extremely rapid, a consequence of a low-fidelity viral reverse transcription process. The envelope gene has been shown to accumulate substitutions at a rate of approximately 1% per year and can frequently spend a long time in the host (approximately 10 years). The relative synonymous codon usage (RSCU) in HIV-1 is known to be different from that of the human host. However, by reengineering the protein coding sequences of HIV-1 to reflect the RSCU patterns observed in humans, a large increase in protein expression is observed. It is reasonable to suggest that within a host there may be a selective drive for change in the RSCU of HIV-1 towards human RSCU.To test this hypothesis we analyzed HIV-1 partial envelope sequences from eight patients sampled serially in time. For each sequence, an RSCU table was constructed. Sequences were labelled as "early" or "late" depending on whether they were sampled before or after the mid-point of the study. Using the RSCU values as descriptor variables, a Principal Components Analysis (PCA) was performed. The first three components clearly discriminated between early and late sequences. We also constructed pooled groupwise RSCU tables for early and late sequences. The viral RSCU values of each of the groups were correlated with human RSCU. If there is selection for host-adaptation in RSCU, we expect that "late" viral RSCUs would tend to be more highly correlated with human RSCU than "early" viral RSCUs. In fact, tests of significance suggest that this is the case. However, closer examination of the data revealed that the apparent trend towards human RSCU can be attributed to the homogenization of the codon usage by mutation pressure rather than host adaptation.

Download Full-text

Muscular Dystrophy Disease Classification Using Relative Synonymous Codon Usage

International Journal of Machine Learning and Computing ◽

10.18178/ijmlc.2016.6.2.588 ◽

2016 ◽

Vol 6 (2) ◽

pp. 139-144 ◽

Cited By ~ 3

Author(s):

K. Sathyavikasini ◽

◽

M. S. Vijaya

Keyword(s):

Muscular Dystrophy ◽

Codon Usage ◽

Synonymous Codon ◽

Synonymous Codon Usage ◽

Relative Synonymous Codon Usage ◽

Disease Classification

Download Full-text

Conserved codon adaptation in highly expressed genes is associated with higher regularity in mRNA secondary structures

10.1101/2020.11.23.393322 ◽

2020 ◽

Author(s):

Mark G. Sterken ◽

Ruud H.P. Wilbers ◽

Pjotr Prins ◽

Basten L. Snoek ◽

George M. Giambasu ◽

...

Keyword(s):

Codon Usage ◽

Developmental Stages ◽

Synonymous Codon ◽

Expression Profiles ◽

Secondary Structures ◽

Relative Increase ◽

Synonymous Codon Usage ◽

Translation Efficiency ◽

Highly Expressed Genes ◽

Higher Regularity

ABSTRACTThe redundancy of the genetic code allows for a regulatory layer to optimize protein synthesis by modulating translation and degradation of mRNAs. Patterns in synonymous codon usage in highly expressed genes have been studied in many species, but scarcely in conjunction with mRNA secondary structure. Here, we analyzed over 2,000 expression profiles covering a range of strains, treatments, and developmental stages of five model species (Escherichia coli, Arabidopsis thaliana, Saccharomyces cerevisiae, Caenorhabditis elegans, and Mus musculus). By comparative analyses of genes constitutively expressed at high and low levels, we revealed a conserved shift in codon usage and predicted mRNA secondary structures. Highly abundant transcripts and proteins, as well as high protein per transcript ratios, were consistently associated with less variable and shorter stretches of weak mRNA secondary structures (loops). Genome-wide recoding showed that codons with the highest relative increase in highly expressed genes, often C-ending and not necessarily the most frequent, enhanced formation of uniform loop sizes. Our results point at a general selective force contributing to the optimal expression of abundant proteins as less variable secondary structures promote regular ribosome trafficking with less detrimental collisions, thereby leading to an increase in mRNA stability and a higher translation efficiency.

Download Full-text

BioKIT: a versatile toolkit for processing and analyzing diverse types of sequence data

10.1101/2021.10.02.462868 ◽

2021 ◽

Author(s):

Jacob L. Steenwyk ◽

Thomas J. Buida ◽

Carla Goncalves ◽

Dayna C. Goltz ◽

Grace H Morales ◽

...

Keyword(s):

Codon Usage ◽

Sequence Data ◽

Synonymous Codon ◽

Synonymous Codon Usage ◽

Relative Synonymous Codon Usage ◽

Summary Statistics ◽

Sequencing Data ◽

Sequence Alignments ◽

Multiple Sequence ◽

Genome Assemblies

Bioinformatic analysis - such as genome assembly quality assessment, alignment summary statistics, relative synonymous codon usage, paired-end aware quality trimming and filtering of sequencing reads, file format conversion, and processing and analysis - is integrated into diverse disciplines in the biological sciences. Several command-line pieces of software have been developed to conduct some of these individual analyses; however, the lack of a unified toolkit that conducts all these analyses can be a barrier in workflows. To address this obstacle, we introduce BioKIT, a versatile toolkit for the UNIX shell environment with 40 functions, several of which were community-sourced, that conduct routine and novel processing and analysis of genome assemblies, multiple sequence alignments, coding sequences, sequencing data, and more. To demonstrate the utility of BioKIT, we assessed the quality and characteristics of 901 eukaryotic genome assemblies, calculated alignment summary statistics for 10 phylogenomic data matrices, determined relative synonymous codon usage across 171 fungal genomes including those that use alternative genetic codes, and demonstrate that a novel metric, gene-wise relative synonymous codon usage, can accurately estimate gene-wise codon optimization. BioKIT will be helpful in facilitating and streamlining sequence analysis workflows. BioKIT is freely available under the MIT license from GitHub (https://github.com/JLSteenwyk/BioKIT), PyPi (https://pypi.org/project/biokit), and the Anaconda Cloud (https://anaconda.org/JLSteenwyk/biokit). Documentation, user tutorials, and instructions for requesting new features are available online (https://jlsteenwyk.com/BioKIT).

Download Full-text

Comparative analysis of codon usage patterns of FUT2 from different species

Kuwait Journal of Science ◽

10.48129/kjs.11289 ◽

2021 ◽

Author(s):

Chao Xu ◽

◽

Wen B. Bao ◽

Sheng L. Wu ◽

Zheng C. Wu ◽

...

Keyword(s):

Codon Usage ◽

Codon Usage Bias ◽

Molecular Mechanisms ◽

Synonymous Codon ◽

Synonymous Codon Usage ◽

Relative Synonymous Codon Usage ◽

Effective Number ◽

E Coli ◽

Effective Number Of Codons ◽

Fut2 Gene

Enterotoxigenic E. coli is an important zoonotic pathogen causing diarrhea in human and newborn animals. α - (1,2) fucosyltransferase 2 (FUT2) is closely associated with the formation of pathogenic receptors of Enterotoxigenic E. coli. Codon usage bias analysis can help to better understand the molecular mechanisms and evolutionary relationships of a particular gene. In order to understand the codon usage pattern of FUT2 gene, FUT2 gene coding sequences of nine species were selected from GenBank database for calculating the nucleotide composition (GC content) and genetic indices including effective number of codons, relative synonymous codon usage and relative codon usage bias using R software, in order to analyze codon usage bias and base composition in FUT2 gene from different species. The results showed that the codon usage of FUT2 gene in different species was affected by GC bias, especially GC frequency at the third position of codon (GC3). Most of the optimal codons were biased towards the G/C-ending types. GCC, CUG, UCC, GUG and AUC showed the highest relative synonymous codon usage value among different species, belonging to the most dominant codons. The usage characteristic of the codens for FUT2 gene in Sus scrofa was similar to that of Bos taurus; Homo sapiens was similar to Pan troglodytes. Effective number of codons was significantly, negatively correlated with GC3, and the relative higher frequency of optimal codon implied that FUT2 genes from different species had a strong bias in codon usage.

Download Full-text

Comparative Analysis of Codon Usage and tRNA in Mitochondrial Genomes of Gallus Gallus

Avian Biology Research ◽

10.3184/175815509x12473915395956 ◽

2009 ◽

Vol 2 (3) ◽

pp. 133-141

Author(s):

Tangjie Zhang ◽

Hong Chang ◽

Yuzhi Liu ◽

Huifang Li ◽

Kuanwei Chen

Keyword(s):

Codon Usage ◽

Synonymous Codon ◽

Gallus Gallus ◽

Mitochondrial Genes ◽

Synonymous Codon Usage ◽

Relative Synonymous Codon Usage ◽

Mutational Bias ◽

Mitochondrial Genomes ◽

The Third ◽

Codon Positions

Codon usage in mitochondrial genes of 11 Gallus gallus and two Anatidae species was analysed to determine the general patterns in codon choice of Callus gallus species. C3 contents were higher in Gallus gallus than in mammalian mitochondrial genomes that encode protein codon positions. The high C3 contents of Callus gallus might be the result of relatively strong mutational bias that occurred in the lineage of the Callus gallus species. A and C ending codons were detected as the “preferred 77 codons in Callus gallus and Anatidae. The NNR codon families are dominated by the A-ending codons, the NNY codon families are dominated by the C-ending codons and the NNN codon families are dominated by the A-ending or the C-ending codons. A comparison of the relative synonymous codon usage (RSCU) and synonymous codon families (SCF) of tRNA and proteins was made, and two groups can be classified by SCF. The codon usage in Callus gallus species indicates that codons containing A or C at the third position are used preferentially, regardless of whether corresponding tRNAs are encoded in the mtDNA. In both Callus gallus and Anatidae species mtDNA, codon usage biases are highly related to CC-ending binucleotide condons.

Download Full-text