In a recent study posted, researchers examined whether the mutational process of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) itself has changed during the evolution of SARS-CoV-2.
Studies have reported a lowered relative number of guanine (G)→thymine (T) substitutions at all regions for SARS-CoV-2 Omicron variant of concern (VOC) clades. However, the degree to which phylogenetic signals are impacted by protein selection and the underlying mutational rates is unclear. Further research is required to determine whether differences in the proportion of G→T substitutions between the Omicron VOC and non-Omicron VOCs predominantly characterize the mutational spectrum of SARS-CoV-2 or only a constituent of the continuously observed variations in various cellular organisms.
In the present study, researchers examined the relative but not absolute rates of different nucleotide mutational types across SARS-CoV-2 clades.
The occurrences of distinct nucleotide mutations on global-scale phylogenetic tree branches comprising approximately six million SARS-CoV-2 sequences were counted to determine the SARS-CoV-2 mutation spectrum. To count mutations, a pre-built mutation tree with clade annotations was used, and subsets of the tree were analyzed.
The counts of mutations on all the branches of a genetic clade were tallied, excluding mutations on any branch with more than four mutations, such that more than one of the mutations was a reversion to the Wuhan-Hu-1 genome or to the founder of that genetic clade. The 4.0-fold degenerate sites in every clade were identified. The team excluded sites with fixed amino acid substitutions concerning the Wuhan-Hu-1 strain. Only clades with ≥5000 mutations in the included regions were analyzed.
Further, the counts of every mutation type for a particular genetic clade at the included 4.0-fold degenerate regions were tabulated, and the proportion of mutations of that particular type was determined. PCA (Principal component analysis) was performed to evaluate the proportion of mutations at the selected degenerate regions of every type.
Furthermore, the analysis was repeated on data subsets restricted to England and the United States of America (USA) sequences, excluding the five most mutated regions for any genetic clade or partitioning the genome of SARS-CoV-2 into two halves to assess the consistency of the findings. The relative rates of mutations were calculated by normalizing the proportion of mutations at the analyzed degenerate regions of a particular type by that of all nucleotides present at the region in the parental nucleotide identity genetic clade founder.
The Mantel test was used to estimate the significance of the correlation between the Euclidean distance between clades’ mutation spectra and the square root of the phylogenetic distance between clade founder sequences (phylogenetic signal). The predicted equilibrium frequencies from the SARS-CoV-2 clades’ mutation spectra were compared to the actual frequencies of nucleotides observed at the degenerate regions in several sarbecoviruses. In addition, the predicted and observed nucleotide frequencies for the human influenza virus were calculated.
Findings and Conclusion
Distinct shifts were observed in the relative rates of several mutation types during SARS-CoV-2 evolution. Omicron showed approximately 2.0-fold lower relative G→T substitution rates concerning early SARS-CoV-2 clades. Shifts were also observed in the mutation spectrum, including lesser cytosine (C)→T substitutions in the Delta VOC and a broader association between the mutational spectrum divergence and genomic divergence across the SARS-CoV-2 phylogeny. The phylogenetically correlated and pervasive shifts indicated SARS-CoV-2 mutations that affected genomic replication, packaging, and innate immunological antagonism.
The Omicron mutation spectrum showed more similarity than the earlier SARS-CoV-2 clades to the spectrum shaping long-term sarbecovirus evolution. Inter-clade differences were robust to analyzing SARS-CoV-2 sequences from less mutated sites or analyzing the SARS-CoV-2 genome halves separately. Relative mutational rates correlated significantly with inter-clade phylogenetic distances, even after excluding G→T substitutions or assessing only Omicron clades or genetic clades of other VOCs. The findings indicated that SARS-CoV-2 mutational spectrum evolution was beyond alterations in G→T substitution relative rates in the Omicron VOC.
The fraction of G→T substitutions in Omicron reduced from 15.0% to 8.0%, and the overall mutational rate in Omicron was 7.0% lower than that of other SARS-CoV-2 VOCs. Phylogenetic signals in mutation spectrums included mutations other than G→T indicated that the mutational rates changed with time, without mandatorily changing the mutational rate of SARS-CoV-2 on the whole. The estimated equilibrium nucleotide frequencies differed among clades; for example, in the Omicron VOC mutational spectrum, T nucleotides were less frequent than in earlier clades.
Nucleotide frequencies at the included sites were comparable for SARS-CoV-2, close relatives, bat CoV RaTG13 and Lao bat virus BANAL-52, and divergent sarbecoviruses SARS-CoV and the BtKY72 Kenya bat CoV. However, the predicted frequencies of nucleotides differed from the actual frequencies for sarbecoviruses, including SARS-CoV-2, whereas the observed and predicted frequencies at the degenerate regions of the mutation spectrum for the influenza virus correlated considerably. Putative associations of protein-coding mutations with mutation spectrum changes were observed.
Overall, the study findings showed that the mutation process varies dynamically during SARS-CoV-2 evolution, and indicated that the SARS-CoV-2 mutation spectrum might be trending towards a spectrum comparable to the mutation spectrum of other sarbecoviruses.