Corona Update 3 February 2021
There seems to be a competition between countries to report new SARS-COV-2
variants. The media try to make sense of it and try to answer questions about how
dangerous these new variants are. For example, the
Scientific American: The Most Worrying Mutations in Five Emerging Coronavirus Variants [1] and The Scientist [5].
This is a very useful article. I will return to it. But there are more variants and many more mutations. What is the total number of different mutations that have been found worldwide up to
now? Answer:
NCBI virus database
[2]. The NCBI started an overview of all mutations in SARS-CoV-2. This is free information and no account is required. This is a user-friendly website.
|
| View Mutations in SARS-CoV-2 SRA Data |
Click on the link View Mutations:
|
|
Table with all mutations of SARS-CoV-2 |
After a few seconds a table with all mutations appears with columns. See appendix for the columns in the list.
Explanation
A non-synonymous substitution is for example: D 614 G is : amino acid D is replaced by G in position 614 in the Spike (surface glycoprotein). The 614 position is relative to the start of the first amino acid (AA) of the protein. For the Spike protein the position is between 1 and 1273. That is the length of the protein. The Spike is a relatively small protein.
The
genomic position is a number between 1 and 29,903. That is the length of the standard reference SARS-CoV-2 genome.
A synonymous substitution for example: Q 613
Q. Q 'replaced' by Q. This is still a substitution
because the substitution is at the nucleotide level: CAA > CAG. The nucleotide change is listed also in the table.
The Count gives an indication whether the mutation is rare. In Collected location the countries of origin of the virus sample are specified.
Furthermore, a handy feature is that each
column can be sorted (up/down) by clicking on the header. Try it!
There are not yet statistics provided by the NCBI website. I counted (30 Jan) the
number of mutations in Spike protein (surface glycoprotein):
- 264 non-synonymous mutations
- 345 synonymous mutations
- 609 mutations total
This is expected: there are more synonymous than non-synonymous mutations. This is quite a lot for a protein of 1273 Amino Acids: 20% Amino Acid changes and 47% of the Spike nucleotides have mutations. The million dollar question is what the effect is on the behaviour of the protein and the properties of the virus. A first step is:
From one-dimensional RNA to three-dimensional proteins
A spectacular and sophisticated feature is the interactive 3-D display of the protein which is shown when clicking on the link of the Protein Change. Try it!
![]() |
| Click on the link N501Y |
Loading data ... please wait ... (ignore error message):
|
|
Interactive 3D model of Spike protein mouse pointer at N501. |
By moving the mouse pointer over the protein, the names of individual Amino Acids with position are displayed. The software is keeping track of all 1273 Amino Acids in this very complicated 3D structure! Really great software! After a lot of trial and error I found the ASN501.
ASN = Asparagin; 1-letter code: N.
Tip: for the table of code names for amino acids see this page.
Asparagin on position 501 (N501) is the location of the famous mutation N501Y. N is replaced by Y. The amino acid it is marked by a yellow color:
![]() |
| zoomed in. Yellow structure is Asparagin in position 501 |
Not surprisingly, the yellow position 501 is located on the outside of the molecule. It must attach to the human ACE2 receptor. It could not work if it were located at the inside of the molecule.
Try it. Play with it. Move the cursor over the structure. Manipulate the point of view with your mouse by holding the mouse button down and move. Watch the different angles of view. Try other mutations. (click on other mutations in the main table). Zoom in. Mind you: this is the molecule that caused a pandemic!
Remember: the three-dimensional structure of a protein is the first step in discovering the effect of a mutation.
Problems: Not all links to 3D proteins seem correct. H1000Q results in a protein THR257. The links are made manual?
Later I discovered that one can select certain locations in the one-dimensional RNA (in the right panel of the page) and the selected amino acid will appear yellow highlighted in the 3D model. I have to explore that.
The famous N501Y mutation is found in the variant in UK, South Africa and Brazil. Here is the list of the Scientific American article [1]:
-
Spain: A222V
(Spike) -
- UK: - - N501Y (Spike)
- South Africa: E484K K417N N501Y [virus escape mutant]
-
Brazil:
E484K K417N/T
N501Y
Universe too small ! too short living !
The number of possible proteins of length 1273 is staggering. Do the calculation: for every position there are 20 possibilities because there are 20 Amino Acids. "So there are 20×20 = 400 distinct proteins of 2 Amino Acids, 20x20x20 = 8000 proteins of length 3 AA, 160,000 proteins of length 4 AA, 3,200,000 with just 5 AA." [4] etc. Total: 20^1273 AA sequences for the Spike alone. And that is only one protein! Obviously, evolution could not have tried out all those possibilities. The age of the universe is too short to try them all out! So, we can expect endless new virus variants coming as long as we don't interfere with the pandemic and the virus is allowed its natural course.
Notes
- The Most Worrying Mutations in Five Emerging Coronavirus Variants, Scientific American,
Appendix
Information in the NCBI table with all mutations:
- Protein: all proteins encoded by SARS-CoV-2
- Amino Acid substitution (as far as I can see: no insertions/deletions...)
-
Count: total number of cases in the database of the specific mutation
- Genomic location: the position in bases or: nt
- Codon change. For example: GCT > GCC (T is used instead of U !)
-
Non-synonymous (does change AA) or synonymous (does not change AA), AA = amino acid.
- Collection location: country of origin of the sample
Sources
- This page has a table with the abbreviations of the amino acids.
- The video was created with SimpleScreenRecorder for Linux by Maarten Baert.










