03 February 2021

New feature in NCBI virus database: View Mutations in SARS-CoV-2

 

 

 Corona Update 3 February 2021




 

There seems to be a competition between countries to report new SARS-COV-2 variants. The media try to make sense of it and try to answer questions about how dangerous these new variants are. For example, the Scientific American:  The Most Worrying Mutations in Five Emerging Coronavirus Variants [1] and The Scientist [5].

This is a very useful article. I will return to it. But there are more variants and many more mutations. What is the total number of different mutations that have been found worldwide up to now? Answer: NCBI virus database [2]. The NCBI started an overview of all mutations in SARS-CoV-2. This is free information and no account is required. This is a user-friendly website. 

View Mutations in SARS-CoV-2 SRA Data

Click on the link View Mutations:

Table with all mutations of SARS-CoV-2

After a few seconds a table with all mutations appears with columns. See appendix for the columns in the  list.

Explanation

A non-synonymous substitution is for example: D 614 G is : amino acid D is replaced by G in position 614 in the Spike (surface glycoprotein). The 614 position is relative to the start of the first amino acid (AA) of the protein. For the Spike protein the position is between 1 and 1273. That is the length of the protein. The Spike is a relatively small protein. 

The genomic position is a number between 1 and 29,903. That is the length of the standard reference SARS-CoV-2 genome.

A synonymous substitution for example: Q 613 QQ 'replaced' by Q. This is still a substitution because the substitution is at the nucleotide level: CAA > CAG. The nucleotide change is listed also in the table.

The Count gives an indication whether the mutation is rare. In Collected location the countries of origin of the virus sample are specified. 

Furthermore, a handy feature is that each column can be sorted (up/down) by clicking on the header. Try it!

There are not yet statistics provided by the NCBI website. I counted (30 Jan) the number of  mutations in Spike protein (surface glycoprotein):

  • 264 non-synonymous mutations
  • 345 synonymous mutations 
  • 609 mutations total

This is expected: there are more synonymous than non-synonymous mutations. This is quite a lot for a protein of 1273 Amino Acids: 20% Amino Acid changes and 47% of the Spike nucleotides have mutations. The million dollar question is what the effect is on the behaviour of the protein and the properties of the virus. A first step is:


From one-dimensional RNA to three-dimensional proteins

A spectacular and sophisticated feature is the interactive 3-D display of the protein which is shown when clicking on the link of the Protein Change. Try it!

Click on the link N501Y
 

Loading data ... please wait ... (ignore error message):

Interactive 3D model of Spike protein
mouse pointer at N501.

 


try full screen video! (16 sec)

By moving the mouse pointer over the protein, the names of individual Amino Acids with position are displayed. The software is keeping track of all 1273 Amino Acids in this very complicated 3D structure! Really great software! After a lot of trial and error I found the ASN501. 

ASN = Asparagin; 1-letter code: N. 
Tip: for the table of code names for amino acids see this page

Asparagin on position 501 (N501) is the location of the famous mutation N501Y. N is replaced by Y. The amino acid it is marked by a yellow color:

zoomed in. Yellow structure is Asparagin in position 501

Not surprisingly, the yellow position 501 is located on the outside of the molecule. It must attach to the human ACE2 receptor. It could not work if it were located at the inside of the molecule.

Try it. Play with it. Move the cursor over the structure. Manipulate the point of view with your mouse by holding the mouse button down and move. Watch the different angles of view. Try other mutations. (click on other mutations in the main table). Zoom in.  Mind you: this is the molecule that caused a pandemic!

Remember: the three-dimensional structure of a protein is the first step in discovering the effect of a mutation. 

Problems: Not all links to 3D proteins seem correct. H1000Q results in a protein THR257. The links are made manual?

Later I discovered that one can select certain locations in the one-dimensional RNA (in the right panel of the page) and the selected amino acid will appear yellow highlighted in the 3D model. I have to explore that.

 

The famous N501Y mutation is found in the variant in UK, South Africa and Brazil. Here is the list of the Scientific American article [1]:


  • Spain:        A222V (Spike)     -
  • UK:             -     -       N501Y   (Spike)
  • South Africa: E484K  K417N    N501Y [virus escape mutant]
  • Brazil:       E484K  K417N/ N501Y

 

Universe too small ! too short living !

The number of possible proteins of length 1273 is staggering. Do the calculation: for every position there are 20 possibilities because there are 20 Amino Acids. "So there are 20×20 = 400 distinct proteins of 2 Amino Acids, 20x20x20 = 8000 proteins of length 3 AA, 160,000 proteins of length 4 AA, 3,200,000 with just 5 AA." [4] etc. Total: 20^1273 AA sequences for the Spike alone. And that is only one protein! Obviously, evolution could not have tried out all those possibilities. The age of the universe is too short to try them all out! So, we can expect endless new virus variants coming as long as we don't interfere with the pandemic and the virus is allowed its natural course.


 Notes

  1. The Most Worrying Mutations in Five Emerging Coronavirus Variants, Scientific American,

 

Appendix

Information in the NCBI table with all mutations:

  • Protein: all proteins encoded by SARS-CoV-2
  • Amino Acid substitution (as far as I can see: no insertions/deletions...) 
  • Count: total number of cases in the database of the specific mutation
  • Genomic location: the position in bases or: nt
  • Codon change. For example: GCT > GCC  (T is used instead of U !)
  • Non-synonymous (does change AA) or synonymous (does not change AA), AA = amino acid.
  • Collection location: country of origin of the sample 

 

Sources

10 comments:

  1. Gert,

    Very nice and beautiful video of the Spike protein.

    The last paragraph is particularly interesting to me because it reminds me of the book of Andreas Wagner, who explains how a protein chain of 100 amino acids can give rise to endless possibilities. Sure from a mathematical point of view this is all correct. But we do know that there is natural selection and not all possibilities are viable. So almost all of these possibilities are not viable. Only a few are viable in a certain moment and environment. So I do not expect endless variants. A lot maybe, but not endless. If the virus as it was originally, accumulates lots of mutations it will either be not viable or result in a completely different virus.

    ReplyDelete
  2. Thanks Gert for the update

    This doesn't sound good..

    Recurrent deletions in the SARS-CoV-2 spike glycoprotein drive antibody escape
    https://science.sciencemag.org/content/early/2021/02/02/science.abf6950

    ReplyDelete
  3. Hi Marleen, Harry,
    did my video play well also in full screen mode? on what system did you run it? Harry, on what system did you run it?
    Marleen, about the combinatorial explosion:
    Do you know of any reason why there could not be endless synonymous mutations in SARS-CoV-2?
    Non-synonymous mutations: how many AA are crucial and irreplaceable? I think only a few of the 1273...
    The ultimate question is: how many proteins with 1273 AA behave like Spike protein?
    And: how many AA could be deleted without destroying the activity?

    I think Harry's tip:
    Recurrent deletions in the SARS-CoV-2 spike glycoprotein drive antibody escape;
    is very interesting in that respect!

    Thanks both of you!

    ReplyDelete
  4. Hoi Bert, het aardige van de SimpleScreenRecorder is dat het door een vrijwilliger is geprogrammeerd als open source gratis beschikbaar gesteld is. Dat vindt ik altijd sympathiek. Hij accepteert geen donaties!

    Virologen denken verschillend over voor- en nadelen: bij uitstel van de tweede vaccinatie is het mogelijk dat immuniteit langzaam zakt en zodoende een geschikte omgeving schept voor het ontstaan van allerlei virus mutanten waaronder immune escape mutanten. Die kunnen zich vervolgens vermenigvuldigen in het lichaam en zich verspreiden van persoon tot persoon. Mogelijk kunnen ze vervolgens gevaccineerde personen besmetten. Mogelijk kunnen (1x) gevaccineerde mensen toch weer vaccin-resistente virussen produceren. Maar de kans op dit scenario is moeilijk in te schatten en er zijn geen empirische data.
    Het is vergelijkbaar met een half afgemaakte antibiotica kuur.

    Dat andere kant is: zo snel mogelijk zo veel mensen een eerste vaccinatie geven om herd immunity effect te bereiken.
    Omdat er geen harde data zijn is het moeilijk de voor- en nadelen tegen elkaar af te wegen.
    De Britse variant is mogelijk/waarschijlijk ontstaan in een immunocompromised patient (verzwakt immuun systeem) en dat is een vergelijkbare situatie. (zie vorige blogs)

    ReplyDelete
  5. Gert,

    There may well be endless synonymous mutations in SARS-CoV-2, but if they are synonymous they don’t have any effect on the phenotype (the activity or properties of the S protein). So it seems to me that synonymous mutations don’t count at all.

    The only thing that we should be aware of is that synonymous mutations may accumulate. I don’t know however what that may mean for the Spike protein.

    I think your question

    The ultimate question is: how many proteins with 1273 AA behave like Spike protein?
    And: how many AA could be deleted without destroying the activity?


    might be formulated in a slightly different way:

    How many mutations can the Spike protein support and still have the same activity/properties (it anchors to the ACE2 receptor and is recognized by certain antibodies)? But I must recognise that it is almost the same question.
    Anyway I agree


    Ik heb het filmpje op de browser Google Chrome met Windows 10 gezien.

    Ook ik vraag me af hoe dat zit met de tweede dosis. Daar had men toch beter van te voren over na kunnen denken. Iedereen die een prik krijgt zou de tweede dosis mee moeten krijgen en opbergen. Helaas moet het vaccin superkoud bewaard worden.



    ReplyDelete
  6. Marleen, thanks for the comment. Yes, synonymous mutations don't make a difference for the protein. The non-synonymous mutations: if the majority are deleterious, and only a few are beneficial, than that raises the question: how did the Spike protein originate? Usually, every protein must evolve by trial and error in a step by step way. So, how did it evolve? Did it show up fully-formed and pre-adapted for the human ACE2? If it descended from SARS-COV-1 then by what evolutionary path? There must be some pretty significant improvements since SARS-1 because SARS-1 did not cause a pandemic. So, there are some important and intriguing questions to be solved...

    Bedankt voor je test van het filmpje, het was nieuw voor mij, een experiment.

    Ja, de tweede dosis: het is een duivels dilemma om te kiezen: wel/niet de tweede dosis uitstellen ten gunste van een grote groep mensen. Wetenschappelijke tijdschriften als Science en Nature en bij ons de KNAW zouden onmiddellijk een forum moeten organiseren waar voor- en tegenstanders met elkaar in discussie gaan op wetenschappelijk niveau en een consensus bereiken.

    ReplyDelete
  7. Gert
    another relevant study:

    Kemp, SA et al. SARS-CoV-2 evolution during treatment of chronic infection. Nature; 5 Feb; DOI: 10.1038/s41586-021-03291-y

    ReplyDelete
  8. Gert,

    some stuff for your next blog!

    ...the DeepVacPred computational framework directly predicts 26 potential vaccine subunits from the available SARS-CoV-2 spike protein sequence.

    .....Moreover, we trace the RNA mutations of the SARS-CoV-2 and ensure that the designed vaccine can tackle the recent RNA mutations of the virus.

    Yang, Z., Bogdan, P. & Nazarian, S. An in silico deep learning approach to multi-epitope vaccine design: a SARS-CoV-2 case study. Sci Rep 11, 3238 (2021). https://doi.org/10.1038/s41598-021-81749-9

    ReplyDelete
  9. Harry, dank inclusief de vorige: spot on! Ik hoef de bladen niet meer af te struinen, ik krijg tegenwoordig alles aangeleverd!

    ReplyDelete

Comments to posts >30 days old are being moderated.
Safari causes problems, please use Firefox or Chrome for adding comments.