Can the new British B.1.1.7 variant be found in the NCBI database?
Corona Update 7 Jan 2021
There is much talk these days about the new British SARS-COV-2 B.1.1.7 variant. Even top scientific journals like Science raise the alarm: "Viral mutations may cause another ‘very, very bad’ COVID-19 wave, scientists warn".
In the previous blogpost I explored the NCBI database. Can I find this
new variant in the NCBI database? How do I find it in a database of
nearly 32,000 SARS-CoV-2 nucleotide sequences? I
first tried the sequences from the UK, of course. But, amazingly, they
uploaded only 5 complete genomes and 60 proteins. None of them were
useful. How do I recognize the variant anyway? The new variant is
characterised by the unique combination of 17 mutations (Fig 1).
Fig. 1. All 17 mutations of the British variant B.1.17 (source) Note: all substitutions are non-synonymous! |
The Spike protein enables entrance in to human cells. Very important protein. The standard length of the Spike protein is 1273 amino acids (AA). Since the Spike protein of the new variant has two deletions (together 3 amino acids or 9 bases), it has been shortened to 1270 amino acids. So, any Spike protein with 1273 AA can be eliminated from the search. So, for a start, I selected all SARS-CoV-2 Spike protein sequences with a length of 1270 AA. They did exist. I checked whether all 8 mutations were present. Fortunately, they were present in five sequences (Fig 2):
Fig 2. B.1.1.7 variants - Spike protein (composite image) row 2,3,4,5,6 are B.1.1.7 variants. Others are controls. The numbers above the columns are sequence positions, Click to enlarge |
That's a promising start. But there are nine mutations in other genes. I entered the five Sequence IDs in the 'Accession' filter (with no
further filters). That results in whole genomes. I hit the button 'Align'.
I checked the presence of the remaining nine mutations one by
one in the rest of the virus. And, lo and behold, they all were found at
the exact locations predicted in the table (Fig.3). That means this
really is the new British B.1.1.7 variant. Big surprise: they were
captured not in the UK, but in the USA (CA, NY, FL). I guess it is
extremely unlikely that they arose independently of the British variant.
So, they must have been transported by air travel from the UK to the USA.
Collection dates of the US samples are: 19, 20, 24 and 29 December 2020.
The new variant was reported on 8 December in the UK (source). So, it spread within a few weeks to the US. Maybe earlier and very
likely there are far more than those five in the US.
The B.1.1.7 variant doesn't stop mutating. I already found additional
mutations in the US variant, for example: del A28271 shared by all 5, but not found in the reference virus Refseq NC_045512. I am busy checking
more.
Quite a lot of people in the Netherlands doubt whether the PCR test
detects SARS-CoV-2 at all and conclude there is no SARS-CoV-2 pandemic and
lockdown should be stopped immediately. Well, at this very moment the
NCBI database
contains 47,714 SARS-CoV-2 genome sequences. If this is no proof of the
existence of a SARS-CoV-2 pandemic, no evidence will be enough for those
people.
I will expand this blog when new information becomes available.
Latest news:
The new variant is roughly 50% more transmissible than other variants, and according to others 56% (Nature).
Update 10 Jan 2021
According to this publication there is a D614G mutation in the Spike protein of the B.1.1.7 variant, but this one is not listed in the table Fig. 1 above. I checked my five variants: indeed D614G is present! This is an interesting mutation, I will blog about later.
Note: all of the substitutions of the new variant in the list Fig. 1 are non-synonymous: they substitute one amino acid (AA) for another. This will change protein properties! I overlooked this important fact. Of course there are also synonymous mutations in the RNA; they do not change an Amino Acid.
Appendix: technical notes.
Fig. 3. Composite of nine mutations of B.1.1.7 (outside Spike) Second row: Refseq NC_045512.2 |
This completes the 17 mutations of the B.1.1.7 variant. The second row in Fig. 3 is the Refseq NC_045512.2 from Wuhan, Dec 2019. Remarkable: big deletion in ORF1ab. Row 3 and 5 have undefined bases at position 28280-28283 (letter N).
NC_045512 Reference genome SARS-CoV-2 Wuhan, China. 29903 bp. Dec 2019
Dec 2020. Gives table with all mutations of the B.1.1.7 variant used in Figure 1.
Science 5 Jan 2021
I will expand this blog when new information becomes available.
ReplyDeletekeep up the excellent work, Gert!
Harry, THANKS VERY MUCH!
ReplyDeleteI found the NCBI database a pleasure to work with. So, I continue to work with it :-)
Gert,
ReplyDeleteYour posts invite me to look better into the matter so I looked up if presence of the variant can be demonstrated by the PCR test.
Here are two links that may clarify how the (British) variant has spread. The deletion should become visible in PCR and is also referred to as "drop-out" of the S gene.
https://www.ecdc.europa.eu/sites/default/files/documents/SARS-CoV-2-variant-multiple-spike-protein-mutations-United-Kingdom.pdf
The authors of the following article also find a distribution of the variants in the USA as you mention in your post, but in two other states.
https://www.medrxiv.org/content/10.1101/2020.12.24.20248814v1.full.pdf
I am interested to understand better the phenomenon of drop-out. It seems that the PCR-test can make the difference between the older variant and the new variant. It seems worthwhile to study again all the PCR test that have been done in the past, but is that feasible?
This people, in particular one guy, is driving hardworking virologists and medical doctors crazy. He drives me mad. They try to deny the reality of the pandemic and haven’t payed attention to the start of the pandemic in North Italy, Bergamo, where the military transported the dead to crematory ovens because Bergamo had no place to burn their dead anymore. There where long lines of military trucks in the streets of Bergamo. The whole country mourned. 200 Italian medical doctors have died from the virus SARS-Cov-2 in that period.
The latest wave there have been about 100 dead doctors in Italy up till now.
Thank you for the interesting post, also Harry's link is extremely interesting. But you remember what Ron Fouchier said yesterday? The 2020 variant probably is the 'hottest', which should give hope for a less 'hot' variant.
Hi Marleen,
ReplyDeleteI am not sure about the drop out, it seems to be a problem for PCR tests? but I am not sure and did not study it yet. But google shows a lot of hits ... too much for me now...
Thanks for the valuable links, I am interested in the genetic, genomic and evolutionary aspects and I see interesting and helpful information. And I like to do hands-on work and discover things about the virus.
The humanitarian aspect: yes, I understand when you get mad of covid-19- and SARS-COV-2-deniers, especially if you have seen with your own eyes the human suffering that this virus brings. What you describe is no different from war scenes and nobody of our age have lived in a war or a real pandemic. The hope here is completely based on the vaccination program. But be careful: lockdown is the rule as long as not a significant proportion of the population is vaccinated.