Can the new British B.1.1.7 variant be found in the NCBI database?
Corona Update 7 Jan 2021
There is much talk these days about the new British SARS-COV-2 B.1.1.7 variant. Even top scientific journals like Science raise the alarm: "Viral mutations may cause another ‘very, very bad’ COVID-19 wave, scientists warn".
In the previous blogpost I explored the NCBI database. Can I find this
new variant in the NCBI database? How do I find it in a database of
nearly 32,000 SARS-CoV-2 nucleotide sequences? I
first tried the sequences from the UK, of course. But, amazingly, they
uploaded only 5 complete genomes and 60 proteins. None of them were
useful. How do I recognize the variant anyway? The new variant is
characterised by the unique combination of 17 mutations (Fig 1).
![]() |
Fig. 1. All 17 mutations of the British variant B.1.17 (source) Note: all substitutions are non-synonymous! |
The Spike protein enables entrance in to human cells. Very important protein. The standard length of the Spike protein is 1273 amino acids (AA). Since the Spike protein of the new variant has two deletions (together 3 amino acids or 9 bases), it has been shortened to 1270 amino acids. So, any Spike protein with 1273 AA can be eliminated from the search. So, for a start, I selected all SARS-CoV-2 Spike protein sequences with a length of 1270 AA. They did exist. I checked whether all 8 mutations were present. Fortunately, they were present in five sequences (Fig 2):
![]() |
Fig 2. B.1.1.7 variants - Spike protein (composite image) row 2,3,4,5,6 are B.1.1.7 variants. Others are controls. The numbers above the columns are sequence positions, Click to enlarge |
That's a promising start. But there are nine mutations in other genes. I entered the five Sequence IDs in the 'Accession' filter (with no
further filters). That results in whole genomes. I hit the button 'Align'.
I checked the presence of the remaining nine mutations one by
one in the rest of the virus. And, lo and behold, they all were found at
the exact locations predicted in the table (Fig.3). That means this
really is the new British B.1.1.7 variant. Big surprise: they were
captured not in the UK, but in the USA (CA, NY, FL). I guess it is
extremely unlikely that they arose independently of the British variant.
So, they must have been transported by air travel from the UK to the USA.
Collection dates of the US samples are: 19, 20, 24 and 29 December 2020.
The new variant was reported on 8 December in the UK (source). So, it spread within a few weeks to the US. Maybe earlier and very
likely there are far more than those five in the US.
The B.1.1.7 variant doesn't stop mutating. I already found additional
mutations in the US variant, for example: del A28271 shared by all 5, but not found in the reference virus Refseq NC_045512. I am busy checking
more.
Quite a lot of people in the Netherlands doubt whether the PCR test
detects SARS-CoV-2 at all and conclude there is no SARS-CoV-2 pandemic and
lockdown should be stopped immediately. Well, at this very moment the
NCBI database
contains 47,714 SARS-CoV-2 genome sequences. If this is no proof of the
existence of a SARS-CoV-2 pandemic, no evidence will be enough for those
people.
I will expand this blog when new information becomes available.
Latest news:
The new variant is roughly 50% more transmissible than other variants, and according to others 56% (Nature).
Update 10 Jan 2021
According to this publication there is a D614G mutation in the Spike protein of the B.1.1.7 variant, but this one is not listed in the table Fig. 1 above. I checked my five variants: indeed D614G is present! This is an interesting mutation, I will blog about later.
Note: all of the substitutions of the new variant in the list Fig. 1 are non-synonymous: they substitute one amino acid (AA) for another. This will change protein properties! I overlooked this important fact. Of course there are also synonymous mutations in the RNA; they do not change an Amino Acid.
Appendix: technical notes.
![]() |
Fig. 3. Composite of nine mutations of B.1.1.7 (outside Spike) Second row: Refseq NC_045512.2 |
This completes the 17 mutations of the B.1.1.7 variant. The second row in Fig. 3 is the Refseq NC_045512.2 from Wuhan, Dec 2019. Remarkable: big deletion in ORF1ab. Row 3 and 5 have undefined bases at position 28280-28283 (letter N).
NC_045512 Reference genome SARS-CoV-2 Wuhan, China. 29903 bp. Dec 2019
Dec 2020. Gives table with all mutations of the B.1.1.7 variant used in Figure 1.
Science 5 Jan 2021