07 January 2021

Finding the highly transmissible British SARS-CoV-2 B.1.1.7 variant in the USA

Can the new British B.1.1.7 variant be found in the NCBI database?


Corona Update 7 Jan 2021



There is much talk these days about the new British SARS-COV-2 B.1.1.7 variant. Even top scientific journals like Science raise the alarm: "Viral mutations may cause another ‘very, very bad’ COVID-19 wave, scientists warn". 

In the previous blogpost I explored the NCBI database. Can I find this new variant in the  NCBI database? How do I find it in a database of nearly 32,000 SARS-CoV-2 nucleotide sequences? I first tried the sequences from the UK, of course. But, amazingly, they uploaded only 5 complete genomes and 60 proteins. None of them were useful. How do I recognize the variant anyway? The new variant is characterised by the unique combination of 17 mutations (Fig 1).

Fig. 1. All 17 mutations of the British variant B.1.17 (source)
Note: all substitutions are non-synonymous!

The Spike protein enables entrance in to human cells. Very important protein. The standard length of the Spike protein is 1273 amino acids (AA). Since the Spike protein of the new variant has two deletions (together 3 amino acids or 9 bases), it has been shortened to 1270 amino acids. So, any Spike protein with 1273 AA can be eliminated from the search. So, for a start, I selected all SARS-CoV-2 Spike protein sequences with a length of 1270 AA. They did exist. I checked whether all 8 mutations were present. Fortunately, they were present in five sequences (Fig 2):

Fig 2. B.1.1.7 variants - Spike protein (composite image)
row 2,3,4,5,6 are B.1.1.7 variants. Others are controls.
The numbers above the columns are sequence positions,
Click to enlarge

That's a promising start. But there are nine mutations in other genes. I  entered the five Sequence IDs in the 'Accession' filter (with no further filters). That results in whole genomes. I hit the button 'Align'. I checked the presence of the remaining nine mutations one by one in the rest of the virus. And, lo and behold, they all were found at the exact locations predicted in the table (Fig.3). That means this really is the new British B.1.1.7 variant. Big surprise: they were captured not in the UK, but in the USA (CA, NY, FL). I guess it is extremely unlikely that they arose independently of the British variant. So, they must have been transported by air travel from the UK to the USA. Collection dates of the US samples are: 19, 20, 24 and 29 December 2020. The new variant was reported on 8 December in the UK (source). So, it spread within a few weeks to the US. Maybe earlier and very likely there are far more than those five in the US.

The B.1.1.7 variant doesn't stop mutating. I already found additional mutations in the US variant, for example: del A28271 shared by all 5, but not found in the reference virus Refseq NC_045512. I am busy checking more.

Quite a lot of people in the Netherlands doubt whether the PCR test detects SARS-CoV-2 at all and conclude there is no SARS-CoV-2 pandemic and lockdown should be stopped immediately. Well, at this very moment the NCBI database contains 47,714 SARS-CoV-2 genome sequences. If this is no proof of the existence of a SARS-CoV-2 pandemic, no evidence will be enough for those people.


I will expand this blog when new information becomes available.

Latest news:

The new variant is roughly 50% more transmissible than other variants, and according to others 56% (Nature).

Update 10 Jan 2021

According to this publication there is a D614G mutation in the Spike protein of the B.1.1.7 variant, but this one is not listed in the table Fig. 1 above. I checked my five variants: indeed D614G  is present! This is an interesting mutation, I will blog about later.

Note: all of the substitutions of the new variant in the list Fig. 1 are non-synonymous: they substitute one amino acid (AA) for another. This will change protein properties! I overlooked this important fact. Of course there are also synonymous mutations in the RNA; they do not change an Amino Acid.

Appendix: technical notes.

Figure 2 consists of 8 different screenshots of 8 different positions along the sequence of the Spike protein. The positions are too far apart to capture them in one screenshot. 

Below the screenshot of the 9 mutations in the rest of the B.1.1.7 sequence:

Fig. 3. Composite of nine mutations of B.1.1.7 (outside Spike)
Second row: Refseq NC_045512.2

This completes the 17 mutations of the B.1.1.7 variant. The second row in Fig. 3 is the Refseq NC_045512.2 from Wuhan, Dec 2019. Remarkable: big deletion in ORF1ab. Row 3 and 5 have undefined bases at position 28280-28283 (letter N).

The five B.1.1.7 sequences are loaded by this link in the NCBI database.  

MW422256 29817 bp

MW422255 29763 bp

MW430974 29861 bp

MW430966 29835 bp

MW440433 29792 bp

Reference Surface glycoprotein: YP_009724390 1273 aa (listed in Fig 2) 

NC_045512 Reference genome SARS-CoV-2 Wuhan, China. 29903 bp. Dec 2019



Preliminary genomic characterisation of an emergent SARS-CoV-2 lineage in the UK defined by a novel set of spike mutations Dec 2020. Gives table with all mutations of the B.1.1.7 variant used in Figure 1.

Viral mutations may cause another ‘very, very bad’ COVID-19 wave, scientists warn  Science 5 Jan 2021

Report 42 - Transmission of SARS-CoV-2 Lineage B.1.1.7 in England: insights from linking epidemiological and genetic data, posted 4 Jan 2021 on medrxiv.org


  1. I will expand this blog when new information becomes available.

    keep up the excellent work, Gert!

    I found the NCBI database a pleasure to work with. So, I continue to work with it :-)

  3. Gert,

    Your posts invite me to look better into the matter so I looked up if presence of the variant can be demonstrated by the PCR test.

    Here are two links that may clarify how the (British) variant has spread. The deletion should become visible in PCR and is also referred to as "drop-out" of the S gene.


    The authors of the following article also find a distribution of the variants in the USA as you mention in your post, but in two other states.


    I am interested to understand better the phenomenon of drop-out. It seems that the PCR-test can make the difference between the older variant and the new variant. It seems worthwhile to study again all the PCR test that have been done in the past, but is that feasible?

    This people, in particular one guy, is driving hardworking virologists and medical doctors crazy. He drives me mad. They try to deny the reality of the pandemic and haven’t payed attention to the start of the pandemic in North Italy, Bergamo, where the military transported the dead to crematory ovens because Bergamo had no place to burn their dead anymore. There where long lines of military trucks in the streets of Bergamo. The whole country mourned. 200 Italian medical doctors have died from the virus SARS-Cov-2 in that period.
    The latest wave there have been about 100 dead doctors in Italy up till now.

    Thank you for the interesting post, also Harry's link is extremely interesting. But you remember what Ron Fouchier said yesterday? The 2020 variant probably is the 'hottest', which should give hope for a less 'hot' variant.

  4. Hi Marleen,
    I am not sure about the drop out, it seems to be a problem for PCR tests? but I am not sure and did not study it yet. But google shows a lot of hits ... too much for me now...
    Thanks for the valuable links, I am interested in the genetic, genomic and evolutionary aspects and I see interesting and helpful information. And I like to do hands-on work and discover things about the virus.

    The humanitarian aspect: yes, I understand when you get mad of covid-19- and SARS-COV-2-deniers, especially if you have seen with your own eyes the human suffering that this virus brings. What you describe is no different from war scenes and nobody of our age have lived in a war or a real pandemic. The hope here is completely based on the vaccination program. But be careful: lockdown is the rule as long as not a significant proportion of the population is vaccinated.


Comments to posts >30 days old are being moderated.
Safari causes problems, please use Firefox or Chrome for adding comments.