13 September 2023

The true history of junk DNA (2)

Francis Crick
What Mad Pursuit.
paperback 1988

"The originator of the central dogma, Francis Crick, was well aware of genes that didn't encode protein. They don't figure into the central dogma." (Laurence Moran (2023) What's in your genome, chapter 8 paragraph 'Revising the central Dogma?') (my bold)

Exactly: They don't figure in the Central Dogma! That is precisely the problem! Crick omitted noncoding DNA from the Central Dogma. Had he included it in his scheme, a lot of confusion could have been prevented.

Central Dogma, from: Francis Crick, What Mad Pursuit, page 168.

Crick could have added an arrow from RNA to for example 'RNA genes'. He did not.

RNA genes added to Central Dogma (©GK)

In this blog I want to explore possible reasons for this omission. They have to do with the historical scientific context of the time that Crick proposed his central dogma. I hope this will show that scientists misinterpreting the Central Dogma are not fools and that Crick himself overlooked non-coding DNA when drawing his Central Dogma diagram. But first a second quote from Laurence Moran:

"Many scientists have a very different view of the central dogma. They were taught, incorrectly, that the real meaning of the central dogma is that DNA makes RNA makes protein and the only function of DNA is to encode protein ... They were somehow led to believe that there was only one kind of gene, namely, protein-coding genes." (Moran, 2023, What's in your genome, Chapter 8) (my bold)

Well, it is certainly not a mystery why many scientists were led to believe that there was only one kind of gene: there is only one kind of gene in Crick's illustration of the Central Dogma.

Why did Crick not add RNA genes to his diagram? It is important trying to understand the historical context at the time that Crick proposed his Central Dogma. Traveling back in time is not easy, therefore I use Crick's own account in What Mad Pursuit.

The central problem of biology at the time was: How could genes possibly construct all the elaborate and beautifully controlled parts of living things? It was known that each chemical reaction in the cell was catalyzed by enzymes. This is a defining property of life on earth. Furthermore, it was known before 1953 that enzymes are proteins. Crick realized that the key problem in biology was to explain how proteins were synthesized. In the 1940s a very influential hypothesis was proposed, the 'One gene - one enzyme' hypothesis. The next question was: How do genes control the synthesis of proteins? (Chapter 3 The Baffling Problem, page 33). Obvious today, but at the time it was a problem at the frontiers of science. Further, it was also known at the time that proteins were made of about 20 different amino acids. 

After the discovery that DNA consisted of a sequence of bases, the next question emerged: what is the precise relation between genes and proteins? Crick proposed the Sequence hypothesis: the sequence of bases in DNA is a necessary and sufficient condition for the sequence of amino acids in proteins. Crick:

"Rereading it, I see that I did not express myself very precisely, since I said "...it assumes that the specificity of a piece of nucleic acid is expressed solely by the sequence of its bases, and that is sequence is a (simple) code for the amino acid sequence of a particular protein." This rather implies that all nucleic acid sequences must code for protein which is certainly not what I meant." (Francis Crick, What Mad Pursuit, Chapter 10, page 108).

Then Crick explains that other parts of the DNA sequence could be used for control mechanisms (today: gene regulation) and he even mentions producing RNA for purposes other than coding (today: RNA genes). Crick concluded: "I don't believe anyone noticed my slip, so little harm was done." (page 109). Unfortunately, Crick underestimated the long lasting influence of the famous Central Dogma diagram.

According to Moran the meaning of the Central Dogma diagram is that the information in proteins cannot get out again. That, indeed, is what Crick himself says (page 109). Unfortunately, the Central Dogma diagram is a weird way to illustrate the non-existence of a specific type of information flow. It is as if one wants to illustrate the absence of something with the absence of something in an illustration. It isn't manifest. It seems rather impossible to me to do that [2].

In my view the point of the Central Dogma was to illustrate (albeit in a partial way) the solution of the central question of the time and indeed of all times: how can genes specify proteins? Crick himself expressed this clearly:

"I shall… argue that the main function of the genetic material is to control (not necessarily directly) the synthesis of proteins." [1] (my bold)

The Sequence hypothesis isn't a hypothesis anymore, and it isn't at the frontiers of science anymore, but 'The Sequence' is still and will always be one of the defining characteristics of life on earth [3]. This is certainly not an outdated idea from the pas. Life as we know it is impossible without enzymes (=sequences) and without genes (=sequences) coding for them. 

The 'protein universe' is very much at the frontiers of science, new protein structures are discovered today [4].



Appendix (1)

All the following concepts are about protein synthesis:  
  1. Mendelian genes specify discrete phenotypic characters (with the benefit of hindsight).
  2. The Sequence Hypothesis states that the sequences of DNA bases specify the sequence of amino acids in proteins. (Chapter 10 Theory in Molecular Biology)
  3. The Central Dogma states the direction of flow of information from DNA to RNA to protein and not back from proteins (chapter 10)
  4. The Genetic Code Table specifies which 61 DNA base triplets which amino acid ('sense' codons) and 3 base triplets which specify STOP chain ('nonsense' codons) Chapter 8 and Appendix B.
  5. The Adaptor Hypothesis  (Crick, Chapter 8 page 95) (the implementation of the Genetic Code in specific molecules: tRNA) doesn't make sense without protein synthesis.


Appendix (2)

The terminology used to describe genes makes only sense (!) in relation to protein synthesis: 

  • sense, nonsense, missense
  • sense, anti-sense strand
  • positive-sense, negative-sense 
  • coding strand, template strand
  • coding, noncoding
  • translation
  • STOP/START codons
  • the Genetic Code Table
  • triplets
  • in-frame/out of frame
  • ORF: Open Reading Frame
  • mRNA: messenger DNA
  • tRNA: transfer RNA
  • rRNA: ribosomal RNA

Also, the concepts: promoter (DNA sequence to which proteins bind) and enhancer (DNA sequence to which specific proteins bind) make only sense (!) in the context of protein synthesis, direct or indirectly, because they promote or enhance gene expression of protein-coding genes (mainly). Using these concepts implies protein synthesis on the basis of DNA sequences.  

However, there are concepts not (directly) related to protein synthesis: base pairing, double helixtranscription, directionality, replication.

Appendix (3)

I wonder whether there is a total absence of any coding signature in non-coding RNA genes. I found it difficult to find clear information about it. For example: do START and STOP codons occur in RNA genes? If so, do they have any effect? Do non-coding RNA genes have a triplet structure? Do single base deletions or insertions have similar effects on RNA genes as on protein-coding genes? (they don't disturb the reading frame). Are there functional RNAs completely independent and unrelated to protein (synthesis)? How did RNA genes originate? Did those they originate from coding sequences or from random sequences?


  1. Matthew Cobb (2017) 60 years ago, Francis Crick changed the logic of biology, PLOS BIOLOGY. Please note "(not necessarily directly)", this is a very ingenious way of including the indirect way of controlling protein synthesis: via enhancers and promoters. (added 22 sep 2023).
  2. Elsewhere Crick designed another diagram which prevents the dilemma of illustrating the absence of something, see: Larry Moran (2007) Basic Concepts: The Central Dogma of Molecular Biology blog.
  3. See for example my review of Tibor Gánti (2003) 'The Principles of Life'. 
  4. ‘A Pandora’s box’: map of protein-structure families delights scientists,  Nature 13 Dec 2023.


Previous blogs