04 May 2026

Physicist Charles S. Cockell: "DNA and its entourage". The Genetic Code is non-random.

https://wasdarwinwrong.com/blog/The-Equations-of-Life.png

Evolution is the transformation of species into slightly modified species. The origin of species is solved in principle. However, the origin of life is a fundamentally different problem. Darwin avoided discussing the origin of life. It is the hardest problem of biology and it still has not been solved yet. Whereas the details of evolution can be described as a puzzle, the origin of life must be characterized as an intractable problem.

A crucial step (although not the first step) in the origin of life was the invention of DNA and the Genetic Code. DNA has become a defining property of life. But how DNA acquired its meaning is still a mystery. How could a relatively chemically inert non-enzymatic molecule (DNA) become useful, even indispensable, for life? DNA itself could not have been involved in the origin of life (Think about this...). There must have been something before DNA. 

In the literature the Genetic Code is usually presented as a table in which all 64 combinations of 3 bases A, T, C, G are 'associated' with 20 Amino Acids. These associations could be made in many different ways. Nature got stuck to one system of associations called the Genetic Code table. In the book The Equations of Life: How Physics Shapes Evolution, Chapter 7 'The Code of Life', physicist Charles S. Cockell notices that the Genetic Code table is non-random. This is an important observation. The assignments of base triplets to Amino Acids (AAs) is non-random. When looking at the table It is immediately clear that there is a pattern. Secondly, there is redundancy: many Amino Acids are coded by more than one base triplet. But there is also a pattern. This is all known very well.

However, Cockell also notices that there is something special about nature's choice of the twenty Amino Acids. There are many more natural Amino Acids available than those twenty. So, why those twenty? Random? Accident? Or are they the most suitable for their task? He refers to a publication [1] that argues for a non-random choice. The authors reasoned that there are 3 properties of Amino Acids that are important for constructing a protein: (1) the size of an Amino Acid, (2) the charge, (3) hydrophobicity (repelling water) [3]. Together those properties determine how the protein behaves and what it can do. In principle proteins could be constructed from one or a few Amino Acids. But most useful proteins consist of a diverse mixture of amino acids. Proteins are defined by a unique sequence of AAs to fold into a complex 3D shape. Also, it is not useful to have many AAs with the same hydrophobicity, or the same size or the same charge. The best toolkit for life would have an even distribution of AA properties that does not overlap too much. To test for optimality the authors tested a set of fifty AAs found in the Murchison meteorite. The reason? They assumed that AAs found in the meteorite would represent the set of AAs found on the early Earth. What they found was astonishing, writes Cockell:

"When they compared the twenty amino acids used by life with a million alternative bundles of amino acids randomly chosen from the fifty in the meteorite, the twenty used by life had better coverage and combinations of all three of the key factors than did any other set. ... they seemed to be selected by evolution to give a wide range and even distribution of properties that might be useful in proteins." ... Of a much expanded set of seventy-six AA, not a single group out of a million possible alternatives outperformed the natural set.". (chapter 7). 

Cockell concludes that the twenty AAs used by life are not random. That was new to me. But there is one important aspect Cockell doesn't mention: the AAs must also be suitable to be attached to a transfer RNA (tRNA) and to be processed by the ribosome. This is a crucial property. It is the biochemical implementation of the Genetic Code table. There could be differences in suitability. This must be investigated. Furthermore, all AAs are associated with 1 or more triplet codons (redundancy). The question is: how is the association made between a base triplet codon and the AA? And how did that originate in the first place? The structure of tRNAs does not show a direct chemical bond between codon and AA. Is it a random choice? That could certainly be the case because AAs and triplet codons (bases A, T, C, T) are different chemical compounds, yet they are somehow connected. Or is there a logic in the associations? Is there a pattern? Much research has been done to solve this question. No definitive answer yet. Cockell does mention this. But at least he pointed out a new aspect of the origin of life and the Genetic Code table to me.  

 ______

In this blog I did focus on the origin of the Genetic Code. The origin of the Genetic Code is in fact the origin of DNA-based life: bacteria, animals and plants. It is also the origin of protein and enzyme based life. The origin of DNA and proteins are strongly intertwined. DNA on its own has no use and proteins can not exist without DNA. DNA cannot self-reproduce, it needs enzymes. But proteins cannot self-reproduce either. A specific protein consists of a unique sequence of Amino Acids. Unique proteins do not self-assemble spontaneously. The only way to reproduce such a unique sequence is on the basis of another unique sequence: the unique base sequence in DNA. In other words: DNA and proteins depend on each other. This not a promising situation to start life. Hence, the hypothetical RNA world was developed (which is not without its own problems!). Keep in mind: the origin of the Genetic Code is not the same as the origin of eukaryotes. Bacteria are also DNA- and protein-based life forms. All life on earth uses the same Genetic Code, including viruses.

Physicist Charles Cockell used the expression "DNA and its entourage" [2]. And that is a misleading description. I hope readers recognize this as DNA-centric thinking. The cell and the cellular machinery are not an "entourage"! It is an equally important part of the cell! DNA is not the master of the cell! The cell is not the servant of DNA!


Notes

  1. Gayle, Freeland (2011) Did evolution select a nonrandom "alphabet" of amino acids? Astrobiology 
  2. "An entourage is a group of attendants, assistants, or close associates who accompany and work for an important or famous person". 
  3. There exist another list of properties of AAs: polar versus non-polar; acidic versus basic. [5 May 2026]

 

Further Reading

  • Can AI simplify the alphabet of life?  Generative AI design yields functional proteins with only 19 amino acids. Science, 30 April 2026. The design functional bacterial proteins without the amino acid isoleucine.
  • Toward life with a 19–amino acid alphabet through generative artificial intelligence design.  Science, 30 April 2026. "no known free-living organism uses an alphabet of fewer than 20 amino acids. This raises a fundamental question: Can a viable cell be constructed with a reduced amino acid alphabet?" A statement against Cockell: "Computational protein modeling also indicates that as few as 9 to 12 amino acids could, in principle, encode all protein folds."
 

14 comments:

  1. Have you seen a paper about the biological code:

    shCherbak, V. I., & Makukov, M. A. (2013). The “Wow! signal” of the terrestrial genetic code. Icarus, 224(1), 228-242.

    ReplyDelete
  2. Dear dr Evgenii Rudnyi, thanks for this publication. I can't remember I have seen it before. It is a very unusual publication. Do you think that our genetic code on earth is manipulated by aliens? I can't read the complete publication, do you know if they reveal what the cryptic message is? I really want to know what the message is.
    - if our Genetic Code is truly universal (valid for the entire cosmos), due to universal natural laws, than it seems there is no room for manipulation (?).
    - if our Genetic Code was manipulated by an alien intelligent civilization, then is their own Genetic Code also manipulated by yet another intelligent civilization? etc, etc. > infinite regress.

    ReplyDelete
    Replies
    1. There are some more papers from Shcherbak on this issue. What they have found is that there are many interesting things in the coding table that defines the connection between DNA and proteins. For example, one can find there a Pythagorean triangle: 3^2 + 4^2 = 5^2.

      Shcherbak was an atheist, but I have read this from a blog of a Russian Christian who believes that one can indeed find something more important that just a Pythagorean triangle. Yet, for me a Pythagorean triangle in the coding table is already a nice thing.

      I did not know how to upload a picture in the reply. So please find a picture this way:

      https://drive.google.com/file/d/1yZHCjhSjt4lXYDh2cmFwKkJVNB4nf0ko/view?usp=sharing

      Delete
    2. I have found that he has made translation into English. See the page about a Pythagorean triangle:

      https://gospelinthecode.livejournal.com/6536.html

      Delete
  3. Gert, how interesting to read your blog about physics of life.
    I read some research of Charles Cockell, he is a astrobiologist and physicist. From ancient geological data about the oxygenation of the atmosphere he calculated the power of UV B and UV C radiation. Around 1,500 million years ago a thick ozon layer was formed in a relatively short time. This resulted into a strong decrease of the harmful UV B and UV C radiation reaching the earth surface. This radiation level was decreased to one thousandth compared to the period before that, since the formation of the Earth (Cockell, 2000).
    I think: this strong decrease opened the doors for multicellulair life on land (see my book, p. 315-316).

    And now to my surprise you published this blog.
    I am wondering: does this all (especially your critcs against the gene-centered view) change your ideas about neo-darwinism as explanation for the evolution of life?

    Ref. Cockell, Charles S., ‘The ultraviolet history of the terrestrial planets – implications for biological
    evolution’, Planetary and Space Science 48, 2000, p. 203-214,

    ReplyDelete
  4. dear dr Korthof

    this might interest you to anticipate R Barth's question a little bit:


    The structure of the SGC is nonrandom and ensures high robustness of the code to mutational and translational errors. However, this error minimization is most likely a by-product of the *primordial code expansion driven* by the *diversification of the repertoire of protein amino acids*, rather than a direct result of selection.

    Origin and Evolution of the Universal Genetic Code Eugene V. Koonin1 and Artem S. Novozhilov2 Vol. 51:45-62 (Volume publication date November 2017) https://doi.org/10.1146/annurev-genet-120116-024713

    ReplyDelete
  5. Hi Rolie, thanks for your comment. I checked the pages in your book. I especially liked note 13, page 316, about UV protection by mycosporines and scytomenine in Cyanobacteria. They are photosynthesizers, so need to be close to the water surface to get enough sunlight. The associated disadvantage is UV damage to cell components. This explains the presence of mycosporines and scytomenines! Nice example of adaptation by natural selection! I wonder: if the first animals and plants would contain also these UV protectives, then they could withstand UV radiation and could have conquered the land long before a protective Ozonlayer was established...
    Rolie: "to my surprise you published this blog": please note I have for a very long time a section 'Engineering, physics and evolution'
    https://wasdarwinwrong.com/korthof.htm#engineering
    in which I collect evolution books by physicists and engineers!

    Rolie: "...especially your critics against the gene-centered view..."
    My primary source of DNA-centrism and its failures was my years-long engagement with Senapathy's book. Furthermore, Nick Lane's OXYGEN has had a tremendous influence on my thinking: there is more to the evolution of life on earth than genes!

    Cockell: So far I only read chapter 7: 'The Code of Life'. What he did in that chapter has little to do with physics, and far more with inorganic and organic chemistry. He seems to overlook the role of natural selection, and seems to explain the nonrandomness of the Genetic Code table mainly by physical forces, which I disagree with.

    ReplyDelete
  6. Dear Dr Anonymous, thanks for the Koonin 2017 publication.
    Please note in the above publication The “Wow! signal” of the terrestrial genetic code, there is also a reference to a very similar Koonin publication:
    'Origin and evolution of the genetic code: the universal enigma', 2009.
    Please note I have blogged about the Koonin threshold for the origin of life:
    Hoe Koonin het ontstaan van het leven verklaart: moedige poging of wanhoopsdaad? 2012.
    Please note the lively discussion on this blog in the comments (among others a contributor you may know: harry pinxteren.)
    and an updated version on my website:
    The Koonin threshold for the Origin of Life on Earth, 2013.

    ReplyDelete
  7. Dear Dr Korthof

    I skipped the 2009 (original 2008 https://doi.org/10.48550/arXiv.0807.4749 ) version because Koonin and Novozhilov dropped the * universal enigma* in their version of 2017, anticipating dr Barth’s question with their conclusion I quoted above.

    I’ve only a vague idea what they could mean by *diversification of the repertoire* , but I think I have an interesting lead that contributes substantially to your new paradigm:

    By identifying 14 distinct structural states of nucleosomes, “DNA spools”, Yang et al revealed a sophisticated “organizational code” that allows cells to fine-tune gene activity with incredible precision. Nature
    DOI:10.1038/s41586-026-10418-6

    The research counts as an AI Breakthrough: IDLI (Iteratively Defined Lengths of Inaccessibility) uses two-dimensional scanning, analyzing both the length of the DNA fiber and the internal structure of individual nucleosomes, to detect subtle distortions that previous technologies missed. Using a technology called SAMOSA to map DNA molecules, the AI was then trained to recognize patterns in the “accessibility” data. If a nucleosome was missing a building block or was loosely bound, the AI detected a specific “signature” of exposed DNA that shouldn’t be there in a “perfect” spool.

    In short: AI Uncovers a Hidden “Grammar” in DNA Packaging that cells use to flexibly regulate gene expression

    viz. IDLI showed some "interesting *diversification of the repertoire* of proteins, *rather than a direct result of selection*" , and thus substantiated the claims you made in the preceding blogs.

    So, I 'd say we can undeniably detect progress, I think, at least since discussing Koonin, years ago.

    As an instructive aside, please note:

    IDLI uses the very same techniques that only a few days ago gave R Dawkins his own personal “Claude delusion”, when he had a couple of talks with his personalized chatbot ("Claudia"): “These intelligent beings are at least as competent as any evolved organism,” (sic) https://www.theguardian.com/technology/2026/may/05/richard-dawkins-ai-consciousness-anthropic-claude-openai-chatgpt
    note that AIbots are particularly notorious for their sycophancy.

    ReplyDelete
  8. Dear dr Anonymous,
    thanks for your reply. I focus on:
    "this error minimization is most likely a by-product "
    Is it really only a 'byproduct' when it has beneficial effects?
    Organisms with translation error minimization would have an advantage over those with less error minimization, isn't it? Then 'error minimization' is subject to natural selection, isn't it?. It is quite possible, even reasonable, that more than one feature is subject to natural selection.

    I need more time to digest your further remarks.

    I appreciate pointing out publications and data that support the anti DNA-centric view. For example, I read in Nature: "DNA sequences that control gene expression":
    this is an ambiguous expression: do those sequences really 'control' gene expression or do they f.e. have predictable effects on gene expression? Are they just one factor in a complex system? Proteins are involved, proteins do the work. If X controls Y, it suggests X is the only cause of Y and there is a direct connection between X and Y.
    So, if you or any reader encounters these expressions, I would be interested.

    ReplyDelete
  9. Evgenii Rudnyi, thanks for your comments. I continue here on the bottom of the comments so readers easily see that these are the latest comments.
    The Genetic Code table is represented in letters for the 4 bases and the 20 AA. How do you get from letters to numbers? How do you assign numbers to letters? If there are many ways to assign numbers to letters, how do you choose 'correct' one?

    ReplyDelete
    Replies
    1. Numbers are molecular masses of amino acid residues.

      Delete
    2. Evgenii Rudnyi, thank you for your answer. You say "amino acid residues" meaning "the part of an amino acid that remains after it has been incorporated into a polypeptide chain (protein), following the elimination of a water molecule during peptide bond formation." and not the amino acid in solution? Why this choice?
      Secondly, I am not a chemist, but I found on Wikipedia:
      Molar mass Cysteine 121.15 g·mol−1
      Molar mass water 18.01528(33) g/mol
      These are decimals numbers, having digits after the decimal point (I guess there are many more decimals). What does the author of the paper do with them?
      Thirdly, does he use the molar masses of all the 20 AAs?

      Delete
  10. Fine tuning of the cosmological and physical constants: interesting news for readers interested in fine tuning:

    Scientists make stunning discovery that could change our understanding of the Universe, Sciencedaily.com 8 April.

    Constraints on fundamental physical constants from bio-friendly viscosity and diffusion, Science 23 Aug 2023.

    ReplyDelete

Comments to posts >30 days old are being moderated.
Safari causes problems, please use Firefox or Chrome for adding comments.