31 July 2023

The true history of junk DNA

"By the late 1960s, knowledgeable scientists were used to the idea that genes occupied only a small part of the genome, and in 1974 the editor of the journal Cell, Benjamin Lewin, was expressing the consensus view of the experts when he wrote that the C-value Paradox could be resolved by assuming that much of the genome is composed of nonfunctional repetitive DNA (junk DNA)." (Chapter 2 of Laurence Moran (2023) What's in your genome.)

It may be that 'knowledgeable scientists' in the late 1960s knew that much of the genome is composed of junk DNA, but the 'consensus view' was not widely known in all sub-disciplines of the biological research community. Maybe the journal Cell was not read in the evolutionary biologist community. Probably, those experts were experts in a different field with its own journals and conferences.

Eli C. Minkoff (1984) Evolutionary Biology.

I checked the oldest textbook I have, Minkoff (1984) Evolutionary biology. There is no 'junk DNA' and no 'non-coding' DNA in the index, despite the 'consensus view'. Yes, there are tRNA and rRNA (p.16), but these RNAs are not labeled as 'non-coding RNA' or 'non-coding DNA'. They are in the business of producing proteins. They are the very embodiment of the genetic code. Therefore, it would be somewhat counterintuitive to call them 'non-coding'. Yes, there is 'genetic drift', neutral mutations, 'neutralism versus selectionism', 'genetic load', 'mutational load' in his book, but Minkoff did not connect these concepts with 'non-coding DNA'. The concept is absent anyway. 'Centromere' is mentioned once casually (p.19), I could not find 'telomere'. Anyway, 'centromere' is not labeled as 'non-coding DNA'. Why is non-coding DNA absent from the book? 

I think I found part of the answer in the following passage:

"One of the fundamental tenets of modern synthetic theory of evolution is that natural selection operates on the phenotype rather than the genotype. No genetic change can be influenced by natural selection unless it first produces some phenotypic change. It is largely for this reason that modern evolutionary biologists must be aware of the manner in which phenotypes are controlled." (p.114)
This was an eye-opener for me. The phenotype is the most important, the genotype is important only in so far it has an effect on the phenotype. Who cares about DNA that does nothing? Evolutionary biology has the task of explaining the organism.

A second foundational paradigm I found here:

"Proteins are among the most important of all biological molecules. (...) The great intricacies of living systems are all the result of enzyme-controlled activities (...) enzymes are therefore the chemical basis of life". (p.17).

Taken together these two principles explain the mindset of evolutionary biologists in those days. If they did know about non-coding DNA, it simply had no relevance to the goals of their daily research. On page 37 there is a table labeled as 'The Genetic Code for Translation of mRNA Codons into Amino Acid sequences'. The famous table. It makes sense in this context, because the Genetic Code is the link between DNA and proteins. The reason for the existence of the Genetic Code is to produce proteins. The Watson-Crick structure of DNA plus the chemical structure of the four bases is present in the book. Minkoff knows the necessary biochemistry. Unfortunate exception: introns and splicing are absent! Introns were discovered in 1977.

There is one isolated and thus mysterious remark which vaguely suggests something like 'non-coding DNA':

"Not all of the genotype is transcribed and translated into a portion of the epigenotype, nor are all the transcribable genes ever transcribed at the same time." (p.114) ['epigenotype' = "the polypeptides that result from the immediate transcription and translation of the genotype"]

That's all. Probably, Minkoff was vaguely aware of non-coding DNA. But why include it in his textbook? He did not elaborate the concept because in his opinion it was simply not relevant or nothing was known about it. DNA which is not transcribed and translated has nothing to contribute to the phenotype of the organism, consequently nothing to biology and evolution. It doesn't fit in the evolutionary biology paradigm of that time [1].

So, that is the 'true history' of non-coding DNA based upon Minkoff (1984) and that was taught to biology students at that time. He did not say that non-coding equals junk, but by omitting non-coding DNA, he implied that non-coding DNA is unimportant. If one makes statements about the history of junk DNA, one has to investigate the evolutionary biology textbooks, especially older ones. Minkoff was an eye-opener for me. I checked more evolutionary biology textbooks: 8 out of 17 do not have 'non-coding DNA' in the index.

"As Sandwalk readers know, there was never a time when knowledgeable scientists said that all non-coding DNA was junk. They always knew that there was functional DNA outside of coding regions." (Sandwalk)

I think one has to take into account that there are different scientific disciplines with their own paradigms, leaders, journals, conferences, and networks. 

Thanks for reading. Have a nice day!



  1. On page 18 he writes: "there are other sequences in each DNA molecule that do not appear to determine the amino acid sequence of any polypeptide. Some of these may function as "spacers", and others are believed to function as regulatory genes, which control the transcription of other genes." (page 18, chapter 2: Basic Principles of Genetics). Here he describes non-coding regulatory genes! He doesn't realize that these non-protein-coding DNA sequences must have indirect effects on the phenotype, and consequently are important for evolutionary biology! In the subsequent development of evolutionary biology, the evolutionary importance of regulatory genes became evident.  Added: 21 Aug 2023


Previous posts

  1. Junk DNA in the Evolution textbooks (2) from 1996 to 2023 26 Jul 23
  2. Junk DNA in the evolution textbooks. Bergstrom and Dugatkin 2023 12 Jul 23
  3. Periannan Senapathy (1994) claimed that the human genome consists of more than 90% junk DNA. 4 Jul 2023
  4. Scientists say: 90% of your genome is junk. Have a nice day! Biochemist Laurence Moran defends junk DNA theory 26 Jun 23


  1. Gert, thanks for your research on the history of junk DNA.
    After reading the four blogs about this subject I still have some questions:
    1. is it scientifically spoken correct to label some peace of nature as junk, rubbish, rommel in het Nederlands?
    2. what I understand of these DNA parts is that we don't know the function is it. Not knowing function is not the same as not existing function.
    3. beside encoding biological information (like a amino acid chain) there could be some physical functions like stability of the superfolded DNA-string. Marleen referred to that in one of her posts.
    4. and as said by others: in the past the large non-coding DNA might have been a pool for making new genes.

    So, one of my suggestions would be to search for physical functions of DNA-strings. It would not supprise me that someone has investigeated the stability and readability of DNA organised in nucleosomen, chromatine threads and chromosomes.

  2. Hi Rolie!
    1) 'junk' is not a scientific concept, but is widely used, nine evolution textbooks and many other evolution books use the term, so you better know it. General advice: why would an onion require nearly 5 times as much DNA compared to humans? Is all of the Onion DNA really functional?
    Onion has a haploid genome size of 15.9 Gb which is 4.9x as much DNA as does a human genome: 3.2 Gb.
    Onion needs 5x time as much genes as humans???

    There are other theoretical reasons that not every piece of DNA in a genome of 3,2 GB can be functional: such a huge amount of information cannot be maintained by natural selection; it will be slowly destroyed by mutation; mutation frequency must be extremely low or DNA-repair must be extremely accurate.

    2) "Not knowing function is not the same as not existing function". Is correct. However, take the ALU repeat: 1 million ALU copies in the human genome: is it reasonable to ascribe a function? Furthermore, we know the origin of most of such sequences: copy&paste mechanism just like viruses. Or: Selfish DNA: it makes copies of itself just because it can.

    to be continued.

  3. (continued)
    "3. beside encoding biological information (like a amino acid chain) there could be some physical functions like stability of the superfolded DNA-string. Marleen referred to that in one of her posts."

    For example, there is something like 'spacer DNA': that is a sequence with the only important property is its length, not a specific sequence of bases.
    However: 90% of 3,2 GB is 2,88 GB = non-coding DNA ! That amount of DNA cannot be explained by sequences that have physical properties...!?

    "4. and as said by others: in the past the large non-coding DNA might have been a pool for making new genes."
    That happens indeed in evolution. Sometimes. But 2,88 GB would be a huge burden with uncertain and relatively small benefits in the future of the species. The problem: selection acts on genomes of today, not on those of the future.

    "It would not supprise me that someone has investigeated the stability and readability of DNA organised in nucleosomen, chromatine threads and chromosomes. "
    Yes, for example: https://en.wikipedia.org/wiki/Telomere
    these structure on every end of chromosomes protect the ends of chromosomes from degradation. They consists of repetitive nucleotide sequences. But when added to the rest of functional non-coding DNA it does not come close to 2,88 GB ...
    I hope this info helps?

  4. Thanks Gert, this makes things a bit more clear.
    Yet, with my question (3) I mentioned something else.
    I found a paper about this:
    2018 - Exploring the relationship between intron retention and chromatin accessibility in plants.
    What if relatively long length of the intron is a prerequisite for the smooth reading of the whole gene and thus getting the information from te exons?
    And isn't intron retention itself not pointing to biological functions of the long introns?
    As a physicist I am not familiar with these things, so may be you can get it more clear to me?

  5. Rolie, you ask really good questions! I didn't encounter 'intron retention' before, and it is not in Laurence Moran What's in your genome.
    In wikipedia Alternative splicing it is a (rare) form of alternative splicing and I found here a good illustration of the process of intron retention:
    in the article:
    The changing paradigm of intron retention: regulation, ramifications and recipes.
    It remains to be seen whether there a beneficial cases of intron retention (IR) in animals, and whether IR are constitutive or facultative, whether IR is conserved in evolution (so is identical in humans and chimps) or that in most cases IR causes disease.
    A requirement for a beneficial Intron Retention event is that the intron must have in frame codons perfectly aligning with the triplet codons of the exon.
    I wonder what kind of protein would produced, by IR and how such introns could be useful parts of a protein. It seems like inserting a random piece of protein in to an existing protein... Very interesting stuff!
    I suspect Moran would claim (if he knew IR) that it is simply noise :-)
    Thanks for your question!

  6. Intron Retention is also known as: Exonization, the creation of a new exon from an intron as a result of mutations in introns.

  7. Bedankt Gert voor het uitzoeken.
    Exonisation betekent dus dat introns een bron van nieuwe exons, en dus genen of delen van genen kunnen zijn.
    Ik heb nu op mijn leeslijstje:
    2000 - The Correlation Between Intron Length and Recombination in Drosophila
    2015 - Introns - The Functional Benefits of Introns in Genomes

    Nog niets gevonden over de mogelijke impact van introns op de fysische mogelijkheden of beperking bij structuurvorming van chromatidedraden, chromosomen etc.

  8. Hi Rolie,
    "Introns: The Functional Benefits of Introns in Genomes" is very interesting. Larry Moran disagrees with this sort of thinking. He has several arguments why introns are mostly junk. One of them is that experimentally removing (some) introns does no harm to the organism. On the whole Moran is not interested in examples of the benefits of introns, because in the Big Picture these beneficial functions of junk contribute only a tiny percentage to the whole genome. So, it does not change The Big Picture. However, for Evolutionary biologists the examples of treasures in junk are important and are simply the core business of Evolution theory: explaining how evolution, DNA makes humans and chimps, etc.

    By the way: today I found a lovely passage in his book "This evidence suggests that there has been selection for removing excess junk DNA from these introns in order to speed up gene expression." (chapter 6).
    I added it as note 12 to my original blog of the book. Here he admits there are metabolic costs to junk DNA!

    Some years ago I wrote about the creationist vision of introns:
    Wat doen Junker & Scherer met introns? (Dutch).

    Have a nice day!

  9. Hi Gert, it took me some time to read some papers about the function of introns. Below a summary:
    1. There is correlation between recombination rate (bp/generation) and the length of introns, longer introns have smaller rate.
    2000 - The Correlation Between Intron Length and Recombination in Drosophila.
    Also see: 2001 - Why do genes have introns - Recombination might add a new piece to the puzzle.
    2. Introns influence the rate of transcription. Genes with less and/or shorter introns are read faster, quite logical I would say.
    2008 - Rapidly regulated genes are intron poor
    3. Patterns of intron architecture, 2009 - Patterns of exon-intron architecture variation of genes in eukaryotic genomes. This paper reports that for many species (including humans) the intron length is at a maximum for a GC-content of 40%, when GC-content is larger or smaller then introns are shorter.
    4. An other pattern described in the same paper: in many genes the first intron is the longest. Downstream intron length decreases rapidly with ordinal number to less than one third of the first intron length.
    5. I assume such patterns don’t arise randomly and therefore they point to some mechanism. The most obvious one is natural selection. But isn’t NS related to biological function?
    6. A difficult but intriguing paper is: 2022 - Gene architecture directs splicing outcome in separate nuclear spatial regions. This research show that intron length varies with distance to the center of the nucleus. They measured GC-content and intron length in five concentric zones around the center of the nucleus. In the central zone the number of TAD’s (topologically associating domain) with 57% GC-content is three times larger than in the peripheral zone. A smaller but similar decrease of intron length was found. All of these seems to be related to two different kind splicing processes. To me, too complicated to understand.
    One of their conclusions is:
    Altogether, our results suggest that the chromatin is organized in the nuclear space in a way that creates different functional zones with respect to the splicing mechanism of exon and intron definition, regulation of Alternative Splicing, and binding of Splicing Factors (proteins regulating splicing processes).

    These observations make me doubt the statement of biochemist Laurence Moran that 90% of our genome is junk, in his book “What's in Your Genome? 90% of Your Genome Is Junk”.
    Gert, what do you think?


  10. Rolie, you describe non-random statistical patterns of intron length and asks: "But isn’t NS related to biological function?"

    I would conclude: intron LENGTH ultimately has indirect EFFECTS on the phenotype.
    That is NOT the same as: the precise intron SEQUENCES have phenotypic effects.
    Intron sequence could largely evolve like neutral sequences.

    1) It is important to keep in mind that about 24% of our genome consists of introns, whereas the total of exons is not more than 2%.
    Is that amount of intron sequence necessary to regulate the protein coding genes? Not likely.

    2) one has to keep in mind how introns originated: they are thought to have invaded protein coding genes of our early eukaryotic ancestors.
    Some introns could have secondarily gained a function, that is a phenotypic effect, long after their introduction.

    3) one has to keep in mind that in mammals 6% of the genes are intronless. They do fine without introns.

    4) introns are analogous to viruses. See the book 'Viruses' by Marilyn J. Roossinck (2023):
    "as virus ecologist Marilyn Roossinck stresses, not all are agents of disease: some benefit their hosts by helping to protect them from other microorganisms, or helping them to perform new functions."
    Again: initially harmful or neutral and secondarily could have (beneficial) phenotypic effects.

    Thanks for commenting!

  11. What insights does your blog post offer regarding the actual significance of 'junk DNA' in genetics? Tel U

  12. Rolie,
    there are also positive effects of introns and for example in Evolution textbook Bergstrom & Dugatkin (2023) page 392 in the context of the Exon Theory of Genes. This theory says that many current genes arose by rearrangement of exons. Unfortunately, there is no separate Wikipedia page on the theory, but one will find some info on the Split gene theory wikipedia page. Warning: this page has been edited by Periannan Senapathy or supporters. Further help from google.
    See also my blog
    quote from Bergstrom and Dugatkin (2012): "introns may impose substantial fitness costs ..."

  13. Gert,
    When, as I showed in my previous post, introns influence biological processes (like: recombination, gene regulation, evolutionary processes) then I would not call this an indirect effect. Would you call gene regulation with transcription factors indirect?
    Anyway, more important is that introns play a significant role.
    Now I am reading a paper about exon shuffling. According to that theory introns are essential as separators between exons, of which copies can be made during recombination an which can be reused to make other proteins.
    The idea of exons as basic building blocks of proteins seems to fit with what I read (and wrote in my book) about protein folds and the role of physics, namely the importance of modularity.

    Now I am reading this paper: 2005 - Exon–domain correlation and its corollaries.
    To be continued.

  14. Rolie wrote "Would you call gene regulation with transcription factors indirect?"
    No. but now you are talking about gene regulation in general?
    If you read papers about exon shuffling, alternative splicing, etc. please pay attention to what is theory, what has been demonstrated with evidence, and what is the percent of the published cases in the article of the total number of introns / genes in our genome. Only in this way we get an idea how many introns / genes are involved of the total number of exons/introns/genes.

  15. Thank you, Gert.
    And for clarity: I am not trying to "proove" that there would be no random DNA parts without function. It is the high percentage of 90% mentioned by Laurence Moran.
    From a physics perspective randomness is a essential ingredient of the cosmos. But always in relation to lawfull operations. Therefore I think that so called junk DNA will be different from carying a bag of garbage (Dutch: vuilnis) on your back.
    To be continued.

  16. Rolie wrote "It is the high percentage of 90% mentioned by Laurence Moran. "...
    if you are going to argue against that, you need to know what he wrote exactly, isn't it?
    Don't cherry pick (=selecting examples of introns with useful effects

  17. Reading Morans book required? My comments are mainly related to your blogs about the book.
    And yes, I searched for paper about possible intron functions, is that a kind of cherry picking?
    Searching with: introns without functions, in scholar google gives you all kind of papers about functions introns.
    Using junk introns as search phrase, same story.

    May be you can give some refr. from Morans book about research confirming the junk intron hypothesis?

  18. Additionally to my previous comment: yes I can find papers about junk DNA, for example Ford Doolittle, 2014 - Is junk DNA bunk - A critique of ENCODE. And there is something about introns, but there is only one paper in the refr. list about introns: Doolittle WF (1987) What introns have to tell us: Hierarchy in genome evolution.
    Nothing about the papers I referred to in the previous comments.
    But of course his first goal with the paper is to discuss ENCODE.
    It seems my searching for papers about intron function is more fruitful than for papers showing that introns are junk.

  19. Hi Rolie,
    you wrote "is that a kind of cherry picking?" you can evaluate that yourself with:
    you wrote: "May be you can give some refr. from Morans book about research confirming the junk intron hypothesis?"
    I blogged about the book, the book itself is the reference, especially chapter 5 The Big Picture. I have the KOBO ebook (€ 28,99), not the paperback, so, I can't lend it to you. Do you have an ebook-reader? Furthermore, Moran himself blogged extensively about the matter.
    Have you read my comment
    Monday, August 21, 2023 at 10:12:00 AM GMT+2
    where I gave 4 arguments/facts. I can add:
    5) The gene with the most introns is TTN, with 362 introns, which is also the gene with the longest transcript length.
    Do you think that all those 362 introns have a function?

    you wrote:
    "It seems my searching for papers about intron function is more fruitful than for papers showing that introns are junk."
    No surprise, because papers such as "We have investigated the possible functions of 10.000 introns in the human genome: we have found none."
    (that is called a negative result) will probably not be published. Papers that find possible functions of introns will be of biological and medical interest and have a higher probability of being published.

    Again: in the papers you find about functions of introns, do the authors make an estimate of the % of introns that have a function?
    What is your own estimate of the % of introns that have a function?

  20. Rolie,
    especially for you and Dutch visitors:
    Waarom hebben we zoveel parasitair DNA?
    10 February 2016
    (this morning I discovered this blog and it is quite useful to reread)

  21. Gert, it took about two weeks to reply, too busy.
    The whole discussion about intron-function was very interesting. Now I want to stop it. You have shown that of many introns no function is known. I read papers that show functions of some introns. To demonstrate that is a hard research job and takes time. So, we may expect more is in the pipeline.

    My conclusion is: yes, you are right saying many introns have no function (I add: as far as we know). On the other side: published research shows all kind of functions for some introns. Wait and see.

  22. Repetitive DNA regulates gene expression:
    Short tandem repeats affect gene expression by binding regulatory proteins

  23. Ja dat is interessant. Helaas heb ik geen toegang tot het hele artikel


Comments to posts >30 days old are being moderated.
Safari causes problems, please use Firefox or Chrome for adding comments.