Do ENCODE Skeptics Protest Too Much? Part 3 (of 3)
In light of the belief that the human genome is largely a genetic junkyard, the ENCODE Project’s phase two report that 80 percent of the human genome consists of functional DNA elements is an astounding announcement—and it is not without detractors. In part 1 of this series, I collated the criticisms published by ENCODE “skeptics.” In part 2, I responded to two of the most significant challenges. In this final installment, I respond to four other objections.
I protest. I take these wise men,
that crow so at these set kind of fools,
no better than the fools’ zanies.
Twelfth Night, Act I, scene v
When the ENCODE Project announced in September 2012,1 that 80 percent of the human genome, at minimum, consists of functional elements, creation proponents pointed out that this provocative result undermines one of the best arguments for an evolutionary origin of humanity (namely that a Creator would not design life with a surplus of useless DNA).
A number of scientists, however, have protested, claiming that the ENCODE Project made a number of errors in assigning function to DNA sequences in the human genome.2 People committed to the evolutionary paradigm will undoubtedly cite these published criticisms as a way to dismiss the creation advocates as “fool’s zanies” for attempting enlist the ENCODE Project as evidence for design.
Yet a careful examination of the critiques suggests that these complaints lack merit. Here, in the last installment of a three-part response to the ENCODE skeptics (see parts 1 and 2), I discuss four significant objections to the ENCODE Project.
- The ENCODE Project made a logical error when assigning function to the different elements of the genome.
- The ENCODE Project assigned function to entire classes of sequence elements based on identified function for a few members of that class.
- The ENCODE Project conflated biochemical activity with function.
- The ENCODE Project’s results don’t square with the C-value paradox.
Logical Errors in Assigning Function?
One group of ENCODE skeptics, led by University of Houston biology professor Dan Graur, not only takes issue with the way the ENCODE Project defined function, they also assert that the ENCODE team made a logical error when assigning function to DNA sequences. Specifically, Graur’s group argues that the ENCODE researchers are guilty of affirming the consequent (also known as the converse error).
Assuming the expression “if P, then Q” is valid, it is a formal error in logic to use the following line of reasoning.
- If P, then Q
- Q is observed
- Therefore, P
This reasoning is flawed because there may be other conditions that could lead to Q, whether P is true or not. That is, it may also be that if R, then Q, or if S, then Q, regardless of P.
To illustrate how Graur and his team believe the ENCODE Project committed this error, let’s look at transcription factor binding to DNA. The ENCODE researchers identified all the sequences in the human genome that bind transcription factors (proteins that bind to specific DNA sequences). This binding influences gene expression.
If a DNA sequence regulates gene expression by serving as a promoter or enhancer, then it will bind a specific set of transcription factors. (If P, then Q.) Let’s say that a researcher observes a transcription factor binding to DNA. He or she then concludes that the sequence to which the protein binds is critical for regulating gene expression. (Q is observed, therefore, P.)
Graur’s team argues this conclusion is invalid. For example, it is possible that the transcription factor could bind randomly to DNA sequences that do not serve as promoters or enhancers, or in any other functional role. In other words, to conclude that DNA sequences are functional if they bind transcription factors is to affirm the consequent. Graur’s group believes the ENCODE Project committed this logical error for all the assays they performed when assigning function to DNA sequences.
In my opinion this concern is not nearly as problematic as Graur’s team makes it out to be. They conflate deductive reasoning with inductive; yet scientific investigations rely on induction, not deduction. While Graur’s group rightly points out that affirming the consequent is a logical fallacy when engaged in deductive reasoning, this error doesn’t apply to inductive reasoning. Induction produces conclusions that are probabilistic though not certain.
Let’s return to the example of transcription factor binding to DNA. As already noted, if a DNA sequence serves as an enhancer or promoter, it will bind a specific set of transcription factors. If a scientist then observes transcription factors binding to DNA, it is reasonable to conclude that these binding sites play a role in regulating gene expression. Though not certain, this conclusion is probabilistic. Despite the uncertainty associated with it, the conclusion is still reasonable because a vast body of data demonstrates that transcription factors bind to specific DNA sequences that regulate gene expression. Yes, another explanation for why these transcription factors bind to DNA may exist. Confirmatory experiments can reduce this uncertainty.
The key point is this: there is nothing wrong with concluding, when using inductive reasoning, that sequences that bind transcription factors are functional. By extension, there is nothing wrong with the reasoning the ENCODE Project employed to assign function to sequences in the human genome. The ENCODE Project did not affirm the consequent because they were making use of induction (as do all scientists), not deduction.
Assigning Function to an Entire Class of Sequence Elements Based on a Few Members?
In his critique of ENCODE, biochemist W. Ford Doolittle rather forcefully warns,
There are three other “natural temptations” that I would caution consumers of the ENCODE Project product to avoid. The first temptation is the assumption that because some members of a class of elements have acquired SE [selected effect] function, all or most must have functions or (more broadly) that the class of elements as a whole can thus be declared functional.3
In other words, just because researchers identify function for, say, duplicated pseudogenes doesn’t mean all pseudogenes possess function. However, it would be natural in other scientific investigations to assume that if a particular property has been identified for a representative sample, then the entire system possesses that property. Yet when reviewing the ENCODE Project, some evolutionary biologists are eschewing this common practice. Many of the human DNA sequences to which ENCODE assigned function reside within regions of the genome long thought to be junk DNA. It seems Doolittle and others are reluctant to conclude that a part represents the whole due to a pre-commitment to the belief that junk DNA arises via evolutionary processes.
According to the evolutionary paradigm, junk DNA sequences originate when random physical, chemical, and biochemical processes convert sequences from functional to nonfunctional. But this is not the end of the story. Many biologists believe evolutionary processes will make use of junk DNA sequences occasionally, altering them into something useful via a process called neofunctionalization. As Doolittle puts it, “It is surely inevitable that evolution, that inveterate tinkerer, will have sometimes co-opted some TEs [transposable elements, a class of junk DNA] for such purposes.”4 It seems many evolutionary biologists cannot accept the ENCODE data due to their belief that evolutionary processes would not convert all members of a class of junk DNA into functional elements.
The results of the ENCODE Project challenge this evolutionary perspective. The ENCODE team performed a large number of assays, systematically surveying the human genome and cataloging the functional sequences. They didn’t simply identify a few examples of function in particular members of a junk DNA category and then conclude the whole class must be functional. Instead, they identified, one by one, members of a sequence elements group that displayed function.
So, in effect, Doolittle’s complaint holds no weight. It also fails to take into account other work—such as research involving pseudogenes—that not only identifies function for individual members of this junk DNA class, but also presents an elegant framework to explain the function of all members of the category. (Go here and here to read about the competitive endogenous RNA hypothesis as a comprehensive model for pseudogene function.) This type of advance coheres nicely with the catalogue of functional elements ENCODE identified.
Conflating Biochemical Activity with Function?
Graur’s group, Doolittle, and researchers Deng-Ke Niu and Li Jiang all complain that the ENCODE Project confused biochemical activity with function. This was, in fact, one of the first critiques leveled against the ENCODE Project. As I explained in a previous article, the ENCODE Project measured biochemical activity known to play a role in gene regulation. However, the ubiquity of this objection makes it worth a more comprehensive response.
ENCODE detractors might concede that the biochemical assays ENCODE researchers carried out did indeed measure activity related to function—but these skeptics would still maintain that not all the activity measured is actually functional. For example, the ENCODE Project determined that about 60 percent of the human genome is transcribed to produce RNA molecules. ENCODE skeptics would argue that not all of these transcripts possess function. In fact, they might say that most are nonfunctional. Graur’s group asserts that “some studies even indicate that 90% of transcripts generated by RNA polymerase II may represent transcriptional noise.”5
These criticisms ignore two important points: (1) biochemical noise costs energy; and (2) random interactions among genome components would be highly deleterious to the organism.
Let me illustrate the first point by focusing on transcription. From my vantage point, it is reasonable to conclude that the transcripts produced from the human genome are, by and large, all functional. First, researchers know the identity of these transcripts. The various RNA molecules transcribed from the human genome all play a role, either direct or indirect, in protein synthesis or in gene expression regulation. So, on this basis alone, it is reasonable to suspect that most of the transcripts possess these functions.
But more importantly, transcription is an energy- and resource-intensive process. Therefore, it would be untenable to believe that most transcripts are mere biochemical noise. Such a view ignores cellular energetics. Transcribing 60 percent of the genome when most of the transcripts serve no useful function would routinely waste a significant amount of the organism’s energy and material stores. If such an inefficient practice existed, surely natural selection would eliminate it and streamline transcription to produce transcripts that contribute to the organism’s fitness.
Graur and his colleagues, along with Niu and Jiang, would argue that most instances of transcription factor binding to DNA are also nothing more than biochemical noise. Graur’s group claims that most transcription factor binding occurs by chance. Each transcription factor latches on to a particular motif (pattern) within a relatively short DNA sequence. These motifs occasionally appear in a random sequence. Graur and his colleagues argue that nonfunctional transcription factor binding sites are dispersed randomly throughout the human genome and account for most of the binding interactions.
Apart from energetics considerations, this argument ignores the fact that random binding would make a dire mess of genome operations. In fact, other studies indicate that protein surfaces are designed to minimize so-called promiscuous (random) interactions. (Go here and here to read articles I wrote on the extent to which biochemical systems go to avoid unwanted protein-protein interactions in the cell’s interior.) Without minimizing these disruptive interactions, biochemical processes in the cell would grind to a halt. It is reasonable to think that the same considerations would apply to transcription factor binding with DNA.
While it’s true that biochemical activity doesn’t necessarily equate to function, the ENCODE researchers appear to have gone to great efforts to ensure that they measured activity with biological meaning. The idea that activities associated with the genome—such as the transcription of the genome, methylation of DNA, modification of histones, binding of transcription factors, and others—are mostly noise borders on the ridiculous because it ignores well-established principles of biochemical operations.
Squaring with the C-Value Paradox?
Both Doolittle and researcher Sean R. Eddy protest that ENCODE’s results don’t make sense in light of the C-value paradox. This conundrum traces back to the early days of molecular biology. Scientists observed that the nucleus of each cell type within a particular organism contained a constant amount of DNA. Therefore, biochemists refer to this amount of DNA as the C value (C for constant).
Initially, researchers expected the amount of DNA to correlate with an organism’s biological complexity. Yet studies revealed that no such relationship existed. Some relatively simple organisms possess a larger C value than do more complex organisms. To resolve this paradox, molecular biologists proposed that the majority of an organism’s genome consists of DNA that doesn’t code for proteins or regulate gene expression. Researchers concluded that the non-coding DNA served no real purpose. They viewed it as vestiges of evolutionary processes, or junk.
However, if the ENCODE Project’s conclusion that most, if not all, of the human genome contains functional DNA is valid, then the genome contains very little junk DNA. According to Doolittle, “If the human genome is junk-free, then it must be very luckily poised at some sort of minimal size for organisms of human complexity.”6 From an evolutionary perspective, all the different classes of junk DNA would have to evolve new function to make the ENCODE Project’s conclusion possible. Doolittle states that we would be the “first among many in having made such full and efficient use of all of its millions of SINES and LINES (retrotransposable elements) and introns to encode the multitudes of lncRNAs and house the millions of enhancers necessary to make us the uniquely complex creatures that we believe ourselves to be.”7 For Doolittle, the absurdity of this prospect means the ENCODE Project’s results cannot be correct.
In light of the C-value paradox, the ENCODE results would mean that less sophisticated organisms with larger genomes (compared to humans) must also possess more functional elements. But such a scenario makes no sense—at least from an evolutionary perspective. Yet it is possible to account for the larger genomes in organisms less complex than humans. It may be that the excess DNA plays a role other than coding for proteins and regulating gene expression. A number of studies, for example, indicate that DNA dictates the size of the cell nucleus. (To read more about this idea go here and here.)
To me, this criticism of ENCODE seems motivated by a strong commitment to the evolutionary paradigm. In other words, the experimentally generated ENCODE results don’t square with the expectations of the theory of biological evolution; therefore, the ENCODE results must be wrong. This is an example of theory-dependent reasoning, in which the theoretical framework holds more sway than the actual experimental and observational results. ENCODE skeptics’ commitment to the evolutionary paradigm is so strong it appears that they unwittingly abandoned one of science’s central practices: experimental results dictate a theory’s validity, not the other way around.
Despite these latest criticisms, I see no real scientific reason to dismiss the ENCODE Project’s results. Careful consideration reveals that the objections have more to do with philosophy than science. The ENCODE skeptics seem to feel that the ENCODE results must be wrong because they don’t line up with key concepts of the evolutionary paradigm. The ENCODE skeptics even depart from standard scientific practices to maintain their commitment to evolution in the face of the ENCODE discoveries.
The ENCODE Project’s conclusions—namely that at least 80 percent of the human genome is comprised of functional DNA sequences—remain valid evidence for elegant design, befitting the work of a Creator, in the human genome and, by extension, the genomes of other organisms.
1. The ENCODE Project Consortium, “An Integrated Encyclopedia of DNA Elements in the Human Genome,” Nature 489 (September 6, 2012): 57–74.
2. Sean R. Eddy, “The C-Value Paradox, Junk DNA, and ENCODE,” Current Biology 22 (November 6, 2012): R898–R899; Dan Graur et al., “On the Immortality of Television Sets: ‘Function’ in the Human Genome According to the Evolution-Free Gospel of ENCODE,” Genome Biology and Evolution 5, no. 3 (2013): 578–90; Deng-Ke Niu and Li Jiang, “Can ENCODE Tell Us How Much Junk DNA We Carry in Our Genome?” Biochemical and Biophysical Research Communications 430 (January 25, 2013): 1340–43; W. Ford Doolittle, “Is Junk DNA Bunk? A Critique of ENCODE,” Proceedings of the National Academy of Sciences, USA 110 (April 2, 2013): 5294–300.
3. Doolittle, “Is Junk DNA Bunk?”: 5297.
5. Graur et al., “On the Immortality of Television Sets”: 578–90.
6. Doolittle, “Is Junk DNA Bunk?”: 5297.