The discovery of the structure of DNA led to the idea that genomes are merely a series of DNA sequences, or genes, that code for proteins. Yet a paradox soon emerged: some relatively simple creatures turned out to have much larger genomes than more complex ones. Why would they need more genes?
What does DNA code for? Genetic traits and proteins. So do simple creatures need larger DNA structures? They don't. It rapidly became clear that in animals and plants, most DNA does not code for proteins. Early in studies of the Genome. 98 per cent of our DNA is of the non-coding variety. But even back in the 1970s it was obvious that not all non-coding DNA is junk. There is a certain kind of regulatory DNA. Certain sequences for which certain proteins bind can boost or block the expression of genes nearby. Such DNA is important.
This feature has been discovered over the years. Tiny bits of non-coding DNA have turned out to have a regulatory role or some other function. It was believed until recently that such sequences were only a small-part of non-coding DNA. Only in the past decade, as the genomes of more and more species have been sequenced and compared, has the bigger picture begun to emerge.
Conservation of Genes
Even though it is 450 million years since the ancestors of pufferfish and humans parted ways, everyone expected that we would still share many of the same genes - as proved to be the case. Most of the protein-coding DNA in different vertebrates is very similar or "conserved". The surprise was that even more of the non-coding DNA is conserved, too. Why did this occur?
DNA is constantly mutating due to copying mistakes and damage from chemicals and radiation. Specific sequences will be conserved only if natural selection weeds out any offspring with changes in these sequences. This will happen only if the changes are harmful, so researchers are convinced that all the conserved non-coding DNA must do something important. Why else would evolution hang on to it?
Those regions really challenge our understanding of biology. Biologists trying to find out what conserved non-coding DNA does, so scientists recently added extra copies of some of these sequences to mice. It's like taking a few extra pages and stapling them into a book.
Copies of the "ultra-conserved" sequences that are almost exactly the same, base for base, in the mouse, rat and human. Nearly half of the sequences the team tested boosted gene expression in specific tissues, especially genes involved in nervous system development, the team reported last year.
This suggests that much of the conserved non-coding DNA is needed to make a brain cell, say, different from a skin cell. However, conserved DNA still accounts for only a tiny proportion of the genome. Even counting the 1.2 per cent of coding DNA, the human sequences found in other mammals add up to just 5 per cent. What's the other 95 per cent for?
One possibility is that some of the DNA whose sequence is not conserved might be conserved in a different sense. Regulatory sequences are essentially binding sites for proteins, so what matters is their three-dimensional structure. And while the conventional view is that the 3D structure of DNA is closely related to its sequence, scientists have found evidence that some regulatory regions share similar structures even though their sequences are different. Looked at this way, the total amount of conserved DNA could be much higher.
The RNA transcription factor
Another line of evidence suggesting that some non-conserved DNA has a function comes from looking at which DNA sequences get transcribed into RNA. It used to be thought that, with a few exceptions, most RNAs were produced as the first step in making proteins.
Protein-coding genes contain vast stretches of non-coding DNA called introns, which make up a quarter of our genome. These introns are transcribed into RNA but immediately edited out of the "raw" RNA. The resulting "processed" RNAs represent just 2 per cent of the genome.
A few years ago, however, scientists showed that far more than 2 per cent of the genome gets transcribed into RNA. The latest estimates are that 85 to 97 per cent of the entire genome is transcribed into raw RNA, resulting in processed RNAs representing 18 per cent of the genome.
Clearly, most of this RNA is non-coding, or ncRNA. So what is it for? While some of the very small ncRNAs have a big role in the control of gene expression most ncRNA remains mysterious.