What is it that has allowed more “complex” species to evolve from simpler species? The popular notion that the addition of new protein coding genes is responsible for this increased complexity has been challenged over the last decade. It is becoming apparent that, although new proteins certainly are important, the amount of conserved sequence is far greater than can be accounted for by protein coding genes alone. A number of papers have addressed the issue of how much sequence is conserved and whether these conserved sequences are actually functional [1, 2]. A recent paper by Meader, Ponting and Lunter, entitled “Massive Turnover of functional sequence in human and other mammalian genomes” [3] addresses this issue and comes to some fascinating conclusions.
Many studies have involved the examination of point mutation rates and, in particular, what fraction of those point mutations are neutral [1]. Chiaromonte, et al [1] point out that about 5% of the human genome has undergone fewer point mutations than would have been expected were those mutations neutral. This suggests that there is some pressure to remove the deleterious mutations (purifying selection) from this DNA. Since only about 1.06% of the human genome codes for protein sequences [4], the remainder of the 5% must reside in non-coding sequences.
However, knowing which single nucleotide changes in the non-coding sequence are consequential and which are not is a very difficult problem that leads to large variations in estimating how much non-coding sequence is actually functional and therefore conserved. One approach to this question is to identify single nucleotide changes in sequences from closely related species [5], but estimating neutral mutation rates for these sequences in two species is often imprecise because distinguishing consequential from non-consequential changes is a true challenge.
Lunter, et al [6] developed a method based on genomic sequence comparisons looking for insertions or deletions (indels), which are much easier to detect in alignments than are point mutations. An indel will be seen as a gap in one of the sequences. It is possible to mathematically model the expected distance between indels that are neutral. A larger than expected inter-gap distance, therefore, would indicate that indels within that sequence are deleterious and have been purified out. The basis of their “neutral indel model” is to look for regions in the genomes that do not have indels at the rate expected from neutral changes. This approach does not depend on single nucleotide changes and is not subject to the inaccuracies inherent in that approach. Rather, the method looks for relatively easy to identify gaps in pairwise sequence alignments between genomic DNAs. This elegant solution is quite powerful.
In order to quantify functional DNA as being constrained, Lunter, et al [6] calibrate their system using biologically identified neutrally evolved sequences which they call ancestral repeats (ARs). These sequences are the result of transposable elements inserted into the genome of a common ancestor to the two species being compared. Presumably there is no pressure to maintain these ARs, so they are free to evolve. Their neutral indel model precisely demonstrates that, in fact, the ARs are neutrally evolving. Sequences that change more slowly are under selective pressure, and are therefore likely to be functional DNA. Using this approach, they evaluate a number of genomes for constrained sequences through pairwise genomic alignments. The pairwise comparisons also allow the estimation of lineage-specific sequences.
So, what did Meader et al [3] observe? In general, they found that as the divergence between mammalian species increased, the amount of pairwise shared functional sequence decreased dramatically. The fact that mammalian species do not tend to share functional sequences suggests that these sequences are turning over rapidly, resulting in considerable divergence among mammalian species. In humans, 6.5-10% of the genome is functionally constrained, yet only 1.06% of the genome codes for proteins. The vast majority of the conserved sequence, therefore, is likely to be regulatory in nature.
Humans have an estimated 200-300 Mb of functional DNA, which includes about 30 Mb of coding sequence. On the other hand, Drosophila has about 56-66 MB of functional DNA, including 21.8 Mb of coding sequence. The difference here is striking. While humans have garnered an increase of less than 50% in protein coding DNA since our divergence from flies (30 MB vs 21.8 Mb), we have increased our non-coding functional DNA content by 550% (17-270 MB vs 35-45 Mb). The ratio of non-coding to coding functional DNA is 1.5-2.0x in Drosophila, but is 5-8x in humans. Clearly, the increased complexity that we observe in humans is due more to changes in functional non-coding DNA that to protein coding DNA.
This elegant and innovative approach to understanding genome evolution provides a new perspective to the evolution of species. Newer species seem to result not so much from the new gene products that can be made, but from the ability to regulate and integrate these gene products in new combinations and under different circumstances. This is not really that much of a surprise at this point in time (after all, most of the genes found in yeast and worms, also have their counterparts in humans), but the order of magnitude and rate of change of non-coding functional DNA is significantly larger than previously thought. It would be interesting to see the results of this approach applied to a larger number of primates.
References
1. Chiaromonte, F., R.J. Weber, K.M. Roskin, M. Diekhans, W.J. Kent, and D. Haussler, The share of human genomic DNA under selection estimated from human-mouse genomic alignments.Cold Spring Harb Symp Quant Biol, 2003. 68:245-254.
2. Pheasant, M. and J.S. Mattick, Raising the estimate of functional human sequences.Genome Res, 2007. 17(9):1245-1253.
3. Meader, S., C.P. Ponting, and G. Lunter, Massive turnover of functional sequence in human and other mammalian genomes.Genome Res, 2010. 20(10):1335-1343.
4. Church, D.M., L. Goodstadt, L.W. Hillier, M.C. Zody, S. Goldstein, X. She, C.J. Bult, R. Agarwala, J.L. Cherry, M. DiCuccio, W. Hlavina, Y. Kapustin, P. Meric, D. Maglott, Z. Birtle, A.C. Marques, T. Graves, S. Zhou, B. Teague, K. Potamousis, C. Churas, M. Place, J. Herschleb, R. Runnheim, D. Forrest, J. Amos-Landgraf, D.C. Schwartz, Z. Cheng, K. Lindblad-Toh, E.E. Eichler, and C.P. Ponting, Lineage-specific biology revealed by a finished genome assembly of the mouse.PLoS Biol, 2009. 7(5):e1000112.
5. Siepel, A., G. Bejerano, J.S. Pedersen, A.S. Hinrichs, M. Hou, K. Rosenbloom, H. Clawson, J. Spieth, L.W. Hillier, S. Richards, G.M. Weinstock, R.K. Wilson, R.A. Gibbs, W.J. Kent, W. Miller, and D. Haussler, Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes.Genome Res, 2005. 15(8):1034-1050.
6. Lunter, G., C.P. Ponting, and J. Hein, Genome-wide identification of human functional DNA using a neutral indel model.PLoS Comput Biol, 2006. 2(1):e5.
Non-coding doesn’t ‘translate’ to non-functional when it comes to evolution
What is it that has allowed more “complex” species to evolve from simpler species? The popular notion that the addition of new protein coding genes is responsible for this increased complexity has been challenged over the last decade. It is becoming apparent that, although new proteins certainly are important, the amount of conserved sequence is far greater than can be accounted for by protein coding genes alone. A number of papers have addressed the issue of how much sequence is conserved and whether these conserved sequences are actually functional [1, 2]. A recent paper by Meader, Ponting and Lunter, entitled “Massive Turnover of functional sequence in human and other mammalian genomes” [3] addresses this issue and comes to some fascinating conclusions.
Many studies have involved the examination of point mutation rates and, in particular, what fraction of those point mutations are neutral [1]. Chiaromonte, et al [1] point out that about 5% of the human genome has undergone fewer point mutations than would have been expected were those mutations neutral. This suggests that there is some pressure to remove the deleterious mutations (purifying selection) from this DNA. Since only about 1.06% of the human genome codes for protein sequences [4], the remainder of the 5% must reside in non-coding sequences.
However, knowing which single nucleotide changes in the non-coding sequence are consequential and which are not is a very difficult problem that leads to large variations in estimating how much non-coding sequence is actually functional and therefore conserved. One approach to this question is to identify single nucleotide changes in sequences from closely related species [5], but estimating neutral mutation rates for these sequences in two species is often imprecise because distinguishing consequential from non-consequential changes is a true challenge.
Lunter, et al [6] developed a method based on genomic sequence comparisons looking for insertions or deletions (indels), which are much easier to detect in alignments than are point mutations. An indel will be seen as a gap in one of the sequences. It is possible to mathematically model the expected distance between indels that are neutral. A larger than expected inter-gap distance, therefore, would indicate that indels within that sequence are deleterious and have been purified out. The basis of their “neutral indel model” is to look for regions in the genomes that do not have indels at the rate expected from neutral changes. This approach does not depend on single nucleotide changes and is not subject to the inaccuracies inherent in that approach. Rather, the method looks for relatively easy to identify gaps in pairwise sequence alignments between genomic DNAs. This elegant solution is quite powerful.
In order to quantify functional DNA as being constrained, Lunter, et al [6] calibrate their system using biologically identified neutrally evolved sequences which they call ancestral repeats (ARs). These sequences are the result of transposable elements inserted into the genome of a common ancestor to the two species being compared. Presumably there is no pressure to maintain these ARs, so they are free to evolve. Their neutral indel model precisely demonstrates that, in fact, the ARs are neutrally evolving. Sequences that change more slowly are under selective pressure, and are therefore likely to be functional DNA. Using this approach, they evaluate a number of genomes for constrained sequences through pairwise genomic alignments. The pairwise comparisons also allow the estimation of lineage-specific sequences.
So, what did Meader et al [3] observe? In general, they found that as the divergence between mammalian species increased, the amount of pairwise shared functional sequence decreased dramatically. The fact that mammalian species do not tend to share functional sequences suggests that these sequences are turning over rapidly, resulting in considerable divergence among mammalian species. In humans, 6.5-10% of the genome is functionally constrained, yet only 1.06% of the genome codes for proteins. The vast majority of the conserved sequence, therefore, is likely to be regulatory in nature.
Humans have an estimated 200-300 Mb of functional DNA, which includes about 30 Mb of coding sequence. On the other hand, Drosophila has about 56-66 MB of functional DNA, including 21.8 Mb of coding sequence. The difference here is striking. While humans have garnered an increase of less than 50% in protein coding DNA since our divergence from flies (30 MB vs 21.8 Mb), we have increased our non-coding functional DNA content by 550% (17-270 MB vs 35-45 Mb). The ratio of non-coding to coding functional DNA is 1.5-2.0x in Drosophila, but is 5-8x in humans. Clearly, the increased complexity that we observe in humans is due more to changes in functional non-coding DNA that to protein coding DNA.
This elegant and innovative approach to understanding genome evolution provides a new perspective to the evolution of species. Newer species seem to result not so much from the new gene products that can be made, but from the ability to regulate and integrate these gene products in new combinations and under different circumstances. This is not really that much of a surprise at this point in time (after all, most of the genes found in yeast and worms, also have their counterparts in humans), but the order of magnitude and rate of change of non-coding functional DNA is significantly larger than previously thought. It would be interesting to see the results of this approach applied to a larger number of primates.
References
1. Chiaromonte, F., R.J. Weber, K.M. Roskin, M. Diekhans, W.J. Kent, and D. Haussler, The share of human genomic DNA under selection estimated from human-mouse genomic alignments. Cold Spring Harb Symp Quant Biol, 2003. 68:245-254.
2. Pheasant, M. and J.S. Mattick, Raising the estimate of functional human sequences. Genome Res, 2007. 17(9):1245-1253.
3. Meader, S., C.P. Ponting, and G. Lunter, Massive turnover of functional sequence in human and other mammalian genomes. Genome Res, 2010. 20(10):1335-1343.
4. Church, D.M., L. Goodstadt, L.W. Hillier, M.C. Zody, S. Goldstein, X. She, C.J. Bult, R. Agarwala, J.L. Cherry, M. DiCuccio, W. Hlavina, Y. Kapustin, P. Meric, D. Maglott, Z. Birtle, A.C. Marques, T. Graves, S. Zhou, B. Teague, K. Potamousis, C. Churas, M. Place, J. Herschleb, R. Runnheim, D. Forrest, J. Amos-Landgraf, D.C. Schwartz, Z. Cheng, K. Lindblad-Toh, E.E. Eichler, and C.P. Ponting, Lineage-specific biology revealed by a finished genome assembly of the mouse. PLoS Biol, 2009. 7(5):e1000112.
5. Siepel, A., G. Bejerano, J.S. Pedersen, A.S. Hinrichs, M. Hou, K. Rosenbloom, H. Clawson, J. Spieth, L.W. Hillier, S. Richards, G.M. Weinstock, R.K. Wilson, R.A. Gibbs, W.J. Kent, W. Miller, and D. Haussler, Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. Genome Res, 2005. 15(8):1034-1050.
6. Lunter, G., C.P. Ponting, and J. Hein, Genome-wide identification of human functional DNA using a neutral indel model. PLoS Comput Biol, 2006. 2(1):e5.