Want an extra GCK license for FREE?

Our satisfied customers are our best ambassadors – and we’d like to THANK YOU for helping us spread the word about Gene Construction Kit® (GCK)!
Refer a colleague to us and when they place an order, we’ll give you an annual GCK license – on the platform of your choice – absolutely FREE!

Our satisfied customers are our best ambassadors – and we’d like to thank our customers for helping us spread the word about Gene Construction Kit® (GCK)!

Refer a colleague to us and they will receive a $50 discount* …

… and when they place an order, we’ll give YOU one annual GCK license absolutely FREE!

**NOTE:  You can even ‘Refer’ yourself!
… Receive the $50 discount – AND – the FREE 12-month GCK license with your order.

Please visit our “how to buy” page for ordering information, and mention code “Feb-2012″ when contacting us.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

To introduce your colleagues to the benefits of GCK, simply send them the link to “Download a Demo“.

After installing the demo version, your colleague can request an activation code by email – which will convert the demo version into a full-functioning, time-limited         “Trial Version.”

Textco BioSoftware offers academic, government, and commercial discount pricing, and for installation on 10 or more computers, ‘Network’ type licensing is available.

For more detailed information about working with our software, please review the additional tutorial information at our website, and when contacting us mention code “Feb-2012.”

We look forward to working with you and your colleagues!

Contact for Sales / Marketing:

Brant Hackett; 480-241-9121

Roberta Brucks Gross; 603-643-1471

About Textco BioSoftware

Textco BioSoftware (formerly Textco, Inc.), has been developing high quality productivity tools for molecular biologists for over 25 years. Our unwavering commitment to customer service, and our focus on quality has generated a loyal customer following. Since 1984, we have provided solutions to scientists who are breaking new ground in genetic engineering, basic biology research, drug development, and biotechnology – at academic, government, and corporate institutions in more than 50 countries worldwide.

* Excludes prior purchases, annual licenses, and network renewals. Cannot be combined with other discount offers.

Posted in current promotions | Leave a comment

Welcome Bio-IT World Readers …

Thank you for your interest in the award-winning Gene Construction Kit®. Lauded by leading biologists as ‘Easy-to-Learn AND Easy-to-Use,’ researchers around the world rely on GCK to increase their productivity, thereby paying for itself with savings of time and money.

To introduce Bio-IT World readers to GCK, we are offering a limited-time discount …

Just mention code, “Bio-IT”, to receive a $101 discount towards the purchase of a GCK ‘non-expiring’ single-computer license – purchased by March 15, 2012 … Additional discounts are also available when you purchase licenses for multiple computers.

As an added bonus – with your order for GCK, you will receive a $300 BioBucks™ Certificate good towards an additional software license (even a second license purchased the same day).

‘BioBucks’ are valid for through May 31, 2012 – and are transferrable to a colleague or collaborator should you choose.

FREE demo versions of Gene Construction Kit® 3.5 (and also Gene Inspector® – our DNA / protein analysis & e-notebook software) are available for download for both Mac and Windows.

After installing the demo version, please email us to request an activation code which will convert the license into the full-functioning, time-limited “Trial Versions”.

Textco BioSoftware offers academic, government, and commercial pricing, and for installation on 10 or more computers, ‘Network’ type licensing is available.

For more detailed information about working with our software, please review the additional tutorial information found within our website, and contact Textco BioSoftware for your discount pricing. We look forward to working with you!

Contact for Sales / Marketing:

Brant Hackett; 480-241-9121

Roberta Brucks Gross; 603-643-1471

About Textco BioSoftware

Textco BioSoftware (formerly Textco, Inc.), has been developing high quality productivity tools for molecular biologists for over 25 years. Our unwavering commitment to customer service, and our focus on quality has generated a loyal customer following. Since 1984, we have provided solutions to scientists who are breaking new ground in genetic engineering, basic biology research, drug development, and biotechnology – at academic, government, and corporate institutions in more than 50 countries worldwide.

Posted in Uncategorized | Leave a comment

GCK 3.5.2 Update Now Available

West Lebanon, NH  09|28|2011

Textco BioSoftware today released GCK version 3.5.2, available to current customers as a FREE download from the updates section of the website.

Modifications and fixes included in this release can be reviewed by visiting the ‘Revision History‘ page.

All current GCK 3.0 and 3.5 license holders are encouraged to update to this latest release for improved performance.

GCK 3.5 is compatible with Mac OS 10.4 and higher (Including OS 10.7, “Lion”); and Windows 7, Windows Vista & Windows XP.

New customers can evaluate Gene Construction Kit 3.5 by downloading the free demo version.

Contact Textco BioSoftware for more information.

Contact for Sales/Marketing:

Brant Hackett; 480-241-2191

Roberta Brucks Gross; 603-643-1471

About Textco BioSoftware

Textco BioSoftware (formerly Textco, Inc.), has been developing high quality productivity tools for molecular biologists for over 25 years. Our unwavering commitment to customer service, and our focus on quality has generated a loyal customer following. Since 1984, we have provided solutions to scientists who are breaking new ground in genetic engineering, basic biology research, drug development, and biotechnology – at academic, government, and corporate institutions in more than 50 countries worldwide.


Posted in news | Leave a comment

Welcome …

Thank you for your interest in learning more about Gene Construction Kit®. Since its initial introduction more than 20 years ago, “GCK” has been the plasmid mapping software of choice for tens of thousands of researchers worldwide. GCK has proven the test of time, and its unique graphical approach to molecular cloning has not been matched. Lauded by leading researchers as ‘Easy-to-learn AND Easy-to-Use,’ researchers rely on GCK to increase their productivity, thereby paying for itself by saving time and money.

GCK 3.5 is compatible with Mac OS 10.4 and higher (Including Snow Leopard and Lion); and Windows 7, Windows Vista & Windows XP.

Textco BioSoftware always offers special academic and government pricing, and for installation on 10 or more computers ‘Network’ type licensing is available.

To introduce online readers to GCK, we are offering you a special discount …

Just mention your code, “NOVA” to receive ~ $150 off ~ the price of one GCK 3.5 ‘non-expiring’ license – further discounts are available when you purchase licenses for multiple computers.

FREE demo versions of Gene Construction Kit® (and Gene Inspector® – our DNA/protein analysis application) are available for download for both Mac and Windows. We also offer full-functioning, time-limited “Trial Versions” upon request.

For more detailed information about working with our software, please review the additional tutorial information found within our website, and contact Textco BioSoftware for your discount pricing. We look forward to working with you.

Contact for Sales/Marketing:

Brant Hackett; 480-241-2191

Roberta Brucks Gross; 603-643-1471

About Textco BioSoftware

Textco BioSoftware (formerly Textco, Inc.), has been developing high quality productivity tools for molecular biologists for over 25 years. Our unwavering commitment to customer service, and our focus on quality has generated a loyal customer following. Since 1984, we have provided solutions to scientists who are breaking new ground in genetic engineering, basic biology research, drug development, and biotechnology – at academic, government, and corporate institutions in more than 50 countries worldwide.


Posted in Uncategorized | Leave a comment

Gene Inspector 2.0 Beta Announced …

West Lebanon, NH 08|24|2011

Textco BioSoftware today announced the public beta testing for Gene Inspector version 2.0 for Mac. The Windows beta release will be available soon. GI 2.0 is compatible with Lion (Mac OS 10.7), as well as previous versions of Mac OS (10.4 or higher).

Users will notice faster analysis processing, smoother font and analysis display via anti-aliasing, and GI 2.0 is a Universal Binary release – taking advantage of the Intel processors found inside the latest Macs.

We are beginning an open beta-test for GI 2.0 and encourage you to participate by signing up here.

Contact for Sales/Marketing:

Brant Hackett; 480-241-2191

Roberta Brucks Gross; 603-643-1471

About Textco BioSoftware

Textco BioSoftware (formerly Textco, Inc.), has been developing high quality productivity tools for molecular biologists for over 25 years. Our unwavering commitment to customer service, and our focus on quality has generated a loyal customer following. Since 1984, we have provided solutions to scientists who are breaking new ground in genetic engineering, basic biology research, drug development, and biotechnology – at academic, government, and corporate institutions in more than 50 countries worldwide.

Posted in news | Tagged , | Leave a comment

How living systems have helped in the evolution of computational problem solving

Today, we are being deluged with an enormous amount of biological data. Web sites abound that have genomes, gene expression data, transcription factor data, phylogenetic data, and the list goes on. The Genomes OnLine Database currently lists 6,423 genomes. They also list 249 metagenomes, which are “genomes” from whole microbial communities [1]. NCBI has a Gene Expression Omnibus (GEO) that contains gene expression data and currently lists 9,053 sequencing platforms, 594,152 samples, 23,949 series and 2,720 datasets [2, 3]. Transcription factor databases tend to be species specific and contain information about the proteins and their recognition sites on DNA. There are hundreds of transcription factors for each of the thousands of sequenced genomes as listed in databases such as TRANSFAC [4]. PhylomeDB contains gene phylogenies. It currently has 17 phylomes, 416,093 trees, 165,850 alignments, 5,262,859 proteins, 717 species, and 1,053 genomes [5]. The Tree Of Life web site contains more than 10,000 pages of information about biodiversity and evolutionary history [6].

This is a great deal of data! How can we make sense of it all? Clearly, there is valuable information and insights into the biology, if only we could tease it out of this deluge of data. Most usefully, we should turn the raw data into knowledge and understanding. As humans, the challenge is too great without the use of computers. Designing computer algorithms to analyze these data sets is where the challenge lies. New algorithms have to continue to be developed to explore the data in a meaningful way.

One approach that is particularly intriguing for computational biology is to model computer algorithms on biological systems. There is a satisfying symmetry to this approach for the biologist. This blog entry will talk about neural networks, but future blogs will address evolutionary computation and ant colony optimization – other computational approaches based on biological models. These three approaches will give you a good feel for how the biology can help shape the computation.

Biological neural networks, as they exist in your brain, consist of interconnected neurons that can accomplish a task – such as recognizing a tree, your spouse, a problem or a solution. Multiple input neurons can connect to a common target, each having a different “strength” to their influence on the target neuron. Similarly, multiple target neurons, might connect to their common target neuron. These neuronal circuits help to generate the output, for example: “Oh, that’s a tree!” In reality, the connections are more complex, but this is the essence – neurons connecting to other neurons to form a network.

Artificial neural networks work on the same principle [7]. For example, let’s say we want to predict protein secondary structure from the primary amino acid sequence. Methods exist that do a decent job of this using more classic approaches [8, 9]. We might choose to have 9 input neurons (sensors, perceptrons) that recognize 9 adjacent amino acids in the sequence . These might feed into three target neurons (via 9 x 3 = 27 connections). The three target neurons might, in turn, feed into a single output neuron (three more connections) that can then predict the likely structure for this stretch of 9 amino acids (or the middle 3 or 5 amino acids). The output value of the output neuron can specify that the structure is alpha helix, beta sheet, or random coil (or some other structure). The 30 different connection weights are what actually determines the output and the analysis process of the input data. How are those weights determined?

The weights are determined by the neural network program that is initially trained on a set of proteins whose structure is known. At first, the weights are randomly assigned and the network is evaluated by how well it predicts the structures for a set of proteins with known structures. Next, the weights are adjusted and the network run again. If the accuracy improves, the next round of adjustments takes place in the same “direction” as the previous round. If the new network is worse than the previous one, the weights are reset and the process is rerun. There are many details involved in a lot of these steps. Decisions have to be made on how to measure the accuracy of the network, how to adjust the weights, can connections have zero weights, how many layers does the neural network have, how many “neurons” are there in each layer, can signals go backwards so that a layer can alter the value of a previous layer, etc. The performance of the program can be influenced dramatically by these choices. In any case, the connection weights of a network evolve (nice word!) towards a more accurate predictor. Each iteration of the network gets better and better until an optimum solution is reached.

Once the network is trained (the connection weights do not change significantly from one iteration to the next), it can be used to predict the structure of unknown proteins which can then be tested in the lab. Note that since the values are randomly chosen to begin with, each time a nueral network is trained on a set of data, it could produce a different result. Thus, it is usually advisable to train the network multiple times to ensure getting a well trained network.

It is possible to generate a number of different trained neural networks that perform equally well despite having dramatically different weights for the connections. Therein lies a weakness of the approach. Even if a neural network is perfect (100% accurate), it might not be possible to learn what is actually being modeled by just looking at the weights and connections. Basically, the neural network is a black box that takes some input and generates an output. Many feel that a black box that works is better than no method at all. If the neural network is successful, it says that the input data is sufficient to predicting the output (even if it is not understood how it is working). With that understanding, it should be possible to design a new algorithm with a precisely defined model that can do what the neural network was able to do. This would be classified as progress.

Neural nets have been developed to address a number of different biological systems. Actual protein secondary structure prediction (as in our simple example) has been approached by neural networks [10, 11] and the results are an improvement over more traditional methods [8, 9]. GRAIL is a neural network that has been developed to predict genes from genomic DNA sequences [12]. Other biological uses for artificial neural networks include modeling D1 and D2 dopamine receptors [13], designing bioactive proteins [14], predicting antigenic activity for hepatitis C protein NS3 [15], and inferring the rules of E. coli translational efficiency [16].

What artificial neural network solutions have in common is that they mimic a biological system to guide the computing. Principles that work in biological systems have been applied to computing systems with quite a bit of success. Survival of the fittest seems to work for computing as well as for biology. It is easy to imagine a positive feedback loop where using biological computation will help us understand the actual biology. This in turn will lead to more detailed and sophisticated biological models that can then be applied to redesigned and improved algorithms… It will be fun to see how this all evolves ;-) .

References

1. Markowitz, V.M., N.N. Ivanova, E. Szeto, K. Palaniappan, K. Chu, D. Dalevi, I.M. Chen, Y. Grechkin, I. Dubchak, I. Anderson, A. Lykidis, K. Mavromatis, P. Hugenholtz, and N.C. Kyrpides, IMG/M: a data management and analysis system for metagenomes. Nucleic Acids Research, 2008. 36(Database issue):D534-538 http://www.ncbi.nlm.nih.gov/pubmed/17932063.
2. Barrett, T., D.B. Troup, S.E. Wilhite, P. Ledoux, C. Evangelista, I.F. Kim, M. Tomashevsky, K.A. Marshall, K.H. Phillippy, P.M. Sherman, R.N. Muertter, M. Holko, O. Ayanbule, A. Yefanov, and A. Soboleva, NCBI GEO: archive for functional genomics data sets–10 years on. Nucleic Acids Research, 2011. 39(Database issue):D1005-1010 http://www.ncbi.nlm.nih.gov/pubmed/21097893.
3. Sayers, E.W., T. Barrett, D.A. Benson, E. Bolton, S.H. Bryant, K. Canese, V. Chetvernin, D.M. Church, M. DiCuccio, S. Federhen, M. Feolo, I.M. Fingerman, L.Y. Geer, W. Helmberg, Y. Kapustin, D. Landsman, D.J. Lipman, Z. Lu, T.L. Madden, T. Madej, D.R. Maglott, A. Marchler-Bauer, V. Miller, I. Mizrachi, J. Ostell, A. Panchenko, L. Phan, K.D. Pruitt, G.D. Schuler, E. Sequeira, S.T. Sherry, M. Shumway, K. Sirotkin, D. Slotta, A. Souvorov, G. Starchenko, T.A. Tatusova, L. Wagner, Y. Wang, W.J. Wilbur, E. Yaschenko, and J. Ye, Database resources of the National Center for Biotechnology Information. Nucleic Acids Research, 2011. 39(Database issue):D38-51 http://www.ncbi.nlm.nih.gov/pubmed/21097890.
4. Matys, V., E. Fricke, R. Geffers, E. Gossling, M. Haubrock, R. Hehl, K. Hornischer, D. Karas, A.E. Kel, O.V. Kel-Margoulis, D.U. Kloos, S. Land, B. Lewicki-Potapov, H. Michael, R. Munch, I. Reuter, S. Rotert, H. Saxel, M. Scheer, S. Thiele, and E. Wingender, TRANSFAC: transcriptional regulation, from patterns to profiles. Nucleic Acids Research, 2003. 31(1):374-378 http://www.ncbi.nlm.nih.gov/pubmed/12520026.
5. Huerta-Cepas, J., S. Capella-Gutierrez, L.P. Pryszcz, I. Denisov, D. Kormes, M. Marcet-Houben, and T. Gabaldon, PhylomeDB v3.0: an expanding repository of genome-wide collections of trees, alignments and phylogeny-based orthology and paralogy predictions. Nucleic Acids Research, 2011. 39(Database issue):D556-560 http://www.ncbi.nlm.nih.gov/pubmed/21075798.
6. Maddison, D.R., K.-S. Schulz, and W.P. Maddison, The Tree of Life Web Project. Zootaxa, 2007. 1668:19-40 http://www.mapress.com/zootaxa/2007f/zt01668p040.pdf.
7. Hopfield, J.J., Neural networks and physical systems with emergent collective computational abilities. Proceedings of the National Academy of Sciences of the United States of America, 1982. 79(8):2554-2558 http://www.ncbi.nlm.nih.gov/pubmed/6953413.
8. Chou, P.Y. and G.D. Fasman, Prediction of the secondary structure of proteins from their amino acid sequence. Advances in enzymology and related areas of molecular biology, 1978. 47:45-148 http://www.ncbi.nlm.nih.gov/pubmed/364941.
9. Garnier, J., D.J. Osguthorpe, and B. Robson, Analysis of the accuracy and implications of simple methods for predicting the secondary structure of globular proteins. Journal of molecular biology, 1978. 120(1):97-120 http://www.ncbi.nlm.nih.gov/pubmed/642007.
10. Guermeur, Y., C. Geourjon, P. Gallinari, and G. Deleage, Improved performance in protein secondary structure prediction by inhomogeneous score combination. Bioinformatics, 1999. 15(5):413-421 http://www.ncbi.nlm.nih.gov/pubmed/10366661.
11. Cai, Y.D., X.J. Liu, and K.C. Chou, Prediction of protein secondary structure content by artificial neural network. Journal of computational chemistry, 2003. 24(6):727-731 http://www.ncbi.nlm.nih.gov/pubmed/12666164.
12. Uberbacher, E.C., D. Hyatt, and M. Shah, GrailEXP and Genome Analysis Pipeline for genome annotation. Current protocols in bioinformatics / editoral board, Andreas D. Baxevanis … [et al.], 2004. Chapter 4:Unit4 9 http://www.ncbi.nlm.nih.gov/pubmed/18428726.
13. Karolidis, D.A., S. Agatonovic-Kustrin, and D.W. Morton, Artificial neural network (ANN) based modelling for D1 like and D2 like dopamine receptor affinity and selectivity. Medicinal chemistry, 2010. 6(5):259-270 http://www.ncbi.nlm.nih.gov/pubmed/20977414.
14. Huang, R.B., Q.S. Du, Y.T. Wei, Z.W. Pang, H. Wei, and K.C. Chou, Physics and chemistry-driven artificial neural network for predicting bioactivity of peptides and proteins and their design. Journal of theoretical biology, 2009. 256(3):428-435 http://www.ncbi.nlm.nih.gov/pubmed/18835398.
15. Lara, J., R.M. Wohlhueter, Z. Dimitrova, and Y.E. Khudyakov, Artificial neural network for prediction of antigenic activity for a major conformational epitope in the hepatitis C virus NS3 protein. Bioinformatics, 2008. 24(17):1858-1864 http://www.ncbi.nlm.nih.gov/pubmed/18628290.
16. Mori, K., R. Saito, S. Kikuchi, and M. Tomita, Inferring rules of Escherichia coli translational efficiency using an artificial neural network. Bio Systems, 2007. 90(2):414-420 http://www.ncbi.nlm.nih.gov/pubmed/17150301.

Posted in computational biology reflections blog | Tagged , , | Leave a comment

Gene Construction Kit 3.5.1 Update Now Available

West Lebanon, NH  07|13|2011

Textco BioSoftware today released GCK version 3.5.1, available to current customers as a FREE download from the updates section of the website.

Modifications and fixes included in this release can be reviewed by visiting the ‘Revision History‘ page.

All current GCK 3.0 and 3.5 license holders are encouraged to update to this latest release for improved performance.

GCK 3.5 is compatible with Mac OS 10.4 and higher (Including OS 10.7, “Lion”); and Windows 7, Windows Vista & Windows XP.

New customers can evaluate Gene Construction Kit 3.5 by downloading the free demo version.

Contact Textco BioSoftware for more information.

Contact for Sales/Marketing:

Brant Hackett; 480-241-2191

Roberta Brucks Gross; 603-643-1471

About Textco BioSoftware

Textco BioSoftware (formerly Textco, Inc.), has been developing high quality productivity tools for molecular biologists for over 25 years. Our unwavering commitment to customer service, and our focus on quality has generated a loyal customer following. Since 1984, we have provided solutions to scientists who are breaking new ground in genetic engineering, basic biology research, drug development, and biotechnology – at academic, government, and corporate institutions in more than 50 countries worldwide.


Posted in news | Leave a comment

Gene Construction Kit® Upgraded to v3.5

West Lebanon, NH 05|24|2011


Textco BioSoftware today announced the release of an upgraded version of their award-winning DNA plasmid mapping software, Gene Construction Kit® (GCK 3.5).

GCK version 3.5 includes enhancements for both Windows and Mac. Users will notice increased speed when working with files accessed over the network, and GCK 3.5 offers a number of other improvements designed to increase efficiency.

In anticipation of the pending Apple operating system upgrade, this GCK release is compatible with Mac OS 10.7 “Lion”. The Mac version of GCK 3.5 takes full advantage of the processors found inside the Intel Macs and is completely updated to Universal Binary.

A FREE Upgrade is being offered to current GCK 3.0 license holders. Those customers can simply visit our support page to follow the complimentary update instructions.

To mark the release of GCK 3.5, Textco BioSoftware is offering special ‘Introductory Discount‘ pricing to both new customers -and- labs looking to expand the use of GCK to additional users.

Since it’s introduction more than 20 years ago, Gene Construction Kit® has been the molecular cloning software of choice for tens of thousands of researchers worldwide. GCK has proven the test of time, and its unique, graphical approach to cloning has not been matched. Lauded by leading researchers as the ‘Easy-to-learn AND Easy-to-Use’ cloning package – researchers rely on GCK to simplify the management of their cloning projects. This upgraded version of GCK includes enhancements that will save researchers both time and money. A complete list of the updates can be viewed through Gene Construction Kit’s version history.

GCK 3.5 is compatible with Mac OS 10.4 and higher (Including Snow Leopard and the soon to be released Lion OS); and Windows 7, Windows Vista & Windows XP.

A free demo version of Gene Construction Kit 3.5 is available for download.

Contact Textco BioSoftware for more information.

Contact for Sales/Marketing:

Brant Hackett; 480-241-2191

Roberta Brucks Gross; 603-643-1471

About Textco BioSoftware
Textco BioSoftware (formerly Textco, Inc.), has been developing high quality productivity tools for molecular biologists for over 25 years. Our unwavering commitment to customer service, and our focus on quality has generated a loyal customer following. Since 1984, we have provided solutions to scientists who are breaking new ground in genetic engineering, basic biology research, drug development, and biotechnology – at academic, government, and corporate institutions in more than 50 countries worldwide.

Posted in news | Tagged , , | Leave a comment

February Newsletter – GCK 3.5 Beta Announced …

West Lebanon, NH 02|08|2011

Textco BioSoftware today announced the public beta testing for GCK version 3.5 for both Mac and Windows. The Mac version has been completely updated to Universal Binary to take full advantage of the processors inside the new Intel Macs. Both the Windows and Mac version of GCK 3.5 will have new font panel menus, increased speed for working with networked files, right-mouse button click support for edit operations, updated GenBank importer options and other enhancements to increase your efficiency.

We are beginning an open beta-test for GCK 3.5 and encourage you to participate by signing up here.

Also in our February 2011 Newsletter, we highlight the ‘Gel Window’ of GCK and how it can help confirm results and save you time and money.

To review this latest Newsletter in your browser, please click here.

If you have any suggestions for topics you would like to see covered, please email us with your thoughts!

Contact for Sales/Marketing:

Brant Hackett; 480-241-2191

Roberta Brucks Gross; 603-643-1471

About Textco BioSoftware

Textco BioSoftware (formerly Textco, Inc.), has been developing high quality productivity tools for molecular biologists for over 25 years. Our unwavering commitment to customer service, and our focus on quality has generated a loyal customer following. Since 1984, we have provided solutions to scientists who are breaking new ground in genetic engineering, basic biology research, drug development, and biotechnology – at academic, government, and corporate institutions in more than 50 countries worldwide.

Posted in news | Tagged , , | Leave a comment

Non-coding doesn’t ‘translate’ to non-functional when it comes to evolution

What is it that has allowed more “complex” species to evolve from simpler species? The popular notion that the addition of new protein coding genes is responsible for this increased complexity has been challenged over the last decade. It is becoming apparent that, although new proteins certainly are important, the amount of conserved sequence is far greater than can be accounted for by protein coding genes alone. A number of papers have addressed the issue of how much sequence is conserved and whether these conserved sequences are actually functional [1, 2]. A recent paper by Meader, Ponting and Lunter, entitled “Massive Turnover of functional sequence in human and other mammalian genomes” [3] addresses this issue and comes to some fascinating conclusions.

Many studies have involved the examination of point mutation rates and, in particular, what fraction of those point mutations are neutral [1]. Chiaromonte, et al [1] point out that about 5% of the human genome has undergone fewer point mutations than would have been expected were those mutations neutral. This suggests that there is some pressure to remove the deleterious mutations (purifying selection) from this DNA. Since only about 1.06% of the human genome codes for protein sequences [4], the remainder of the 5% must reside in non-coding sequences.

However, knowing which single nucleotide changes in the non-coding sequence are consequential and which are not is a very difficult problem that leads to large variations in estimating how much non-coding sequence is actually functional and therefore conserved. One approach to this question is to identify single nucleotide changes in sequences from closely related species [5], but estimating neutral mutation rates for these sequences in two species is often imprecise because distinguishing consequential from non-consequential changes is a true challenge.

Lunter, et al [6] developed a method based on genomic sequence comparisons looking for insertions or deletions (indels), which are much easier to detect in alignments than are point mutations. An indel will be seen as a gap in one of the sequences. It is possible to mathematically model the expected distance between indels that are neutral. A larger than expected inter-gap distance, therefore, would indicate that indels within that sequence are deleterious and have been purified out. The basis of their “neutral indel model” is to look for regions in the genomes that do not have indels at the rate expected from neutral changes. This approach does not depend on single nucleotide changes and is not subject to the inaccuracies inherent in that approach. Rather, the method looks for relatively easy to identify gaps in pairwise sequence alignments between genomic DNAs. This elegant solution is quite powerful.

In order to quantify functional DNA as being constrained, Lunter, et al [6] calibrate their system using biologically identified neutrally evolved sequences which they call ancestral repeats (ARs). These sequences are the result of transposable elements inserted into the genome of a common ancestor to the two species being compared. Presumably there is no pressure to maintain these ARs, so they are free to evolve. Their neutral indel model precisely demonstrates that, in fact, the ARs are neutrally evolving. Sequences that change more slowly are under selective pressure, and are therefore likely to be functional DNA. Using this approach, they evaluate a number of genomes for constrained sequences through pairwise genomic alignments. The pairwise comparisons also allow the estimation of lineage-specific sequences.

So, what did Meader et al [3] observe? In general, they found that as the divergence between mammalian species increased, the amount of pairwise shared functional sequence decreased dramatically. The fact that mammalian species do not tend to share functional sequences suggests that these sequences are turning over rapidly, resulting in considerable divergence among mammalian species. In humans, 6.5-10% of the genome is functionally constrained, yet only 1.06% of the genome codes for proteins. The vast majority of the conserved sequence, therefore, is likely to be regulatory in nature.

Humans have an estimated 200-300 Mb of functional DNA, which includes about 30 Mb of coding sequence. On the other hand, Drosophila has about 56-66 MB of functional DNA, including 21.8 Mb of coding sequence. The difference here is striking. While humans have garnered an increase of less than 50% in protein coding DNA since our divergence from flies (30 MB vs 21.8 Mb), we have increased our non-coding functional DNA content by 550% (17-270 MB vs 35-45 Mb). The ratio of non-coding to coding functional DNA is 1.5-2.0x in Drosophila, but is 5-8x in humans. Clearly, the increased complexity that we observe in humans is due more to changes in functional non-coding DNA that to protein coding DNA.

This elegant and innovative approach to understanding genome evolution provides a new perspective to the evolution of species. Newer species seem to result not so much from the new gene products that can be made, but from the ability to regulate and integrate these gene products in new combinations and under different circumstances. This is not really that much of a surprise at this point in time (after all, most of the genes found in yeast and worms, also have their counterparts in humans), but the order of magnitude and rate of change of non-coding functional DNA is significantly larger than previously thought. It would be interesting to see the results of this approach applied to a larger number of primates.

References

1. Chiaromonte, F., R.J. Weber, K.M. Roskin, M. Diekhans, W.J. Kent, and D. Haussler, The share of human genomic DNA under selection estimated from human-mouse genomic alignments. Cold Spring Harb Symp Quant Biol, 2003. 68:245-254.
2. Pheasant, M. and J.S. Mattick, Raising the estimate of functional human sequences. Genome Res, 2007. 17(9):1245-1253.
3. Meader, S., C.P. Ponting, and G. Lunter, Massive turnover of functional sequence in human and other mammalian genomes. Genome Res, 2010. 20(10):1335-1343.
4. Church, D.M., L. Goodstadt, L.W. Hillier, M.C. Zody, S. Goldstein, X. She, C.J. Bult, R. Agarwala, J.L. Cherry, M. DiCuccio, W. Hlavina, Y. Kapustin, P. Meric, D. Maglott, Z. Birtle, A.C. Marques, T. Graves, S. Zhou, B. Teague, K. Potamousis, C. Churas, M. Place, J. Herschleb, R. Runnheim, D. Forrest, J. Amos-Landgraf, D.C. Schwartz, Z. Cheng, K. Lindblad-Toh, E.E. Eichler, and C.P. Ponting, Lineage-specific biology revealed by a finished genome assembly of the mouse. PLoS Biol, 2009. 7(5):e1000112.
5. Siepel, A., G. Bejerano, J.S. Pedersen, A.S. Hinrichs, M. Hou, K. Rosenbloom, H. Clawson, J. Spieth, L.W. Hillier, S. Richards, G.M. Weinstock, R.K. Wilson, R.A. Gibbs, W.J. Kent, W. Miller, and D. Haussler, Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. Genome Res, 2005. 15(8):1034-1050.
6. Lunter, G., C.P. Ponting, and J. Hein, Genome-wide identification of human functional DNA using a neutral indel model. PLoS Comput Biol, 2006. 2(1):e5.

Posted in computational biology reflections blog | Tagged , , , , , , | Leave a comment