How Genetic Algorithms Have Helped in Computational Problem Solving

[Note: you can download a Genetic Algorithm Demo program to go with this article (Mac, Win, Linux). This is explained later on.]

In the last installment of this blog, I talked about how an improved understanding of the way in which biological systems handle complex decision making can be used to design computational approaches to complex problems. In particular, I discussed how the vast amounts of biological data we have accumulated are being studied through the use of neural networks. These networks represent computer models in which relationships between different pieces of information are determined by trial and error though the program itself – always moving towards a better solution.

In this blog entry I will talk about genetic algorithms and how this computational approach is solving problems by using the lessons we learn from evolution (survival of the fittest). It is another example of how biological principles are being usurped by computer scientists to solve biological (and many other) problems.

Genetic algorithms are useful for problems in which you know the desired result and you know how to guess at the initial state (even if it is random), but don’t know how to get from the initial to final states. These problems often have many different parameters that can vary simultaneously which makes working through every combination of all the parameters computationally very slow and not feasible (would you want to wait months to get an answer?). Genetic algorithms can evolve into a solution much like living organisms can evolve to become better fit for their environment. The basic idea is to start with a population of individuals (in the computer) and to allow that population to evolve to a more fit state.

Let’s look at the biological premise. We are working with a population of individuals each of whom have a “fitness” to exist in their current environment. This fitness is determined by the set of genes present on a single chromosome. The individuals in the population can mate and can share their genetic material through crossing-over. Mutations can also occur at random changing any gene in any generation. When the environment changes, some individuals will be better fit to survive in the new environment. Those better-fit individuals will be more likely to mate than an individual that is poorly fit to survive in the new environment. Thus, the genes of the better-fit individual(s) are more likely to be passed on to offspring (does this sound familiar?). With each subsequent generation, the population as a whole will evolve to be more fit in the new environment because they will have a better combination of genes. There is a very nice discussion of this process, with more details, using a hypothetical organism called hooters.

Computationally, the process is very similar to the biological one. There are two critical steps that must be taken before a genetic algorithm can be run. First, one needs to define a series of parameters that might represent a solution to the problem. Second, one needs to define a fitness function that can quantify how close an individual is to being a perfect answer (perfect fitness). Once a population of individuals is generated, they are all scored for fitness. The most-fit individuals will mate more often and some fraction of the least-fit individuals will be discarded (they die). A mating between two individuals consists of randomly choosing a cross-over point for the two chromosomes and also performing one or more mutations (determined by a pre-determined mutation frequency). The matings are used to generate new individuals who will replace those who died. This keeps the population at the same size in each generation. Once the population is regenerated, all of the individuals are scored again, and the process repeated. Thus, with each generation, the population becomes more fit, until an ideal solution is eventually obtained.

Let’s look at a couple of examples of how genetic algorithms have been used to address biological problems.

In a paper by Gultyaev, van Batenburg and Pleij [1], they used a genetic algorithm to fold RNAs. Their parameters consisted of (for each position in the RNA) a value indicating which possible stem structure might exist containing that nucleotide. For example, that nucleotide might not be part of any stem, or it might be possible to be part of six different stems. In this case the parameter for that nucleotide could have any of 7 different values. The next nucleotide would have different possible structures too. An individual in this genetic algorithm system would consist of a string of values, one for each nucleotide in the RNA. We can think of this as a chromosome for an individual. To generate the initial population of individuals, one could generate many different (random) combinations of structures – thus creating a population of chromosomes representing different folded structures. In practice, these chromosomes only contained non-conflicting stem-loop structures. Fitness was measured as the energy minimization of the folded structure.

Kato and Tsunoda [2] used a genetic algorithm to examine different combinations of regulatory motifs and transcription factor binding sites (TFBS) with the goal of defining a set of sites that could explain the regulation of a gene set based on gene expression data. In this case, they developed a program called MotifCombinator to examine combinations of discovered motifs and known TFBS. They seek the right combination of motifs to correlate with expression patterns of gene sets. Their chromosomes consist of a string of specific motifs or motif pairs arranged in a specific order. The fitness is related to how close the motif combination (chromosome) comes to matching gene expression patterns.

You can see an interesting example of how a genetic algorithm works at this page. This demo allows you to play with parameters to select individuals that do the best in finding food in a given area.

I have also produced a simple Genetic Algorithm Demo (Mac, Win, Linux). The program allows you to type in some text and then it can run the algorithm using whatever parameter values you want to try in order to identify the string you typed in. Each individual in this program consists of a string of characters that are the same length as the query sequence of characters you type. The fitness is simply the fraction of characters that match between the query string and the individual being evaluated. You can define what the mutation rate is, how many individuals survive each generation, how many individuals get deleted each generation, and a few other parameters. Try playing with the parameters to get a feel for how they influence the way the genetic algorithm runs. Does a larger population speed up finding a solution? Does a longer starting string take longer to be solved? Does changing the mutation rate help or hinder a solution? What does keeping more (or discarding more) of the population do to finding a solution. Progress is displayed in a number of ways but all results are displayed in a table that can be exported.

Genetic algorithms, therefore, can be used to solve problems that are currently not solvable in any way. They are a kind of hill climbing algorithm* that tries to get closer to a solution with each generation. However, because they involve random choices in setting up the initial population and in making each new generation, they might not ever get to the final solution. Also, that same randomness might lead to different solutions on each run – for example, different RNA structures or motif combinations. [Note that my simple demo program will always get to the right solution because the scoring system is perfect – just generating a matching string.]

Hopefully, in the future, when you read papers that discuss genetic algorithms, you will be able to understand (and evaluate) the approach and interpret the results appropriately. You may even think of applying the approach to solve some of your own research problems.

Like the neural nets I discussed in the last blog, genetic algorithms use a known biological paradigm and introduce the concept of the evolution of a computer solution by having individuals in a population become more fit. The idea of allowing a solution to evolve by applying selective pressure is something that biologists feel comfortable about.

In the next blog, I will discuss another system called “ant colony optimization,” which uses ant foraging behavior as a model for constructing a solution.

* Hill climbing algorithms work by always stepping to a higher ranking solution. Imagine climbing a hill. If you point yourself in a random direction and take a step to a higher elevation, you are getting closer to the peak. If you take a step in any direction that does not climb up higher, you retreat to your previous position and try again. By this approach you are always guaranteed to reach a local peak.

References

1. Gultyaev, A.P., F.H. van Batenburg, and C.W. Pleij, The computer simulation of RNA folding pathways using a genetic algorithm. J Mol Biol, 1995. 250(1):37-51 link.
2. Kato, M. and T. Tsunoda, MotifCombinator: a web-based tool to search for combinations of cis-regulatory motifs. BMC Bioinformatics, 2007. 8:100 link.

This entry was posted in computational biology reflections blog and tagged , , . Bookmark the permalink. Trackbacks are closed, but you can post a comment.

2 Comments