Buy the Book

  GEP Biblio

  Visit Gepsoft


© C. FERREIRA, 2002 (Terms of Use) (Terms of Use) ISBN: 9729589054

Gene Expression Programming: Mathematical Modeling by an Artificial Intelligence


In this chapter:

The aim of this chapter is to bring into focus the basic differences between gene expression programming (GEP) and its predecessors, genetic algorithms (GAs) and genetic programming (GP). All three algorithms belong to the wider class of Genetic Algorithms (the use of capitals here is meant to distinguish this wider class from the classic GA) as all of them use populations of individuals, select the individuals according to fitness, and introduce genetic variation using one or more genetic operators. The fundamental difference between the three algorithms resides in the nature of the individuals: in GAs the individuals are symbolic strings of fixed length (chromosomes); in GP the individuals are non-linear entities of different sizes and shapes (parse trees); and in GEP the individuals are also non-linear entities of different sizes and shapes (expression trees), but these complex entities are encoded as simple strings of fixed length (chromosomes).

If we have in mind the history of life on Earth (see, for instance, Dawkins 1995 or Maynard Smith and Szathmáry 1995), we can see that the difference between GAs and GP is only superficial: both systems use only one kind of entity which functions both as genotype and body (phenotype). These kinds of systems are condemned to have one of two limitations: if they are easy to manipulate genetically, they lose in functional complexity (the case of GAs); if they exhibit a certain amount of functional complexity, they are extremely difficult to reproduce with modification (the case of GP).

In his book, River Out of Eden, R. Dawkins (1995) gives a list of thresholds of any life explosion. The first is the “replicator threshold” which consists of a self-copying system in which there is hereditary variation. Also important is that replicators survive by virtue of their own properties. The second threshold is the “phenotype threshold” in which replicators survive by virtue of causal effects on something else. This “something else” is what is called the phenotype or body, that is, the entity that faces the environment and does all the work. A simple example of a replicator/phenotype system is the DNA/protein system of life on Earth. It is believed that for life to move beyond a very rudimentary stage, the phenotype threshold must be crossed (e.g., Dawkins 1995; Maynard Smith and Szathmáry 1995).

Similarly, the entities of both GAs and GP (simple replicators) survive by virtue of their own properties. Understandingly, there has been an effort in the last years in the scientific community to cross the phenotype threshold in evolutionary computation. The most outstanding effort is developmental genetic programming or DGP (Banzhaf 1994) where binary strings are used to encode mathematical expressions. The expressions are decoded using a five-bit binary code, called genetic code. Contrary to its analogous natural genetic code, this “genetic code”, when applied to binary strings, frequently produces invalid expressions (in nature there is no such thing as a structurally incorrect protein). Therefore a huge amount of computational resources goes into editing these illegal structures, which limits this system considerably. Not surprisingly, the gain in performance of DGP over GP is minimal (Banzhaf 1994; Keller and Banzhaf 1996).

Gene expression programming is an example of a full-fledged replicator/phenotype system where the chromosomes/expression trees form a truly functional, indivisible whole (Ferreira 2001). Indeed, in GEP there is no such thing as an invalid expression tree or program. Obviously, the interplay of GEP chromosomes and expression trees requires an unambiguous translation system to transfer the language of chromosomes into the language of expression trees. Furthermore, we will see that the structural organization of GEP chromosomes allows the unconstrained modification of the genome, creating the perfect conditions for evolution to occur. Indeed, the varied set of genetic operators developed to introduce genetic modification in GEP populations always produce valid expression trees, making GEP a simple artificial life system, well established beyond the replicator threshold.

To help the non-biologist understand the fundamental difference between GEP and the other genetic algorithms and why GEP is such a leap forward in evolutionary computation, it is useful to know a little more about the structure and function of the main players of biological gene expression and how they work together. What follows is a very brief introduction to the structure and function of the main molecules of information metabolism and how mutation in proteins relates to evolution. If you wish to pursue these questions further, any textbook on biochemistry will do, although I recommend the recently published Biochemistry by Mathews et al. (2000) for its clarity and elegant presentation.

Home | Contents | Previous | Next