GEP Book

  Home
  News
  Author
  Q&A
  Tutorials
  Downloads
  GEP Biblio
  Contacts

  Visit Gepsoft

 

C. FERREIRA In Leandro N. de Castro and Fernando J. Von Zuben, eds., Recent Developments in Biologically Inspired Computing, pages 82-103, Idea Group Publishing, 2004.

Gene Expression Programming and the Evolution of Computer Programs

Genetic Programming
 
Genetic programming, invented by Cramer in 1985 (Cramer 1985) and further developed by Koza (1992), solves the problem of fixed length solutions through the use of nonlinear structures (parse trees) with different sizes and shapes. The alphabet used to create these structures is also more varied, creating a richer, more versatile system of representation. Notwithstanding, the created individuals also lack a simple, autonomous genome. Like the linear chromosomes of genetic algorithms, the nonlinear structures of GP are also cursed with the dual role of genotype/phenotype.

The parse trees of genetic programming resemble protein molecules in their use of a richer alphabet and in their complex and unique hierarchical representation. Indeed, parse trees are capable of exhibiting a great variety of functionalities. The problem with these complex replicators is that their reproduction with modification is highly constrained in evolutionary terms because the modifications must take place on the parse tree itself and, consequently, only a limited range of modification is possible. Indeed, special kinds of genetic operators were developed that operate at the tree level, modifying or exchanging particular branches between trees.

Although at first sight this might appear advantageous, it greatly limits this technique (we all know the limits of grafting and pruning in nature). Consider for instance crossover, the most used and often the only search operator used in genetic programming. In this case, selected branches are exchanged between two parent trees to create offspring (Figure 1). The idea behind its implementation was to exchange smaller, mathematically concise blocks in order to evolve more complex, hierarchical solutions composed of smaller building blocks.

Figure 1. Tree crossover in genetic programming. The arrows indicate the crossover points.

The mutation operator in GP is also very different from natural point mutation. This operator selects a node in the parse tree and replaces the branch underneath by a new randomly generated branch (Figure 2). Notice that the overall shape of the tree is not greatly changed by this kind of mutation, especially if lower nodes are preferentially chosen as mutation targets.

Figure 2. Tree mutation in genetic programming. The arrow indicates the mutation point. The new branch randomly generated by the mutation operator in the daughter tree is shown in gray.

Permutation is the third operator used in genetic programming and the most conservative of the three. During permutation, the arguments of a randomly chosen function are randomly permuted (Figure 3). In this case the overall shape of the tree remains unchanged.

Figure 3. Permutation in genetic programming. The arrow indicates the permutation point. Note that the arguments of the permuted function traded places in the daughter tree.

In summary, in genetic programming the operators resemble more of a conscious mathematician than the blind way of nature. But in adaptive systems the blind way of nature is much more efficient and systems such as GP are highly constrained. For instance, the implementation of other operators in genetic programming such as the simple yet high-performing point mutation (Ferreira 2002c) is unproductive as most mutations result in syntactically incorrect structures (Figure 4). Obviously, the implementation of other operators such as transposition or inversion raises similar difficulties and the search space in GP remains vastly unexplored.

Figure 4. Illustration of a hypothetical event of point mutation in genetic programming. The arrow indicates the mutation point. Note that the daughter tree is an invalid structure.

Although Koza described these three operators as the basic GP operators, crossover is practically the only genetic operator used in most GP applications (Koza 1992). Consequently, no new material is introduced in the genetic pool of GP populations. Not surprisingly, huge populations of parse trees must be used with the aim of creating all the necessary building blocks with the inception of the initial population in order to guarantee the discovery of a good solution only by moving the initial building blocks around.

Finally, due to the dual function of the parse trees (genotype and phenotype), genetic programming is incapable of a simple, rudimentary expression: in all cases, the entire parse tree is the solution.

Home | Contents | Previous | Next