The objective of this problem is the discovery of a symbolic expression that satisfies a set of fitness cases. Consider we are given a sampling of the numerical values from the function:
y = a^{4} +
a^{3} + a^{2} + a 
(6.1) 
over 10 chosen points and we want to find a function fitting those values within 0.01 of the correct value.
First, the set of functions F and the set of terminals T must be chosen. In this case
F = {+, , *, /} and T = {a}. Then the structural organization of chromosomes, namely the length of the head and the number of genes, is chosen. It is advisable to start with short, singlegene chromosomes and then gradually increase
h. Figure 6 shows such an analysis for this problem. A population size
P of 30 individuals and an evolutionary time G of 50 generations were used. A
p_{m} equivalent to two onepoint mutations per chromosome and a
p_{1r} = 0.7 were used in all the experiments in order to simplify the analysis. The set of fitness cases is shown in
Table 1 and the fitness was evaluated by equation
(4.1a), being M = 100. If C_{(i,j)}T_{j} is equal to or less than 0.01 (the precision), then C_{(i,j)}T_{j}
= 0 and f_{(i,j) }= 100; thus for C_{t} = 10,
f_{max} = 1000.
Figure 6. Variation of success rate (P_{s}) with chromosome length. For this analysis
G = 50, P = 30, and P_{s} was evaluated over 100 identical runs.
Table 1
Set of fitness cases for the symbolic regression problem.
a 
f(a) 
2.81 
95.2425 
6 
1554 
7.043 
2866.5486 
8 
4680 
10 
11110 
11.38 
18386.0341 
12 
22620 
14 
41370 
15 
54240 
20 
168420 
Note that GEP can be useful in searching the most parsimonious solution to a problem. For instance, the chromosome:
0123456789012
*++/**aaaaaaa
with h = 6 codes for the ET:
which is equivalent to the target function (6.1). Note also that GEP can efficiently evolve solutions using large values of
h, that is, it is capable of evolving large and complex subETs. It is worth noting that the most compact genomes are not the most efficient. Therefore a certain redundancy is fundamental to efficiently evolve good programs.
In another analysis, the relationship between success rate and population size
P, using an h = 24 was studied (Figure
7). These results show the supremacy of a genotype/phenotype representation, as this singlegene system, which is equivalent to GP, greatly surpasses that technique
[9]. However, GEP is much more complex than a singlegene system because GEP chromosomes can encode more than one gene (see
Figure 8).
Figure 7. Variation of success rate (P_{s}) with population size. For this analysis
G = 50, and a medium value of 49 for chromosome length (h = 24) was used.
P_{s} was evaluated over 100 identical runs.
Suppose we could not find a solution after the analysis shown in Figure
6. Then we could increase the number of genes, and choose a function to link them. For instance, we could choose an
h = 6 and then increase the number of genes gradually. Figure 8 shows how the success rate for this problem depends on the number of genes. In this analysis, the
p_{m} was equivalent to two onepoint mutations per chromosome,
p_{1r} = 0.2, p_{2r} = 0.5, p_{gr} = 0.1,
p_{is} = 0.1, p_{ris} = 0.1, p_{gt} = 0.1, and three transposons (both IS and RIS elements) of lengths 1, 2, and 3 were used. Note that GEP can cope very well with an excess of genes: the success rate for the 10genic system is still very high (47%).
Figure 8. Variation of success rate (P_{s}) with the number of genes. For this analysis
G = 50, P = 30, and h = 6 (a gene length of 13).
P_{s} was evaluated over 100 identical runs.
In Figure 9 another important relationship is shown: how the success rate depends on evolutionary time. In contrast to GP where 51 generations are the norm, for after that nothing much can possibly be discovered
[7], in GEP, populations can adapt and evolve indefinitely because new material is constantly being introduced into the genetic pool.
Figure 9. Variation of success rate (P_{s}) with the number of
generations. For this analysis P = 30, p_{m} = 0.051,
p_{1r} = 0.7 and a chromosome length of 79 (a singlegene chromosome with
h = 39) was used. P_{s} was evaluated over 100 identical runs.
Finally, suppose that the multigenic system with subETs linked by addition could not evolve a satisfactory solution. Then we could choose another linking function, for instance, multiplication. This process is repeated until a good solution has been found.
As stated previously, GEP chromosomes can be easily modified in order to encode the linking function as well. In this case, for each problem the ideal linking function would be found in the process of adaptation.
Consider, for instance, a multigenic system composed of 3 genes linked by addition. As shown in
Figure 8, the success rate has in this case the maximum value of 100%.
Figure 10 shows the progression of average fitness of the population and the fitness of the best individual for run 0 of the experiment summarized in
Table 2, column 1.
Figure 10. Progression of average fitness of the population and the fitness of the best individual for run 0 of the experiment summarized in
Table 2, column 1 (symbolic regression).
In this run, a correct solution was found in generation 11 (the subETs are linked by
addition):
012345678901201234567890120123456789012
***a+aaaaaaa++**a*aaaaaaa*+a/aaaaaaaa
and mathematically corresponds to the target function (the contribution of each subET is indicated in brackets):
y = (a^{4}) + (a^{3} + a^{2} +
a) + (0) = a^{4} + a^{3} + a^{2} +
a.
The detailed analysis of this program shows that some of the actions are redundant for the problem at hand, like the addition of 0 or multiplication by 1. However, the existence of these unnecessary clusters, or even pseudogenes like gene 3, is important to the evolution of more fit individuals (compare, in Figures
6 and 8, the success rate of a compact, singlegene system with
h = 6 with other less compact systems both with more genes and
h greater than 6).
Table 2
Parameters for the symbolic regression (SR), sequence induction (SI), sequence induction using ephemeral random constants (SI*), block stacking (BS), and 11multiplexer (11M) problems.
The plot for average fitness in Figure 10 (and also Figures
12, 13 and 17 below) suggests different evolutionary dynamics for GEP populations. The oscillations on average fitness, even after the discovery of a perfect solution, are unique to GEP. A certain degree of oscillation is due to the small population sizes used to solve the problems presented in this work. However, an identical pattern is obtained using larger population sizes.
Figure 11 compares six evolutionary dynamics in populations of 500 individuals for 500 generations. Plot 1 (all operators active) shows the progression of average fitness of an experiment identical to the one summarized in
Table 2, column 1, that is, with all the genetic operators switched on. The remaining dynamics were obtained for mutation alone (Plot 2), for gene recombination combined with gene transposition (Plot 3), for onepoint recombination (Plot 4), twopoint recombination (Plot 5), and gene recombination (Plot 6).
Figure 11. Possible evolutionary dynamics for GEP populations. For this analysis
P = 500. The plots show the progression of average fitness of the population.
Plot 1: All operators switched on with rates as shown in Table
2, column 1; in this case a perfect solution was found in generation 1.
Plot 2: Only mutation at p_{m} = 0.051; in this case a perfect solution was found in generation 3.
Plot 3: Only gene recombination at p_{gr} = 0.7 plus gene transposition at
p_{gt} = 0.2 were switched on; in this case a perfect solution was found in generation 2.
Plot 4: Only onepoint recombination at p_{1r} = 0.7; in this case a perfect solution was found in generation 3.
Plot 5: Only twopoint recombination at p_{2r} = 0.7; in this case a perfect solution was found in generation 1.
Plot 6: Only gene recombination at p_{gr} = 0.7; in this case a perfect solution was not found: the best of run has fitness 980 and was found in generation 2.
It is worth noticing the homogenizing effect of all kinds of recombination. Interestingly, this kind of pattern is similar to the evolutionary dynamics of GAs and GP populations
[9, 10]. Also worth noticing is the plot for gene recombination alone
(Figure 11, Plot 6): in this case a perfect solution was not found. This shows that sometimes it is impossible to find a perfect solution only by shuffling existing building blocks, as is done in all GP implementations without mutation. Indeed, GEP gene recombination is similar in effect to GP recombination, for it permits exclusively the recombination of mathematically concise blocks. Note that even a more generalized shuffling of building blocks (using gene recombination combined with gene transposition) results in oscillatory dynamics
(Figure 11, Plot 3).
