Buy the Book

  GEP Biblio

  Visit Gepsoft


C. FERREIRA, 2002 (Terms of Use) ISBN: 9729589054

Gene Expression Programming: Mathematical Modeling by an Artificial Intelligence

Creation of the initial population
The chromosomes of the individuals of the initial population are randomly generated using the symbols representing the functions and terminals chosen to solve the particular problem. These initial individuals are the first set of candidate solutions to the problem at hand. Because they are totally random, these founder individuals are most probably not very good solutions. But they are, nonetheless, everything that is necessary because evolution takes care of the rest and, soon enough, very good solutions will start to appear. Let us illustrate this with a concrete example.

Suppose, for instance, that we wanted to know how to express the Majority (a, b, c) function in terms of AND, OR, and NOT. In this case the choice of the function set is not complicated and consists of F = {A, O, N}, representing, respectively, the Boolean functions AND, OR, and NOT. The choice of the terminal set is also simple and consists of T = {a, b, c}, representing the three arguments to the majority function. Therefore, for this problem, the heads of the genes will be randomly generated using six different symbols (A, O, N, a, b, c), whereas the tails will be randomly generated using a smaller alphabet of only three symbols (a, b, c).

The truth table for the majority function is shown in Table 3.1. For this problem, the complete set of transition states is used as the selection environment against which the fitness of each program is evaluated. More formally, the selection environment is called the set of fitness cases. The fitness function for this simple problem is also not very difficult to guess and corresponds to the number of fitness cases correctly evaluated by a particular individual.

Table 3.1
Majority function.

Thus, with the symbols from F and T, the chromosomes of the initial population are randomly generated. Obviously, the heads of the genes are created using the elements from both F and T, whereas the tails of the genes are created using only terminals. Figure 3.2 shows a small initial population (also called generation 0) of 10 individuals randomly created to solve the majority problem. The chromosomes are composed of two genes, each with a length of 7 (h = 3 and t = 4). The chromosomes encode sub-ETs which are posttranslationally linked by the Boolean function OR.

Generation N: 0

Figure 3.2. The chromosomes of a small initial population created to solve the Majority (a, b, c) function problem. The randomly generated chromosomes are composed of two genes, encoding sub-ETs linked by OR.

It is worth noticing how easily the initial populations of gene expression programming are created. The simple structures of the chromosomes are randomly created using the symbols of the function and terminal sets and nothing whatsoever needs to be done to monitor their structural soundness: we know for sure that, without exception, all of them encode valid programs. As a comparison, the random generation of trees in GP must be done with extreme care in order to guarantee the formation of syntactically correct programs. Not surprisingly, the generation of initial populations in GP is a complicated, time consuming process.

After the random generation of the chromosomes of the individuals of the initial population, the chromosomes are expressed, and the fitness of each individual is evaluated. Figure 3.3 shows the fitness of each individual at the majority task (for simplicity, only the chromosomes of the individuals are shown).

Generation N: 0
NacababAANaccb-[0] = 2
AOAaccaNcOabcc-[1] = 4
ObNaabaAONaacb-[2] = 3
OObacbcANNcccc-[3] = 4
NNNabcbOAaacbc-[4] = 4
NbbbcccAaOabbc-[5] = 4
AbNbbaaAaAbcac-[6] = 5
OcOccaaOOAaabb-[7] = 5
OOcacabNNNbbcb-[8] = 5
ObbbabbNbAccbc-[9] = 4

Figure 3.3. The chromosomes of the individuals of the initial population and their respective fitness (the value after the equal sign). The fitness corresponds to the number of fitness cases correctly solved by the program encoded in each chromosome.

Then, according to fitness, the individuals are selected to reproduce with modification. For instance, all the 10 individuals of the initial population shown in Figure 3.2 are viable and are therefore eligible to be selected to reproduce. These viable individuals, among themselves, create as many new individuals as there are individuals in the population. As such, throughout a run, the population size is kept unchanged. The descendants of the selected individuals consist of the new members of the next generation. The expression of one of the best of generation 0 (chromosome 8) is shown in Figure 3.4. This individual was able to solve correctly five fitness cases, and thus its fitness is equal to 5.

Figure 3.4. Expression of the best individual of generation 0. a) The chromosome composed of two genes. b) The program encoded in the chromosome. The linking function is shown in gray. This Boolean function solves correctly five out of eight fitness cases.

You have probably noticed that all the genes created randomly in the initial population have a function at position 0. Although I have chosen to have all the genes in the initial population starting with a function, this is not very important in multigenic individuals. This feature of initial populations is a remnant of the evolutionary process of GEP itself, for the first GEP implementation used chromosomes composed of only one gene. In those systems, genes coding for one-element ETs were of little use and were excluded from the populations. When multigenic chromosomes were developed, I did not change this feature of initial populations because terminals can be inserted at the start site of a gene in future generations. And if this happens to be advantageous for a certain task, this kind of gene can certainly evolve. We will see later on in this chapter that point mutation allows the evolution of genes encoding one-element sub-ETs.

The problem with the random generation of initial populations is that, sometimes, especially if the number of individuals and the number of fitness cases are small, none of the chromosomes encodes viable individuals and the run is aborted (this is highly improbable for Boolean problems, though). A simple form of circumventing this problem, consists of introducing a start up control. This control is by default set to one viable chromosome, and was used in all the problems presented in this book. We will see that GEP populations can efficiently evolve even when all the individuals are descendants of a sole viable founder (see chapter 7 for a discussion). This control mechanism, together with the cloning of the best individual (see section 3.1.2), prevents the existence of failed runs in gene expression programming. So, if all the individuals of the initial population had fitness 0, another initial population would be randomly generated. This process is repeated until at least one viable individual is created.

Home | Contents | Previous | Next