Home
News
Author
Q&A
Tutorials
GEP Biblio
Contacts

Visit Gepsoft

Gene Expression Programming: Mathematical Modeling by an Artificial Intelligence

Two approaches to the problem of constant creation

In this section:

In this section we are going to analyze two different approaches to the problem of constant creation in symbolic regression by comparing the performance of two different algorithms. The first uses the facility to manipulate random constants directly and the second does not include this facility. The comparison between the two approaches will be made on three different problems. The first is an artificial problem of sequence induction requiring integer constants; the second is a problem of function finding requiring floating-point constants; and the third is a real-world time series prediction problem also requiring floating-point constants.

For the sequence induction problem, the following test sequence was chosen:

 an = 4n4 + 3n3 + 2n2 + n (4.9)

where n consists of the nonnegative integers. This sequence was chosen because it can be exactly solved by both algorithms and therefore can provide an accurate measure of their performance in terms of success rate.

For the function finding problem, the following “V” shaped function was chosen:

 y = 4.251a2 + ln(a2) + 7.243ea (4.10)

where a is the independent variable and e is the irrational number 2.71828183. Problems of this kind cannot be exactly solved by evolutionary algorithms and, therefore, the performance of both approaches will be compared in terms of average best-of-run fitness and average best-of-run R-square.

For the time series prediction task, 100 observations of the Wolfer sunspots series were used (Table 4.5) with an embedding dimension of 10 and a delay time of one (see section 4.4 for more details). Once again, the performance of both approaches will be compared in terms of average best-of-run fitness and R-square.

Table 4.5
Wolfer sunspots series (read by rows).

 101 82 66 35 31 7 20 92 154 125 85 68 38 23 10 24 83 132 131 118 90 67 60 47 41 21 16 6 4 7 14 34 45 43 48 42 28 10 8 2 0 1 5 12 14 35 46 41 30 24 16 7 4 2 8 17 36 50 62 67 71 48 28 8 13 57 122 138 103 86 63 37 24 11 15 40 62 98 124 96 66 64 54 39 21 7 4 23 55 94 96 77 59 44 47 30 16 7 37 74

For the sequence induction problem, the first 10 positive integers n and their corresponding term were used as fitness cases (Table 4.6). The fitness function was based on the relative error and the fitness was evaluated by equation (3.1b). A selection range of 25% and maximum precision (0% error) were chosen, giving fmax = 250. This experiment, with its two different approaches, is summarized in Table 4.7.

Table 4.6
Set of fitness cases for the sequence induction task.

 n an 1 10 2 98 3 426 4 1252 5 2930 6 5910 7 10738 8 18056 9 28602 10 43210

Table 4.7
General settings used in the sequence induction problem with (SI*) and without (SI) random constants.

 SI* SI Number of runs 100 100 Number of generations 100 100 Population size 100 100 Number of fitness cases 10 (Table 4.6) 10 (Table 4.6) Function set + - * / + - * / Terminal set a ? a Random constants array length 10 -- Random constants range {0, 1, 2, 3} -- Head length 6 6 Number of genes 5 5 Linking function + + Chromosome length 100 65 Mutation rate 0.044 0.044 One-point recombination rate 0.3 0.3 Two-point recombination rate 0.3 0.3 Gene recombination rate 0.1 0.1 IS transposition rate 0.1 0.1 IS elements length 1,2,3 1,2,3 RIS transposition rate 0.1 0.1 RIS elements length 1,2,3 1,2,3 Gene transposition rate 0.1 0.1 Random constants mutation rate 0.01 -- Dc specific transposition rate 0.1 -- Dc specific IS elements length 1,2,3 -- Selection range 25% 25% Precision 0% 0% Average best-of-run fitness 195.308 249.982 Average best-of-run R-square 0.798698299 0.9999999996 Success rate 24% 98%

For the “V” shaped function problem, a set of 20 random fitness cases chosen from the interval [-1, 1] was used (Table 4.8). The fitness function was also evaluated by equation (3.1b), but in this case a selection range of 100% was used, giving fmax = 2000. This experiment, with its two different approaches, is summarized in Table 4.9.

Table 4.8
Set of fitness cases used in the “V” function problem.

 a f(a) -0.2639725157548 3.19498066265276 0.0578905532656938 1.99052001725998 0.334025290109634 8.39663703997286 -0.236334577564462 3.07088976972825 -0.855744382566804 5.87946763695703 -0.0194437136332785 -0.775326322328458 -0.192134388183304 2.83470225774408 0.529307910124627 12.2154726642137 -0.00788974118728459 -2.49803983418635 0.438969804950631 10.4071734858808 -0.107559292698039 2.09413635645908 -0.274556994377163 3.23927278010839 -0.0595333219604528 1.19701284767347 0.384492993958352 9.35580769189855 -0.874923020736333 6.00642453001302 -0.236546636250546 3.07189729043837 -0.167875941704557 2.67440053130986 0.950682181822091 22.4819639844149 0.946979159577362 22.3750161187355 0.639339910059591 14.5701285332337

Table 4.9
General settings used in the “V” function problem with (V*) and without (V) random constants.

 V* V Number of runs 100 100 Number of generations 5000 5000 Population size 100 100 Number of fitness cases 20 (Table 4.8) 20 (Table 4.8) Function set + - * / L E K ~ S C + - * / L E K ~ S C Terminal set a, ? a Random constants array length 10 -- Random constants range [-1,1] -- Head length 6 6 Number of genes 5 5 Linking function + + Chromosome length 100 65 Mutation rate 0.044 0.044 One-point recombination rate 0.3 0.3 Two-point recombination rate 0.3 0.3 Gene recombination rate 0.1 0.1 IS transposition rate 0.1 0.1 IS elements length 1,2,3 1,2,3 RIS transposition rate 0.1 0.1 RIS elements length 1,2,3 1,2,3 Gene transposition rate 0.1 0.1 Random constants mutation rate 0.01 -- Dc specific transposition rate 0.1 -- Dc specific IS elements length 1,2,3 -- Selection range 100% 100% Precision 0% 0% Average best-of-run fitness 1896.25 1953.057 Average best-of-run R-square 0.95129456 0.99647004

For the time series prediction problem, using an embedding dimension of 10 and a delay time of one, the sunspots series presented in Table 4.5 result in 90 fitness cases (see section 4.4 for more details). In this case, a wider selection range of 1000% was chosen, giving fmax = 90,000. This experiment, with its two different approaches, is summarized in Table 4.10.

Table 4.10
General settings used in the sunspots prediction task with (SS*) and without (SS) random constants.

 SS* SS Number of runs 100 100 Number of generations 5000 5000 Population size 100 100 Number of fitness cases 90 (Table 4.5) 90 (Table 4.5) Function set 4 (+ - * /) 4 (+ - * /) Terminal set a - j, ? a - j Random constants array length 10 -- Random constants range [-1,1] -- Head length 7 7 Number of genes 3 3 Linking function + + Chromosome length 69 45 Mutation rate 0.044 0.044 One-point recombination rate 0.3 0.3 Two-point recombination rate 0.3 0.3 Gene recombination rate 0.1 0.1 IS transposition rate 0.1 0.1 IS elements length 1,2,3 1,2,3 RIS transposition rate 0.1 0.1 RIS elements length 1,2,3 1,2,3 Gene transposition rate 0.1 0.1 Random constants mutation rate 0.01 -- Dc specific transposition rate 0.1 -- Dc specific IS elements length 1,2,3 -- Selection range 1000% 1000% Precision 0% 0% Average best-of-run fitness 86182.05 89009.66 Average best-of-run R-square 0.706437 0.801144

Home | Contents | Previous | Next