In this section:
In this section we are going to analyze two different approaches to the problem of constant creation in symbolic regression by comparing the performance of two different algorithms. The first uses the facility to manipulate random constants directly and the second does not include this facility. The comparison between the two approaches will be made on three different problems. The first is an artificial problem of sequence induction requiring integer constants; the second is a problem of function finding requiring floatingpoint constants; and the third is a realworld time series prediction problem also requiring floatingpoint constants.
For the sequence induction problem, the following test sequence was chosen:
a_{n} = 4n^{4}
+ 3n^{3} + 2n^{2} + n 
(4.9) 
where n consists of the nonnegative integers. This sequence was chosen because it can be exactly solved by both algorithms and therefore can provide an accurate measure of their performance in terms of success rate.
For the function finding problem, the following “V” shaped function was chosen:
y = 4.251a^{2}
+ ln(a^{2}) + 7.243e^{a} 
(4.10) 
where a is the independent variable and e is the irrational number 2.71828183. Problems of this kind cannot be exactly solved by evolutionary algorithms and, therefore, the performance of both approaches will be compared in terms of average bestofrun fitness and average bestofrun Rsquare.
For the time series prediction task, 100 observations of the Wolfer sunspots series were used
(Table 4.5) with an embedding dimension of 10 and a delay time of one (see
section 4.4 for more details). Once again, the performance of both approaches will be compared in terms of average bestofrun fitness and Rsquare.
Table 4.5
Wolfer sunspots series (read by rows).
101 
82 
66 
35 
31 
7 
20 
92 
154 
125 
85 
68 
38 
23 
10 
24 
83 
132 
131 
118 
90 
67 
60 
47 
41 
21 
16 
6 
4 
7 
14 
34 
45 
43 
48 
42 
28 
10 
8 
2 
0 
1 
5 
12 
14 
35 
46 
41 
30 
24 
16 
7 
4 
2 
8 
17 
36 
50 
62 
67 
71 
48 
28 
8 
13 
57 
122 
138 
103 
86 
63 
37 
24 
11 
15 
40 
62 
98 
124 
96 
66 
64 
54 
39 
21 
7 
4 
23 
55 
94 
96 
77 
59 
44 
47 
30 
16 
7 
37 
74 




For the sequence induction problem, the first 10 positive integers n and their corresponding term were used as fitness cases
(Table 4.6). The fitness function was based on the relative error and the fitness was evaluated by equation
(3.1b). A selection range of 25% and maximum precision (0% error) were chosen, giving
f_{max} = 250. This experiment, with its two different approaches, is summarized in
Table 4.7.
Table 4.6
Set of fitness cases for the sequence induction task.
n 
a_{n} 
1 
10 
2 
98 
3 
426 
4 
1252 
5 
2930 
6 
5910 
7 
10738 
8 
18056 
9 
28602 
10 
43210 
Table 4.7
General settings used in the sequence induction problem with (SI*) and without
(SI) random constants.

SI* 
SI 
Number
of runs 
100 
100 
Number
of generations 
100 
100 
Population
size 
100 
100 
Number
of fitness cases 
10 (Table
4.6) 
10 (Table
4.6) 
Function
set 
+
 * / 
+
 * / 
Terminal
set 
a ? 
a 
Random
constants array length 
10 
 
Random
constants range 
{0,
1, 2, 3} 
 
Head
length 
6 
6 
Number
of genes 
5 
5 
Linking
function 
+ 
+ 
Chromosome
length 
100 
65 
Mutation
rate 
0.044 
0.044 
Onepoint
recombination rate 
0.3 
0.3 
Twopoint
recombination rate 
0.3 
0.3 
Gene
recombination rate 
0.1 
0.1 
IS
transposition rate 
0.1 
0.1 
IS
elements length 
1,2,3 
1,2,3 
RIS
transposition rate 
0.1 
0.1 
RIS
elements length 
1,2,3 
1,2,3 
Gene
transposition rate 
0.1 
0.1 
Random
constants mutation rate 
0.01 
 
Dc
specific transposition rate 
0.1 
 
Dc
specific IS elements length 
1,2,3 
 
Selection
range 
25% 
25% 
Precision 
0% 
0% 
Average
bestofrun fitness 
195.308 
249.982 
Average
bestofrun Rsquare 
0.798698299 
0.9999999996 
Success
rate 
24% 
98% 
For the “V” shaped function problem, a set of 20 random fitness cases chosen from the interval [1, 1] was used
(Table 4.8). The fitness function was also evaluated by equation
(3.1b), but in this case a selection range of 100% was used, giving
f_{max} = 2000. This experiment, with its two different approaches, is summarized in
Table 4.9.
Table 4.8
Set of fitness cases used in the “V” function problem.
a

f(a)

0.2639725157548 
3.19498066265276 
0.0578905532656938 
1.99052001725998 
0.334025290109634 
8.39663703997286 
0.236334577564462 
3.07088976972825 
0.855744382566804 
5.87946763695703 
0.0194437136332785 
0.775326322328458 
0.192134388183304 
2.83470225774408 
0.529307910124627 
12.2154726642137 
0.00788974118728459 
2.49803983418635 
0.438969804950631 
10.4071734858808 
0.107559292698039 
2.09413635645908 
0.274556994377163 
3.23927278010839 
0.0595333219604528 
1.19701284767347 
0.384492993958352 
9.35580769189855 
0.874923020736333 
6.00642453001302 
0.236546636250546 
3.07189729043837 
0.167875941704557 
2.67440053130986 
0.950682181822091 
22.4819639844149 
0.946979159577362 
22.3750161187355 
0.639339910059591 
14.5701285332337 
Table 4.9
General settings used in the “V” function problem with (V*) and without
(V) random constants.

V* 
V 
Number
of runs 
100 
100 
Number
of generations 
5000 
5000 
Population
size 
100 
100 
Number
of fitness cases 
20 (Table
4.8) 
20 (Table
4.8) 
Function
set 
+
 * / L E K ~ S C 
+
 * / L E K ~ S C 
Terminal
set 
a, ? 
a 
Random
constants array length 
10 
 
Random
constants range 
[1,1] 
 
Head
length 
6 
6 
Number
of genes 
5 
5 
Linking
function 
+ 
+ 
Chromosome
length 
100 
65 
Mutation
rate 
0.044 
0.044 
Onepoint
recombination rate 
0.3 
0.3 
Twopoint
recombination rate 
0.3 
0.3 
Gene
recombination rate 
0.1 
0.1 
IS
transposition rate 
0.1 
0.1 
IS
elements length 
1,2,3 
1,2,3 
RIS
transposition rate 
0.1 
0.1 
RIS
elements length 
1,2,3 
1,2,3 
Gene
transposition rate 
0.1 
0.1 
Random
constants mutation rate 
0.01 
 
Dc
specific transposition rate 
0.1 
 
Dc
specific IS elements length 
1,2,3 
 
Selection
range 
100% 
100% 
Precision 
0% 
0% 
Average
bestofrun fitness 
1896.25 
1953.057 
Average
bestofrun Rsquare 
0.95129456 
0.99647004 
For the time series prediction problem, using an embedding dimension of 10 and a delay time of one, the sunspots series presented in
Table 4.5 result in 90 fitness cases (see section 4.4 for more details). In this case, a wider selection range of 1000% was chosen, giving
f_{max} = 90,000. This experiment, with its two different approaches, is summarized in
Table 4.10.
Table 4.10
General settings used in the sunspots prediction task with (SS*) and without
(SS) random constants.

SS* 
SS 
Number
of runs 
100 
100 
Number
of generations 
5000 
5000 
Population
size 
100 
100 
Number
of fitness cases 
90 (Table
4.5) 
90 (Table
4.5) 
Function
set 
4
(+  * /) 
4
(+  * /) 
Terminal
set 
a 
j, ? 
a  j 
Random
constants array length 
10 
 
Random
constants range 
[1,1] 
 
Head
length 
7 
7 
Number
of genes 
3 
3 
Linking
function 
+ 
+ 
Chromosome
length 
69 
45 
Mutation
rate 
0.044 
0.044 
Onepoint
recombination rate 
0.3 
0.3 
Twopoint
recombination rate 
0.3 
0.3 
Gene
recombination rate 
0.1 
0.1 
IS
transposition rate 
0.1 
0.1 
IS
elements length 
1,2,3 
1,2,3 
RIS
transposition rate 
0.1 
0.1 
RIS
elements length 
1,2,3 
1,2,3 
Gene
transposition rate 
0.1 
0.1 
Random
constants mutation rate 
0.01 
 
Dc
specific transposition rate 
0.1 
 
Dc
specific IS elements length 
1,2,3 
 
Selection
range 
1000% 
1000% 
Precision 
0% 
0% 
Average
bestofrun fitness 
86182.05 
89009.66 
Average
bestofrun Rsquare 
0.706437 
0.801144 
