Tools for mining knowledge from data are crucial in a world where data is constantly increasing. The quantity of data is so big that to find the meaningful factors in the sea of data becomes a Herculean task and new technologies have been developed to extract relevant knowledge from data. Gene expression programming is one of these emerging technologies and is ideal for separating the wheat from chaff. In this section we are going to illustrate this with a function finding problem where nine out of 10 variables are meaningless.
The test function is the already familiar function of section
4.1.1, with the difference that the meaningful parameter is to be discovered among a total of 10 variables. In
Table 4.4 are summarized the parameters used per run in this experiment. As the high success rate shows (77%), GEP was not overwhelmed by the quantity of irrelevant data and found its way very efficiently. The first perfect solution was found in generation 61 of run 0. Its chromosome is shown below (the subETs are linked by addition):
01234567890120123456789012012345678901201234567890120123456789012 

*a*aahgadadcah*dgcfjcbd/gcgciijeegh+eeehbeddbfd*aadaabcecfgb 
(4.7) 
where a represents the meaningful variable and bj represent the remaining meaningless variables. As its expression shows, this chromosome encodes a function equivalent to the target function
(4.1).
Table 4.4
Settings used in the 10dimensional data mining problem.
Number
of runs 
100 
Number
of generations 
1000 
Population
size 
50 
Number
of fitness cases 
100 
Function
set 
+
 * / 
Terminal
set 
a
b c d e f g h i j 
Head
length 
6 
Number
of genes 
5 
Linking
function 
+ 
Chromosome
length 
65 
Mutation
rate 
0.044 
Onepoint
recombination rate 
0.3 
Twopoint
recombination rate 
0.3 
Gene
recombination rate 
0.1 
IS
transposition rate 
0.1 
IS
elements length 
1,2,3 
RIS
transposition rate 
0.1 
RIS
elements length 
1,2,3 
Gene
transposition rate 
0.1 
Selection
range 
100% 
Precision 
0.01% 
Success
rate 
77% 
