Gene
Expression Programming (GEP) is a
new evolutionary algorithm that evolves computer programs (they can take
many forms: mathematical expressions, neural networks, decision trees,
polynomial constructs, logical expressions, and so on). The
computer programs of GEP, irrespective of their complexity, are all
encoded in linear chromosomes. Then the linear chromosomes are expressed
or translated into expression trees (branched structures). Thus, in GEP, the genotype (the linear chromosomes)
and the phenotype (the expression trees) are different entities (both
structurally and functionally), and because of this apparently trivial
fact, this new evolutionary system can finally make a difference,
successfully assisting researchers in the design of robust and accurate
computer models.
As in nature, the linear chromosomes of GEP consist of the genetic material
that is passed on with modification to the next generation. This, in
other words, means that
in GEP all the genetic modifications take place in the linear chromosomes
(much easier to do than in complex branched structures as is done in GP), and
it also means that only the linear chromosomes are transmitted in the process of reproduction
(linear strings are much easier to replicate than complicated tree
structures).
And also as in nature, it's only during development that the information
encoded in the chromosomes is finally expressed into fully developed
computer programs or expression trees (ETs).
Expression trees are sophisticated computer programs that are usually
evolved to solve a particular
problem and are therefore selected according to their fitness at solving
that task. With time, populations of such computer programs
(encoded, of course, in linear chromosomes) will discover new traits
(thanks to genetic modification) and therefore will become better adapted to
the particular environment chosen for their breeding (this environment
defines obviously the problem at hand). And this means that, given
enough time and that we've set the stage correctly, a good solution to
the problem at hand will be discovered.
So, GEP is a full-fledged genotype/phenotype system, with the genotype
totally separated from the phenotype. In Genetic Programming, though, we
have a totally different scenario: genotype and phenotype are one
entangled mess or what is more formally called a simple replicator
system. And the consequences of this are huge: the full-fledged
genotype/phenotype system of GEP surpasses the old GP system by a factor
of 100-60,000!
Learn More
This might all seem very confusing, especially if
you've forgotten your biology, but GEP is in truth very simple and can
be quickly understood.
To know all the details about this new algorithm, see the seminal
GEP paper (published in
the journal Complex
Systems in 2001) where the algorithm is fully described and applied to a
varied set of modeling problems. Or you can read the
GEP online tutorial for a quick
introduction to this revolutionary algorithm. Other
online tutorials are also available for
faster and more informal expositions.
For more advanced topics, all my GEP
papers are freely available online both in pdf format and html. The
1st edition of my GEP book is
also freely available online, but you should also check the
2nd Springer edition which was
substantially revised and extended with five new chapters, including a
chapter describing two new algorithms for decision tree induction with
GEP.
GEP Software
To see and understand how GEP really works, you can
download Gepsoft GeneXproTools 4.0.
The freely available Demo
is fully functional for a wide set of sample problems. And this means
two main things: First, you can visualize and monitor lots of
interesting things during evolution, including the evolutionary
dynamics, fitness distribution, program size distribution and variation,
instant curve fitting and target/model comparison, variables usage,
program structure (not only the expression trees but also the
corresponding code in Ada, C, C++, C#, Fortran, Java, Java Script, Matlab, Pascal,
Perl, PHP, Python, Visual Basic, VB.Net, and VHDL), model
generalization, and so on.
And second, with GeneXproTools you
can experiment with a lot of settings and see immediately how a
particular setting affects evolution. For example, you can change the
population size, the genetic operators, the fitness function, the
chromosome architecture (program size, number of genes and linking
function), the function set (more than 250 built-in functions to choose
from), the learning algorithm, the random numerical constants, the
rounding threshold, experiment with parsimony pressure, explore
different modeling categories (function finding, classification, time
series prediction, and logic synthesis), change the seed structure,
simplify the evolved models, explore neutrality by adding neutral genes,
create your own fitness functions, design your own mathematical/logical
functions and then evolve models with them, and even create your own
grammars to generate code automatically from GEP genes in your favorite
programming languages, and so on.