Karva Notation and K-Expressions
Karva notation was developed specifically for Gene Expression Programming
(GEP) and consists of a universal way of compactly representing any mathematical or logical expression that can be represented as a tree
. Besides its compactness, this universal
representation is also linear, and this is a fundamental
characteristic for any system that has to breed mathematical
expressions to create new, more precise ones.
The linear structures of GEP are called chromosomes and each
chromosome contains one or more genes. And each gene is
associated with its own K-expression ("K" or "Kappa" is for
Karva notation). Genes and K-expressions are very easy to
decode. For example, the gene:
can be represented as a diagram or expression tree (ET):
The translation of the gene into the corresponding ET is straightforward: The first element in the gene (position 0) corresponds to the root of the ET; then, below that node, are attached as many nodes as there are arguments to that function (two, in this case); then these nodes are filled consecutively with the elements in the gene (in this case, positions 1 and 2), and so forth. The process is repeated until a line composed of only terminals is formed (in this case, the third line).
More formally, both the gene and ET above can be represented by the
Usually the genes evolved by GEP are more interesting than the gene presented
above, not only with
noncoding regions at their ends but also with more diverse
Consider another gene:
where "Q" represents the square root function. This gene has a head
(from position 0 through 7 and shown in blue) of length 8 and a tail (from position 8 through
16) of length 9. Note that the head contains both functions
(which may take 1, 2,..., n arguments) and
terminals (the variables and constants in a problem), whereas the tail contains exclusively
terminals (this structural organization is indeed the cornerstone of
GEP as it ensures that all the programs encoded in the genes are
syntactically correct). The translation of such genes is done exactly as
shown for gene (1), giving:
Note that, in this case, not all the elements in the gene were used to construct the ET, as the translation ends whenever a line containing only terminals is formed. In this
particular case, the gene ends at position 16 whereas the K-expression ends at position 9.
The beauty and elegance of the gene/ET system of Gene
Expression Programming by itself is rather amazing, but most
importantly it's that this is only the beginning and, indeed, this
elegant genotype/phenotype system is at the heart of systems
much more complex and also more efficient. In one such system the
chromosomes contain more than one gene, and they are obviously
called multigenic systems. In these systems, the chromosomes
consist of multiple genes, and each gene codes for a sub-ET or sub-program. After translation, the sub-ETs are linked by
special functions, the so called linking functions. These
functions link the sub-ETs together one after the other in an
orderly fashion. For example, the following chromosome composed of three genes (position 0 indicates the beginning of each gene):
codes for the three following sub-ETs:
Then the sub-ETs are afterwards linked by one of the available linking
functions. For instance, if the linking function were addition, then the following program would be
obtained (the linking function is shown in gray):
Note that the final program represented in the Figure above could be linearly encoded as
a single K-expression:
However, the use of multigenic chromosomes is more appropriate to evolve solutions to complex problems, for they permit the modular construction of complex, hierarchical structures, where each gene codes for a smaller and simpler building block. These smaller building blocks are separated from
one another and are therefore free to evolve independently, allowing
for the creation of concrete new units that might prove handy in a
So, what is the use of Karva notation and why is it so important?
The answer is that it allows adaptation or, in other words,
learning: you replicate the linear strings of GEP, change a few
bits, and you end up with new strings that encode slightly better
programs. This is the basis for learning and evolution, and the
plasticity of GEP genes with their noncoding regions allows the
totally unconstrained exploration of the solution space, therefore
increasing the odds of finding very good solutions to the problem at
hand. No other programming system is as unconstrained as the
genotype/phenotype GEP system in terms of the mechanisms used to
create genetic variation (the Genetic Programming system is
restricted to an inefficient tree-crossover and it's so complicated
structurally that it never even developed beyond a single tree
Furthermore, no other system has as much growth potential as GEP in
the sense that the basic GEP system can give rise to other, more
complex systems. We've already encountered here two GEP systems: the
unigenic system and the multigenic system. But both these systems
can be used as the starting point to create yet more complex
systems: from simple GEP systems that elegantly handle random
numerical constants to more sophisticated systems that can evolve
highly complex structures such as neural networks, decision trees,
automatically defined functions and polynomial networks. You can
learn all about these and other GEP systems in Ferreira's book
Gene Expression Programming: Mathematical Modeling by an Artificial
- Ferreira, C., 2001.
Gene Expression Programming: A New Adaptive Algorithm for
Solving Problems. Complex Systems, Vol. 13, issue 2: 87-129.
- Koza, J. R., 1992. Genetic Programming: On the Programming of Computers by Means of Natural Selection.
Cambridge, MA: MIT Press.
- Ferreira, C., 2006.
Gene Expression Programming:
Mathematical Modeling by an Artificial Intelligence. 2nd Edition, Springer-Verlag, Germany.
Last modified: August 27, 2007
Cite this as:
Ferreira, C. "Karva Notation and K-Expressions." From GEP
Tutorials: A Gepsoft Web Resource.