Proteins

  Buy the Book

  Home
  News
  Author
  Q&A
  Tutorials
  Downloads
  GEP Biblio
  Contacts

  Visit Gepsoft

ISBN: 9729589054

Gene Expression Programming: Mathematical Modeling by an Artificial Intelligence

Proteins

Proteins are linear, long strings of 20 different amino acids and they consist of the immediate expression of the genetic information stored in DNA. This means that the four-letter language of DNA is translated into the more complex 20-letter language of proteins. Obviously, there must be some kind of code (genetic code) to translate the language of the four nucleotides into the language of 20 amino acids. In order to specify each of the 20 amino acids there should be at least 20 DNA “words”. By using triplets of nucleotides (codons) for each amino acid, 4³ = 64 different three-letter “words” are possible. This is more than adequate to code for the 20 amino acids and, in fact, most amino acids have multiple codons, as only three of the 64 codons code for the instruction “stop synthesis”. There is also a codon for a “start synthesis” instruction, but this codon also codes for methionine, one of the 20 amino acids found in proteins. The genetic code is virtually universal, meaning that all organisms on Earth with very few exceptions use the same codons to translate the language of their genes into proteins (the genetic code is shown in section 1.2.4, Figure 1.6).

Thus, the information for proteins is decoded triplet by triplet at a time and expressed as linear sequences of amino acids. Although the amino acid sequence of the protein reflects the sequence of the corresponding DNA molecule, the protein has a unique three-dimensional structure and exhibits unique properties. Because of the richer chemical alphabet of proteins, the linear strings of amino acids fold in special ways giving each protein its individual three-dimensional structure. This unique three-dimensional structure or tertiary organization of proteins, together with the vast chemical repertoire of amino acids, allows proteins to play numerous roles, amongst them the role of biological catalysts or enzymes. In fact, proteins are the real workers of the cell.

Note again that, like RNA, proteins can function simultaneously as genotype and phenotype. Note, however, that despite the richer functional diversity, such systems are also equally constrained: any modification in the replicator is immediately reflected in its performance.

In theoretical terms, the differences between DNA, RNA, and proteins are most useful to help understand the fundamental differences between GAs, GP and GEP. Both GAs and GP are simple replicator systems, using only one kind of entity: linear strings of 0’s and 1’s in the case of GAs, and complex ramified structures composed of several elements in the case of GP. Many believe that a simple “RNA world” existed in the early history of life, perhaps contemporary to a simple “protein world”. RNA and proteins somehow started working together, recruiting also DNA. The complex DNA/protein system of life on Earth is the descendant of this evolutionary process.

It is surprising that computer scientists some 4 billion years after these events took pretty much the same steps of life on Earth, first inventing simple replicator systems and only later inventing sophisticated replicator/phenotype systems. The genetic algorithm invented by Holland in the 60’s (Holland 1975) is analogous to a simple RNA replicator with its linear chromosomes and limited functionality, whereas the algorithm popularized by Koza (1992) is analogous to a simple protein replicator with its richer functionality. Curiously enough, the conscious attempts to create a genotype/phenotype system, despite trying very hard to emulate the DNA/protein system, are far from being the desired leap forward (Banzhaf 1994, Ryan et al. 1998).

On the other hand, the full-fledged genotype/phenotype system of GEP was invented in 1999 by myself (Ferreira 2001), totally unaware of all the hard work done by other researchers to create a genotype/phenotype system. In fact, I first heard of GP in Mitchell’s book (Mitchell 1996) and was so impressed that I tried to make a GP on my own. I suppose I just applied what I knew from biochemistry and evolution and, therefore, it never crossed my mind to make a system without an autonomous genome. Obviously, the complicated things of information metabolism were discarded as they are irrelevant to a computer system where the rules are not dictated by chemistry. Thus, double-stranded chromosomes, RNA-like intermediates, and complicated genetic codes with complicated translation mechanisms did not make their way into GEP. Furthermore, I also knew that for a genotype/phenotype machine to run smoothly, the genetic operators could not be constrained and they should always produce valid structures. And the result was the first truly functional genotype/phenotype system that can be implemented using any programming language, as nothing in this algorithm depends on the workings of a particular language.

Home | Contents | Previous | Next