My JGAP page - Genetic programming and Symbolic Regression

JGAP (Java Genetic Algorithms Package) is a Java based system for genetic algorithms and genetic programming.

For more about JGAP:

Symbolic Regression

2010-02-21: I blogged about this package in Symbolic regression (using genetic programming) with JGAP.

Download all Java and configuration files mentioned below: symbolic_regression.zip.

Here is the result of my experiment with Symbolic Regression using Genetic Programming in JGAP. Right now I'm learning both JGAP and genetic programming so there are, for sure, peculiarities in the files. After I learn more, new features will be added or old will be removed.

SymbolicRegression.java is the main program. As you may recognize, it is based on JGAP's example MathProblem.java but extended with some bells & whistles.

The program is compiled with (on a Linux box) like this:
javac -Xlint:unchecked -classpath "jgap/jgap.jar:jgap/lib/log4j.jar:jgap/lib/xstream-1.2.2.jar:jgap/lib/commons-lang-2.1.jar:$CLASSPATH" SymbolicRegression.java
and run with:
java -server -Xmx1024m -Xss2M  -classpath "jgap/jgap.jar:jgap/lib/log4j.jar:jgap/lib/xstream-1.2.2.jar:jgap/lib/commons-lang-2.1.jar:$CLASSPATH" SymbolicRegression [config file]
For compiling, all the Java files below must be downloaded and placed in the same directory as the the file. Here is my log4j.properties file.

See below for more about the configuration files.

Note: Most of these files where incorporated (in some case with changes by Klaus Meffert) in the JGAP distribution, version 3.6 (directory examples/src/examples/gp/symbolicRegression).

Defined functions

Here are my defined functions. Some of these may be considered experimental, but may be of some use. Please note that the code is not tidied up etc.

Configuration files

One of the primary tasks was to be able to use a configuration file to state the problem and the data. Below are some examples. Please note that some of these are experimental (and use experimental parameters/operators), and also they may not give any interesting or good results. More info about the data/problem is usually in the header of the file.

Some of the problems was first tested with Eureqa and was commented in Eureqa: Equation discovery with genetic programming (a Google Translation of my original swedish blog post Eureqa: equation discovery med genetisk programmering).

The configuration parameters

The configuration file consists of the following parameters. Here is a short explanation; the full story is in the code: SymbolicRegression.java. Most of the parameters has reasonable default values, taken from either MathProblem.java or GPConfiguration.

Supported function

The program has support for the following functions from JGAP. The "main" type is double so all functions are not applicable there (e.g. IfElse etc). However, for the ADF functions (defined by setting adf_arity to > 0) many more functions is supported. Please note that some of these are (very) experimental and maybe don't even make sense in this context. Also, see my own defined functions defined in the Java files above.

Examples

Here are two small examples of the program, including the configuration file and a sample run.

Polynom
Here is a simple example of a configuration file. It happens to be the same problem as the JGAP example MathProblem, the polynom x^4 + x^3 + x^2 - x.
#
# Polynom x^4 + x^3 + x^2 - x
# The JGAP example
#
presentation: P(4) x^4 + x^3 + x^2 - x (the JGAP example)
num_input_variables: 1
variable_names: x y
functions: Add,Subtract,Multiply,Divide,Pow,Log,Sine
terminal_range: -10 10
max_init_depth: 4
population_size: 1000
max_crossover_depth: 8
num_evolutions: 800
max_nodes: 20
stop_criteria_fitness: 0.1
data
-2.378099   26.567495
4.153756   382.45743
2.6789956   75.23481
5.336802   986.33777
2.4132318   51.379707
-1.7993588   9.693933
3.9202332   307.8775
2.9227705   103.56364
-0.1422224   0.159982
4.9111285   719.39545
1.2542424   4.76668
1.5987749   11.577456
4.7125554   615.356
-1.1101999   2.493538
-1.7379236   8.631802
3.8303614   282.29697
5.158349   866.7222
3.6650343   239.42934
0.3196721   -0.17437163
-2.3650131   26.014963
A simple run, slightly edited.
It was 20 data rows
Presentation: P(4) x^4 + x^3 + x^2 - x (the JGAP example)
output_variable: y (index: 1)
input variable: x
function1: &1 + &2
function1: &1 - &2
function1: &1 * &2
function1: /
function1: &1 ^ &2
function1: log &1
function1: sine &1
function1: 1.0
[19:52:57] INFO  GPGenotype - Creating initial population
[19:52:57] INFO  GPGenotype - Mem free: 10.5 MB
[19:52:57] INFO  GPPopulation - Prototype program set
[19:52:57] INFO  GPGenotype - Mem free after creating population: 10.5 MB
Creating initial population
Mem free: 10.5 MB
Evolving generation 1/800, memory free: 6.7 MB (time from start:  0,42s)
Best solution fitness: 968.56
Best solution: x ^ 4.0
Depth of chrom: 1
Correlation coefficient: 0.9995009838030151
Evolving generation 4/800, memory free: 11.4 MB (time from start:  0,84s)
Best solution fitness: 813.62
Best solution: ((4.0 + 4.0) + (log 4.0)) * ((x ^ 3.0) / (9.0 / x))
Depth of chrom: 3
Correlation coefficient: 0.999500983803015
Evolving generation 6/800, memory free: 6.7 MB (time from start:  1,07s)
Best solution fitness: 712.97
Best solution: ((5.0 * x) + ((x ^ 4.0) + x)) + x
Depth of chrom: 4
Correlation coefficient: 0.999781550858714
Evolving generation 7/800, memory free: 40.3 MB (time from start:  1,20s)
Best solution fitness: 582.77
Best solution: ((9.0 * x) + (x * 9.0)) - ((9.0 - x) - (x ^ 4.0))
Depth of chrom: 3
Correlation coefficient: 0.9965296891627895
Evolving generation 8/800, memory free: 24.9 MB (time from start:  1,32s)
Best solution fitness: 471.58
Best solution: (((x + 7.0) * (x * x)) * x) * (x / 9.0)
Depth of chrom: 4
Correlation coefficient: 0.9980245718601822
Evolving generation 11/800, memory free: 29.6 MB (time from start:  1,69s)
Best solution fitness: 364.73
Best solution: ((x * 8.0) + ((x ^ 4.0) + x)) - (4.0 - ((x * x) * x))
Depth of chrom: 4
Correlation coefficient: 0.9988207761993971
Evolving generation 16/800, memory free: 33.2 MB (time from start:  2,24s)
Best solution fitness: 317.84
Best solution: ((x * ((x * x) + (log 9.0))) * x) + (x * 9.0)
Depth of chrom: 5
Correlation coefficient: 0.9993672319814505
Evolving generation 17/800, memory free: 19.4 MB (time from start:  2,38s)
Best solution fitness: 169.76
Best solution: (x ^ 4.0) + (x * (x * x))
Depth of chrom: 3
Correlation coefficient: 0.9999752402614274
Evolving generation 22/800, memory free: 24.2 MB (time from start:  3,10s)
Best solution fitness: 136.21
Best solution: ((x * x) * x) + (x + (x * (x * (x * x))))
Depth of chrom: 5
Correlation coefficient: 0.9999485732269622
Evolving generation 23/800, memory free: 10.4 MB (time from start:  3,20s)
Best solution fitness: 3.7509724195622374E-4
Best solution: (x * ((((x * x) + x) * x) + x)) - x
Depth of chrom: 6
Correlation coefficient: 0.9999999999999939

Fitness stopping criteria (0.1) reached with fitness 3.7509724195622374E-4 at generation 23

All time best (from generation 23)
Evolving generation 23/800, memory free: 10.4 MB (time from start:  3,20s)
Best solution fitness: 3.7509724195622374E-4
Best solution: (x * ((((x * x) + x) * x) + x)) - x
Depth of chrom: 6
Correlation coefficient: 0.9999999999999939

Total time  3,20s
Fibonacci series
Here is another example, Fibonacci series as time serie. The object is to give a symbolic regression on the fourth value (F4), which is solved quite fast.
#
# Fibonacci with 3 variables
# 
presentation: This is the Fibonacci series
return_type: DoubleClass
num_input_variables: 3
variable_names: F1 F2 F3 F4
functions: Multiply,Divide,Add,Subtract
terminal_range: -10 10
max_init_depth: 4
population_size: 20
max_crossover_depth: 8
num_evolutions: 100
max_nodes: 21
show_similiar: true
show_population: false
stop_criteria_fitness: 0
data
1,1,2,3
1,2,3,5
2,3,5,8
3,5,8,13
5,8,13,21
8,13,21,34
13,21,34,55
21,34,55,89
34,55,89,144
55,89,144,233
89,144,233,377
144,233,377,610
233,377,610,987
377,610,987,1597
610,987,1597,2584
987,1597,2584,4181
1597,2584,4181,6765
2584,4181,6765,10946
4181,6765,10946,17711
6765,10946,17711,28657
10946,17711,28657,46368
And a sample run:
Presentation: This is the Fibonacci series
output_variable: F4 (index: 3)
input variable: F1
input variable: F2
input variable: F3
function1: &1 * &2
function1: /
function1: &1 + &2
function1: &1 - &2
function1: -5.0
[19:56:31] INFO  GPGenotype - Creating initial population
[19:56:31] INFO  GPGenotype - Mem free: 7.5 MB
[19:56:31] INFO  GPPopulation - Prototype program set
[19:56:31] INFO  GPGenotype - Mem free after creating population: 7.5 MB
Creating initial population
Mem free: 7.5 MB
Evolving generation 1/100, memory free: 6.7 MB (time from start:  0,03s)
Best solution fitness: 17649.74
Best solution: (F3 - (-6.0 - F1)) + ((F2 - F2) - (-6.0 / F1))
Depth of chrom: 3
Correlation coefficient: 0.9999999838289084
Evolving generation 6/100, memory free: 6.0 MB (time from start:  0,13s)
Best solution fitness: 147.0
Best solution: 7.0 + (F2 + F3)
Depth of chrom: 2
Correlation coefficient: 1.0
Evolving generation 7/100, memory free: 5.8 MB (time from start:  0,14s)
Best solution fitness: 0.0
Best solution: F2 + F3
Depth of chrom: 1
Correlation coefficient: 1.0

Fitness stopping criteria (0.0) reached with fitness 0.0 at generation 7

All time best (from generation 7)
Evolving generation 7/100, memory free: 5.8 MB (time from start:  0,15s)
Best solution fitness: 0.0
Best solution: F2 + F3
Depth of chrom: 1
Correlation coefficient: 1.0

Total time  0,15s

All solutions with the best fitness (0.0):
F2 + F3 (1)

Todo

Here are some TODO:s, or things nice to have.

See also

Mention/Usage

Here are some mentions or usage of my Symbolic Regression program:
This page was created by Hakan Kjellerstrand (hakank@bonetmail.com), homepage.