Using Genetic Programming to Generate Decision Tree

 

1.     Discussion. 2

2.     Financial Genetic Programming. 2

2.1.      Investment Decision Making Using FGP: A Case Study by Jin li and Edward P. K. Tsang (July 1999) 2

2.2.      Improving Technical Analysis Predictions: An Application of Genetic Programming by Jin Li and Edward P.K. Tsang (1999) 3

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

1.      Discussion

 

In my opinion, genetic programming is all about classification, which is the same as decision tree. However, it is how you classify your decision tree that makes it into determining an optimal solution. FGP is a rule set genetic programming in the sense that it specifies the conditions and rules to make the decisions. In the following two papers, the FGP works with historical data and divided them into training and test sets for performance comparison. According to the authors, they found that FGP performs better than C4.5 but may be due to over-fitting of data. From the two papers, FGP straightaway determines the solution based on the rules whereas in conventional decision tree analysis, we present all the different possibilities and obtain the highest EMV. However, since it is a type of programming, more rule sets can be added and EMV values can be stored to find the maximum value to make this possible. To make this happen, we need a decision tree that is able to forecast future values based on the historical data and other factors. FGP has discussed about this but I don’t see how they use it in the following 2 papers.

 

FGP does work similarly to C4.5 (rule set) with standard genetic operators such as crossover, mutation and reproduction. The main differences are the selection strategy and the fitness rule. These papers have provided discussion on the fitness rule. But I have yet to find out the selection strategy used (tournament vs. roulette wheel).

 

 

2.      Financial Genetic Programming

2.1.   Investment Decision Making Using FGP: A Case Study by Jin li and Edward P. K. Tsang (July 1999)

 

Goal:

1)      FGP to recognize investment opportunities where a return of 2.2% or more can be achieved within 21 trading days (i.e. one month). Test the effectiveness of FGP when r and n take on different values.

2)      To compare FGP with other machine learning system. (C4.5)

 

Background of FGP:

1)      Based on GP, which belongs to the paradigm of genetic algorithms. In GA, a string represents a candidate solution while a tree in GP represents a candidate solution.

2)      Basic elements of FGP are conditions and recommendations.

3)      A fitness function is needed to evaluate the quality of each candidate solution with regard to the task to be performed (how good is a rule for prediction in our application?).

4)      In FGP, a candidate solution is represented by a genetic decision tree (GDT). Basic elements of GDTs are conditions and recommendations. A single condition comprises one financial indicator, such as 50-days moving average, one relational operator such as “greater than”, or “less than”, etc, and a threshold number. Conditions are combined in GDTs through logic operators such as “Or”, “And”, “Not”, and “If-Then-Else”.

 

 

PMV_50: Today’s price – the average price of the previous 50 trading days

Filter_5: Today’s price – the minimum price of the previous 5 trading days

Filter_63: Today’s price – the minimum price of the previous 63 trading days

TRB_5: Today’s price – the maximum price of the previous 5 trading days (based on the Trading Range Breakout rule)

 

Syntax used: Backus normal form (BNF) grammar.

Rule:

If today’s price is 18.45 or more below the average price of the last 50 days, then the goal can be achieved if we invest today (we call today a positive position).

Otherwise, if today’s price is no more than 19.48 above the maximum price of the previous 5 trading days and today’s price is less than 36.24 above the minimum price in the last 63 days, then it is negative.

 

FGP also uses standard genetic operators, including crossover, mutation (for diversification purpose) and reproduction. Selection strategies include roulette wheel and tournament. Tournament strategy is used in this paper.

FGP also constructs best GDT based on the fitness value. Uses Rate of Correctness (RC) where RC = (P + N)/T (P: number of positives and N: number or negatives and T: total number of predictions)

 

2.2.   Improving Technical Analysis Predictions: An Application of Genetic Programming by Jin Li and Edward P.K. Tsang (1999)

 

Proposed a GP approach that is capable of combining individual technical rules and adapt to the thresholds based on past data.

 

EDDIE (Evolutionary Dynamic Data Investment Evaluator) is a forecasting system to help investors to make the best use of the information available to them (Tsang et al. 1998). Such information may include technical rule indicators, individual company’s performance indicators, expert predictions, etc.

 

Financial forecasting using EDDIE:

1.      Allows potential invest to make hypotheses about the factors that are relevant to a forecast.

2.      Then tests those hypotheses using historical data and evolves, by way of natural selection, decision trees which aim to provide good return on investment (ROI).

 

Evolutionary computation is a standard term that encompasses a class of search, adaptation, and optimization techniques based on the principles of natural selection and evolution. These include genetic algorithms, evolution strategies, and evolutionary programming. (The author, Back, provides a review of these three paradigms.)

 

FGP:

  1. A candidate solution is represented by a genetic decision tree (GDT).
  2. Basic elements of GDTs are rules and forecast values, which corresponds to the functions and terminals in GP.
  3. Each GDT is evaluated based on the fitness value that reflects the quality of the GDT to solve the problem at hand. FGP maintains a set of GDTs called a population and works in iterations. For every iteration, FGP creates a new generation of population using standard genetic crossover, mutation, and reproduction operators.
  4. Given 2 GDTs (called parents), the crossover operator in FGP works as follows (usually used with high probability ~ 0.9):
    1. Randomly select a node within each parent GDTs as a crossover point,
    2. Exchange the subtrees rooted at the selected nodes to generate two children.
  5. Mutation works as follows to keep a population with sufficient diversification (usually used with low probability ~ 0.1):
    1. Randomly select a node within parent tree as the mutation point
    2. Generate a new tree with limited depth.
    3. Replace the subtree rooted at the selected node with the generated tree.
  6. Reproduction is used to evolve new generation with low probability as well (e.g. 0.1).
  7. Selection strategy: roulette wheel and tournament. Tournament is preferred based on the justifications presented by Miller & Goldberg 1995.

 

 

Some other papers:

  1. Bauer (1994) reported his genetic algorithm intelligent systems that aimed at finding tactic market-timing strategies.
  2. Allen & Karjalainen (1995) used genetic programming to identify profitable trading rules in the stock market.
  3. Chen & Yeh (1996) attempted to formalize the notion of unpredictability (as in the efficient market hypothesis) in terms of search intensity and the chance of success.

4.      Mahfoud & Mani (1996) presented a new genetic-algorithm-based system and applied it to the task of predicting the future performances of individual stocks.

  1. Neely et al (1997) and Oussaidene et al (1997) applied genetic programming to foreign exchange forecasting and reported some success.
  2. Butler (1997) reported a genetic programming based system for predicting whether a return of r or more could be achievable within the next n trading days in stock market.