Using
Genetic Programming to Generate Decision Tree
2. Financial Genetic Programming
2.1. Investment Decision Making Using
FGP: A Case Study by Jin li and Edward P. K. Tsang (July 1999)
In my opinion, genetic programming is all about
classification, which is the same as decision tree. However, it is how you
classify your decision tree that makes it into determining an optimal solution.
FGP is a rule set genetic programming in the sense that it specifies the
conditions and rules to make the decisions. In the following two papers, the
FGP works with historical data and divided them into training and test sets for
performance comparison. According to the authors, they found that FGP performs
better than C4.5 but may be due to over-fitting of data. From the two papers,
FGP straightaway determines the solution based on the rules
whereas in conventional decision tree analysis, we present all the different
possibilities and obtain the highest EMV. However, since it is a type of
programming, more rule sets can be added and EMV values can be stored to find
the maximum value to make this possible. To make this happen, we need a
decision tree that is able to forecast future values based on the historical
data and other factors. FGP has discussed about this but I don’t see how they
use it in the following 2 papers.
FGP does work similarly to C4.5 (rule set) with
standard genetic operators such as crossover, mutation and reproduction. The
main differences are the selection strategy and the fitness rule. These papers
have provided discussion on the fitness rule. But I have yet to find out the
selection strategy used (tournament vs. roulette wheel).
Goal:
1) FGP to recognize investment opportunities where a return of 2.2% or more can be achieved within 21 trading days (i.e. one month). Test the effectiveness of FGP when r and n take on different values.
2) To compare FGP with other machine learning system. (C4.5)
Background of FGP:
1) Based on GP, which belongs to the paradigm of genetic algorithms. In GA, a string represents a candidate solution while a tree in GP represents a candidate solution.
2) Basic elements of FGP are conditions and recommendations.
3) A fitness function is needed to evaluate the quality of each candidate solution with regard to the task to be performed (how good is a rule for prediction in our application?).
4) In FGP, a candidate solution is represented by a genetic decision tree (GDT). Basic elements of GDTs are conditions and recommendations. A single condition comprises one financial indicator, such as 50-days moving average, one relational operator such as “greater than”, or “less than”, etc, and a threshold number. Conditions are combined in GDTs through logic operators such as “Or”, “And”, “Not”, and “If-Then-Else”.

PMV_50: Today’s price – the average price of the previous 50 trading days
Filter_5: Today’s price – the minimum price of the previous 5 trading days
Filter_63: Today’s price – the minimum price of the previous 63 trading days
TRB_5: Today’s price – the maximum price of the previous 5 trading days (based on the Trading Range Breakout rule)
Syntax used: Backus normal form (BNF) grammar.
Rule:
If today’s price is 18.45 or more below the average price of the last 50 days, then the goal can be achieved if we invest today (we call today a positive position).
Otherwise, if today’s price is no more than 19.48 above the maximum price of the previous 5 trading days and today’s price is less than 36.24 above the minimum price in the last 63 days, then it is negative.
FGP also uses standard genetic operators, including crossover, mutation (for diversification purpose) and reproduction. Selection strategies include roulette wheel and tournament. Tournament strategy is used in this paper.
FGP also constructs best GDT based on the fitness value. Uses Rate of Correctness (RC) where RC = (P + N)/T (P: number of positives and N: number or negatives and T: total number of predictions)
Proposed a GP approach that is capable of combining individual technical rules and adapt to the thresholds based on past data.
EDDIE (Evolutionary Dynamic Data Investment Evaluator) is a forecasting system to help investors to make the best use of the information available to them (Tsang et al. 1998). Such information may include technical rule indicators, individual company’s performance indicators, expert predictions, etc.
Financial forecasting using EDDIE:
1. Allows potential invest to make hypotheses about the factors that are relevant to a forecast.
2. Then tests those hypotheses using historical data and evolves, by way of natural selection, decision trees which aim to provide good return on investment (ROI).
Evolutionary computation is a standard term that encompasses a class of search, adaptation, and optimization techniques based on the principles of natural selection and evolution. These include genetic algorithms, evolution strategies, and evolutionary programming. (The author, Back, provides a review of these three paradigms.)
FGP:
Some other papers:
4. Mahfoud & Mani (1996) presented a new genetic-algorithm-based system and applied it to the task of predicting the future performances of individual stocks.