C++ Neural Networks and Fuzzy Logic

bet	27/41
Sana	16.08.2020
Hajmi	1.14 Mb.
	#126479

1 ... 23 24 25 26 27 28 29 30 ... 41

Bog'liq
C neural networks and fuzzy logic

C++ Neural Networks and Fuzzy Logic

by Valluru B. Rao

MTBooks, IDG Books Worldwide, Inc.

ISBN: 1558515526 Pub Date: 06/01/95

Previous Table of Contents Next

Summary

You explored further the backpropagation algorithm in this chapter, continuing the discussion in Chapter 7.

• A momentum term was added to the training law and was shown to result in much faster

convergence in some cases.

• A noise term was added to inputs to allow training to take place with random noise applied. This

noise was made to decrease with the number of cycles, so that final stage learning could be done in a

noise−free environment.

• The final version of the backpropagation simulator was constructed and used on the example from

Chapter 12. Further application of the simulator will be made in Chapter 14.

• Several applications with the backpropagation algorithm were outlined, showing the wide

applicability of this algorithm.

Previous Table of Contents Next

IDG Books Worldwide, Inc.

C++ Neural Networks and Fuzzy Logic:Preface

Summary

296

C++ Neural Networks and Fuzzy Logic

by Valluru B. Rao

MTBooks, IDG Books Worldwide, Inc.

ISBN: 1558515526 Pub Date: 06/01/95

Previous Table of Contents Next

Chapter 14

Application to Financial Forecasting

Introduction

In Chapters 7 and 13, the backpropagation simulator was developed. In this chapter, you will use the simulator

to tackle a complex problem in financial forecasting. The application of neural networks to financial

forecasting and modeling has been very popular over the last few years. Financial journals and magazines

frequently mention the use of neural networks, and commercial tools and simulators are quite widespread.

This chapter gives you an overview of typical steps used in creating financial forecasting models. Many of the

steps will be simplified, and so the results will not, unfortunately, be good enough for real life application.

However, this chapter will hopefully serve as an introduction to the field with some added pointers for further

reading and resources for those who want more detailed information.

Who Trades with Neural Networks?

There has been a great amount of interest on Wall Street for neural networks. Bradford Lewis runs two

Fidelity funds in part with the use of neural networks. Also, LBS Capital Management (Peoria, Illinois)

manages part of its portfolio with neural networks. According to Barron’s (February 27, 1995), LBS’s $150

million fund beat the averages by three percentage points a year since 1992. Each weekend, neural networks

are retrained with the latest technical and fundamental data including P/E ratios, earnings results and interest

rates. Another of LBS’s models has done worse than the S&P 500 for the past five years however. In the book

Virtual Trading, Jeffrey Katz states that most of the successful neural network systems are proprietary and not

publicly heard of. Clients who use neural networks usually don’t want anyone else to know what they are

doing, for fear of losing their competitive edge. Firms put in many person−years of engineering design with a

lot of CPU cycles to achieve practical and profitable results. Let’s look at the process:

Developing a Forecasting Model

There are many steps in building a forecasting model, as listed below.

1. Decide on what your target is and develop a neural network (following these steps) for each

target.

2. Determine the time frame that you wish to forecast.

3. Gather information about the problem domain.

4. Gather the needed data and get a feel for each inputs relationship to the target.

5. Process the data to highlight features for the network to discern.

6. Transform the data as appropriate.

7. Scale and bias the data for the network, as needed.

8. Reduce the dimensionality of the input data as much as possible.

C++ Neural Networks and Fuzzy Logic:Preface

Chapter 14 Application to Financial Forecasting

297

9. Design a network architecture (topology, # layers, size of layers, parameters, learning paradigm).

10. Go through the train/test/redesign loop for a network.

11. Eliminate correlated inputs as much as possible, while in step 10.

12. Deploy your network on new data and test it and refine it as necessary.

Once you develop a forecasting model, you then must integrate this into your trading system. A neural

network can be designed to predict direction, or magnitude, or maybe just turning points in a particular market

or something else. Avner Mandelman of Cereus Investments (Los Altos Hills, California) uses a long−range

trained neural network to tell him when the market is making a top or bottom (Barron’s, December 14, 1992).

Now let’s expand on the twelve aspects of model building:

The Target and the Timeframe

What should the output of your neural network forecast? Let’s say you want to predict the stock market. Do

you want to predict the S&P 500? Or, do you want to predict the direction of the S&P 500 perhaps? You

could predict the volatility of the S&P 500 too (maybe if you’re an options player). Further, like Mr.

Mandelman, you could only want to predict tops and bottoms, say, for the Dow Jones Industrial Average. You

need to decide on the market or markets and also on your specific objectives.

Another crucial decision is the timeframe you want to predict forward. It is easier to create neural network

models for longer term predictions than it is for shorter term predictions. You can see a lot of market noise, or

seemingly random, chaotic variations at smaller and smaller timescale resolutions that might explain this.

Another reason is that the macroeconomic forces that fundamentally move market over long periods, move

slowly. The U.S. dollar makes multiyear trends, shaped by economic policy of governments around the world.

For a given error tolerance, a one−year forecast, or one−month forecast will take less effort with a neural

network than a one−day forecast will.

Domain Expertise

So far we’ve talked about the target and the timeframe. Now one other important aspect of model building is

knowledge of the domain. If you want to create an effective predictive model of the weather, then you need to

know or be able to guess about the factors that influence weather. The same holds true for the stock market or

other financial market. In order to create a real tradable Treasury bond trading system, you need to have a

good idea of what really drives the market and works— i.e., talk to a Tbond trader and encapsulate his domain

expertise!

Gather the Data

Once you know the factors that influence the target output, you can gather raw data. If you are predicting the

S&P 500, then you may consider Treasury yields, 3−month T−bill yields, and earnings as some of the factors.

Once you have the data, you can do scatter plots to see if there is some correlation between the input and the

target output. If you are not satisfied with the plot, you may consider a different input in its place.

Previous Table of Contents Next

IDG Books Worldwide, Inc.

C++ Neural Networks and Fuzzy Logic:Preface

The Target and the Timeframe

298

C++ Neural Networks and Fuzzy Logic

by Valluru B. Rao

MTBooks, IDG Books Worldwide, Inc.

ISBN: 1558515526 Pub Date: 06/01/95

Previous Table of Contents Next

Pre processing the Data for the Network

Surprising as it may sound, you are most likely going to spend about 90% of your time, as a neural network

developer, in massaging and transforming data into meaningful form for training your network. We actually

defined three substeps in this area of preprocessing in our master list:

• Highlight features

• Transform

• Scale and bias

Highlighting Features in the Input Data

You should present the neural network, as much as possible, with an easy way to find patterns in your data.

For time series data, like stock market prices over time, you may consider presenting quantities like rate of

change and acceleration (the first and second derivatives of your input) as examples. Other ways to highlight

data is to magnify certain occurrences. For example, if you consider Central bank intervention as an important

qualifier to foreign exchange rates, then you may include as an input to your network, a value of 1 or 0, to

indicate the presence or lack of presence of Central bank intervention. Now if you further consider the activity

of the U.S. Federal Reserve bank to be important by itself, then you may wish to highlight that, by separating

it out as another 1/0 input. Using 1/0 coding to separate composite effects is called thermometer encoding.

There is a whole body of study of market behavior called Technical Analysis from which you may also wish

to present technical studies on your data. There is a wide assortment of mathematical technical studies that

you perform on your data (see references), such as moving averages to smooth data as an example. There are

also pattern recognition studies you can use, like the “double−top” formation, which purportedly results in a

high probability of significant decline. To be able to recognize such a pattern, you may wish to present a

mathematical function that aids in the identification of the double−top.

You may want to de−emphasize unwanted noise in your input data. If you see a spike in your data, you can

lessen its effect, by passing it through a moving average filter for example. You should be careful about

introducing excessive lag in the resulting data though.

Transform the Data If Appropriate

For time series data, you may consider using a Fourier transform to move to the frequency−phase plane. This

will uncover periodic cyclic information if it exists. The Fourier transform will decompose the input discrete

data series into a series of frequency spikes that measure the relevance of each frequency component. If the

stock market indeed follows the so−called January effect, where prices typically make a run up, then you

would expect a strong yearly component in the frequency spectrum. Mark Jurik suggests sampling data with

intervals that catch different cycle periods, in his paper on neural network data preparation (see references ).

C++ Neural Networks and Fuzzy Logic:Preface

Pre processing the Data for the Network

299

You can use other signal processing techniques such as filtering. Besides the frequency domain, you can also

consider moving to other spaces, such as with using the wavelet transform. You may also analyze the chaotic

component of the data with chaos measures. It’s beyond the scope of this book to discuss these techniques.

(Refer to the Resources section of this chapter for more information.) If you are developing short−term

trading neural network systems, these techniques may play a significant role in your preprocessing effort. All

of these techniques will provide new ways of looking at your data, for possible features to detect in other

domains.

Scale Your Data

Neurons like to see data in a particular input range to be most effective. If you present data like the S&P 500

that varies from 200 to 550 (as the S&P 500 has over the years) will not be useful, since the middle layer of

neurons have a Sigmoid Activation function that squashes large inputs to either 0 or +1. In other words, you

should choose data that fit a range that does not saturate, or overwhelm the network neurons. Choosing inputs

from –1 to 1 or 0 to 1 is a good idea. By the same token, you should normalize the expected values for the

outputs to the 0 to 1 sigmoidal range.

It is important to pay attention to the number of input values in the data set that are close to zero. Since the

weight change law is proportional to the input value, then a close to zero input will mean that that weight will

not participate in learning! To avoid such situations, you can add a constant bias to your data to move the data

closer to 0.5, where the neurons respond very well.

Reduce Dimensionality

You should try to eliminate inputs wherever possible. This will reduce the dimensionality of the problem and

make it easier for your neural network to generalize. Suppose that you have three inputs, x, y and z and one

output, o. Now suppose that you find that all of your inputs are restricted only to one plane. You could

redefine axes such that you have x’ and y’ for the new plane and map your inputs to the new coordinates. This

changes the number of inputs to your problem to 2 instead of 3, without any loss of information. This is

illustrated in Figure 14.1.

Figure 14.1

Reducing dimensionality from three to two dimensions.

Previous Table of Contents Next

IDG Books Worldwide, Inc.

C++ Neural Networks and Fuzzy Logic:Preface

Reduce Dimensionality

300

C++ Neural Networks and Fuzzy Logic

by Valluru B. Rao

MTBooks, IDG Books Worldwide, Inc.

ISBN: 1558515526 Pub Date: 06/01/95

Previous Table of Contents Next

Generalization versus Memorization

If your overall goal is beyond pattern classification, you need to track your network’s ability to generalize.

Not only should you look at the overall error with the training set that you define, but you should set aside

some training examples as part of a test set (and do not train with them), with which you can see whether or

not the network is able to correctly predict. If the network responds poorly to your test set, you know that you

have overtrained, or you can say the network “memorized” the training patterns. If you look at the arbitrary

curve−fitting analogy in Figure 14.2, you see curves for a generalized fit, labeled G, and an overfit, labeled O.

In the case of the overfit, any data point outside of the training data results in highly erroneous prediction.

Your test data will certainly show you large error in the case of an overfitted model.

Figure 14.2

General (G) versus over fitting (0) of data.

Another way to consider this issue is in terms of Degrees Of Freedom (DOF). For the polynomial:

y= a0 + a1x + a2x2 + anxn...

the DOF equals the number of coefficients a0, a1 ... an, which is N + 1. So for the equation of a line (y=a0 +

a1x), the DOF would be 2. For a parabola, this would be 3 and so on. The objective to not overfit data can be

restated as an objective to obtain the function with the least DOF that fits the data adequately. For neural

network models, the larger the number of trainable weights (which is a function of the number of inputs and

the architecture), the larger the DOF. Be careful with having too many (unimportant) inputs. You may find

terrific results with your training data, but extremely poor results with your test data.

Eliminate Correlated Inputs Where Possible

You have seen that getting to the minimum number of inputs for a given problem is important in terms of

minimizing DOF and simplifying your model. Another way to reduce dimensionality is to look for correlated

inputs and to carefully eliminate redundancy. For example, you may find that the Swiss franc and German

mark are highly correlated over a certain time period of interest. You may wish to eliminate one of these

inputs to reduce dimensionality. You have to be careful in this process though. You may find that a seemingly

redundant piece of information is actually very important. Mark Jurik, of Jurik Consulting, in his paper on

data preprocessing, suggests that one of the best ways to determine if an input is really needed is to construct

neural network models with and without the input and choose the model with the best error on training and

test data. Although very iterative, you can try eliminating as many inputs as possible this way and be assured

that you haven’t eliminated a variable that really made a difference.

Another approach is sensitivity analysis, where you vary one input a little, while holding all others constant

and note the effect on the output. If the effect is small you eliminate that input. This approach is flawed

C++ Neural Networks and Fuzzy Logic:Preface

Eliminate Correlated Inputs Where Possible

301

because in the real world, all the inputs are not constant. Jurik’s approach is more time consuming but will

lead to a better model.

The process of decorrelation, or eliminating correlated inputs, can also utilize a linear algebra technique

called principal component analysis. The result of principal component analysis is a minimum set of variables

that contain the maximum information. For further information on principal component analysis, you should

consult a statistics reference or research two methods of principal component analysis: the Karhunen−Loev

transform and the Hotelling transform.

Design a Network Architecture

Now it’s time to actually design the neural network. For the backpropagation feed−forward neural network we

have designed, this means making the following choices:

1. The number of hidden layers.

2. The size of hidden layers.

3. The learning constant, beta([beta]).

4. The momentum parameter, alpha([alpha]).

5. The form of the squashing function (does not have to be the sigmoid).

6. The starting point, that is, initial weight matrix.

7. The addition of noise.

Some of the parameters listed can be made to vary with the number of cycles executed, similar to the current

implementation of noise. For example, you can start with a learning constant [beta] that is large and reduce

this constant as learning progresses. This allows rapid initial learning in the beginning of the process and may

speed up the overall simulation time.

The Train/Test/Redesign Loop

Much of the process of determining the best parameters for a given application is trial and error. You need to

spend a great deal of time evaluating different options to find the best fit for your problem. You may literally

create hundreds if not thousands of networks either manually or automatically to search for the best solution.

Many commercial neural network programs use genetic algorithms to help to automatically arrive at an

optimum network. A genetic algorithm makes up possible solutions to a problem from a set of starting genes.

Analogous to biological evolution, the algorithm combines genetic solutions with a predefined set of operators

to create new generations of solutions, who survive or perish depending on their ability to solve the problem.

The key benefit of genetic algorithms (GA) is the ability to traverse an enormous search space for a possibly

optimum solution. You would program a GA to search for the number of hidden layers and other network

parameters, and gradually evolve a neural network solution. Some vendors use a GA only to assign a starting

set of weights to the network, instead of randomizing the weights to start you off near a good solution.

Previous Table of Contents Next

IDG Books Worldwide, Inc.

C++ Neural Networks and Fuzzy Logic:Preface

Design a Network Architecture

302

C++ Neural Networks and Fuzzy Logic

by Valluru B. Rao

MTBooks, IDG Books Worldwide, Inc.

ISBN: 1558515526 Pub Date: 06/01/95

Previous Table of Contents Next

Now let’s review the steps:

1.

Split your data. First, divide you data set into three pieces, a training set, a test set and a blind

test set. Use about 80% of your data records for your training set, 10% for your test set and 10% for

your blind test set.

2.

Train and test. Next, start with a network topology and train your network on your training set

data. When you have reached a satisfactory minimum error, save your weights and apply your trained

network to the test data and note the error. Now restart the process with the same network topology

for a different set of initial weights and see if you can achieve a better error on training and test sets.

Reasoning: you may have found a local minimum on your first attempt and randomizing the initial

weights will start you off to a different, maybe better solution.

3.

Eliminate correlated inputs. You may optionally try at this point to see if you can eliminate

correlated inputs, as mentioned before, by iteratively removing each input and noting the best error

you can achieve on the training and test sets for each of these cases. Choose the case that leads to the

best error and eliminate the input (if any) that achieved it. You can repeat this whole process again to

try to eliminate another input variable.

4.

Iteratively train and test. Now you can try other network parameters and repeat the train and

test process to achieve a better result.

5.

Deploy your network. You now can use the blind test data set to see how your optimized

network performs. If the error is not satisfactory, then you need to re−enter the design phase or the

train and test phase.

6.

Revisit your network design when conditions change. You need to retrain your network when

you have reason to think that you have new information relevant to the problem you are modeling. If

you have a neural network that tries to predict the weekly change in the S&P 500, then you likely will

need to retrain your network at least once a month, if not once a week. If you find that the network no

longer generalizes well with the new information, you need to re−enter the design phase.

If this sounds like a lot of work, it is! Now, let’s try our luck at forecasting by going through a subset of the

steps outlined:

Download 1.14 Mb.

Do'stlaringiz bilan baham:

1 ... 23 24 25 26 27 28 29 30 ... 41