C++ Neural Networks and Fuzzy Logic
C++ Neural Networks and Fuzzy Logic
Download 1.14 Mb. Pdf ko'rish
|
C neural networks and fuzzy logic
- Bu sahifa navigatsiya:
- C++ Neural Networks and Fuzzy Logic by Valluru B. Rao MTBooks, IDG Books Worldwide, Inc. ISBN
- Who Trades with Neural Networks
- Developing a Forecasting Model
- The Target and the Timeframe
- Transform the Data If Appropriate
- Eliminate Correlated Inputs Where Possible
- Design a Network Architecture
- The Train/Test/Redesign Loop
C++ Neural Networks and Fuzzy Logic by Valluru B. Rao MTBooks, IDG Books Worldwide, Inc. ISBN: 1558515526 Pub Date: 06/01/95 Previous Table of Contents Next Summary You explored further the backpropagation algorithm in this chapter, continuing the discussion in Chapter 7. • A momentum term was added to the training law and was shown to result in much faster convergence in some cases. • A noise term was added to inputs to allow training to take place with random noise applied. This noise was made to decrease with the number of cycles, so that final stage learning could be done in a noise−free environment.
Chapter 12. Further application of the simulator will be made in Chapter 14. • Several applications with the backpropagation algorithm were outlined, showing the wide applicability of this algorithm. Previous Table of Contents Next Copyright © IDG Books Worldwide, Inc. C++ Neural Networks and Fuzzy Logic:Preface Summary 296
C++ Neural Networks and Fuzzy Logic by Valluru B. Rao MTBooks, IDG Books Worldwide, Inc. ISBN: 1558515526 Pub Date: 06/01/95 Previous Table of Contents Next Chapter 14 Application to Financial Forecasting Introduction In Chapters 7 and 13, the backpropagation simulator was developed. In this chapter, you will use the simulator to tackle a complex problem in financial forecasting. The application of neural networks to financial forecasting and modeling has been very popular over the last few years. Financial journals and magazines frequently mention the use of neural networks, and commercial tools and simulators are quite widespread. This chapter gives you an overview of typical steps used in creating financial forecasting models. Many of the steps will be simplified, and so the results will not, unfortunately, be good enough for real life application. However, this chapter will hopefully serve as an introduction to the field with some added pointers for further reading and resources for those who want more detailed information.
There has been a great amount of interest on Wall Street for neural networks. Bradford Lewis runs two Fidelity funds in part with the use of neural networks. Also, LBS Capital Management (Peoria, Illinois) manages part of its portfolio with neural networks. According to Barron’s (February 27, 1995), LBS’s $150 million fund beat the averages by three percentage points a year since 1992. Each weekend, neural networks are retrained with the latest technical and fundamental data including P/E ratios, earnings results and interest rates. Another of LBS’s models has done worse than the S&P 500 for the past five years however. In the book
publicly heard of. Clients who use neural networks usually don’t want anyone else to know what they are doing, for fear of losing their competitive edge. Firms put in many person−years of engineering design with a lot of CPU cycles to achieve practical and profitable results. Let’s look at the process: Developing a Forecasting Model There are many steps in building a forecasting model, as listed below. 1. Decide on what your target is and develop a neural network (following these steps) for each target.
2. Determine the time frame that you wish to forecast. 3. Gather information about the problem domain. 4. Gather the needed data and get a feel for each inputs relationship to the target. 5. Process the data to highlight features for the network to discern. 6. Transform the data as appropriate. 7. Scale and bias the data for the network, as needed. 8. Reduce the dimensionality of the input data as much as possible. C++ Neural Networks and Fuzzy Logic:Preface Chapter 14 Application to Financial Forecasting 297
9. Design a network architecture (topology, # layers, size of layers, parameters, learning paradigm). 10. Go through the train/test/redesign loop for a network. 11. Eliminate correlated inputs as much as possible, while in step 10. 12. Deploy your network on new data and test it and refine it as necessary. Once you develop a forecasting model, you then must integrate this into your trading system. A neural network can be designed to predict direction, or magnitude, or maybe just turning points in a particular market or something else. Avner Mandelman of Cereus Investments (Los Altos Hills, California) uses a long−range trained neural network to tell him when the market is making a top or bottom (Barron’s, December 14, 1992). Now let’s expand on the twelve aspects of model building: The Target and the Timeframe What should the output of your neural network forecast? Let’s say you want to predict the stock market. Do you want to predict the S&P 500? Or, do you want to predict the direction of the S&P 500 perhaps? You could predict the volatility of the S&P 500 too (maybe if you’re an options player). Further, like Mr. Mandelman, you could only want to predict tops and bottoms, say, for the Dow Jones Industrial Average. You need to decide on the market or markets and also on your specific objectives. Another crucial decision is the timeframe you want to predict forward. It is easier to create neural network models for longer term predictions than it is for shorter term predictions. You can see a lot of market noise, or seemingly random, chaotic variations at smaller and smaller timescale resolutions that might explain this. Another reason is that the macroeconomic forces that fundamentally move market over long periods, move slowly. The U.S. dollar makes multiyear trends, shaped by economic policy of governments around the world. For a given error tolerance, a one−year forecast, or one−month forecast will take less effort with a neural network than a one−day forecast will.
So far we’ve talked about the target and the timeframe. Now one other important aspect of model building is knowledge of the domain. If you want to create an effective predictive model of the weather, then you need to know or be able to guess about the factors that influence weather. The same holds true for the stock market or other financial market. In order to create a real tradable Treasury bond trading system, you need to have a good idea of what really drives the market and works— i.e., talk to a Tbond trader and encapsulate his domain expertise!
Once you know the factors that influence the target output, you can gather raw data. If you are predicting the S&P 500, then you may consider Treasury yields, 3−month T−bill yields, and earnings as some of the factors. Once you have the data, you can do scatter plots to see if there is some correlation between the input and the target output. If you are not satisfied with the plot, you may consider a different input in its place. Previous Table of Contents Next Copyright © IDG Books Worldwide, Inc. C++ Neural Networks and Fuzzy Logic:Preface The Target and the Timeframe 298
C++ Neural Networks and Fuzzy Logic by Valluru B. Rao MTBooks, IDG Books Worldwide, Inc. ISBN: 1558515526 Pub Date: 06/01/95 Previous Table of Contents Next Pre processing the Data for the Network Surprising as it may sound, you are most likely going to spend about 90% of your time, as a neural network developer, in massaging and transforming data into meaningful form for training your network. We actually defined three substeps in this area of preprocessing in our master list: • Highlight features • Transform • Scale and bias Highlighting Features in the Input Data You should present the neural network, as much as possible, with an easy way to find patterns in your data. For time series data, like stock market prices over time, you may consider presenting quantities like rate of change and acceleration (the first and second derivatives of your input) as examples. Other ways to highlight data is to magnify certain occurrences. For example, if you consider Central bank intervention as an important qualifier to foreign exchange rates, then you may include as an input to your network, a value of 1 or 0, to indicate the presence or lack of presence of Central bank intervention. Now if you further consider the activity of the U.S. Federal Reserve bank to be important by itself, then you may wish to highlight that, by separating it out as another 1/0 input. Using 1/0 coding to separate composite effects is called thermometer encoding. There is a whole body of study of market behavior called Technical Analysis from which you may also wish to present technical studies on your data. There is a wide assortment of mathematical technical studies that you perform on your data (see references), such as moving averages to smooth data as an example. There are also pattern recognition studies you can use, like the “double−top” formation, which purportedly results in a high probability of significant decline. To be able to recognize such a pattern, you may wish to present a mathematical function that aids in the identification of the double−top. You may want to de−emphasize unwanted noise in your input data. If you see a spike in your data, you can lessen its effect, by passing it through a moving average filter for example. You should be careful about introducing excessive lag in the resulting data though. Transform the Data If Appropriate For time series data, you may consider using a Fourier transform to move to the frequency−phase plane. This will uncover periodic cyclic information if it exists. The Fourier transform will decompose the input discrete data series into a series of frequency spikes that measure the relevance of each frequency component. If the stock market indeed follows the so−called January effect, where prices typically make a run up, then you would expect a strong yearly component in the frequency spectrum. Mark Jurik suggests sampling data with intervals that catch different cycle periods, in his paper on neural network data preparation (see references ). C++ Neural Networks and Fuzzy Logic:Preface Pre processing the Data for the Network 299
You can use other signal processing techniques such as filtering. Besides the frequency domain, you can also consider moving to other spaces, such as with using the wavelet transform. You may also analyze the chaotic component of the data with chaos measures. It’s beyond the scope of this book to discuss these techniques. (Refer to the Resources section of this chapter for more information.) If you are developing short−term trading neural network systems, these techniques may play a significant role in your preprocessing effort. All of these techniques will provide new ways of looking at your data, for possible features to detect in other domains.
Neurons like to see data in a particular input range to be most effective. If you present data like the S&P 500 that varies from 200 to 550 (as the S&P 500 has over the years) will not be useful, since the middle layer of neurons have a Sigmoid Activation function that squashes large inputs to either 0 or +1. In other words, you should choose data that fit a range that does not saturate, or overwhelm the network neurons. Choosing inputs from –1 to 1 or 0 to 1 is a good idea. By the same token, you should normalize the expected values for the outputs to the 0 to 1 sigmoidal range. It is important to pay attention to the number of input values in the data set that are close to zero. Since the weight change law is proportional to the input value, then a close to zero input will mean that that weight will not participate in learning! To avoid such situations, you can add a constant bias to your data to move the data closer to 0.5, where the neurons respond very well.
You should try to eliminate inputs wherever possible. This will reduce the dimensionality of the problem and make it easier for your neural network to generalize. Suppose that you have three inputs, x, y and z and one output, o. Now suppose that you find that all of your inputs are restricted only to one plane. You could redefine axes such that you have x’ and y’ for the new plane and map your inputs to the new coordinates. This changes the number of inputs to your problem to 2 instead of 3, without any loss of information. This is illustrated in Figure 14.1.
Reducing dimensionality from three to two dimensions. Previous Table of Contents Next Copyright © IDG Books Worldwide, Inc. C++ Neural Networks and Fuzzy Logic:Preface Reduce Dimensionality 300
C++ Neural Networks and Fuzzy Logic by Valluru B. Rao MTBooks, IDG Books Worldwide, Inc. ISBN: 1558515526 Pub Date: 06/01/95 Previous Table of Contents Next Generalization versus Memorization If your overall goal is beyond pattern classification, you need to track your network’s ability to generalize. Not only should you look at the overall error with the training set that you define, but you should set aside some training examples as part of a test set (and do not train with them), with which you can see whether or not the network is able to correctly predict. If the network responds poorly to your test set, you know that you have overtrained, or you can say the network “memorized” the training patterns. If you look at the arbitrary curve−fitting analogy in Figure 14.2, you see curves for a generalized fit, labeled G, and an overfit, labeled O. In the case of the overfit, any data point outside of the training data results in highly erroneous prediction. Your test data will certainly show you large error in the case of an overfitted model.
General (G) versus over fitting (0) of data. Another way to consider this issue is in terms of Degrees Of Freedom (DOF). For the polynomial: y= a0 + a1x + a2x2 + anxn... the DOF equals the number of coefficients a0, a1 ... an, which is N + 1. So for the equation of a line (y=a0 +
restated as an objective to obtain the function with the least DOF that fits the data adequately. For neural network models, the larger the number of trainable weights (which is a function of the number of inputs and the architecture), the larger the DOF. Be careful with having too many (unimportant) inputs. You may find terrific results with your training data, but extremely poor results with your test data.
You have seen that getting to the minimum number of inputs for a given problem is important in terms of minimizing DOF and simplifying your model. Another way to reduce dimensionality is to look for correlated inputs and to carefully eliminate redundancy. For example, you may find that the Swiss franc and German mark are highly correlated over a certain time period of interest. You may wish to eliminate one of these inputs to reduce dimensionality. You have to be careful in this process though. You may find that a seemingly redundant piece of information is actually very important. Mark Jurik, of Jurik Consulting, in his paper on data preprocessing, suggests that one of the best ways to determine if an input is really needed is to construct neural network models with and without the input and choose the model with the best error on training and test data. Although very iterative, you can try eliminating as many inputs as possible this way and be assured that you haven’t eliminated a variable that really made a difference. Another approach is sensitivity analysis, where you vary one input a little, while holding all others constant and note the effect on the output. If the effect is small you eliminate that input. This approach is flawed C++ Neural Networks and Fuzzy Logic:Preface Eliminate Correlated Inputs Where Possible 301
because in the real world, all the inputs are not constant. Jurik’s approach is more time consuming but will lead to a better model. The process of decorrelation, or eliminating correlated inputs, can also utilize a linear algebra technique called principal component analysis. The result of principal component analysis is a minimum set of variables that contain the maximum information. For further information on principal component analysis, you should consult a statistics reference or research two methods of principal component analysis: the Karhunen−Loev transform and the Hotelling transform. Design a Network Architecture Now it’s time to actually design the neural network. For the backpropagation feed−forward neural network we have designed, this means making the following choices:
Some of the parameters listed can be made to vary with the number of cycles executed, similar to the current implementation of noise. For example, you can start with a learning constant [beta] that is large and reduce this constant as learning progresses. This allows rapid initial learning in the beginning of the process and may speed up the overall simulation time.
Much of the process of determining the best parameters for a given application is trial and error. You need to spend a great deal of time evaluating different options to find the best fit for your problem. You may literally create hundreds if not thousands of networks either manually or automatically to search for the best solution. Many commercial neural network programs use genetic algorithms to help to automatically arrive at an optimum network. A genetic algorithm makes up possible solutions to a problem from a set of starting genes. Analogous to biological evolution, the algorithm combines genetic solutions with a predefined set of operators to create new generations of solutions, who survive or perish depending on their ability to solve the problem. The key benefit of genetic algorithms (GA) is the ability to traverse an enormous search space for a possibly optimum solution. You would program a GA to search for the number of hidden layers and other network parameters, and gradually evolve a neural network solution. Some vendors use a GA only to assign a starting set of weights to the network, instead of randomizing the weights to start you off near a good solution. Previous Table of Contents Next Copyright © IDG Books Worldwide, Inc. C++ Neural Networks and Fuzzy Logic:Preface Design a Network Architecture 302
C++ Neural Networks and Fuzzy Logic by Valluru B. Rao MTBooks, IDG Books Worldwide, Inc. ISBN: 1558515526 Pub Date: 06/01/95 Previous Table of Contents Next Now let’s review the steps:
test set. Use about 80% of your data records for your training set, 10% for your test set and 10% for your blind test set.
data. When you have reached a satisfactory minimum error, save your weights and apply your trained network to the test data and note the error. Now restart the process with the same network topology for a different set of initial weights and see if you can achieve a better error on training and test sets. Reasoning: you may have found a local minimum on your first attempt and randomizing the initial weights will start you off to a different, maybe better solution. 3. Eliminate correlated inputs. You may optionally try at this point to see if you can eliminate correlated inputs, as mentioned before, by iteratively removing each input and noting the best error you can achieve on the training and test sets for each of these cases. Choose the case that leads to the best error and eliminate the input (if any) that achieved it. You can repeat this whole process again to try to eliminate another input variable.
test process to achieve a better result. 5. Deploy your network. You now can use the blind test data set to see how your optimized network performs. If the error is not satisfactory, then you need to re−enter the design phase or the train and test phase.
you have reason to think that you have new information relevant to the problem you are modeling. If you have a neural network that tries to predict the weekly change in the S&P 500, then you likely will need to retrain your network at least once a month, if not once a week. If you find that the network no longer generalizes well with the new information, you need to re−enter the design phase. If this sounds like a lot of work, it is! Now, let’s try our luck at forecasting by going through a subset of the steps outlined:
Download 1.14 Mb. Do'stlaringiz bilan baham: |
ma'muriyatiga murojaat qiling