C++ Neural Networks and Fuzzy Logic
Table 5.9 Adjustment of Weights stepw
Download 1.14 Mb. Pdf ko'rish
|
C neural networks and fuzzy logic
- Bu sahifa navigatsiya:
- C++ Neural Networks and Fuzzy Logic by Valluru B. Rao MTBooks, IDG Books Worldwide, Inc. ISBN
- Initializing Weights for Autoassociative Networks
- threshold
- sigmoid logistic
- step
- sigmoid
- Some Neural Network Models Adaline and Madaline
- Figure for Backpropagation Network
- Bidirectional Associative Memory
- Temporal Associative Memory
- Figure 5.7
Table 5.9 Adjustment of Weights stepw 1 w 2 abactivationoutputcomment 1001100desired output is 1; increment both w’s 2111121output is what it should be 3111011output is what it should be 4110111output is 1; it should be 0. 5subtract 1 from w 2 6100100output is what it should be 7100000output is what it should be 8101111output is what it should be 9101011output is what it should be Previous Table of Contents Next Copyright © IDG Books Worldwide, Inc. C++ Neural Networks and Fuzzy Logic:Preface Instar and Outstar 91
C++ Neural Networks and Fuzzy Logic by Valluru B. Rao MTBooks, IDG Books Worldwide, Inc. ISBN: 1558515526 Pub Date: 06/01/95 Previous Table of Contents Next Table 5.9 shows that the network weight vector changed from an initial vector (0, 0) to the final weight vector (1, 0) in eight iterations. This example is not of a network for pattern matching. If you think about it, you will realize that the network is designed to fire if the first digit in the pattern is a 1, and not otherwise. An analogy for this kind of a problem is determining if a given image contains a specific object in a specific part of the image, such as a dot should occur in the letter i. If the initial weights are chosen somewhat prudently and to make some particular relevance, then the speed of operation can be increased in the sense of convergence being achieved with fewer iterations than otherwise. Thus, encoding algorithms are important. We now present some of the encoding algorithms. Initializing Weights for Autoassociative Networks Consider a network that is to associate each input pattern with itself and which gets binary patterns as inputs. Make a bipolar mapping on the input pattern. That is, replace each 0 by –1. Call the mapped pattern the vector
. You will get a matrix of order the size of x when you form the product xx
. Obtain similar matrices for the other patterns you want the network to store. Add these matrices to give you the matrix of weights to be used initially, as we did in Chapter 4. This process can be described with the following equation: W = Â i
i
i T Weight Initialization for Heteroassociative Networks Consider a network that is to associate one input pattern with another pattern and that gets binary patterns as inputs. Make a bipolar mapping on the input pattern. That is, replace each 0 by –1. Call the mapped pattern the vector x when written as a column vector. Get a similar bipolar mapping for the corresponding associated pattern. Call it y. You will get a matrix of size x by size y when you form the product xy T . Obtain similar matrices for the other patterns you want the network to store. Add these matrices to give you the matrix of weights to be used initially. The following equation restates this process: W = Â i
i
i T On Center, Off Surround In one of the many interesting paradigms you encounter in neural network models and theory, is the strategy winner takes all. Well, if there should be one winner emerging from a crowd of neurons in a particular layer, there needs to be competition. Since everybody is for himself in such a competition, in this case every neuron for itself, it would be necessary to have lateral connections that indicate this circumstance. The lateral connections from any neuron to the others should have a negative weight. Or, the neuron with the highest activation is considered the winner and only its weights are modified in the training process, leaving the weights of others the same. Winner takes all means that only one neuron in that layer fires and the others do not. This can happen in a hidden layer or in the output layer. C++ Neural Networks and Fuzzy Logic:Preface Initializing Weights for Autoassociative Networks 92
In another situation, when a particular category of input is to be identified from among several groups of inputs, there has to be a subset of the neurons that are dedicated to seeing it happen. In this case, inhibition increases for distant neurons, whereas excitation increases for the neighboring ones, as far as such a subset of neurons is concerned. The phrase on center, off surround describes this phenomenon of distant inhibition and near excitation. Weights also are the prime components in a neural network, as they reflect on the one hand the memory stored by the network, and on the other hand the basis for learning and training.
You have seen that mutually orthogonal or almost orthogonal patterns are required as stable stored patterns for the Hopfield network, which we discussed before for pattern matching. Similar restrictions are found also with other neural networks. Sometimes it is not a restriction, but the purpose of the model makes natural a certain type of input. Certainly, in the context of pattern classification, binary input patterns make problem setup simpler. Binary, bipolar, and analog signals are the varieties of inputs. Networks that accept analog signals as inputs are for continuous models, and those that require binary or bipolar inputs are for discrete models. Binary inputs can be fed to networks for continuous models, but analog signals cannot be input to networks for discrete models (unless they are fuzzified). With input possibilities being discrete or analog, and the model possibilities being discrete or continuous, there are potentially four situations, but one of them where analog inputs are considered for a discrete model is untenable. An example of a continuous model is where a network is to adjust the angle by which the steering wheel of a truck is to be turned to back up the truck into a parking space. If a network is supposed to recognize characters of the alphabet, a means of discretization of a character allows the use of a discrete model. What are the types of inputs for problems like image processing or handwriting analysis? Remembering that artificial neurons, as processing elements, do aggregation of their inputs by using connection weights, and that the output neuron uses a threshold function, you know that the inputs have to be numerical. A handwritten character can be superimposed on a grid, and the input can consist of the cells in each row of the grid where a part of the character is present. In other words, the input corresponding to one character will be a set of binary or gray−scale sequences containing one sequence for each row of the grid. A 1 in a particular position in the sequence for a row shows that the corresponding pixel is present(black) in that part of the grid, while 0 shows it is not. The size of the grid has to be big enough to accommodate the largest character under study, as well as the most complex features. Previous Table of Contents Next Copyright © IDG Books Worldwide, Inc. C++ Neural Networks and Fuzzy Logic:Preface Inputs
93 C++ Neural Networks and Fuzzy Logic by Valluru B. Rao MTBooks, IDG Books Worldwide, Inc. ISBN: 1558515526 Pub Date: 06/01/95 Previous Table of Contents Next Outputs The output from some neural networks is a spatial pattern that can include a bit pattern, in some a binary function value, and in some others an analog signal. The type of mapping intended for the inputs determines the type of outputs, naturally. The output could be one of classifying the input data, or finding associations between patterns of the same dimension as the input. The threshold functions do the final mapping of the activations of the output neurons into the network outputs. But the outputs from a single cycle of operation of a neural network may not be the final outputs, since you would iterate the network into further cycles of operation until you see convergence. If convergence seems possible, but is taking an awful lot of time and effort, that is, if it is too slow to learn, you may assign a tolerance level and settle for the network to achieve near convergence. The Threshold Function The output of any neuron is the result of thresholding, if any, of its internal activation, which, in turn, is the weighted sum of the neuron’s inputs. Thresholding sometimes is done for the sake of scaling down the activation and mapping it into a meaningful output for the problem, and sometimes for adding a bias. Thresholding (scaling) is important for multilayer networks to preserve a meaningful range across each layer’s operations. The most often used threshold function is the sigmoid function. A step function or a ramp function or just a linear function can be used, as when you simply add the bias to the activation. The sigmoid function accomplishes mapping the activation into the interval [0, 1]. The equations are given as follows for the different threshold functions just mentioned.
More than one function goes by the name sigmoid function. They differ in their formulas and in their ranges. They all have a graph similar to a stretched letter s. We give below two such functions. The first is the hyperbolic tangent function with values in (–1, 1). The second is the logistic function and has values between 0 and 1. You therefore choose the one that fits the range you want. The graph of the sigmoid logistic function is given in Fig. 5.3. 1. f(x) = tanh(x) = ( e x − e −x ) / (e
x + e
−x )
−x )
−x / (e x + e −x ) after adding and also subtracting e −x to the numerator, and then simplifying. If now you multiply in the second term both numerator and denominator by e
, you get 1 − 2/ (e 2x + 1). As x approaches −, this function goes to −1, and as x approaches +, it goes to +1. On the other hand, the second function here, the sigmoid logistic function, goes to 0 as x approaches −, and to +1 as x approaches +. You can see this if you rewrite 1 / (1+ e −x ) as 1 − 1 / (1+ e x ), after manipulations similar to those above. C++ Neural Networks and Fuzzy Logic:Preface Outputs
94 You can think of equation 1 as the bipolar equivalent of binary equation 2. Both functions have the same shape.
Figure 5.3 is the graph of the sigmoid logistic function (number 2 of the preceding list). Figure 5.3 The sigmoid function. The Step Function The step function is also frequently used as a threshold function. The function is 0 to start with and remains so to the left of some threshold value ¸. A jump to 1 occurs for the value of the function to the right of ¸, and the function then remains at the level 1. In general, a step function can have a finite number of points at which jumps of equal or unequal size occur. When the jumps are equal and at many points, the graph will resemble a staircase. We are interested in a step function that goes from 0 to 1 in one step, as soon as the argument exceeds the threshold value ¸. You could also have two values other than 0 and 1 in defining the range of values of such a step function. A graph of the step function follows in Figure 5.4. Figure 5.4 The step function. Note: You can think of a sigmoid function as a fuzzy step function. The Ramp Function To describe the ramp function simply, first consider a step function that makes a jump from 0 to 1 at some point. Instead of letting it take a sudden jump like that at one point, let it gradually gain in value, along a straight line (looks like a ramp), over a finite interval reaching from an initial 0 to a final 1. Thus, you get a ramp function. You can think of a ramp function as a piecewise linear approximation of a sigmoid. The graph of a ramp function is illustrated in Figure 5.5. Figure 5.5 Graph of a ramp function. Linear Function A linear function is a simple one given by an equation of the form: f(x) = ±x + ² When ± = 1, the application of this threshold function amounts to simply adding a bias equal to ² to the sum of the inputs. C++ Neural Networks and Fuzzy Logic:Preface The Step Function 95
Applications As briefly indicated before, the areas of application generally include auto− and heteroassociation, pattern recognition, data compression, data completion, signal filtering, image processing, forecasting, handwriting recognition, and optimization. The type of connections in the network, and the type of learning algorithm used must be chosen appropriate to the application. For example, a network with lateral connections can do autoassociation, while a feed−forward type can do forecasting. Some Neural Network Models Adaline and Madaline Adaline is the acronym for adaptive linear element, due to Bernard Widrow and Marcian Hoff. It is similar to a Perceptron. Inputs are real numbers in the interval [–1,+1], and learning is based on the criterion of minimizing the average squared error. Adaline has a high capacity to store patterns. Madaline stands for many Adalines and is a neural network that is widely used. It is composed of field A and field B neurons, and there is one connection from each field A neuron to each field B neuron. Figure 5.6 shows a diagram of the Madaline. Figure 5.6 The Madaline model. Previous Table of Contents Next Copyright © IDG Books Worldwide, Inc. C++ Neural Networks and Fuzzy Logic:Preface Applications 96
C++ Neural Networks and Fuzzy Logic by Valluru B. Rao MTBooks, IDG Books Worldwide, Inc. ISBN: 1558515526 Pub Date: 06/01/95 Previous Table of Contents Next Backpropagation The Backpropagation training algorithm for training feed−forward networks was developed by Paul Werbos, and later by Parker, and Rummelhart and McClelland. This type of network configuration is the most common in use, due to its ease of training. It is estimated that over 80% of all neural network projects in development use backpropagation. In backpropagation, there are two phases in its learning cycle, one to propagate the input pattern through the network and the other to adapt the output, by changing the weights in the network. It is the error signals that are backpropagated in the network operation to the hidden layer(s). The portion of the error signal that a hidden−layer neuron receives in this process is an estimate of the contribution of a particular neuron to the output error. Adjusting on this basis the weights of the connections, the squared error, or some other metric, is reduced in each cycle and finally minimized, if possible. Figure for Backpropagation Network You will find in Figure 7.1 in Chapter 7, the layout of the nodes that represent the neurons in a feedforward Backpropagation network and the connections between them. For now, you try your hand at drawing this layout based on the following description, and compare your drawing with Figure 7.1. There are three fields of neurons. The connections are forward and are from each neuron in a layer to every neuron in the next layer. There are no lateral or recurrent connections. Labels on connections indicate weights. Keep in mind that the number of neurons is not necessarily the same in different layers, and this fact should be evident in the notation for the weights. Bidirectional Associative Memory Bidirectional Associative Memory, (BAM), and other models described in this section were developed by Bart Kosko. BAM is a network with feedback connections from the output layer to the input layer. It associates a member of the set of input patterns with a member of the set of output patterns that is the closest, and thus it does heteroassociation. The patterns can be with binary or bipolar values. If all possible input patterns are known, the matrix of connection weights can be determined as the sum of matrices obtained by taking the matrix product of an input vector (as a column vector) with its transpose (written as a row vector). The pattern obtained from the output layer in one cycle of operation is fed back at the input layer at the start of the next cycle. The process continues until the network stabilizes on all the input patterns. The stable state so achieved is described as resonance, a concept used in the Adaptive Resonance Theory. You will find in Figure 8.1 in Chapter 8, the layout of the nodes that represent the neurons in a BAM network and the connections between them. There are two fields of neurons. The network is fully connected with feedback connections and forward connections. There are no lateral or recurrent connections. Fuzzy Associative memories are similar to Bidirectional Associative memories, except that association is established between fuzzy patterns. Chapter 9 deals with Fuzzy Associative memories. C++ Neural Networks and Fuzzy Logic:Preface Backpropagation 97
Temporal Associative Memory Another type of associative memory is temporal associative memory. Amari, a pioneer in the field of neural networks, constructed a Temporal Associative Memory model that has feedback connections between the input and output layers. The forte of this model is that it can store and retrieve spatiotemporal patterns. An example of a spatiotemporal pattern is a waveform of a speech segment.
Introduced by James Anderson and others, this network differs from the single−layer fully connected Hopfield network in that Brain−State−in−a−Box uses what we call recurrent connections as well. Each neuron has a connection to itself. With target patterns available, a modified Hebbian learning rule is used. The adjustment to a connection weight is proportional to the product of the desired output and the error in the computed output. You will see more on Hebbian learning in Chapter 6. This network is adept at noise tolerance, and it can accomplish pattern completion. Figure 5.7 shows a Brain−State−in−a−Box network.
A Brain−State−in−a−Box, network. What’s in a Name? More like what’s in the box? Suppose you find the following: there is a square box and its corners are the locations for an entity to be. The entity is not at one of the corners, but is at some point inside the box. The next position for the entity is determined by working out the change in each coordinate of the position, according to a weight matrix, and a squashing function. This process is repeated until the entity settles down at some position. The choice of the weight matrix is such that when the entity reaches a corner of the square box, its position is stable and no more movement takes place. You would perhaps guess that the entity finally settles at the corner nearest to the initial position of it within the box. It is said that this kind of an example is the reason for the name Brain−State−in−a−Box for the model. Its forte is that it represents linear transformations. Some type of association of patterns can be achieved with this model. If an incomplete pattern is associated with a completed pattern, it would be an example of autoassociation.
This is a neural network model developed by Robert Hecht−Nielsen, that has one or two additional layers between the input and output layers. If it is one, the middle layer is a Grossberg layer with a bunch of outstars. In the other case, a Kohonen layer, or a self−organizing layer, follows the input layer, and in turn is followed by a Grossberg layer of outstars. The model has the distinction of considerably reducing training time. With this model, you gain a tool that works like a look−up table. Previous Table of Contents Next Copyright © IDG Books Worldwide, Inc. C++ Neural Networks and Fuzzy Logic:Preface Temporal Associative Memory 98
C++ Neural Networks and Fuzzy Logic by Valluru B. Rao MTBooks, IDG Books Worldwide, Inc. ISBN: 1558515526 Pub Date: 06/01/95 Previous Table of Contents Next Neocognitron Compared to all other neural network models, Fukushima’s Neocognitron is more complex and ambitious. It demonstrates the advantages of a multilayered network. The Neocognitron is one of the best models for recognizing handwritten symbols. Many pairs of layers called the S layer, for simple layer, and C layer, for complex layer, are used. Within each S layer are several planes containing simple cells. Similarly, there are within each C layer, an equal number of planes containing complex cells. The input layer does not have this arrangement and is like an input layer in any other neural network. The number of planes of simple cells and of complex cells within a pair of S and C layers being the same, these planes are paired, and the complex plane cells process the outputs of the simple plane cells. The simple cells are trained so that the response of a simple cell corresponds to a specific portion of the input image. If the same part of the image occurs with some distortion, in terms of scaling or rotation, a different set of simple cells responds to it. The complex cells output to indicate that some simple cell they correspond to did fire. While simple cells respond to what is in a contiguous region in the image, complex cells respond on the basis of a larger region. As the process continues to the output layer, the C−layer component of the output layer responds, corresponding to the entire image presented in the beginning at the input layer.
Download 1.14 Mb. Do'stlaringiz bilan baham: |
ma'muriyatiga murojaat qiling