C++ Neural Networks and Fuzzy Logic

Table 5.9 Adjustment of Weights stepw

bet	9/41
Sana	16.08.2020
Hajmi	1,14 Mb.
	#126479

1 ... 5 6 7 8 9 10 11 12 ... 41

Bog'liq
C neural networks and fuzzy logic

Table 5.9 Adjustment of Weights

stepw

1

w

2

abactivationoutputcomment

1001100desired output is 1; increment both w’s

2111121output is what it should be

3111011output is what it should be

4110111output is 1; it should be 0.

5subtract 1 from w

6100100output is what it should be

7100000output is what it should be

8101111output is what it should be

9101011output is what it should be

Previous Table of Contents Next

IDG Books Worldwide, Inc.

C++ Neural Networks and Fuzzy Logic:Preface

Instar and Outstar

C++ Neural Networks and Fuzzy Logic

by Valluru B. Rao

MTBooks, IDG Books Worldwide, Inc.

ISBN: 1558515526 Pub Date: 06/01/95

Previous Table of Contents Next

Table 5.9 shows that the network weight vector changed from an initial vector (0, 0) to the final weight vector

(1, 0) in eight iterations. This example is not of a network for pattern matching. If you think about it, you will

realize that the network is designed to fire if the first digit in the pattern is a 1, and not otherwise. An analogy

for this kind of a problem is determining if a given image contains a specific object in a specific part of the

image, such as a dot should occur in the letter i.

If the initial weights are chosen somewhat prudently and to make some particular relevance, then the speed of

operation can be increased in the sense of convergence being achieved with fewer iterations than otherwise.

Thus, encoding algorithms are important. We now present some of the encoding algorithms.

Initializing Weights for Autoassociative Networks

Consider a network that is to associate each input pattern with itself and which gets binary patterns as inputs.

Make a bipolar mapping on the input pattern. That is, replace each 0 by –1. Call the mapped pattern the vector

x, when written as a column vector. The transpose, the same vector written as a row vector, is x

T

. You will get

a matrix of order the size of x when you form the product xx

T

. Obtain similar matrices for the other patterns

you want the network to store. Add these matrices to give you the matrix of weights to be used initially, as we

did in Chapter 4. This process can be described with the following equation:

W = Â

i

x

Weight Initialization for Heteroassociative Networks

Consider a network that is to associate one input pattern with another pattern and that gets binary patterns as

inputs. Make a bipolar mapping on the input pattern. That is, replace each 0 by –1. Call the mapped pattern

the vector x when written as a column vector. Get a similar bipolar mapping for the corresponding associated

pattern. Call it y. You will get a matrix of size x by size y when you form the product xy

. Obtain similar

matrices for the other patterns you want the network to store. Add these matrices to give you the matrix of

weights to be used initially. The following equation restates this process:

W = Â

i

x

i

y

On Center, Off Surround

In one of the many interesting paradigms you encounter in neural network models and theory, is the strategy

winner takes all. Well, if there should be one winner emerging from a crowd of neurons in a particular layer,

there needs to be competition. Since everybody is for himself in such a competition, in this case every neuron

for itself, it would be necessary to have lateral connections that indicate this circumstance. The lateral

connections from any neuron to the others should have a negative weight. Or, the neuron with the highest

activation is considered the winner and only its weights are modified in the training process, leaving the

weights of others the same. Winner takes all means that only one neuron in that layer fires and the others do

not. This can happen in a hidden layer or in the output layer.

C++ Neural Networks and Fuzzy Logic:Preface

Initializing Weights for Autoassociative Networks

In another situation, when a particular category of input is to be identified from among several groups of

inputs, there has to be a subset of the neurons that are dedicated to seeing it happen. In this case, inhibition

increases for distant neurons, whereas excitation increases for the neighboring ones, as far as such a subset of

neurons is concerned. The phrase on center, off surround describes this phenomenon of distant inhibition and

near excitation.

Weights also are the prime components in a neural network, as they reflect on the one hand the memory stored

by the network, and on the other hand the basis for learning and training.

Inputs

You have seen that mutually orthogonal or almost orthogonal patterns are required as stable stored patterns for

the Hopfield network, which we discussed before for pattern matching. Similar restrictions are found also

with other neural networks. Sometimes it is not a restriction, but the purpose of the model makes natural a

certain type of input. Certainly, in the context of pattern classification, binary input patterns make problem

setup simpler. Binary, bipolar, and analog signals are the varieties of inputs. Networks that accept analog

signals as inputs are for continuous models, and those that require binary or bipolar inputs are for discrete

models. Binary inputs can be fed to networks for continuous models, but analog signals cannot be input to

networks for discrete models (unless they are fuzzified). With input possibilities being discrete or analog, and

the model possibilities being discrete or continuous, there are potentially four situations, but one of them

where analog inputs are considered for a discrete model is untenable.

An example of a continuous model is where a network is to adjust the angle by which the steering wheel of a

truck is to be turned to back up the truck into a parking space. If a network is supposed to recognize characters

of the alphabet, a means of discretization of a character allows the use of a discrete model.

What are the types of inputs for problems like image processing or handwriting analysis? Remembering that

artificial neurons, as processing elements, do aggregation of their inputs by using connection weights, and that

the output neuron uses a threshold function, you know that the inputs have to be numerical. A handwritten

character can be superimposed on a grid, and the input can consist of the cells in each row of the grid where a

part of the character is present. In other words, the input corresponding to one character will be a set of binary

or gray−scale sequences containing one sequence for each row of the grid. A 1 in a particular position in the

sequence for a row shows that the corresponding pixel is present(black) in that part of the grid, while 0 shows

it is not. The size of the grid has to be big enough to accommodate the largest character under study, as well as

the most complex features.

Previous Table of Contents Next

IDG Books Worldwide, Inc.

C++ Neural Networks and Fuzzy Logic:Preface

Inputs

C++ Neural Networks and Fuzzy Logic

by Valluru B. Rao

MTBooks, IDG Books Worldwide, Inc.

ISBN: 1558515526 Pub Date: 06/01/95

Previous Table of Contents Next

Outputs

The output from some neural networks is a spatial pattern that can include a bit pattern, in some a binary

function value, and in some others an analog signal. The type of mapping intended for the inputs determines

the type of outputs, naturally. The output could be one of classifying the input data, or finding associations

between patterns of the same dimension as the input.

The threshold functions do the final mapping of the activations of the output neurons into the network

outputs. But the outputs from a single cycle of operation of a neural network may not be the final outputs,

since you would iterate the network into further cycles of operation until you see convergence. If convergence

seems possible, but is taking an awful lot of time and effort, that is, if it is too slow to learn, you may assign a

tolerance level and settle for the network to achieve near convergence.

The Threshold Function

The output of any neuron is the result of thresholding, if any, of its internal activation, which, in turn, is the

weighted sum of the neuron’s inputs. Thresholding sometimes is done for the sake of scaling down the

activation and mapping it into a meaningful output for the problem, and sometimes for adding a bias.

Thresholding (scaling) is important for multilayer networks to preserve a meaningful range across each layer’s

operations. The most often used threshold function is the sigmoid function. A step function or a ramp

function or just a linear function can be used, as when you simply add the bias to the activation. The sigmoid

function accomplishes mapping the activation into the interval [0, 1]. The equations are given as follows for

the different threshold functions just mentioned.

The Sigmoid Function

More than one function goes by the name sigmoid function. They differ in their formulas and in their ranges.

They all have a graph similar to a stretched letter s. We give below two such functions. The first is the

hyperbolic tangent function with values in (–1, 1). The second is the logistic function and has values between

0 and 1. You therefore choose the one that fits the range you want. The graph of the sigmoid logistic function

is given in Fig. 5.3.

1. f(x) = tanh(x) = ( e

− e

−x

) / (e

+ e

−x

)

2. f(x) = 1 / (1+ e

−x

)

Note that the first function here, the hyperbolic tangent function, can also be written, as 1 − 2e

−x

/ (e

+ e

−x

)

after adding and also subtracting e

−x

to the numerator, and then simplifying. If now you multiply in the second

term both numerator and denominator by e

x

, you get 1 − 2/ (e

+ 1). As x approaches −, this function goes to

−1, and as x approaches +, it goes to +1. On the other hand, the second function here, the sigmoid logistic

function, goes to 0 as x approaches −, and to +1 as x approaches +. You can see this if you rewrite 1 / (1+

e

−x

) as 1 − 1 / (1+ e

), after manipulations similar to those above.

C++ Neural Networks and Fuzzy Logic:Preface

Outputs

You can think of equation 1 as the bipolar equivalent of binary equation 2. Both functions have the same

shape.

Figure 5.3 is the graph of the sigmoid logistic function (number 2 of the preceding list).

Figure 5.3

The sigmoid function.

The Step Function

The step function is also frequently used as a threshold function. The function is 0 to start with and remains

so to the left of some threshold value ¸. A jump to 1 occurs for the value of the function to the right of ¸, and

the function then remains at the level 1. In general, a step function can have a finite number of points at which

jumps of equal or unequal size occur. When the jumps are equal and at many points, the graph will resemble a

staircase. We are interested in a step function that goes from 0 to 1 in one step, as soon as the argument

exceeds the threshold value ¸. You could also have two values other than 0 and 1 in defining the range of

values of such a step function. A graph of the step function follows in Figure 5.4.

Figure 5.4

The step function.

Note: You can think of a sigmoid function as a fuzzy step function.

The Ramp Function

To describe the ramp function simply, first consider a step function that makes a jump from 0 to 1 at some

point. Instead of letting it take a sudden jump like that at one point, let it gradually gain in value, along a

straight line (looks like a ramp), over a finite interval reaching from an initial 0 to a final 1. Thus, you get a

ramp function. You can think of a ramp function as a piecewise linear approximation of a sigmoid. The

graph of a ramp function is illustrated in Figure 5.5.

Figure 5.5

Graph of a ramp function.

Linear Function

A linear function is a simple one given by an equation of the form:

f(x) = ±x + ²

When ± = 1, the application of this threshold function amounts to simply adding a bias equal to ² to the sum

of the inputs.

C++ Neural Networks and Fuzzy Logic:Preface

The Step Function

Applications

As briefly indicated before, the areas of application generally include auto− and heteroassociation, pattern

recognition, data compression, data completion, signal filtering, image processing, forecasting, handwriting

recognition, and optimization. The type of connections in the network, and the type of learning algorithm used

must be chosen appropriate to the application. For example, a network with lateral connections can do

autoassociation, while a feed−forward type can do forecasting.

Some Neural Network Models

Adaline and Madaline

Adaline is the acronym for adaptive linear element, due to Bernard Widrow and Marcian Hoff. It is similar to

a Perceptron. Inputs are real numbers in the interval [–1,+1], and learning is based on the criterion of

minimizing the average squared error. Adaline has a high capacity to store patterns. Madaline stands for many

Adalines and is a neural network that is widely used. It is composed of field A and field B neurons, and there

is one connection from each field A neuron to each field B neuron. Figure 5.6 shows a diagram of the

Madaline.

Figure 5.6

The Madaline model.

Previous Table of Contents Next

IDG Books Worldwide, Inc.

C++ Neural Networks and Fuzzy Logic:Preface

Applications

C++ Neural Networks and Fuzzy Logic

by Valluru B. Rao

MTBooks, IDG Books Worldwide, Inc.

ISBN: 1558515526 Pub Date: 06/01/95

Previous Table of Contents Next

Backpropagation

The Backpropagation training algorithm for training feed−forward networks was developed by Paul Werbos,

and later by Parker, and Rummelhart and McClelland. This type of network configuration is the most

common in use, due to its ease of training. It is estimated that over 80% of all neural network projects in

development use backpropagation. In backpropagation, there are two phases in its learning cycle, one to

propagate the input pattern through the network and the other to adapt the output, by changing the weights in

the network. It is the error signals that are backpropagated in the network operation to the hidden layer(s). The

portion of the error signal that a hidden−layer neuron receives in this process is an estimate of the contribution

of a particular neuron to the output error. Adjusting on this basis the weights of the connections, the squared

error, or some other metric, is reduced in each cycle and finally minimized, if possible.

Figure for Backpropagation Network

You will find in Figure 7.1 in Chapter 7, the layout of the nodes that represent the neurons in a feedforward

Backpropagation network and the connections between them. For now, you try your hand at drawing this

layout based on the following description, and compare your drawing with Figure 7.1. There are three fields

of neurons. The connections are forward and are from each neuron in a layer to every neuron in the next layer.

There are no lateral or recurrent connections. Labels on connections indicate weights. Keep in mind that the

number of neurons is not necessarily the same in different layers, and this fact should be evident in the

notation for the weights.

Bidirectional Associative Memory

Bidirectional Associative Memory, (BAM), and other models described in this section were developed by

Bart Kosko. BAM is a network with feedback connections from the output layer to the input layer. It

associates a member of the set of input patterns with a member of the set of output patterns that is the closest,

and thus it does heteroassociation. The patterns can be with binary or bipolar values. If all possible input

patterns are known, the matrix of connection weights can be determined as the sum of matrices obtained by

taking the matrix product of an input vector (as a column vector) with its transpose (written as a row vector).

The pattern obtained from the output layer in one cycle of operation is fed back at the input layer at the start of

the next cycle. The process continues until the network stabilizes on all the input patterns. The stable state so

achieved is described as resonance, a concept used in the Adaptive Resonance Theory.

You will find in Figure 8.1 in Chapter 8, the layout of the nodes that represent the neurons in a BAM network

and the connections between them. There are two fields of neurons. The network is fully connected with

feedback connections and forward connections. There are no lateral or recurrent connections.

Fuzzy Associative memories are similar to Bidirectional Associative memories, except that association is

established between fuzzy patterns. Chapter 9 deals with Fuzzy Associative memories.

C++ Neural Networks and Fuzzy Logic:Preface

Backpropagation

Temporal Associative Memory

Another type of associative memory is temporal associative memory. Amari, a pioneer in the field of neural

networks, constructed a Temporal Associative Memory model that has feedback connections between the

input and output layers. The forte of this model is that it can store and retrieve spatiotemporal patterns. An

example of a spatiotemporal pattern is a waveform of a speech segment.

Brain−State−in−a−Box

Introduced by James Anderson and others, this network differs from the single−layer fully connected Hopfield

network in that Brain−State−in−a−Box uses what we call recurrent connections as well. Each neuron has a

connection to itself. With target patterns available, a modified Hebbian learning rule is used. The adjustment

to a connection weight is proportional to the product of the desired output and the error in the computed

output. You will see more on Hebbian learning in Chapter 6. This network is adept at noise tolerance, and it

can accomplish pattern completion. Figure 5.7 shows a Brain−State−in−a−Box network.

Figure 5.7

A Brain−State−in−a−Box, network.

What’s in a Name?

More like what’s in the box? Suppose you find the following: there is a square box and its corners are the

locations for an entity to be. The entity is not at one of the corners, but is at some point inside the box. The

next position for the entity is determined by working out the change in each coordinate of the position,

according to a weight matrix, and a squashing function. This process is repeated until the entity settles down

at some position. The choice of the weight matrix is such that when the entity reaches a corner of the square

box, its position is stable and no more movement takes place. You would perhaps guess that the entity finally

settles at the corner nearest to the initial position of it within the box. It is said that this kind of an example is

the reason for the name Brain−State−in−a−Box for the model. Its forte is that it represents linear

transformations. Some type of association of patterns can be achieved with this model. If an incomplete

pattern is associated with a completed pattern, it would be an example of autoassociation.

Counterpropagation

This is a neural network model developed by Robert Hecht−Nielsen, that has one or two additional layers

between the input and output layers. If it is one, the middle layer is a Grossberg layer with a bunch of outstars.

In the other case, a Kohonen layer, or a self−organizing layer, follows the input layer, and in turn is followed

by a Grossberg layer of outstars. The model has the distinction of considerably reducing training time. With

this model, you gain a tool that works like a look−up table.

Previous Table of Contents Next

IDG Books Worldwide, Inc.

C++ Neural Networks and Fuzzy Logic:Preface

Temporal Associative Memory

C++ Neural Networks and Fuzzy Logic

by Valluru B. Rao

MTBooks, IDG Books Worldwide, Inc.

ISBN: 1558515526 Pub Date: 06/01/95

Previous Table of Contents Next

Neocognitron

Compared to all other neural network models, Fukushima’s Neocognitron is more complex and ambitious. It

demonstrates the advantages of a multilayered network. The Neocognitron is one of the best models for

recognizing handwritten symbols. Many pairs of layers called the S layer, for simple layer, and C layer, for

complex layer, are used. Within each S layer are several planes containing simple cells. Similarly, there are

within each C layer, an equal number of planes containing complex cells. The input layer does not have this

arrangement and is like an input layer in any other neural network.

The number of planes of simple cells and of complex cells within a pair of S and C layers being the same,

these planes are paired, and the complex plane cells process the outputs of the simple plane cells. The simple

cells are trained so that the response of a simple cell corresponds to a specific portion of the input image. If

the same part of the image occurs with some distortion, in terms of scaling or rotation, a different set of simple

cells responds to it. The complex cells output to indicate that some simple cell they correspond to did fire.

While simple cells respond to what is in a contiguous region in the image, complex cells respond on the basis

of a larger region. As the process continues to the output layer, the C−layer component of the output layer

responds, corresponding to the entire image presented in the beginning at the input layer.

Download 1,14 Mb.

Do'stlaringiz bilan baham:

1 ... 5 6 7 8 9 10 11 12 ... 41