Lecture Notes in Computer Science

Development of Labyrinthine Receptive Fields

bet	22/88
Sana	16.12.2017
Hajmi	12.42 Mb.
	#22381

1 ... 18 19 20 21 22 23 24 25 ... 88

4 Discussion
Practical Recurrent Learning (PRL) in the Discrete Time Domain
Keywords
2 Practical Recurrent Learning (PRL)

3.2 Development of Labyrinthine Receptive Fields

If the correlations have the shape of "Mexican hat" in (4) and (5), boundaries between

the sub-areas after a sufficient number of trials become labyrinthine. Thus, the model

fails to develop simple receptive fields. Interestingly, if the value of

is additionally

increased to 1.0, the labyrinth almost always rotates along the ring. In the case of

Fig. 2d, the visual receptive fields rotate clockwise along the downward direction in the

column. Note that the third property described above is again held, i.e. the configura-

tion of the sub-areas is similar between two adjacent cells on the ring and reversed

between diagonally opposed cells. The rotation along the ring is a compromise between

the labyrinthine configurations and the third property.

4 Discussion

If correlations of spontaneous activities among the same-type LGN cells were assumed

to have the shape of "Mexican hat", labyrinthine visual receptive fields emerged, pre-

sumably due to the inhibitory peripheries in the hat. However, if the correlations were

assumed to have the shape of Gaussian, simple receptive fields often emerged. As a

matter of fact, correlations with the shape of Gaussian were experimentally observed in

the developing LGN of ferrets [10]. The same authors also theoretically showed that the

Gaussian correlations result in simple receptive fields only if the On and Off synaptic

weights were constrained separately [10]. This separate constraints is biologically less

plausible, however, than the joint constraint [14]. The present study showed that the

Gaussian correlation plus the joint constraint result in simple receptive fields.

We assumed an intra-cortical interaction, where nearby cells on the ring were mu-

tually excited and cells diagonally opposed on the ring mutually inhibited. These in-

teractions could be due to excitatory and inhibitory synaptic connections, but otherwise

A Ring Model for the Development of Simple Cells in the Visual Cortex

227

due to near diffusion of some excitatory substances and far diffusion of some inhibitory

ones. The latter possibility reminds a chemical explanation for the Turing's diffu-

sion-reaction equation [15], which also emerges stripes and labyrinths [16].

Configurations of the receptive fields after a sufficient number of trials were either

linearly translated as shown in Fig.2c or rotated as in Fig. 2d, along the ring. These

configurations could be interpreted as neural representations on the ring of translational

and rotational motions, respectively.

References

1. Hubel, D.H., Wiesel, T.N.: Receptive fields, binocular interaction and functional architec-

ture in the cat’s visual cortex. J. Physiol. 160, 106–154 (1962)

2. Marcelja, S.: Mathematical description of the responses of simple cortical cells. J. Opt. Soc.

Am. 70, 1297–1300 (1980)

3. DeAngelis, G.C., Ohzawa, I., Freeman, R.D.: Spatiotemporal organization of simple-cell

receptive field in the cat’s striate cortex. 1. General characteristics and postnatal develop-

ment. J. Neurophysiol. 69, 1091–1117 (1993)

4. Hamada, T., Yamashima, M., Kato, K.: A ring model for spatiotemporal properties of sim-

ple cells in the visual cortex. Biol. Cyb. 77, 225–233 (1997)

5. Blakemore, C., Sluyters, R.C.V.: Innate and environmental factors in the development of the

kitten’s visual cortex. J. Physiol. 248, 663–716 (1975)

6. Chapman, B., Godecke, I.: Cortical cell orientation selectivity fails to develop in the absence

of On-center retinal ganglion cell activity. J. Neurosci. 20, 1922–1930 (2000)

7. Sengpiel, R., Kind, P.C.: The role of activity in development of the visual system. Curr.

Bio. 12, 818–826 (2002)

8. Miller, K.D.: A model for the development of simple cell receptive fields and the ordered

arrangement of orientation columns through activity-dependent competition between on-

and off-center inputs. J. Neurosci. 14, 409–441 (1994)

9. Hamada, T., Kato, K., Okada, K.: A model for development of Gabor-receptive fields in

simple cortical cells. NeuroReport 7, 745–748 (1996)

10. Ohshiro, T., Meliky, M.: Simple fall-off pattern of correlated neural activity in the devel-

oping lateral geniculate nucleus. Nature Neurosci. 9, 1541–1548 (2006)

11. Mastronarde, D.N.: Correlated firing of retinal ganglion cells. Trends Neurosci. 12, 75–80

(1989)

12. Goodman, C.S., Shatz, C.J.: Developmental mechanisms that generate precise patterns of

neuronal connectivity. Cell 72/Neuron. 10(suppl.), 77–98 (1993)

13. Ahmed, B., Anderson, J.C., Douglas, R.J., Martin, K.A.C., Nelson, J.C.: Polyneuronal in-

nervation of spiny stellate neurons in cat visual cortex. J. Comp. Neurol. 341, 39–49 (1994)

14. Willshaw, D.J., von der Malsburg, C.: How pattered neural connections can be set up by

self-organization. Biol. Cyb. 58, 63–70 (1988)

15. Turing, A.M.: The chemical basis of morphogenesis. Phil. Trans. Royal Soc. London B237,

37-72 (1952)

16. Shoji, H., Iwasa, Y.: Labyrinthine versus straight-striped patterns generated by

two-dimensional Turing systems. J. Theor. Biol. 237, 104–116 (2005)

M. Ishikawa et al. (Eds.): ICONIP 2007, Part I, LNCS 4984, pp. 228–237, 2008.

Practical Recurrent Learning (PRL)

in the Discrete Time Domain

Mohamad Faizal Bin Samsudin, Takeshi Hirose, and Katsunari Shibata

Department of Electrical and Electronic Engineering,

Oita University, 700 Dannoharu, Oita 870-1192 Japan

shibata@cc.oita-u.ac.jp

Abstract. One of the authors has proposed a simple learning algorithm for

recurrent neural networks, which requires computational cost and memory

capacity in practical order O(n

)[1]. The algorithm was formulated in the

continuous time domain, and it was shown that a sequential NAND problem

was successfully learned by the algorithm. In this paper, the authors name the

learning “Practical Recurrent Learning (PRL)”, and the learning algorithm is

simplified and converted in the discrete time domain for easy analysis. It is

shown that sequential EXOR problem and 3-bit parity problem as non linearly-

separable problems can be learned by PRL even though the learning

performance is often quite inferior to BPTT that is one of the most popular

learning algorithms for recurrent neural networks. Furthermore, the learning

process is observed and the character of PRL is shown.

Keywords: Recurrent Neural Network (RNN), Supervised Learning, Practical

Recurrent Learning (PRL), BPTT, Short-Term Memory.

1 Introduction

When we think of the higher functions in humans, such as logical thinking,

conversation, and so on, it is easily noticed that memory plays an important role in the

functions. Accordingly, it is expected that the need for the RNN is going to grow

drastically in the near future as the increase of the desire to the higher functions.

Conventionally, there are two popular learning algorithms for recurrent neural

networks that have been proposed. One is BPTT (Back Propagation Through

Time)[2] and the other one is RTRL[3] (Real Time Recurrent Learning). In BPTT, all

the past states of the network are stored using O(nT) of memory where n is the

number of neurons and T is the present time step, and the learning is done by tracing

back to the past using the memory. The order of the computational cost is O(n

2

T). The

traced-back time step is often truncated at a constant number when T becomes large,

but it is difficult to know the sufficient number of steps. On the other hand, in RTRL,

the influence of each connection weight to the output of each neuron is kept in O(n

)

of memory, and the order of the computation of the influence is as large as O(n

BPTT is not practical in the meaning that the learning should be done with tracing

back to the past. Even though the special hardware is developed, iteration of learning

for the traceback is necessary. RTRL is not practical in the meaning that the required

Practical Recurrent Learning (PRL) in the Discrete Time Domain

229

order O(n

) in the memory capacity and O(n

) in the computational cost are larger

than O(n

2

) that is the order of the number of connections in a neural network. Even

though each connection has some memory, a memory on the connection should have

O(n) size, that means that the size of each memory should be larger according to the

size of the neural network.

S. Hochreiter and J. Schmidhuber have proposed a special network architecture

that has some memory cells. In each memory cell, there is a linear unit with a fixed

weight self-connection that enables constant, non-vanishing error flow within the

memory cell[4]. They used a variant RTRL and only O(n

) of computational cost is

required. However, special structure is necessary and it cannot be applied to the

general recurrent neural networks.

Therefore, a practical learning algorithm for the general recurrent neural networks

that need O(n

) or less memory and O(n

) or less computational cost is strongly

required. Then Practical Recurrent Learning (PRL) was proposed in the continuous

time domain. In this paper, PRL is simplified and converted in the discrete time

domain for easy analysis, and the learning performance is compared to BPTT.

2 Practical Recurrent Learning (PRL)

Here, PRL is explained using an Elman-type recurrent neural network as shown in

Fig. 1.

Fig. 1. An Elman-type recurrent neural network

2.1 PRL in the Continuous Time Domain[1]

This section describes roughly about PRL in the continuous time domain proposed in

[4]. The forward calculation is the same as the conventional neural network that

means that each hidden or output neurons calculate the weighted sum of the inputs

and then non-linear function f is applied to get the output. Here, the sigmoid function

whose value range is from -0.5 to 0.5 is used. In the output layer, the error signal is

calculated by

)

(

)

(

)

(

)

(

t

x

t

Tr

j

j

j

−

(1)

where Tr : training signal, x

(3)

:output of the output unit. Differing from the regular

BP, the derivative of the output function f’

(3)

is not included. As well as the regular

Input layer

st

layer)

Output layer

layer)

Hidden layer

nd

layer)

...........

230

M.F.B. Samsudin, T. Hirose, and K. Shibata

BP, the error signal in the hidden layer δ

i

(2)

is calculated from the δ

j

(3)

in the upper

layer as described by the following equations.

∑

=

j

j

ji

i

t

v

)

(

)

(

)

(

(2)

( )

(

)

(

)

(

)

(

)

(

)

(

)

(

t

x

dt

d

v

t

S

f

t

w

v

dt

d

j

ji

j

ji

ji

−

(3)

where w

(3)

: connection weight (ith hidden unit - jth output unit) , S

j

(3)

: the net value

of the jth neuron in the output layer. f’ is included in this equation on behalf that f’

disappears in Eq. (1) in order to use f’ when the output changed.

Then, in order to modify the value of weight without tracing back to the past, it is

considered that the following information should be held.

(a) the latest outputs of pre-synaptic neurons,

(b) the outputs of pre-synaptic neuron that changes recently among all the inputs

to the post-synaptic neuron,

synaptic neuron’s output.

Corresponding to the (a),(b),(c), three variables p(t), q(t), r(t) that hold the past

information in various ways are introduced and they are always modified according to

the following differential equations.

( )

)

(

)

(

)

(

)

(

t

S

f

t

x

t

p

t

p

dt

d

j

i

ji

ji

j

−

(4)

( )

(

)

∑

−

=

i

i

ji

j

i

ji

t

x

dt

d

t

q

t

S

f

t

x

t

q

dt

d

)

(

)

(

)

(

)

(

)

(

(5)

( )

(

)

(

)

(

)

(

)

(

)

(

t

x

dt

d

t

r

t

S

f

t

x

t

r

dt

d

j

ji

j

i

ji

−

(6)

Using the three variables, each connection weight is modified. The following

equation is an example but the details can be seen in [1].

(

)

(

)

(

)

(

)

(

)

(

t

t

r

t

q

t

p

t

dw

j

ji

ji

ji

ji

(7)

Among the three variables, r

ji

(t) is considered to be a particularly important variable

with respect to the learning of a problem that needs the past information before a long

time lag. Fig. 2 shows an example of the temporal change of the variable r

ji

(t) according

to the input signal x

i

(t) and the output signal x

j

(t). As shown in Fig. 2, it is the important

character that r

ji

(t) holds the information about the output of the pre-synaptic neuron that

caused the change of the post-synaptic neuron’s output. This variable ignored the inputs

while the output did not change. Accordingly the variable is expected to keep past and

important information without tracing back to the past.

Practical Recurrent Learning (PRL) in the Discrete Time Domain

231

Fig. 2. An example of the variable r

ji

(t) transition. From equation (11), variable r

ji

(t) integrates

the value of input x

i

(t) when the output x

j

(t) changes, and holds the information of the previous

state when the output does not change.

2.2 PRL in the Discrete Time Domain

In order to make the analysis of PRL learning easy, PRL learning method in the

discrete time domain is introduced here. The method of learning is similar to the

conventional Back Propagation method in the meaning that each connection weight

are modified according to the product of the propagated error signal δ

of the post-

synaptic(upper) neuron and the signal that represents the output x

i

of the pre-

synaptic(lower) neuron. Furthermore, to make the learning process become simple,

conventional BP method is used for the learning of the connection weights between

the hidden layer and the output layer and PRL learning method is used only between

the input layer and the hidden layer.

In the output layer, the error signal δ

j

(3)

is calculated as

(

) (

)

(

)

(

)

(

)

(

)

(

)

(

)

(

)

(

)

(

)

(

)

(

)

(

)

(

)

(

t

S

f

t

x

t

Tr

S

x

t

x

t

E

t

S

t

E

j

j

j

j

j

j

j

j

−

∂

⋅

∂

. (8)

Same as the conventional Back Propagation method, the modification of connection

weights are calculated by

)

(

)

(

)

(

i

j

ji

x

w

ηδ

(9)

Each neuron in the hidden layer is trained by PRL and signal δ

j

(2)

is calculated as

∑

⋅

k

kj

k

j

t

w

)

(

)

(

)

(

)

(

(10)

From the equation above, f’(t) is not multiplied as the conventional BP method

because f’(t) is included in the variable r

ji

(t) as shown in equation (11). Considering

0.5

1.0

0.2

100

Input x

i

(t)

Output x

j

(t)

100

0.1

Variable r

ji

(t)

Output

value

Input

value

Times

Value of

variable r

ji

(t)

1.0

232

M.F.B. Samsudin, T. Hirose, and K. Shibata

that variable r

ji

(t) does not changed when the output does not changed, and integrates

the input’s value when the output changes, it is calculated as

(

)

(

)

(

)

(

)

(

)

(

)

(

)

(

)

(

)

(

)

(

t

x

t

S

f

t

x

t

x

t

r

t

r

j

j

i

j

ji

ji

⎟

⎠

⎞

⎜

⎝

⎛

−

(11)

where

)

(

)

(

)

(

−

Δ

t

x

t

x

t

x

j

j

j

. Then, the modification of each connection weight in

the hidden layer is calculated using only the variable r

ji

)

(

)

(

)

(

)

(

t

r

t

w

ji

j

ji

ηδ

(12)

Download 12.42 Mb.

Do'stlaringiz bilan baham:

1 ... 18 19 20 21 22 23 24 25 ... 88