Lecture Notes in Computer Science
Development of Labyrinthine Receptive Fields
Download 12.42 Mb. Pdf ko'rish
|
- Bu sahifa navigatsiya:
- 4 Discussion
- Practical Recurrent Learning (PRL) in the Discrete Time Domain
- Keywords
- 2 Practical Recurrent Learning (PRL)
3.2 Development of Labyrinthine Receptive Fields If the correlations have the shape of "Mexican hat" in (4) and (5), boundaries between the sub-areas after a sufficient number of trials become labyrinthine. Thus, the model fails to develop simple receptive fields. Interestingly, if the value of λ is additionally increased to 1.0, the labyrinth almost always rotates along the ring. In the case of Fig. 2d, the visual receptive fields rotate clockwise along the downward direction in the column. Note that the third property described above is again held, i.e. the configura- tion of the sub-areas is similar between two adjacent cells on the ring and reversed between diagonally opposed cells. The rotation along the ring is a compromise between the labyrinthine configurations and the third property. 4 Discussion If correlations of spontaneous activities among the same-type LGN cells were assumed to have the shape of "Mexican hat", labyrinthine visual receptive fields emerged, pre- sumably due to the inhibitory peripheries in the hat. However, if the correlations were assumed to have the shape of Gaussian, simple receptive fields often emerged. As a matter of fact, correlations with the shape of Gaussian were experimentally observed in the developing LGN of ferrets [10]. The same authors also theoretically showed that the Gaussian correlations result in simple receptive fields only if the On and Off synaptic weights were constrained separately [10]. This separate constraints is biologically less plausible, however, than the joint constraint [14]. The present study showed that the Gaussian correlation plus the joint constraint result in simple receptive fields. We assumed an intra-cortical interaction, where nearby cells on the ring were mu- tually excited and cells diagonally opposed on the ring mutually inhibited. These in- teractions could be due to excitatory and inhibitory synaptic connections, but otherwise A Ring Model for the Development of Simple Cells in the Visual Cortex 227 due to near diffusion of some excitatory substances and far diffusion of some inhibitory ones. The latter possibility reminds a chemical explanation for the Turing's diffu- sion-reaction equation [15], which also emerges stripes and labyrinths [16]. Configurations of the receptive fields after a sufficient number of trials were either linearly translated as shown in Fig.2c or rotated as in Fig. 2d, along the ring. These configurations could be interpreted as neural representations on the ring of translational and rotational motions, respectively. References 1. Hubel, D.H., Wiesel, T.N.: Receptive fields, binocular interaction and functional architec- ture in the cat’s visual cortex. J. Physiol. 160, 106–154 (1962) 2. Marcelja, S.: Mathematical description of the responses of simple cortical cells. J. Opt. Soc. Am. 70, 1297–1300 (1980) 3. DeAngelis, G.C., Ohzawa, I., Freeman, R.D.: Spatiotemporal organization of simple-cell receptive field in the cat’s striate cortex. 1. General characteristics and postnatal develop- ment. J. Neurophysiol. 69, 1091–1117 (1993) 4. Hamada, T., Yamashima, M., Kato, K.: A ring model for spatiotemporal properties of sim- ple cells in the visual cortex. Biol. Cyb. 77, 225–233 (1997) 5. Blakemore, C., Sluyters, R.C.V.: Innate and environmental factors in the development of the kitten’s visual cortex. J. Physiol. 248, 663–716 (1975) 6. Chapman, B., Godecke, I.: Cortical cell orientation selectivity fails to develop in the absence of On-center retinal ganglion cell activity. J. Neurosci. 20, 1922–1930 (2000) 7. Sengpiel, R., Kind, P.C.: The role of activity in development of the visual system. Curr. Bio. 12, 818–826 (2002) 8. Miller, K.D.: A model for the development of simple cell receptive fields and the ordered arrangement of orientation columns through activity-dependent competition between on- and off-center inputs. J. Neurosci. 14, 409–441 (1994) 9. Hamada, T., Kato, K., Okada, K.: A model for development of Gabor-receptive fields in simple cortical cells. NeuroReport 7, 745–748 (1996) 10. Ohshiro, T., Meliky, M.: Simple fall-off pattern of correlated neural activity in the devel- oping lateral geniculate nucleus. Nature Neurosci. 9, 1541–1548 (2006) 11. Mastronarde, D.N.: Correlated firing of retinal ganglion cells. Trends Neurosci. 12, 75–80 (1989) 12. Goodman, C.S., Shatz, C.J.: Developmental mechanisms that generate precise patterns of neuronal connectivity. Cell 72/Neuron. 10(suppl.), 77–98 (1993) 13. Ahmed, B., Anderson, J.C., Douglas, R.J., Martin, K.A.C., Nelson, J.C.: Polyneuronal in- nervation of spiny stellate neurons in cat visual cortex. J. Comp. Neurol. 341, 39–49 (1994) 14. Willshaw, D.J., von der Malsburg, C.: How pattered neural connections can be set up by self-organization. Biol. Cyb. 58, 63–70 (1988) 15. Turing, A.M.: The chemical basis of morphogenesis. Phil. Trans. Royal Soc. London B237, 37-72 (1952) 16. Shoji, H., Iwasa, Y.: Labyrinthine versus straight-striped patterns generated by two-dimensional Turing systems. J. Theor. Biol. 237, 104–116 (2005)
M. Ishikawa et al. (Eds.): ICONIP 2007, Part I, LNCS 4984, pp. 228–237, 2008. © Springer-Verlag Berlin Heidelberg 2008 Practical Recurrent Learning (PRL) in the Discrete Time Domain Mohamad Faizal Bin Samsudin, Takeshi Hirose, and Katsunari Shibata Department of Electrical and Electronic Engineering, Oita University, 700 Dannoharu, Oita 870-1192 Japan shibata@cc.oita-u.ac.jp
recurrent neural networks, which requires computational cost and memory capacity in practical order O(n 2 )[1]. The algorithm was formulated in the continuous time domain, and it was shown that a sequential NAND problem was successfully learned by the algorithm. In this paper, the authors name the learning “Practical Recurrent Learning (PRL)”, and the learning algorithm is simplified and converted in the discrete time domain for easy analysis. It is shown that sequential EXOR problem and 3-bit parity problem as non linearly- separable problems can be learned by PRL even though the learning performance is often quite inferior to BPTT that is one of the most popular learning algorithms for recurrent neural networks. Furthermore, the learning process is observed and the character of PRL is shown.
Recurrent Learning (PRL), BPTT, Short-Term Memory. 1 Introduction When we think of the higher functions in humans, such as logical thinking, conversation, and so on, it is easily noticed that memory plays an important role in the functions. Accordingly, it is expected that the need for the RNN is going to grow drastically in the near future as the increase of the desire to the higher functions. Conventionally, there are two popular learning algorithms for recurrent neural networks that have been proposed. One is BPTT (Back Propagation Through Time)[2] and the other one is RTRL[3] (Real Time Recurrent Learning). In BPTT, all the past states of the network are stored using O(nT) of memory where n is the number of neurons and T is the present time step, and the learning is done by tracing back to the past using the memory. The order of the computational cost is O(n
traced-back time step is often truncated at a constant number when T becomes large, but it is difficult to know the sufficient number of steps. On the other hand, in RTRL, the influence of each connection weight to the output of each neuron is kept in O(n 3 ) of memory, and the order of the computation of the influence is as large as O(n 4 ).
BPTT is not practical in the meaning that the learning should be done with tracing back to the past. Even though the special hardware is developed, iteration of learning for the traceback is necessary. RTRL is not practical in the meaning that the required
Practical Recurrent Learning (PRL) in the Discrete Time Domain 229 order O(n 3 ) in the memory capacity and O(n 4 ) in the computational cost are larger than O(n
) that is the order of the number of connections in a neural network. Even though each connection has some memory, a memory on the connection should have
size of the neural network. S. Hochreiter and J. Schmidhuber have proposed a special network architecture that has some memory cells. In each memory cell, there is a linear unit with a fixed weight self-connection that enables constant, non-vanishing error flow within the memory cell[4]. They used a variant RTRL and only O(n 2 ) of computational cost is required. However, special structure is necessary and it cannot be applied to the general recurrent neural networks. Therefore, a practical learning algorithm for the general recurrent neural networks that need O(n 2 ) or less memory and O(n 2 ) or less computational cost is strongly required. Then Practical Recurrent Learning (PRL) was proposed in the continuous time domain. In this paper, PRL is simplified and converted in the discrete time domain for easy analysis, and the learning performance is compared to BPTT.
Here, PRL is explained using an Elman-type recurrent neural network as shown in Fig. 1.
Fig. 1. An Elman-type recurrent neural network
This section describes roughly about PRL in the continuous time domain proposed in [4]. The forward calculation is the same as the conventional neural network that means that each hidden or output neurons calculate the weighted sum of the inputs and then non-linear function f is applied to get the output. Here, the sigmoid function whose value range is from -0.5 to 0.5 is used. In the output layer, the error signal is calculated by
)
) ( ) 3 ( ) 3 (
x t Tr j j j − = δ
(1) where Tr : training signal, x j (3)
:output of the output unit. Differing from the regular BP, the derivative of the output function f’ j (3)
is not included. As well as the regular Input layer (1 st
Output layer (3 rd layer) Hidden layer (2 nd
........... ........... 230 M.F.B. Samsudin, T. Hirose, and K. Shibata BP, the error signal in the hidden layer δ
(2)
is calculated from the δ j (3)
in the upper layer as described by the following equations.
∑
j j ji i t v ) ( ) 3 ( ) 2 ( δ δ
(2)
( ) ( ) ) ( ) ( ' ) ( ) 3 ( ) 3 ( ) 3 ( t x dt d v t S f t w v dt d j ji j ji ji − = (3)
where w ji (3)
: connection weight (ith hidden unit - jth output unit) , S j (3)
: the net value of the jth neuron in the output layer. f’ is included in this equation on behalf that f’ disappears in Eq. (1) in order to use f’ when the output changed. Then, in order to modify the value of weight without tracing back to the past, it is considered that the following information should be held. (a) the latest outputs of pre-synaptic neurons, (b) the outputs of pre-synaptic neuron that changes recently among all the inputs to the post-synaptic neuron,
synaptic neuron’s output. Corresponding to the (a),(b),(c), three variables p(t), q(t), r(t) that hold the past information in various ways are introduced and they are always modified according to the following differential equations.
( ) ) ( ' ) ( ) ( ) ( t S f t x t p t p dt d j i ji ji j + − = τ
(4)
( ) ( ) ∑ − =
i ji j i ji t x dt d t q t S f t x t q dt d ) ( ) ( ) ( ' ) ( ) ( (5)
( )
( ) ) ( ) ( ) ( ' ) ( ) ( t x dt d t r t S f t x t r dt d j ji j i ji − =
(6)
Using the three variables, each connection weight is modified. The following equation is an example but the details can be seen in [1].
( ) ) ( ) ( ) ( ) ( ) ( t t r t q t p t dw j ji ji ji ji δ + + =
(7) Among the three variables, r ji (t) is considered to be a particularly important variable with respect to the learning of a problem that needs the past information before a long time lag. Fig. 2 shows an example of the temporal change of the variable r
to the input signal x i (t) and the output signal x j (t). As shown in Fig. 2, it is the important character that r ji (t) holds the information about the output of the pre-synaptic neuron that caused the change of the post-synaptic neuron’s output. This variable ignored the inputs while the output did not change. Accordingly the variable is expected to keep past and important information without tracing back to the past. Practical Recurrent Learning (PRL) in the Discrete Time Domain 231
ji (t) transition. From equation (11), variable r ji (t) integrates the value of input x i (t) when the output x j (t) changes, and holds the information of the previous state when the output does not change. 2.2 PRL in the Discrete Time Domain In order to make the analysis of PRL learning easy, PRL learning method in the discrete time domain is introduced here. The method of learning is similar to the conventional Back Propagation method in the meaning that each connection weight are modified according to the product of the propagated error signal δ
of the post- synaptic(upper) neuron and the signal that represents the output x
of the pre- synaptic(lower) neuron. Furthermore, to make the learning process become simple, conventional BP method is used for the learning of the connection weights between the hidden layer and the output layer and PRL learning method is used only between the input layer and the hidden layer. In the output layer, the error signal δ
is calculated as
) ( ) ) ( ' ) ( ) ( ) ( ) ( ) ( ) ( ) 3 ( ) 3 ( ) 3 ( ) 3 ( ) 3 ( ) 3 ( ) 3 (
S f t x t Tr S x t x t E t S t E j j j j j j j j − = ∂ ⋅ ∂ ∂ = ∂ ∂ = δ . (8)
Same as the conventional Back Propagation method, the modification of connection weights are calculated by
) 2 ( ) 3 ( ) 3 (
j ji x w ηδ = Δ . (9) Each neuron in the hidden layer is trained by PRL and signal δ j (2) is calculated as
∑ ⋅ = k kj k j t w ) ( ) 3 ( ) 3 ( ) 2 ( δ δ .
(10) From the equation above, f’(t) is not multiplied as the conventional BP method because f’(t) is included in the variable r
50
0.5 0 0 0.5 1.0
0.2 100
100 50
Input x i (t) Output x j (t) 100
0.1 0 50 Variable r ji (t) Output
value Input
value Times
Times Times
Value of variable r ji (t) 1.0
232 M.F.B. Samsudin, T. Hirose, and K. Shibata that variable r
the input’s value when the output changes, it is calculated as
( ) ) ( ) ( ' ) ( ) ( 1 ) 1 ( ) ( ) 2 ( ) 2 ( ) 2 ( ) 2 (
x t S f t x t x t r t r j j i j ji ji Δ + ⎟ ⎠ ⎞ ⎜ ⎝ ⎛ Δ − − =
(11) where
) 1 ( ) ( ) ( − − = Δ
x t x t x j j j . Then, the modification of each connection weight in the hidden layer is calculated using only the variable r
by
) ( ) ( ) 2 ( ) 2 (
r t w ji j ji ηδ = Δ .
(12)
Download 12.42 Mb. Do'stlaringiz bilan baham: |
ma'muriyatiga murojaat qiling