Lecture Notes in Computer Science

A Oriented in the positive direction

bet	39/88
Sana	16.12.2017
Hajmi	12.42 Mb.
	#22381

1 ... 35 36 37 38 39 40 41 42 ... 88

A

Oriented in the positive direction

k

1/β

c

k

-6 -3

0 3

6-6

-3

0.5

1

k

β

(r)

B

Oriented in the negative direction

1.0

0.5

distance

1.0

0.5

distance

Interval: T

1.0

0.5

1.0

0.5

1.0

0.5

b=3

b=2

b=1

Inverse the direction

Reduce the degree of asymmetric

C

Fig. 2. A: Method of generating an asymmetric neighborhood function by scaling

the distance

asymmetrically.The degree of asymmetry is parameterized by

β. The

distance of the node on the positive direction with asymmetric unit vector k, is scaled by

/β. The distance on the negative direction is scaled by β. Therefore, the asymmetric

function is described by

(

) =

β+1/β

h(˜r

) where ˜

is the scaled distance of

node

i from the winner c. B: An example of an asymmetric Gaussian function. C: An

illustration of the improved algorithm for asymmetric neighborhood function.

Next, we introduce an improved algorithm for the asymmetric neighborhood

function. The asymmetry of the neighborhood function causes to distort the

feature map, in which the density of units does not represent the probability

of the input data. Therefore, two novel procedures will be introduced. The ﬁrst

procedure is an inversion operation on the direction of the asymmetric neigh-

borhood function. As illustrated in Fig. 2C, the direction of the asymmetry is

turned in the opposite direction after every time interval T , which is expected

to average out the distortion in the feature map. It is noted that the interval T

should be set to a larger value than the typical ordering time for the asymmetric

neighborhood function. The second procedure is an operation that decreases the

degree of asymmetry of the neighborhood function. When β = 1, the neighbor-

hood function equals the original symmetric function. With this operation, β is

decreased to 1 with each time step, as illustrated in Fig. 2C. In our numerical

simulations, we adopt a linear decreasing function.

2.3

Numerical Simulations

In the following sections, we tested learning procedures of SOM with sample

data to examine the performance of the ordering process. The sample data is

430

T. Aoki et al.

generated from a random variable of a uniform distribution. In the case of one-

dimensional SOM, the distribution is uniform in the ranges of [0, 1]. Here we use

Gaussian function for the original symmetric neighborhood function. The model

parameters in SOM learning was used as follows: The total number of units N

= 1000, the learning rate α = 0.05 (constant), and the neighborhood radius σ =

50. The asymmetry parameter β = 1.5 and asymmetric direction k is set to the

positive direction in the array. The interval period T of ﬂipping the asymmetric

direction is 3000. In the case of two-dimensional SOM ( 2D

→ 2D map), the input

data taken uniformly from square space, [0, 1] × [0, 1]. The model parameters are

same as in one-dimensional SOM, excepted that The total number of units N =

900 (30

× 30) and σ = 5. The asymmetric direction k is taken in the direction

(1, 0), which can be determined arbitrary. In the following numerical simulations,

we also conﬁrmed the result holds with other model parameters and other form

of neighborhood functions.

2.4

Topological Order and Distortion of the Feature Map

For the aim to examine the ordering process of SOM, lets us consider two mea-

sures which characterize the property of the future map. One is the ‘topological

order’ η for quantifying the order of reference vectors in the SOM array. The

units of the SOM array should be arranged according to its reference vector m

.

In the presence of the topological defect, most of the units satisfy the local order-

ing. However, the topological defect violates the global ordering and the feature

map is divided into fragments of ordering domains within which the units sat-

isfy the order-condition. Therefore, the topological order η can be deﬁned as the

ratio of the maximum domain size to the total number of units N , given by

η ≡

max

(3)

where N

is the size of domain l. In the case of one-dimensional SOM, the

order-condition for the reference vector of units is deﬁned as, m

i−1

≤ m

≤

i+1

i−1

≥ m

i+1

. In the case of two-dimensional SOM as referred

in the previous section, the order-condition is also deﬁned explicitly with the

vector product a

(

i,i)

≡ (m

(

i+1,i)

−m

(

i,i)

)

×(m

(

i,i+1)

−m

(

i,i)

). Within the ordering

domain, the vector products a

(

i,i)

of units have same sign, because the reference

vectors are arranged in the sample space ordered by the position of the unit.

The other is the ‘distortion’ χ, which measures the distortion of the feature

map. The asymmetry of the neighborhood function tends to distort the distri-

bution of reference vectors, which is quite diﬀerent from the correct probability

density of input vectors. For example, when the probability density of input vec-

tors is uniform, a non-uniform distribution of reference vectors is formed with

an asymmetric neighborhood function. Hence, for measuring the non-uniformity

in the distribution of reference vectors, let us deﬁne the distortion χ. χ is a

coeﬃcient of variation of the size-distribution of unit Voronoi tessellation cells,

and is given by

χ =

Var(Δ

)

E(Δ

)

(4)

Ordering Process of SOMs Improved by Asymmetric Neighborhood Function

431

where Δ

is the size of Voronoi cell of unit i. To eliminate the boundary eﬀect

of the SOM algorithm, the Voronoi cells on the edges of the array are excluded.

When the reference vectors are distributed uniformly, the distortion χ converges

to 0. It should be noted that the evaluation of Voronoi cell in two-dimensional

SOM is time-consuming, and we approximate the size of Voronoi cell by the

size of the vector product a

(

i,i)

. If the feature map is uniformly formed, the

approximate value also converges to 0.

Results

3.1

One-Dimensional Case

In this section, we investigate the ordering process of SOM learning in the pres-

ence of a topological defect in symmetric, asymmetric, and improved asymmetric

cases of the neighborhood function. For this purpose, we use the initial condi-

tion that a single topological defect appears at the center of the array. Because

the density of input vectors is uniform, the desirable feature map is a linear

arrangement of SOM nodes.

Figure 3A shows a typical time development of the reference vectors m

In the case of the symmetric neighborhood function, a single defect remains

around the center of the array even after 10000 steps. In contrast, in the case

of the asymmetric one, this defect moves out to the right so that the reference

vectors are ordered within 3000 steps. This phenomenon can also be conﬁrmed in

Fig. 3B, which shows the time dependence of the topological order η. In the case

of the asymmetric neighborhood function, η rapidly converges to 1 (completely

ordered state) within 3000 steps, whereas the process of eliminating the last

defect takes a large amount of time (

∼18000 step) for the symmetric one.

On the other hand, one problem arises in the feature map obtained with

the asymmetric neighborhood function. After 10000 steps, the distribution of

the reference vectors in the feature map develops an unusual bias (Fig. 3A).

Figure 3C shows the time dependence of the distortion χ during learning. In the

case of the symmetric neighborhood function, χ eventually converges to almost

0. This result indicates that the feature map obtained with the symmetric one

has an almost uniform size distribution of Voronoi cells. In contrast, in the case

of the asymmetric one, χ converges to a ﬁnite value (= 0). Although the asym-

metric neighborhood function accelerates the ordering process of SOM learning,

the resultant map becomes distorted which is unusable for the applications.

Therefore, the improved asymmetric method will be introduced, as mentioned

in Method. Using this improved algorithm, χ converges to almost 0 as same as

the symmetric one (Fig. 3C). Furthermore, as shown in Fig. 3B, the improved al-

gorithm preserves the faster order learning. Therefore, by utilizing the improved

algorithm of asymmetric neighborhood function, we confer the full beneﬁt of

both the fast order learning and the undistorted feature map.

To quantify the performance of the ordering process, let us deﬁne the ‘order-

ing time’ as the time at which η reaches to 1. Figure 4A shows the ordering

432

T. Aoki et al.

Topological order

η

Time

Improved asymmetric

Asymmetric

Symmetric

A

B

C

Distortion

χ

Time

Symmetric

Asymmetric

Improved asym.

1000

t=0

1000

t=1000

1000

t=5000

1000

t=10000

1000

t=20000

0.5

10000

20000

0.5

1.5

10000

20000

Fig. 3. The asymmetric neighborhood function enhances the ordering process of SOM.

A: A typical time development of the reference vectors

in cases of symmetric, asym-

metric, and improved asymmetric neighborhood functions. B: The time dependence of

the topological order

η. The standard deviations are denoted by the error bars, which

cannot be seen because they are smaller than the size of the graphed symbol. C: The

time dependence of the distortion

χ.

time as a function of the total number of units N for both improved asym-

metric and symmetric cases of the neighborhood function. It is found that the

ordering time scales roughly as N

and N

for symmetric and improved asym-

metric neighborhood functions, respectively. For detailed discussion about the

reduction of the ordering time, refer to the Aoki & Aoyagi (2007). Figure 4B

shows the dependency of the ordering time on the width of neighborhood func-

tion, which indicates that ordering time is proportional to (N/σ)

with k =

2/3 for asymmetric/symmetric neighborhood function. This result implies that

combined usage of the asymmetric method and annealing method for the width

of neighborhood function is more eﬀective.

3.2

Two-Dimensional Case

In this section, we investigate the eﬀect of asymmetric neighborhood function

for two-dimensional SOM (2D

→ 2D map). Figure 5 shows that a similar fast

Ordering Process of SOMs Improved by Asymmetric Neighborhood Function

433

4

Ordering time

The number of units N

2.989

±0.002

1.917

±0.005

Symmetric

Improved asym.

Fitting

2

Scaled number of units N/

N

3

Sym.

σ = 8

Sym.

σ = 16

Sym.

σ = 32

Sym.

σ = 64

Asym.

σ = 8

Asym.

σ = 16

Asym.

σ = 32

Asym.

σ = 64

Fig. 4. Ordering time as a function of the total number of units

N. The ﬁtting function

is described by

Const. · N

0.5

10000

20000

Topological order

η

Time

Improved asymmetric

Asymmetric

Symmetric

A

B

C

0.5

10000

20000

Distortion

χ

Time

Symmetric

Asymmetric

Improved asym.

t=0

t=1000

t=5000

t=20000

Fig. 5. A: A typical time development of reference vectors in two-dimensional array of

SOM for the cases of symmetric, asymmetric and improved asymmetric neighborhood

functions. B: The time dependence of the topological order

η. C: The time dependence

of the distortion

χ.

434

T. Aoki et al.

Population

Ordering time

Symmetric

Ordering time

Improved asymmetric

0.5

25000

50000

0.5

25000

50000

Fig. 6. Distribution of ordering times when the initial reference vectors are generated

randomly. The white bin at the right in the graph indicates a population of failed trails

which could not converged to the perfect ordering state within 50000 steps.

ordering process can be realized with an asymmetric neighborhood function in

two-dimensional SOM. The initial state has a global topological defect, in which

the map is twisted at the center. In this situation, the conventional symmetric

neighborhood function has trouble in correcting the twisted map. Because of the

local stability, this topological defect is never corrected even with a huge learn-

ing iteration. The asymmetric neighborhood function also is eﬀective to over-

come such a topological defect, like the case of one-dimensional SOM. However,

the same problem of ’distortion map’ occurs. Therefore, by using the improved

asymmetric neighborhood function, the feature map converges to the completely

ordered map in much less time without any distortion.

In the previous simulations, we have considered a simple situation that a sin-

gle defect exists around the center of the feature map as an initial condition in

order to investigate the ordering process with the topological defect. However,

when the initial reference vectors are set randomly, the total number of topo-

logical defects appearing in the map is not generally equal to one. Therefore, it

is necessary to consider the statistical distribution of the ordering time, because

the total number of the topological defects and the convergence process depend

generally on the initial conditions. Figure 6 shows the distribution of the ordering

time, when the initial reference vectors are randomly selected from the uniform

distribution [0, 1] × [0, 1]. In the case of symmetric neighborhood function, a part

of trial could not converged to the ordered state with trapped in the undesirable

meta-stable states, in which the topological defects are never rectiﬁed. Therefore,

although the fast ordering process is observed in some successful cases (lucky

initial conditions), the formed map with symmetric one is highly depends on the

initial conditions. In contrast, for the improved asymmetric neighborhood func-

tion, the distribution of the ordering time has a single sharp peak and the succes-

sive future map is constructed stably without any tuning of the initial condition.

Conclusion

In this paper, we discussed the learning process of the self-organized map, espe-

cially in the presence of a topological defect. Interestingly, even in the presence

Ordering Process of SOMs Improved by Asymmetric Neighborhood Function

435

of the defect, we found that the asymmetry of the neighborhood function enables

the system to accelerate the learning process. Compared with the conventional

symmetric one, the convergence time of the learning process can be roughly

reduced from O(N

) to O(N

) in one-dimensional SOM(N is the total num-

ber of units). Furthermore, this acceleration with the asymmetric neighborhood

function is also eﬀective in the case of two-dimensional SOM (2D

→ 2D map).

In contrast, the conventional symmetric one can not rectify the twisted feature

map even with a sheer of iteration steps due to its local stability. These results

suggest that the proposed method can be eﬀective for more general case of SOM,

which is the subject of future study.

Acknowledgments

This work was supported by Grant-in-Aid for Scientiﬁc Research from the Min-

istry of Education, Science, Sports, and Culture of Japan: Grant number

18047014, 18019019 and 18300079.

References

1. Kohonen, T.: Self-organized formation of topologically correct feature maps. Biol.

Cybern. 43(1), 59–69 (1982)

2. Hubel, D.H., Wiesel, T.N.: Receptive ﬁelds, binocular interaction and functional

architecture in cats visual cortex. J. Physiol.-London 160(1), 106–154 (1962)

3. Hubel, D.H., Wiesel, T.N.: Sequence regularity and geometry of orientation columns

in monkey striate cortex. J. Comp. Neurol. 158(3), 267–294 (1974)

4. von der Malsburg, C.: Self-organization of orientation sensitive cells in striate cortex.

Kybernetik 14(2), 85–100 (1973)

5. Takeuchi, A., Amari, S.: Formation of topographic maps and columnar microstruc-

tures in nerve ﬁelds. Biol. Cybern. 35(2), 63–72 (1979)

6. Erwin, E., Obermayer, K., Schulten, K.: Self-organizing maps - stationary states,

metastability and convergence rate. Biol. Cybern. 67(1), 35–45 (1992)

7. Geszti, T., Csabai, I., Cazok´

o, F., Szak´

acs, T., Serneels, R., Vattay, G.: Dynamics

of the kohonen map. In: Statistical mechanics of neural networks: proceedings of the

XIth Sitges Conference, pp. 341–349. Springer, New York (1990)

8. Der, R., Herrmann, M., Villmann, T.: Time behavior of topological ordering in

self-organizing feature mapping. Biol. Cybern. 77(6), 419–427 (1997)

9. Aoki, T., Aoyagi, T.: Self-organizing maps with asymmetric neighborhood function.

Neural Comput. 19(9), 2515–2535 (2007)

A Characterization of Simple Recurrent Neural

Networks with Two Hidden Units as a Language

Recognizer

Azusa Iwata

, Yoshihisa Shinozawa

, and Akito Sakurai

,2

1

Keio University, Hiyoshi, Kohoku-ku, Yokohama 223-8522, Japan

CREST, Japan Science and Technology Agency

Abstract. We give a necessary condition that a simple recurrent neural

network with two sigmoidal hidden units to implement a recognizer of

the formal language

|n > 0} which is generated by a set of gen-

erating rules

{S → aSb, S → ab} and show that by setting parameters

so as to conform to the condition we get a recognizer of the language.

The condition implies instability of learning process reported in previous

studies. The condition also implies, contrary to its success in implement-

ing the recognizer, diﬃculty of getting a recognizer of more complicated

languages.

Introduction

Pioneered by Elman [6], many researches have been conducted on grammar

learning by recurrent neural networks.

Grammar is deﬁned by generating or rewriting rules such as

S → Sa and

Download 12.42 Mb.

Do'stlaringiz bilan baham:

1 ... 35 36 37 38 39 40 41 42 ... 88