Lecture Notes in Computer Science
A Oriented in the positive direction
Download 12.42 Mb. Pdf ko'rish
|
- Bu sahifa navigatsiya:
- Inverse the direction Reduce the degree of asymmetric C
- Topological order η Time Improved asymmetric Asymmetric Symmetric A B
- Ordering time The number of units
- Scaled number of units
- Topological order
- Population Ordering time Symmetric Ordering time Improved asymmetric
A Oriented in the positive direction k 1/β
c k β c -6 -3 0 3
6-6 -3 0 3 6 0 0.5 1
h β
B Oriented in the negative direction 1.0
0.5 distance
1.0 0.5
distance Interval: T 1.0 0.5
1.0 0.5
1.0 0.5
b=3 b=2
b=1 Inverse the direction Reduce the degree of asymmetric C Fig. 2. A: Method of generating an asymmetric neighborhood function by scaling the distance r ic asymmetrically.The degree of asymmetry is parameterized by β. The
distance of the node on the positive direction with asymmetric unit vector k, is scaled by 1 /β. The distance on the negative direction is scaled by β. Therefore, the asymmetric function is described by h β ( r ic ) = 2 β+1/β h(˜r ic ) where ˜ r ic is the scaled distance of node i from the winner c. B: An example of an asymmetric Gaussian function. C: An illustration of the improved algorithm for asymmetric neighborhood function. Next, we introduce an improved algorithm for the asymmetric neighborhood function. The asymmetry of the neighborhood function causes to distort the feature map, in which the density of units does not represent the probability of the input data. Therefore, two novel procedures will be introduced. The first procedure is an inversion operation on the direction of the asymmetric neigh- borhood function. As illustrated in Fig. 2C, the direction of the asymmetry is turned in the opposite direction after every time interval T , which is expected to average out the distortion in the feature map. It is noted that the interval T should be set to a larger value than the typical ordering time for the asymmetric neighborhood function. The second procedure is an operation that decreases the degree of asymmetry of the neighborhood function. When β = 1, the neighbor- hood function equals the original symmetric function. With this operation, β is decreased to 1 with each time step, as illustrated in Fig. 2C. In our numerical simulations, we adopt a linear decreasing function. 2.3
Numerical Simulations In the following sections, we tested learning procedures of SOM with sample data to examine the performance of the ordering process. The sample data is
430 T. Aoki et al. generated from a random variable of a uniform distribution. In the case of one- dimensional SOM, the distribution is uniform in the ranges of [0, 1]. Here we use Gaussian function for the original symmetric neighborhood function. The model parameters in SOM learning was used as follows: The total number of units N = 1000, the learning rate α = 0.05 (constant), and the neighborhood radius σ = 50. The asymmetry parameter β = 1.5 and asymmetric direction k is set to the positive direction in the array. The interval period T of flipping the asymmetric direction is 3000. In the case of two-dimensional SOM ( 2D → 2D map), the input data taken uniformly from square space, [0, 1] × [0, 1]. The model parameters are same as in one-dimensional SOM, excepted that The total number of units N = 900 (30
× 30) and σ = 5. The asymmetric direction k is taken in the direction (1, 0), which can be determined arbitrary. In the following numerical simulations, we also confirmed the result holds with other model parameters and other form of neighborhood functions. 2.4 Topological Order and Distortion of the Feature Map For the aim to examine the ordering process of SOM, lets us consider two mea- sures which characterize the property of the future map. One is the ‘topological order’ η for quantifying the order of reference vectors in the SOM array. The units of the SOM array should be arranged according to its reference vector m i .
ing. However, the topological defect violates the global ordering and the feature map is divided into fragments of ordering domains within which the units sat- isfy the order-condition. Therefore, the topological order η can be defined as the ratio of the maximum domain size to the total number of units N , given by η ≡ max
l N l N , (3) where N l is the size of domain l. In the case of one-dimensional SOM, the order-condition for the reference vector of units is defined as, m i−1
≤ m i ≤ m i+1
, or m i−1 ≥ m
i ≥ m
i+1 . In the case of two-dimensional SOM as referred in the previous section, the order-condition is also defined explicitly with the vector product a ( i,i)
≡ (m ( i+1,i) −m ( i,i) ) ×(m
( i,i+1)
−m ( i,i) ). Within the ordering domain, the vector products a ( i,i)
of units have same sign, because the reference vectors are arranged in the sample space ordered by the position of the unit. The other is the ‘distortion’ χ, which measures the distortion of the feature map. The asymmetry of the neighborhood function tends to distort the distri- bution of reference vectors, which is quite different from the correct probability density of input vectors. For example, when the probability density of input vec- tors is uniform, a non-uniform distribution of reference vectors is formed with an asymmetric neighborhood function. Hence, for measuring the non-uniformity in the distribution of reference vectors, let us define the distortion χ. χ is a coefficient of variation of the size-distribution of unit Voronoi tessellation cells, and is given by χ =
Var(Δ i ) E(Δ i ) , (4)
Ordering Process of SOMs Improved by Asymmetric Neighborhood Function 431
where Δ i is the size of Voronoi cell of unit i. To eliminate the boundary effect of the SOM algorithm, the Voronoi cells on the edges of the array are excluded. When the reference vectors are distributed uniformly, the distortion χ converges to 0. It should be noted that the evaluation of Voronoi cell in two-dimensional SOM is time-consuming, and we approximate the size of Voronoi cell by the size of the vector product a ( i,i) . If the feature map is uniformly formed, the approximate value also converges to 0. 3 Results
3.1 One-Dimensional Case In this section, we investigate the ordering process of SOM learning in the pres- ence of a topological defect in symmetric, asymmetric, and improved asymmetric cases of the neighborhood function. For this purpose, we use the initial condi- tion that a single topological defect appears at the center of the array. Because the density of input vectors is uniform, the desirable feature map is a linear arrangement of SOM nodes. Figure 3A shows a typical time development of the reference vectors m i . In the case of the symmetric neighborhood function, a single defect remains around the center of the array even after 10000 steps. In contrast, in the case of the asymmetric one, this defect moves out to the right so that the reference vectors are ordered within 3000 steps. This phenomenon can also be confirmed in Fig. 3B, which shows the time dependence of the topological order η. In the case of the asymmetric neighborhood function, η rapidly converges to 1 (completely ordered state) within 3000 steps, whereas the process of eliminating the last defect takes a large amount of time ( ∼18000 step) for the symmetric one. On the other hand, one problem arises in the feature map obtained with the asymmetric neighborhood function. After 10000 steps, the distribution of the reference vectors in the feature map develops an unusual bias (Fig. 3A). Figure 3C shows the time dependence of the distortion χ during learning. In the case of the symmetric neighborhood function, χ eventually converges to almost 0. This result indicates that the feature map obtained with the symmetric one has an almost uniform size distribution of Voronoi cells. In contrast, in the case of the asymmetric one, χ converges to a finite value (= 0). Although the asym- metric neighborhood function accelerates the ordering process of SOM learning, the resultant map becomes distorted which is unusable for the applications. Therefore, the improved asymmetric method will be introduced, as mentioned in Method. Using this improved algorithm, χ converges to almost 0 as same as the symmetric one (Fig. 3C). Furthermore, as shown in Fig. 3B, the improved al- gorithm preserves the faster order learning. Therefore, by utilizing the improved algorithm of asymmetric neighborhood function, we confer the full benefit of both the fast order learning and the undistorted feature map. To quantify the performance of the ordering process, let us define the ‘order- ing time’ as the time at which η reaches to 1. Figure 4A shows the ordering
432 T. Aoki et al. Topological order η
Improved asymmetric Asymmetric Symmetric A B C Distortion χ
Symmetric Asymmetric Improved asym. 0 1 0 1000
0 1 0 1000
0 1 0 1000
0 1 0 1000
0 1 0 1000
0 1 0 1000 0 1 0 1000
0 1 0 1000 0 1 0 1000
t=0 0 1 0 1000
t=1000 0 1 0 1000
t=5000 0 1 0 1000
t=10000 0 1 0 1000
t=20000 0 0.5 1 0 10000 20000 0 0.5 1 1.5
0 10000
20000 Fig. 3. The asymmetric neighborhood function enhances the ordering process of SOM. A: A typical time development of the reference vectors m i in cases of symmetric, asym- metric, and improved asymmetric neighborhood functions. B: The time dependence of the topological order η. The standard deviations are denoted by the error bars, which cannot be seen because they are smaller than the size of the graphed symbol. C: The time dependence of the distortion χ. time as a function of the total number of units N for both improved asym- metric and symmetric cases of the neighborhood function. It is found that the ordering time scales roughly as N 3 and N
2 for symmetric and improved asym- metric neighborhood functions, respectively. For detailed discussion about the reduction of the ordering time, refer to the Aoki & Aoyagi (2007). Figure 4B shows the dependency of the ordering time on the width of neighborhood func- tion, which indicates that ordering time is proportional to (N/σ) k with k =
2/3 for asymmetric/symmetric neighborhood function. This result implies that combined usage of the asymmetric method and annealing method for the width of neighborhood function is more effective. 3.2
Two-Dimensional Case In this section, we investigate the effect of asymmetric neighborhood function for two-dimensional SOM (2D → 2D map). Figure 5 shows that a similar fast Ordering Process of SOMs Improved by Asymmetric Neighborhood Function 433
10 3 10 4 10 5 10 6 10 7 10 8 10 3 10 4 Ordering time The number of units N N 2.989 ±0.002 N 1.917 ±0.005 Symmetric Improved asym. Fitting
10 3 10 4 10 5 10 6 10 7 10 8 10 2
σ N
N 2 Sym. σ = 8 Sym. σ = 16 Sym. σ = 32 Sym. σ = 64 Asym. σ = 8 Asym. σ = 16 Asym. σ = 32 Asym. σ = 64 Fig. 4. Ordering time as a function of the total number of units N. The fitting function is described by Const. · N γ . 0 0.5
1 0 10000 20000 Topological order η
Improved asymmetric Asymmetric Symmetric A B C 0 0.5 1 0 10000 20000 Distortion χ
Symmetric Asymmetric Improved asym. 0 1 0
1 0 1 0
1 0 1 0
1 0 1 0
1
0 1
0 1
0 1
0
1 0 1 0
1 t=0
0 1
0
1 t=1000 0 1 0
1 t=5000
0 1
0
1 t=20000 Fig. 5. A: A typical time development of reference vectors in two-dimensional array of SOM for the cases of symmetric, asymmetric and improved asymmetric neighborhood functions. B: The time dependence of the topological order η. C: The time dependence of the distortion χ.
434 T. Aoki et al. Population Ordering time Symmetric Ordering time Improved asymmetric 0 0.5 0 25000
50000 0 0.5 0 25000
50000 Fig. 6. Distribution of ordering times when the initial reference vectors are generated randomly. The white bin at the right in the graph indicates a population of failed trails which could not converged to the perfect ordering state within 50000 steps. ordering process can be realized with an asymmetric neighborhood function in two-dimensional SOM. The initial state has a global topological defect, in which the map is twisted at the center. In this situation, the conventional symmetric neighborhood function has trouble in correcting the twisted map. Because of the local stability, this topological defect is never corrected even with a huge learn- ing iteration. The asymmetric neighborhood function also is effective to over- come such a topological defect, like the case of one-dimensional SOM. However, the same problem of ’distortion map’ occurs. Therefore, by using the improved asymmetric neighborhood function, the feature map converges to the completely ordered map in much less time without any distortion. In the previous simulations, we have considered a simple situation that a sin- gle defect exists around the center of the feature map as an initial condition in order to investigate the ordering process with the topological defect. However, when the initial reference vectors are set randomly, the total number of topo- logical defects appearing in the map is not generally equal to one. Therefore, it is necessary to consider the statistical distribution of the ordering time, because the total number of the topological defects and the convergence process depend generally on the initial conditions. Figure 6 shows the distribution of the ordering time, when the initial reference vectors are randomly selected from the uniform distribution [0, 1] × [0, 1]. In the case of symmetric neighborhood function, a part of trial could not converged to the ordered state with trapped in the undesirable meta-stable states, in which the topological defects are never rectified. Therefore, although the fast ordering process is observed in some successful cases (lucky initial conditions), the formed map with symmetric one is highly depends on the initial conditions. In contrast, for the improved asymmetric neighborhood func- tion, the distribution of the ordering time has a single sharp peak and the succes- sive future map is constructed stably without any tuning of the initial condition. 4 Conclusion In this paper, we discussed the learning process of the self-organized map, espe- cially in the presence of a topological defect. Interestingly, even in the presence Ordering Process of SOMs Improved by Asymmetric Neighborhood Function 435
of the defect, we found that the asymmetry of the neighborhood function enables the system to accelerate the learning process. Compared with the conventional symmetric one, the convergence time of the learning process can be roughly reduced from O(N 3 ) to O(N
2 ) in one-dimensional SOM(N is the total num- ber of units). Furthermore, this acceleration with the asymmetric neighborhood function is also effective in the case of two-dimensional SOM (2D → 2D map). In contrast, the conventional symmetric one can not rectify the twisted feature map even with a sheer of iteration steps due to its local stability. These results suggest that the proposed method can be effective for more general case of SOM, which is the subject of future study. Acknowledgments This work was supported by Grant-in-Aid for Scientific Research from the Min- istry of Education, Science, Sports, and Culture of Japan: Grant number 18047014, 18019019 and 18300079. References 1. Kohonen, T.: Self-organized formation of topologically correct feature maps. Biol. Cybern. 43(1), 59–69 (1982) 2. Hubel, D.H., Wiesel, T.N.: Receptive fields, binocular interaction and functional architecture in cats visual cortex. J. Physiol.-London 160(1), 106–154 (1962) 3. Hubel, D.H., Wiesel, T.N.: Sequence regularity and geometry of orientation columns in monkey striate cortex. J. Comp. Neurol. 158(3), 267–294 (1974) 4. von der Malsburg, C.: Self-organization of orientation sensitive cells in striate cortex. Kybernetik 14(2), 85–100 (1973) 5. Takeuchi, A., Amari, S.: Formation of topographic maps and columnar microstruc- tures in nerve fields. Biol. Cybern. 35(2), 63–72 (1979) 6. Erwin, E., Obermayer, K., Schulten, K.: Self-organizing maps - stationary states, metastability and convergence rate. Biol. Cybern. 67(1), 35–45 (1992) 7. Geszti, T., Csabai, I., Cazok´ o, F., Szak´ acs, T., Serneels, R., Vattay, G.: Dynamics of the kohonen map. In: Statistical mechanics of neural networks: proceedings of the XIth Sitges Conference, pp. 341–349. Springer, New York (1990) 8. Der, R., Herrmann, M., Villmann, T.: Time behavior of topological ordering in self-organizing feature mapping. Biol. Cybern. 77(6), 419–427 (1997) 9. Aoki, T., Aoyagi, T.: Self-organizing maps with asymmetric neighborhood function. Neural Comput. 19(9), 2515–2535 (2007)
A Characterization of Simple Recurrent Neural Networks with Two Hidden Units as a Language Recognizer Azusa Iwata 1 , Yoshihisa Shinozawa 1 , and Akito Sakurai 1 ,2
Keio University, Hiyoshi, Kohoku-ku, Yokohama 223-8522, Japan 2 CREST, Japan Science and Technology Agency Abstract. We give a necessary condition that a simple recurrent neural network with two sigmoidal hidden units to implement a recognizer of the formal language {a n b n |n > 0} which is generated by a set of gen- erating rules {S → aSb, S → ab} and show that by setting parameters so as to conform to the condition we get a recognizer of the language. The condition implies instability of learning process reported in previous studies. The condition also implies, contrary to its success in implement- ing the recognizer, difficulty of getting a recognizer of more complicated languages. 1 Introduction Pioneered by Elman [6], many researches have been conducted on grammar learning by recurrent neural networks. Grammar is defined by generating or rewriting rules such as S → Sa and Download 12.42 Mb. Do'stlaringiz bilan baham: |
ma'muriyatiga murojaat qiling