Lecture Notes in Computer Science
Download 12.42 Mb. Pdf ko'rish
|
x=
~ ~
x ^
input vector : relative input vector : Fig. 2. Affine Subspace in a computational unit vectors b (i) h
∈ {1, · · · , H}. First of all, we define the orthogonal projection of a input vector x onto the affine subspace of i-th unit as ˆ x
= μ (i)
+ H h=1 (φ (i)T
b (i)
h )b (i) h , (1) where φ (i)
= x − μ
(i) . Therefore the projection error is represented as ˜ x
= φ (i)
− H h=1 (φ (i)T
b (i)
h )b (i) h . (2) Figure 2 shows a schematic interpretation for the orthogonal projection and the projection error of a input vector onto the affine subspace defined in a unit i. The AMSOM is more general strategy than the ASSOM, where each compu- tational unit solely defines a subspace. To illustrate why this is so, let us consider a very simple case: Suppose we are given two clusters as shown in Fig.3(a). It
510 H. Kawano, H. Maeda, and N. Ikoma O class 1
class 2 x 2 x 1 (a)
manifold 2 manifold 1 O class 1
class 2 x 2 x 1 b 1 b 2 μ
μ
(b)
Fig. 3. (a) Clusters in 2-dimensional space: An example of the case which can not be separated without a mean value. (b) Two 1-dimensional affine subspaces to approxi- mate and classify clusters. is not possible to use one dimensional subspaces, that is lines intersecting the origin O, to approximate the clusters. This is true even if the global mean is removed, so that the origin O is translated to the centroid of the two clusters. However, two one-dimensional affine subspaces can easily approximate the clus- ters as shown in Fig.3(b), since the basis vectors are aligned in the direction that minimizes the projection error. In the AMSOM, the input vectors are grouped into episodes in order to present them to the network as an input sets. For pattern classification, an episode input is defined as a subset of training data belonging to the same category. Assume that the number of input vectors in the subset is E, then an episode input ω q in the class q is denoted as ω q = {x 1 , x 2 , · · · , x E } , ω
q ⊆ Ω
q , where Ω q is a set of training patterns belonging to the class q. The set of input vectors of an episode has to be recognized as one class, such that any member of this set and even an arbitrary linear combination of them should have the same winning unit. The training process in AMSOM has the following steps: (a) Winner lookup. The unit that gives the minimum projection error for an episode is selected. The unit is denoted as the winner, whose index is c. This decision criterion for the winner c is represented as c = arg min i E
||˜x (i)
e || 2 , (3)
where i ∈ {1, · · · , M}. Self-Organizing Clustering with Map 511
(b) Learning. For each unit i, and for each x e , update μ (i) μ (i) (t + 1) = μ (i)
(t) + λ m (t)h ci (t) x
e − μ
(i) (t) ,
(4) where λ
m (t) is the learning rate for μ (i) at learning epoch t, h ci (t) is the neighborhood function at learning epoch t with respect to the winner c. Both λ
m (t) and h ci (t) are monotonic decreasing function with respect to t. In this paper, λ m (t) = λ ini m (λ f in m /λ f in m ) t/t max
and h ci (t) = exp( −|c−i|/γ(t)), γ(t) = γ
ini (γ f in /γ f in
) t/t
max are used. Then the basis vectors are updated as b (i) h (t + 1) = b (i) h
b (t)h
ci (t)
φ (i)
e (t)
T b (i) h (t)
|| ˆ φ (i) e (t)
||||φ (i)
e (t)
|| , φ
(i) e (t), (5) where φ
(i) e (t) is the relative input vector in the manifold i updated the mean vector, which is represented by φ (i)
e (t) = x
e − μ
(i) (t + 1), ˆ φ (i)
e (t) is the orthogonal projection of the relative input vector, which is represented by ˆ φ (i) e (t) = H h=1
(φ (i)
(t) T b (i) h (t))b (i) h (t) and λ b (t) is the learning rate for the basis vectors, which is also monotonic decreasing function with respect to t. In this paper, λ b (t) = λ
ini b (λ f in b /λ f in b ) t/t max
is used. After the learning phase, a categorization phase to determine the class asso- ciation of each unit. Each unit is labeled by the class index for which is selected as the winner most frequently when the input data for learning are applied to the AMSOM again, 3 Self-Organizing Clustering with Nonlinear Varieties 3.1 Reproducing Kernels Reproducing kernels are functions k : X 2 → R which for all pattern sets {x 1 , · · · , x l } ⊂ X
(6) give rise to positive matrices K ij := k(x
i , x
j ). Here,
X is some compact set in which the data resides, typically a subset of R n . In the field of Support Vector Machine (SVM), reproducing kernels are often referred to as Mercer kernels. They provide an elegant way of dealing with nonlinear algorithms by reducing them to linear ones in some feature space F nonlinearly related to input space: Using k instead of a dot product in R n corresponds to mapping the data into a possibly high-dimensional dot product space F by a (usually nonlinear) map Φ : R n
k(x, y) = (Φ(x), Φ(y)) . (7)
By virtue of this property, we shall call Φ a feature map associated with k. Any linear algorithm which can be carried out in terms of dot products can be made 512 H. Kawano, H. Maeda, and N. Ikoma nonlinear by substituting a priori chosen kernel. Examples of such algorithms include the potential function method[12], SVM [7][8] and kernel PCA.[9] The price that one has to pay for this elegance, however, is that the solutions are only obtained as expansions in terms of input patterns mapped into feature space. For instance, the normal vector of an SV hyperplane is expanded in terms of Support Vectors, just as the kernel PCA feature extractors are expressed in terms of training examples, Ψ =
l i=1
α i Φ(x i ). (8) 3.2 AMSOM in the Feature Space The AMSOM in the high-dimensional feature space F is considered. In the pro- posed method, an varieties defined by a computational unit i in the competitive layer take the nonlinear form M i
{Φ(x)|Φ(x) = Φ(μ (i)
) + H h=1 ξΦ(b (i)
h ) }, (9) where ξ
∈ R. Given training data set {x 1 , · · · , x N }, the mean vector and the basis vector in a unit i are represented by the following form Φ(μ
(i) ) =
N l=1
α (i)
l Φ(x
l ), (10) Φ(b (i)
h ) =
N l=1
β (i)
hl Φ(x
l ), (11) respectively. α (i)
l in Eq.(10) and β (i) hl
learning. The derivation of training procedure in the proposed method is given as follows: (a) Winner lookup. The norm of the orthogonal projection error onto the i-th affine subspace with respect to present input x p is calculated as follows: ||Φ(˜x p ) (i) || 2 = k(x p , x p ) +
H h=1
P (i)
h 2 + N l 1 =1 N l 2 =1 α (i) l 1 α (i)
l 2 k(x l 1 , x l 2 ) − 2 N l=1 α l k(x, x l ) + 2
H h=1
N l 1 =1 N l 2 =1 P (i) h α l 1 β hl 2 k(x l 1 , x l 2 ) − 2 H h=1 N l=1
P (i)
h β hl k(x, x l ), (12) Self-Organizing Clustering with Map 513
where P (i)
h means the orthogonal projection component of present input x p into the basis Φ(b (i) h ) and it is calculated by P (i)
h = N l=1 β (i) hl k(x
p , x
l ) − N l 1 =1 N l 2 =1 α (i) l 1 β (i)
hl 2 k(x l 1 , x l 2 ). (13) The reproducing kernels used generally are as follows: k(x s
t ) = (x T s x t ) d d ∈ N,
(14) k(x
s , x
t ) = (x T s x t + 1)
d d ∈ N, (15) k(x
s , x
t ) = exp
− ||x
s −x t || 2 2σ 2 σ ∈ R, (16) where N and R are the set of natural numbers and the set of reals, respec- tively. Eq.(14), Eq.(15) and Eq.(16) are referred as to homogeneous poly- nomial kernels, non-homogeneous polynomial kernels and gaussian kernels, respectively. The winner for an episode input ω q =
1 ), · · · , Φ(x E ) } is decided by the same manner as the AMSOM as follows: c = arg min i E
||Φ(˜x e ) (i) || 2 , i ∈ {1, · · · , M}. (17) (b) Learning. The learning rule for α (i) l and β (i) hl are as follows: Δα (i)
l = ⎧ ⎪ ⎨ ⎪ ⎩ −α (i) l (t)λ
m (t)h
ci (t)
f or l = e −α (i) l (t)λ
m (t)h
ci (t) + λ
m (t)h
ci (t)
f or l = e , (18) Δβ (i)
hl = ⎧ ⎪ ⎪ ⎪ ⎪ ⎨ ⎪ ⎪ ⎪ ⎪ ⎩ −α (i) l (t + 1)λ
b (t)h
ci (t)T
(i) h (t) f or l = e −α (i) l (t + 1)λ
b (t)h
ci (t)T (t)
+λ b (t)h ci (t)T (t)
f or l = e , (19) where T (t) =
Φ(φ (i)
e (t))
T Φ(b
(i) h (t)) ||Φ( ˆ φ (i) e (t))
||||Φ(φ (i)
e (t))
|| , (20) Φ( ˆ φ (i) e (t))) =
H h=1
N l=1
β (i)
hl k(x
e , x
l ) − N l 1 =1 N l 2 =1 α l 1 β kl 2 k(x l 1 , x l 2 ) 2 1 2 , (21)
||Φ(φ (i)
e )(t)
||= k(x e , x e ) −2 N l=1
α (i)
l k(x
e , x
l ) +
N l 1 =1 N l 2 =1 α (i) l 1 α (i)
l 2 k(x l 1 , x l 2 ) 1 2 , (22) 514 H. Kawano, H. Maeda, and N. Ikoma Φ(φ (i)
e (t))
T Φ(b
(i) h (t)) = N l=1
β hl k(x e , x
l ) − N l 1 =1 N l 2 =1 α (i) l 1 β (i)
hl 2 k(x l 1 , x l 2 ). (23) In Eqs.(18) and (19), λ m (t), λ b (t) and h ci (t) are the same parameters as mentioned in the AMSOM training process. After the learning phase, a categorization phase to determine the class association of each unit. The procedure of the categorization is the same manner as mentioned in previous section. 4 Experimental Results Data Distribution 1. To analyze the effect of reducing the dimensionality for the intersections of subspaces by the proposed method, the data as shown in Fig.4(a) is used. For this data, although a set of each class can be ap- proximated by 1-dimensional linear manifold in the input space R 2 , the
intersections of subspace could be happend between class 1 and class 2, and between class 2 and class 3 , even if the optimal linear manifold for each class can be obtained. However, the linear manifold in the high-dimensional space, that is the nonlinear manifold in input space, can be expected to classify the given data by reduction effect of the dimensionality for the intersections of subspaces. As the result of simulation, the given input data are classified correctly by the proposed method. Figure 4(b) and (c) are the decision regions con- structed by AMSOM and the proposed method, respectively. In this exper- iment, 3 units were used in the competitive layer. The class associations to each unit are class 1 to unit 1, class 2 to unit 2, class 3 to unit 3, respectively. In this simulation, the experimental parameters are assigned as follows: the totoal number of epochs t max
= 100, λ ini
m = 0.1, λ
f in m = 0.01, λ ini b = 0.1, λ f in
b = 0.01, γ ini = 3, γ
ini = 0.03, both in AMSOM and in the proposed method in common, H = 1 in AMSOM, and H = 2, k(x, y) = (x T y + 1) 3 in the proposed method. From the experiment, it was shown that the the proposed method has the ability to reduce the dimensionality for intersections of subspaces. Data Distribution 2. To verify that the proposed method has the ability to describe the nonlinear manifolds, the data as shown in Fig.5(a) is used. This case is of impossible to describe each class by a linear manifold, that is 1-dimensional affine subspace. As the result of simulation, the given input data are classified correctly by the proposed method. Figure 5(b) and (c) are the decision regions con- structed by AMSOM and the proposed method, respectively. In this exper- iment, 3 units were used in the competitive layer. The class associations to each unit are class 1 to unit 3, class 2 to unit 2, class 3 to unit 1, respectively. In this simulation, the experimental parameters are assigned as follows: the Self-Organizing Clustering with Map 515
(a) -5 -4 -3 -2 -1 0 1 2 3 4 5 -5 -4 -3 -2 -1 0 1 2 3 4 5 x 2 x 1 Unit 3
Unit 1 Unit 2
(b) -5 -4 -3 -2 -1 0 1 2 3 4 5 -5 -4 -3 -2 -1 0 1 2 3 4 5 x 2 x 1 Unit 2
Unit 3 Unit 1
(c) Fig. 4. (a) Training data used in the second experiment. (b) Decision regions learned by AMSOM and (c) the proposed method. -4 -3 -2 -1 0 1 2 3 4 -4 -3 -2 -1 0 1 2 3 4 2 x x 1 class 1
class 2 class 3
(a) -4 -3 -2 -1 0 1 2 3 4 -4 -3 -2 -1 0 1 2 3 4 2 x x 1 (b)
-4 -3 -2 -1 0 1 2 3 4 -4 -3 -2 -1 0 1 2 3 4 2 x x 1 (c)
Fig. 5. (a) Training data used in the second experiment. (b) Decision regions learned by AMSOM and (c) the proposed method. totoal number of epochs t max
= 100, λ ini
m = 0.1, λ
f in m = 0.01, λ ini b = 0.1, λ f in
b = 0.01, γ ini = 3, γ
ini = 0.03, both in AMSOM and in the proposed method in common, H = 1 in AMSOM, and H = 2, k(x, y) = (x T y) 2 in the proposed method. From the experiment, it was shown that the the proposed method has the ability to extract the suitable nonlinear manifolds efficiently. 5 Conclusions A clustering method with map of nonlinear varieties was proposed as a new pattern classification method. The proposed method has been extended to a nonlinear method easily from AMSOM by applying the kernel method. The effectiveness of the proposed method were verified by the experiments. The pro- posed algorithm has highly promising applications of the ASSOM in a wide area of practical problems. 516 H. Kawano, H. Maeda, and N. Ikoma References 1. Oja, E.: Subspace Methods of Pattern Recognition. Research Studies Press (1983) 2. Kohonen, T.: Emergence of Invariant-Feature Detectors in the Adaptive-Subspace Self-Organizing Map. Biol.Cybern 75, 281–291 (1996) 3. Kohonen, T., Kaski, S., Lappalainen, H.: Self-Organizing Formation of Various invariant-feature filters in the Adaptive Subspace SOM. Neural Comput 9, 1321– 1344 (1997) 4. Kohonen, T.: Self-Organizing Maps. Springer, Berlin, Heidelberg, New York (1995) 5. Liu, Z.Q.: Adaptive Subspace Self-Organizing Map and Its Application in Face Recognition. International Journal of Image and Graphics 2(4), 519–540 (2002) 6. Saitoh, S.: Theory of Reproducing Kernels and its Applications. In: Longman Sci- entific & Technical, Harlow, England (1988) 7. Cortes, C., Vapnik, V.: Support Vector Networks. Machine Learning 20, 273–297 (1995)
8. Vapnik, V.: The Nature of Statistical Learning Theory, 2nd edn. Springer, New York, Berlin, Heidelberg (1995) 9. Sch¨ olkopf, B., Smola, A.J., M¨ uler, K.R.: Nonlinear Component Analysis as a Ker- nel Eigenvalue Problem, Technical Report 44, Max-Planck-Institut fur biologische Kybernetik (1996) 10. Mika, S., R¨ atsch, G., Weston, J., Sch¨ olkopf, B., M¨ uler, K.R.: Fisher discriminant analysis with kernels. Neural Networks for Signal Processing IX, 41–48 (1999) 11. Boser, B.E., Guyon, I.M., Vapnik, V.N.: A Training Algorithm for Otimal Margin Classifiers. In: Proceedings of the 5th Annual ACM Workshop on Computational Learning Theory, pp. 144–152 (1992) 12. Aizerman, M., Braverman, E., Rozonoer, L.: Theoretical Foundations of the Poten- tial Function Method in Pattern Recognition Learning. Automation and Remote Control 25, 821–837 (1964) 13. Sch¨ olkopf, B., Smola, A.J.: Learning with Kernels. The MIT Press, Cambridge (2002)
M. Ishikawa et al. (Eds.): ICONIP 2007, Part I, LNCS 4984, pp. 517–526, 2008. © Springer-Verlag Berlin Heidelberg 2008 Download 12.42 Mb. Do'stlaringiz bilan baham: |
ma'muriyatiga murojaat qiling