Lecture Notes in Computer Science

bet	48/88
Sana	16.12.2017
Hajmi	12.42 Mb.
	#22381

1 ... 44 45 46 47 48 49 50 51 ... 88

~ ~

^

x

input vector :

relative input vector :

Fig. 2. Aﬃne Subspace in a computational unit

vectors b

(i)

h

, h

∈ {1, · · · , H}. First of all, we deﬁne the orthogonal projection of

a input vector x onto the aﬃne subspace of i-th unit as

x

(i)

= μ

(i)

h=1

(φ

(i)T

(i)

(1)

where φ

(i)

= x

− μ

(i)

. Therefore the projection error is represented as

x

(i)

= φ

(i)

−

h=1

(φ

(i)T

(i)

(2)

Figure 2 shows a schematic interpretation for the orthogonal projection and the

projection error of a input vector onto the aﬃne subspace deﬁned in a unit i.

The AMSOM is more general strategy than the ASSOM, where each compu-

tational unit solely deﬁnes a subspace. To illustrate why this is so, let us consider

a very simple case: Suppose we are given two clusters as shown in Fig.3(a). It

510

H. Kawano, H. Maeda, and N. Ikoma

class 1

class 2

x

2

x

1

(a)

manifold 2

manifold 1

class 1

class 2

x

2

x

1

b

1

b

2

μ

1

μ

2

(b)

Fig. 3. (a) Clusters in 2-dimensional space: An example of the case which can not be

separated without a mean value. (b) Two 1-dimensional aﬃne subspaces to approxi-

mate and classify clusters.

is not possible to use one dimensional subspaces, that is lines intersecting the

origin O, to approximate the clusters. This is true even if the global mean is

removed, so that the origin O is translated to the centroid of the two clusters.

However, two one-dimensional aﬃne subspaces can easily approximate the clus-

ters as shown in Fig.3(b), since the basis vectors are aligned in the direction that

minimizes the projection error.

In the AMSOM, the input vectors are grouped into episodes in order to present

them to the network as an input sets. For pattern classiﬁcation, an episode input

is deﬁned as a subset of training data belonging to the same category. Assume

that the number of input vectors in the subset is E, then an episode input ω

the class q is denoted as ω

, x

· · · , x

} , ω

⊆ Ω

, where Ω

is a set of

training patterns belonging to the class q. The set of input vectors of an episode

has to be recognized as one class, such that any member of this set and even an

arbitrary linear combination of them should have the same winning unit.

The training process in AMSOM has the following steps:

(a) Winner lookup. The unit that gives the minimum projection error for an

episode is selected. The unit is denoted as the winner, whose index is c. This

decision criterion for the winner c is represented as

c = arg min

E

e=1

||˜x

(i)

(3)

where i

∈ {1, · · · , M}.

Self-Organizing Clustering with Map

511

(b) Learning. For each unit i, and for each x

, update μ

(i)

(t + 1) = μ

(i)

(t) + λ

(t)h

(t) x

− μ

(i)

(t) ,

(4)

where λ

(t) is the learning rate for μ

(i)

at learning epoch t, h

(t) is the

neighborhood function at learning epoch t with respect to the winner c.

Both λ

(t) and h

(t) are monotonic decreasing function with respect to t.

In this paper, λ

(t) = λ

ini

(λ

f in

/λ

f in

)

t/t

max

and h

(t) = exp(

−|c−i|/γ(t)),

γ(t) = γ

ini

(γ

f in

/γ

f in

)

t/t

max

are used.

Then the basis vectors are updated as

(i)

(t + 1) = b

(i)

h

(t) + λ

(t)h

(t)

(i)

(t)

(i)

(t)

|| ˆ

(i)

(t)

||||φ

(i)

(t)

, φ

(i)

(t),

(5)

where φ

(i)

(t) is the relative input vector in the manifold i updated the mean

vector, which is represented by φ

(i)

(t) = x

− μ

(i)

(t + 1), ˆ

(i)

(t) is the

orthogonal projection of the relative input vector, which is represented by

(i)

(t) =

h=1

(φ

(i)

(t)

(i)

(t))b

(i)

(t) and λ

(t) is the learning rate for the

basis vectors, which is also monotonic decreasing function with respect to t.

In this paper, λ

(t) = λ

ini

(λ

f in

/λ

f in

)

t/t

max

is used.

After the learning phase, a categorization phase to determine the class asso-

ciation of each unit. Each unit is labeled by the class index for which is selected

as the winner most frequently when the input data for learning are applied to

the AMSOM again,

Self-Organizing Clustering with Nonlinear Varieties

3.1

Reproducing Kernels

Reproducing kernels are functions k :

→ R which for all pattern sets

· · · , x

} ⊂ X

(6)

give rise to positive matrices K

:= k(x

, x

). Here,

X is some compact set in

which the data resides, typically a subset of R

. In the ﬁeld of Support Vector

Machine (SVM), reproducing kernels are often referred to as Mercer kernels.

They provide an elegant way of dealing with nonlinear algorithms by reducing

them to linear ones in some feature space

F nonlinearly related to input space:

Using k instead of a dot product in R

corresponds to mapping the data into

a possibly high-dimensional dot product space

F by a (usually nonlinear) map

Φ : R

n

→ F, and taking the dot product there, i.e.[11]

k(x, y) = (Φ(x), Φ(y)) .

(7)

By virtue of this property, we shall call Φ a feature map associated with k. Any

linear algorithm which can be carried out in terms of dot products can be made

512

H. Kawano, H. Maeda, and N. Ikoma

nonlinear by substituting a priori chosen kernel. Examples of such algorithms

include the potential function method[12], SVM [7][8] and kernel PCA.[9] The

price that one has to pay for this elegance, however, is that the solutions are only

obtained as expansions in terms of input patterns mapped into feature space.

For instance, the normal vector of an SV hyperplane is expanded in terms of

Support Vectors, just as the kernel PCA feature extractors are expressed in

terms of training examples,

Ψ =

i=1

Φ(x

(8)

3.2

AMSOM in the Feature Space

The AMSOM in the high-dimensional feature space

F is considered. In the pro-

posed method, an varieties deﬁned by a computational unit i in the competitive

layer take the nonlinear form

i

=

{Φ(x)|Φ(x) = Φ(μ

(i)

) +

h=1

ξΦ(b

(i)

)

(9)

where ξ

∈ R. Given training data set {x

· · · , x

}, the mean vector and the

basis vector in a unit i are represented by the following form

Φ(μ

(i)

) =

l=1

(i)

Φ(x

(10)

Φ(b

(i)

) =

l=1

(i)

Φ(x

(11)

respectively. α

(i)

in Eq.(10) and β

(i)

hl

in Eq.(11) are the parameters adjusted by

learning.

The derivation of training procedure in the proposed method is given as

follows:

(a) Winner lookup. The norm of the orthogonal projection error onto the i-th

aﬃne subspace with respect to present input x

is calculated as follows:

||Φ(˜x

)

(i)

= k(x

, x

) +

h=1

(i)

k(x

, x

)

− 2

l=1

k(x, x

) + 2

h=1

(i)

k(x

, x

)

− 2

h=1

l=1

(i)

k(x, x

(12)

Self-Organizing Clustering with Map

513

where P

(i)

means the orthogonal projection component of present input x

into the basis Φ(b

(i)

) and it is calculated by

(i)

l=1

(i)

k(x

, x

)

−

(i)

k(x

, x

(13)

The reproducing kernels used generally are as follows:

k(x

s

, x

)

= (x

)

∈ N,

(14)

k(x

, x

)

= (x

+ 1)

∈ N,

(15)

k(x

, x

) = exp

−

||x

−x

2σ

∈ R,

(16)

where N and R are the set of natural numbers and the set of reals, respec-

tively. Eq.(14), Eq.(15) and Eq.(16) are referred as to homogeneous poly-

nomial kernels, non-homogeneous polynomial kernels and gaussian kernels,

respectively.

The winner for an episode input ω

=

{Φ(x

· · · , Φ(x

)

} is decided by

the same manner as the AMSOM as follows:

c = arg min

E

e=1

||Φ(˜x

)

(i)

, i

∈ {1, · · · , M}.

(17)

(b) Learning. The learning rule for α

(i)

and β

(i)

are as follows:

Δα

(i)

⎧

⎪

⎨

⎪

⎩

−α

(i)

(t)λ

(t)h

(t)

f or l = e

−α

(i)

(t)λ

(t)h

(t) + λ

(t)h

(t)

f or l = e

(18)

Δβ

(i)

⎧

⎪

⎨

⎪

⎩

−α

(i)

(t + 1)λ

(t)h

(t)T

(i)

(t)

f or l = e

−α

(i)

(t + 1)λ

(t)h

(t)T (t)

+λ

(t)h

(t)T (t)

f or l = e

(19)

where

T (t) =

Φ(φ

(i)

(t))

Φ(b

(i)

(t))

||Φ( ˆ

(i)

(t))

||||Φ(φ

(i)

(t))

(20)

Φ( ˆ

(i)

(t))) =

h=1

l=1

(i)

k(x

, x

)

−

k(x

, x

)

(21)

||Φ(φ

(i)

)(t)

||= k(x

, x

)

−2

l=1

(i)

k(x

, x

) +

(i)

k(x

, x

)

(22)

514

H. Kawano, H. Maeda, and N. Ikoma

Φ(φ

(i)

(t))

Φ(b

(i)

(t)) =

l=1

k(x

, x

)

−

(i)

k(x

, x

). (23)

In Eqs.(18) and (19), λ

(t), λ

(t) and h

(t) are the same parameters as

mentioned in the AMSOM training process.

After the learning phase, a categorization phase to determine the class

association of each unit. The procedure of the categorization is the same

manner as mentioned in previous section.

Experimental Results

Data Distribution 1. To analyze the eﬀect of reducing the dimensionality for

the intersections of subspaces by the proposed method, the data as shown

in Fig.4(a) is used. For this data, although a set of each class can be ap-

proximated by 1-dimensional linear manifold in the input space R

, the

intersections of subspace could be happend between class 1 and class 2, and

between class 2 and class 3 , even if the optimal linear manifold for each class

can be obtained. However, the linear manifold in the high-dimensional space,

that is the nonlinear manifold in input space, can be expected to classify the

given data by reduction eﬀect of the dimensionality for the intersections of

subspaces.

As the result of simulation, the given input data are classiﬁed correctly

by the proposed method. Figure 4(b) and (c) are the decision regions con-

structed by AMSOM and the proposed method, respectively. In this exper-

iment, 3 units were used in the competitive layer. The class associations to

each unit are class 1 to unit 1, class 2 to unit 2, class 3 to unit 3, respectively.

In this simulation, the experimental parameters are assigned as follows: the

totoal number of epochs t

max

= 100, λ

ini

= 0.1, λ

f in

= 0.01, λ

ini

= 0.1,

f in

= 0.01, γ

ini

= 3, γ

ini

= 0.03, both in AMSOM and in the proposed

method in common, H = 1 in AMSOM, and H = 2, k(x, y) = (x

y + 1)

in the proposed method.

From the experiment, it was shown that the the proposed method has

the ability to reduce the dimensionality for intersections of subspaces.

Data Distribution 2. To verify that the proposed method has the ability to

describe the nonlinear manifolds, the data as shown in Fig.5(a) is used. This

case is of impossible to describe each class by a linear manifold, that is

1-dimensional aﬃne subspace.

As the result of simulation, the given input data are classiﬁed correctly

by the proposed method. Figure 5(b) and (c) are the decision regions con-

structed by AMSOM and the proposed method, respectively. In this exper-

iment, 3 units were used in the competitive layer. The class associations to

each unit are class 1 to unit 3, class 2 to unit 2, class 3 to unit 1, respectively.

In this simulation, the experimental parameters are assigned as follows: the

Self-Organizing Clustering with Map

515

(a)

-5

-4

-3

-2

-1

-5 -4 -3 -2 -1 0

x

2

x

1

Unit 3

Unit 1

Unit 2

(b)

-5

-4

-3

-2

-1

-5 -4 -3 -2 -1 0

x

2

x

1

Unit 2

Unit 3

Unit 1

(c)

Fig. 4. (a) Training data used in the second experiment. (b) Decision regions learned

by AMSOM and (c) the proposed method.

-4

-3

-2

-1

-4

-3

-2

-1

4

2

x

x

1

class 1

class 2

class 3

(a)

-4

-3

-2

-1

-4

-3

-2

-1

4

2

x

x

1

(b)

-4

-3

-2

-1

-4

-3

-2

-1

2

x

x

1

(c)

Fig. 5. (a) Training data used in the second experiment. (b) Decision regions learned

by AMSOM and (c) the proposed method.

totoal number of epochs t

max

= 100, λ

ini

= 0.1, λ

f in

= 0.01, λ

ini

= 0.1,

f in

= 0.01, γ

ini

= 3, γ

ini

= 0.03, both in AMSOM and in the proposed

method in common, H = 1 in AMSOM, and H = 2, k(x, y) = (x

the proposed method.

From the experiment, it was shown that the the proposed method has

the ability to extract the suitable nonlinear manifolds eﬃciently.

Conclusions

A clustering method with map of nonlinear varieties was proposed as a new

pattern classiﬁcation method. The proposed method has been extended to a

nonlinear method easily from AMSOM by applying the kernel method. The

eﬀectiveness of the proposed method were veriﬁed by the experiments. The pro-

posed algorithm has highly promising applications of the ASSOM in a wide area

of practical problems.

516

H. Kawano, H. Maeda, and N. Ikoma

References

1. Oja, E.: Subspace Methods of Pattern Recognition. Research Studies Press (1983)

2. Kohonen, T.: Emergence of Invariant-Feature Detectors in the Adaptive-Subspace

Self-Organizing Map. Biol.Cybern 75, 281–291 (1996)

3. Kohonen, T., Kaski, S., Lappalainen, H.: Self-Organizing Formation of Various

invariant-feature ﬁlters in the Adaptive Subspace SOM. Neural Comput 9, 1321–

1344 (1997)

4. Kohonen, T.: Self-Organizing Maps. Springer, Berlin, Heidelberg, New York (1995)

5. Liu, Z.Q.: Adaptive Subspace Self-Organizing Map and Its Application in Face

Recognition. International Journal of Image and Graphics 2(4), 519–540 (2002)

6. Saitoh, S.: Theory of Reproducing Kernels and its Applications. In: Longman Sci-

entiﬁc & Technical, Harlow, England (1988)

7. Cortes, C., Vapnik, V.: Support Vector Networks. Machine Learning 20, 273–297

(1995)

8. Vapnik, V.: The Nature of Statistical Learning Theory, 2nd edn. Springer, New

York, Berlin, Heidelberg (1995)

9. Sch¨

olkopf, B., Smola, A.J., M¨

uler, K.R.: Nonlinear Component Analysis as a Ker-

nel Eigenvalue Problem, Technical Report 44, Max-Planck-Institut fur biologische

Kybernetik (1996)

10. Mika, S., R¨

atsch, G., Weston, J., Sch¨

olkopf, B., M¨

uler, K.R.: Fisher discriminant

analysis with kernels. Neural Networks for Signal Processing IX, 41–48 (1999)

11. Boser, B.E., Guyon, I.M., Vapnik, V.N.: A Training Algorithm for Otimal Margin

Classiﬁers. In: Proceedings of the 5th Annual ACM Workshop on Computational

Learning Theory, pp. 144–152 (1992)

12. Aizerman, M., Braverman, E., Rozonoer, L.: Theoretical Foundations of the Poten-

tial Function Method in Pattern Recognition Learning. Automation and Remote

Control 25, 821–837 (1964)

13. Sch¨

olkopf, B., Smola, A.J.: Learning with Kernels. The MIT Press, Cambridge

(2002)

M. Ishikawa et al. (Eds.): ICONIP 2007, Part I, LNCS 4984, pp. 517–526, 2008.

Download 12.42 Mb.

Do'stlaringiz bilan baham:

1 ... 44 45 46 47 48 49 50 51 ... 88