Lecture Notes in Computer Science

bet	78/88
Sana	16.12.2017
Hajmi	12.42 Mb.
	#22381

1 ... 74 75 76 77 78 79 80 81 ... 88

(n = 1, 2, 3 ...) Common kernel K (
Pattern Discovery for High-Dimensional Binary Datasets

t)

c(t)

image

Feature / Kernel

Class map

HOMS feature

φ

Power spectrum ,

Bispectrum,

Trispectrum ...

HOM feature

ψ

Autocorrelations

(n = 1, 2, 3 ...)

Common kernel

K(s

,

v)

> = <

Fig. 1. The LHOM(S) kernel uniﬁes the treatment of higher-order moment features

and higher-order moment spectra features of a signal

This corollary shows that LHOMS and LHOM kernels are proportional, which is

clear from the fact that LHOMS is a Fourier transform of LHOM as mentioned

in Eq. (4). Therefore, this kernel function introduces a uniﬁed view of the kernel-

based pattern recognition using moments and moment spectra, as shown in

Fig. 1.

Texture Classiﬁcation Experiments

5.1

Experimental Conditions

The SVM with the LHOM(S) kernel was tested in two-class texture classiﬁcation

problems. Image luminance was preprocessed to be scaled to the range of [0, 1].

The training sets were obtained from random positions of the original texture

images. All the experiments used 20 training examples per class, each sized

× 32 pixels. The kernel types used in the experiments were the following.

1. Polynomial kernel of Eq. (9), for (d = 1, . . . , 5).

2. RBF kernel kernel of Eq. (10), for (γ = 1, 10, 100).

3. LHOM kernel in their discrete versions [6] with isotropic Gaussian windows.

Order n and window width σ were varied as, (n = 2, 3, 4, 5) and (σ = 4, 8, 12).

When testing the SVM after training, images with tiled texture patterns were

used as shown in Figs. 2(c) and 3(c). The classiﬁcation rates were calculated for

the center regions of the test images allowing edge margins of 16 pixel width.

The output class maps will be presented as the linear output u in Eq. (8). No

post-smoothing of the class maps has been used.

5.2

Sinusoidal Wave with Harmonic Components

Two texture classes generated by a computer according to two formulas,

A

(x) = sin(ω

x) + sin(2ω

x) + sin(3ω

and

(x) = sin(ω

x) + sin(2ω

x + φ) + sin(3ω

x).

Comparison of LHOM Kernel and Conventional Kernels in SVM

857

(a)

(b)

(c)

Fig. 2. Computer generated textures including fundamental, second and third har-

monic frequency components. (a) Class 1, (b) Class 2, and (c) tiled collage for testing.

(a)

(b)

(c)

Fig. 3. Natural texture images selected from the Vision Texture collection. (a) Fabric17

texture, (b) Fabric18 texture and (c) tiled collage for testing.

Here, the fundamental frequency vector was set to ω

= [π/6, π/6]

. Phase shift

φ = π/2 in the second harmonic component of s

is the only diﬀerence in the

two textures. Although the two classes cannot be discriminated using the local

power spectrum, they should be easily classiﬁed using the phase information,

extracted by the features of higher orders.

The classiﬁcation results in Table 1 show that the training was successful for

RBF and LHOM kernels (n = 3, 4, 5) only as expected. The higher test rates for

the LHOMS kernels is partly due to the clear discrimination of the classes within

the feature space, indicated by the small number of support vectors without using

soft margins.

5.3

Fabric Texture

The textures in Fig. 3 were selected from the fabric textures of the Vision Texture

collection [9]. The results are shown in Table 2. Tendencies similar to the previous

experiment are observed showing the superiority of the LHOMS kernel. Fig. 5(a)

indicates that the unthresholded output of the SVM with RBF kernels have a

very small inter-class variance indicating the instability of the classiﬁer. Use of

soft margins resulted in a small improvement for the SVMs with LHOM kernels.

858

K. Kameyama

Table 1. Trainability, number of support vectors and the test rate of the SVMs using

various kernels for the computer-generated texture

84 %

8 (20%)

Yes

LHOM (n = 5)

82 %

3 (7.5%)

Yes

LHOM (n = 4)

83 %

2 (5%)

Yes

LHOM (n = 3)

-

No

LHOM (n = 2)

80 %

39 (97.5%)

Yes

RBF (

=10)

(

= 8)

(

= 8)

(

= 8)

(

= 8)

Polynomial

(d = 1, .., 5)

Test rate

SV number

(ratio to TS)

Trained ?

Kernel Type

(a)

(c)

(b)

(d)

Fig. 4. SVM linear outputs and classiﬁcation rates for the test image of the computer-

generated texture using various types of kernels. (a) RBF(80%), (b) LHOM (n = 3)

(83%), (c) LHOM (n = 4)(82%) and (d) LHOM (n = 5)(84%).

Comparison of LHOM Kernel and Conventional Kernels in SVM

859

Table 2. Trainability, number of support vectors and the test rate of the SVMs using

various kernels for the natural texture

67 % / 69 %

37 / 40

Yes

LHOM (n = 5)

75 % / 75 %

30 / 35

Yes

LHOM (n = 4)

73 % / 75 %

27 / 40

Yes

LHOM (n = 3)

69 % / 79 %

5 / 40

Yes

LHOM (n = 2)

65 % / 65 %

40 / 40

Yes

50 % / 50 %

40 / 40

Partly

(at d=1only)

Polynomial

(d = 1, .., 5)

Test rate

(No SM / Use SM)

SV number

(No SM / Use SM)

Trained ?

Kernel Type

RBF (

=10)

(

= 8)

(

= 8)

(

= 8)

(

= 8)

(a)

(c)

(b)

(d)

Fig. 5. SVM linear outputs and classiﬁcation rates for the test image of the natural

texture using various types of kernels. (a) RBF(65%), (b) LHOM (n = 2)(69%), (c)

LHOM (n = 3)(73%) and (d) LHOM (n = 4)(75 %).

860

K. Kameyama

5.4

Discussion

The test using the tiled images evaluates the combination of the raw classiﬁcation

rate and the spatial resolution. Further investigation separating the two factors

are necessary. When non-tiled (pure) test images of the fabric texture were used,

SVMs with kernels of RBF (γ = 10), LHOMS (σ = 8) for n = 2, 3, 4 and 5

achieved 90%, 95%, 90%, 95% and 80%, respectively. In [1], classiﬁers using

LHOM features of orders up to 3 is reported to have achieved test rates over

96% for 30 classes of natural textures. Although this work does not give a direct

comparison, the results supports the suitability of using LHOM(S) in natural

texture classiﬁcation,

Conclusion

Texture classiﬁcation using SVMs with LHOM kernel and other conventional

kernel functions were compared. It became clear that the SVMs with LHOM

kernels achieve better trainability and give stable response to the texture classes

when compared with those with conventional kernels. Also, the number of sup-

port vectors were lower which indicates a better class separability in the feature

space.

References

1. Kurita, T., Otsu, N.: Texture classiﬁcation by higher order local autocorrelation

features. In: Proc. of Asian Conf. on Computer Vision (ACCV 1993), pp. 175–178

(1993)

2. Cortes, C., Vapnik, V.: Support-vector networks. Machine Learning 20(3), 273–297

(1995)

3. Vapnik, V.N.: The Nature of Statistical Learning Theory, 2nd edn. Springer, Hei-

delberg (2000)

4. Popovici, V., Thiran, J.P.: Higher order autocorrelations for pattern classiﬁcation.

In: Proceedings of International Conference on Image Processing 2001, pp. 724–727

(2001)

5. Popovici, V., Thiran, J.P.: Pattern recognition using higher-order local autocorrela-

tion coeﬃcients. In: Neural Networks for Signal Processing XII (NNSP), pp. 229–238

(2002)

6. Kameyama, K., Taga, K.: Texture classiﬁcation by support vector machines with

kernels for higher-order gabor ﬁltering. In: Proceedings of International Joint Con-

ference on Neural Networks 2004, vol. 4, pp. 3009–3014 (2004)

7. MacLaughlin, J.A., Raviv, J.: N-th autocorrelations in pattern recognition. Infor-

mation and Control 12(2), 121–142 (1968)

8. Umegaki, H.: Basic Information Mathematics - Development via Functional Analysis

- (in Japanese). Saiensu-sha (1993)

9. MIT Vision and Modeling Group: Vision texture (1995)

Pattern Discovery for High-Dimensional

Binary Datasets

Václav Snášel

, Pavel Moravec

, Dušan Húsek

, Alexander Frolov

Hana ˇ

Rezanková

, and Pavel Polyakov

Department of Computer Science, FEECS, VŠB – Technical University of Ostrava,

17. listopadu 15, 708 33 Ostrava-Poruba, Czech Republic

{pavel.moravec,vaclav.snasel}@vsb.cz

Institute of Computer Science, Dept. of Nonlinear Systems,

Academy of Sciences of the Czech Republic, Pod Vodárenskou vˇeží 2,

182 07 Prague, Czech Republic

dusan@cs.cas.cz

Institute of Higher Nervous Activity and Neurophysiology, Russian Academy of Sciences,

Butlerova 5a, 117 485 Moscow, Russia

aafrolov@mail.ru

Department of Statistics and Probability, University of Economics, Prague,

W. Churchill sq. 4, 130 67 Prague, Czech Republic

rezanka@vse.cz

Institute of Optical Neural Technologies, Russian Academy of Sciences,

Vavilova 44, 119 333 Moscow, Russia

pavel.mipt@mail.ru

Abstract. In this paper we compare the performance of several dimension reduc-

tion techniques which are used as a tool for feature extraction. The tested methods

include singular value decomposition, semi-discrete decomposition, non-negative

matrix factorization, novel neural network based algorithm for Boolean factor

analysis and two cluster analysis methods as well. So called bars problem is

used as the benchmark. Set of artiﬁcial signals generated as a Boolean sum of

given number of bars is analyzed by these methods. Resulting images show that

Boolean factor analysis is upmost suitable method for this kind of data.

1

Introduction

In order to perform object recognition (no matter which one) it is necessary to learn rep-

resentations of the underlying characteristic components. Such components correspond

to object-parts, or features. These data sets may comprise discrete attributes, such as

those from market basket analysis, information retrieval, and bioinformatics, as well

as continuous attributes such as those in scientiﬁc simulations, astrophysical measure-

ments, and sensor networks.

Feature extraction from high-dimensional data typically consists in correlation

analysis, clustering including ﬁnding efﬁcient representations for clustered data,

data classiﬁcation, and event association. The objective is discovering meaningful

M. Ishikawa et al. (Eds.): ICONIP 2007, Part I, LNCS 4984, pp. 861–872, 2008.

c Springer-Verlag Berlin Heidelberg 2008

862

V. Snášel et al.

information in data to be able to represent them appropriately and if possible in a space

of lower dimension.

The feature extraction if applied on binary datasets, addresses many research and

application ﬁelds, such as association rule mining [1], market basket analysis [2], dis-

covery of regulation patterns in DNA microarray experiments [3], etc. Many of these

problem areas have been described in tests of PROXIMUS framework (e.g. [4]).

The feature extraction methods can use different aspects of images as the features.

Such methods are either using a heuristics based on the known properties of the image

collection, or are fully automatic and may use the original image vectors as an input.

Here we will concentrate on the case of black and white pictures of bars combinations

represented as binary vectors, so the complex feature extraction methods are unneces-

sary. Values of the entries of this vector represent individual pixels i.e. 0 for white and

1 for black.

There are many attempts that could be used for this reason. This article reports

on an empirical investigation of the performance of some of them. For the sake of

simplicity we use the well-known bars problem (see e.g. [5]) where we try to iso-

late separate horizontal and vertical bars from images containing their combinations.

Here we will concentrate on the category which use dimension reduction techniques

for automatic feature extraction. We will compare results of the most up to date

procedures.

One of such methods is the singular value decomposition which was already many

times successfully used for automatic feature extraction. In case of bars collection (such

as our test data), the base vectors can be interpreted as images, describing some com-

mon characteristics of several input signals.

However singular value decomposition is not suitable for huge collections and is

computationally expensive, so other methods of dimension reduction were proposed.

Here we use the semi-discrete decomposition.

Because the data matrix does have all elements non-negative, we tried to apply a new

method, called non-negative matrix factorization as well.

The question, how does the brain form a useful representation of its environment,

was behind the development of neural network based methods for dimension reduction,

neural network based Boolean factor analysis [6,7] and optimal sparse coding network

developed by Földiák [5].

Here we applied the neural network based Boolean factor analysis developed by us

only, because we have not the optimal sparse coding network algorithm for our disposal

at this moment.

However for the ﬁrst view on image structures, we can apply traditional statistical

methods, mainly different algorithms for statistical cluster analysis.

Analysis of discrete data sets, however, generally leads to NP-complete/hard prob-

lems, especially when physically interpretable results in discrete spaces are desired.

The rest of this paper is organized as follows. The second section explains the di-

mension reduction method used in this study. Then in the section three we describe

experimental results, and ﬁnally in the section four we made some conclusions.

Pattern Discovery for High-Dimensional Binary Datasets

863

2

Dimension Reduction

We used four promising methods of dimension reduction for our comparison – Singular

Value Decomposition (SVD), Semi-Discrete Decomposition (SDD), Non-negative

Matrix Factorization (NMF) and Neural network Boolean Factor Analysis (NBFA). For

the ﬁrst analysis we used two statistical clustering methods, Hierarchical

Agglomerative Algorithm (HAA) and Two-Step Cluster Analysis (TSCA). All methods

are brieﬂy described bellow.

2.1

Singular Value Decomposition

The SVD [8] is an algebraic extension of classical vector model. It is similar to the PCA

method, which was originally used for the generation of eigenfaces in image retrieval.

Informally, SVD discovers signiﬁcant properties and represents the images as linear

combinations of the base vectors. Moreover, the base vectors are ordered according

to their signiﬁcance for the reconstructed image, which allows us to consider only the

ﬁrst k base vectors as important (the remaining ones are interpreted as "noise" and

discarded). Furthermore, SVD is often referred to as more successful in recall when

compared to querying whole image vectors [8].

Formally, we decompose the matrix of images A by SVD, calculating singular values

and singular vectors of A.

We have matrix A, which is an n

× m rank-r matrix (where m ≥ n without loss

of generality) and values σ

, . . . , σ

are calculated from eigenvalues of matrix AA

as σ

√

. Based on them, we can calculate column-orthonormal matrices U =

, . . . , u

)

and V = (v

, . . . , v

)

, where U

U = I

a V

V = I

, and a diagonal

matrix Σ = diag(σ

, . . . , σ

)

, where σ

> 0

for i

≤ r, σ

≥ σ

i+1

and σ

r+1

= . . . =

n

= 0

The decomposition

A = U ΣV

(1)

is called singular decomposition of matrix A and the numbers σ

, . . . , σ

are singular

values of the matrix A. Columns of U (or V ) are called left (or right) singular vectors

of matrix A.

Now we have a decomposition of the original matrix of images A. We get r nonzero

singular numbers, where r is the rank of the original matrix A. Because the singular values

usually fall quickly, we can take only k greatest singular values with the corresponding

singular vector coordinates and create a k-reduced singular decomposition of A.

Let us have k (0 < k < r) and singular value decomposition of A

A = U ΣV

≈ A

= (U

)

0 Σ

(2)

We call A

= U

a k-reduced singular value decomposition (rank-k SVD).

Instead of the A

matrix, a matrix of image vectors in reduced space D

= Σ

used in SVD as the representation of image collection. The image vectors (columns in

) are now represented as points in k-dimensional space (the feature-space). represent

the matrices U

, Σ

, V

864

V. Snášel et al.

Fig. 1. Rank-k SVD

Rank-k SVD is the best rank-k approximation of the original matrix A. This means

that any other decomposition will increase the approximation error, calculated as a sum

of squares (Frobenius norm) of error matrix B = A

−A

k

. However, it does not implicate

that we could not obtain better result with a different approximation.

2.2

Semi-discrete Decomposition

The SDD is one of other LSI methods, proposed recently for text retrieval in [9]. As

mentioned earlier, the rank-k SVD method (called truncated SVD by authors of semi-

discrete decomposition) produces dense matrices U and V , so the resulting required

storage may be even larger than the one needed by the original term-by-document

matrix A.

To improve the required storage size and query time, the semi-discrete decomposi-

tion was deﬁned as

≈ A

= X

(3)

where each coordinate of X

and Y

is constrained to have entries from the set

ϕ =

{−1, 0, 1}, and the matrix D

is a diagonal matrix with positive coordinates.

The SDD does not reproduce A exactly, even if k = n, but it uses very little storage

with respect to the observed accuracy of the approximation. A rank-k SDD (although

from mathematical standpoint it is a sum on rank-1 matrices) requires the storage of

k(m + n)

values from the set

{−1, 0, 1} and k scalars. The scalars need to be only

single precision because the algorithm is self-correcting. The SDD approximation is

formed iteratively.

The optimal choice of the triplets (x

, d

, y

)

for given k can be determined using

greedy algorithm, based on the residual R

= A

− A

−1

(where A

is a zero matrix).

2.3

Non-negative Matrix Factorization

The NMF [10] method calculates an approximation of the matrix A as a product of two

matrices, W and H. The matrices are usually pre-ﬁlled with random values (or H is

initialized to zero and W is randomly generated). During the calculation the values in

and H stay positive. The approximation of matrix A, matrix A

, can be calculated

as A

k

= W H

Pattern Discovery for High-Dimensional Binary Datasets

865

The original NMF method tries to minimize the Frobenius norm of the difference be-

tween A and A

using min

W,H

||V − W H||

F

as the criterion in the minimization problem.

Recently, a new method was proposed in [11], where the constrained least squares

problem min

j

{||V

− W H

− λ||H

} is the criterion in the minimization problem.

This approach is yields better results for sparse matrices.

Unlike in SVD, the base vectors are not ordered from the most general one and we

have to calculate the decomposition for each value of k separately.

Download 12.42 Mb.

Do'stlaringiz bilan baham:

1 ... 74 75 76 77 78 79 80 81 ... 88