Lecture Notes in Computer Science
Download 12.42 Mb. Pdf ko'rish
|
- Bu sahifa navigatsiya:
- (n = 1, 2, 3 ...) Common kernel K (
- Pattern Discovery for High-Dimensional Binary Datasets
t)
c(t) image Feature / Kernel Class map HOMS feature φ
Bispectrum, Trispectrum ... HOM feature ψ
(n = 1, 2, 3 ...) Common kernel K(s , v) =
< φ , φ > = <
ψ , ψ > Fig. 1. The LHOM(S) kernel unifies the treatment of higher-order moment features and higher-order moment spectra features of a signal This corollary shows that LHOMS and LHOM kernels are proportional, which is clear from the fact that LHOMS is a Fourier transform of LHOM as mentioned in Eq. (4). Therefore, this kernel function introduces a unified view of the kernel- based pattern recognition using moments and moment spectra, as shown in Fig. 1.
5 Texture Classification Experiments 5.1 Experimental Conditions The SVM with the LHOM(S) kernel was tested in two-class texture classification problems. Image luminance was preprocessed to be scaled to the range of [0, 1]. The training sets were obtained from random positions of the original texture images. All the experiments used 20 training examples per class, each sized 32 × 32 pixels. The kernel types used in the experiments were the following. 1. Polynomial kernel of Eq. (9), for (d = 1, . . . , 5). 2. RBF kernel kernel of Eq. (10), for (γ = 1, 10, 100). 3. LHOM kernel in their discrete versions [6] with isotropic Gaussian windows. Order n and window width σ were varied as, (n = 2, 3, 4, 5) and (σ = 4, 8, 12). When testing the SVM after training, images with tiled texture patterns were used as shown in Figs. 2(c) and 3(c). The classification rates were calculated for the center regions of the test images allowing edge margins of 16 pixel width. The output class maps will be presented as the linear output u in Eq. (8). No post-smoothing of the class maps has been used. 5.2
Sinusoidal Wave with Harmonic Components Two texture classes generated by a computer according to two formulas, s A
T 0 x) + sin(2ω T 0 x) + sin(3ω T 0 x) and s B (x) = sin(ω T 0 x) + sin(2ω T 0 x + φ) + sin(3ω T 0 x). Comparison of LHOM Kernel and Conventional Kernels in SVM 857
(a) (b)
(c) Fig. 2. Computer generated textures including fundamental, second and third har- monic frequency components. (a) Class 1, (b) Class 2, and (c) tiled collage for testing. (a)
(b) (c)
Fig. 3. Natural texture images selected from the Vision Texture collection. (a) Fabric17 texture, (b) Fabric18 texture and (c) tiled collage for testing. Here, the fundamental frequency vector was set to ω 0 = [π/6, π/6] T . Phase shift φ = π/2 in the second harmonic component of s B is the only difference in the two textures. Although the two classes cannot be discriminated using the local power spectrum, they should be easily classified using the phase information, extracted by the features of higher orders. The classification results in Table 1 show that the training was successful for RBF and LHOM kernels (n = 3, 4, 5) only as expected. The higher test rates for the LHOMS kernels is partly due to the clear discrimination of the classes within the feature space, indicated by the small number of support vectors without using soft margins. 5.3 Fabric Texture The textures in Fig. 3 were selected from the fabric textures of the Vision Texture collection [9]. The results are shown in Table 2. Tendencies similar to the previous experiment are observed showing the superiority of the LHOMS kernel. Fig. 5(a) indicates that the unthresholded output of the SVM with RBF kernels have a very small inter-class variance indicating the instability of the classifier. Use of soft margins resulted in a small improvement for the SVMs with LHOM kernels. 858 K. Kameyama Table 1. Trainability, number of support vectors and the test rate of the SVMs using various kernels for the computer-generated texture 84 % 8 (20%)
Yes LHOM (n = 5) 82 % 3 (7.5%)
Yes LHOM (n = 4) 83 % 2 (5%)
Yes LHOM (n = 3) - -
LHOM (n = 2) 80 %
39 (97.5%) Yes
RBF ( γ =10) ( σ = 8) ( σ = 8) ( σ = 8) ( σ = 8) No - - Polynomial (d = 1, .., 5) Test rate SV number (ratio to TS) Trained ?
Kernel Type (a) (c)
(b) (d)
Fig. 4. SVM linear outputs and classification rates for the test image of the computer- generated texture using various types of kernels. (a) RBF(80%), (b) LHOM (n = 3) (83%), (c) LHOM (n = 4)(82%) and (d) LHOM (n = 5)(84%).
Comparison of LHOM Kernel and Conventional Kernels in SVM 859
Table 2. Trainability, number of support vectors and the test rate of the SVMs using various kernels for the natural texture 67 % / 69 % 37 / 40
Yes LHOM (n = 5) 75 % / 75 % 30 / 35
Yes LHOM (n = 4) 73 % / 75 % 27 / 40
Yes LHOM (n = 3) 69 % / 79 % 5 / 40
Yes LHOM (n = 2) 65 % / 65 % 40 / 40
Yes 50 % / 50 % 40 / 40 Partly
(at d=1only) Polynomial (d = 1, .., 5) Test rate (No SM / Use SM) SV number (No SM / Use SM) Trained ? Kernel Type RBF (
γ =10)
( σ = 8) ( σ = 8) ( σ = 8) ( σ = 8) (a) (c)
(b) (d)
Fig. 5. SVM linear outputs and classification rates for the test image of the natural texture using various types of kernels. (a) RBF(65%), (b) LHOM (n = 2)(69%), (c) LHOM (n = 3)(73%) and (d) LHOM (n = 4)(75 %).
860 K. Kameyama 5.4 Discussion The test using the tiled images evaluates the combination of the raw classification rate and the spatial resolution. Further investigation separating the two factors are necessary. When non-tiled (pure) test images of the fabric texture were used, SVMs with kernels of RBF (γ = 10), LHOMS (σ = 8) for n = 2, 3, 4 and 5 achieved 90%, 95%, 90%, 95% and 80%, respectively. In [1], classifiers using LHOM features of orders up to 3 is reported to have achieved test rates over 96% for 30 classes of natural textures. Although this work does not give a direct comparison, the results supports the suitability of using LHOM(S) in natural texture classification, 6 Conclusion Texture classification using SVMs with LHOM kernel and other conventional kernel functions were compared. It became clear that the SVMs with LHOM kernels achieve better trainability and give stable response to the texture classes when compared with those with conventional kernels. Also, the number of sup- port vectors were lower which indicates a better class separability in the feature space.
References 1. Kurita, T., Otsu, N.: Texture classification by higher order local autocorrelation features. In: Proc. of Asian Conf. on Computer Vision (ACCV 1993), pp. 175–178 (1993)
2. Cortes, C., Vapnik, V.: Support-vector networks. Machine Learning 20(3), 273–297 (1995)
3. Vapnik, V.N.: The Nature of Statistical Learning Theory, 2nd edn. Springer, Hei- delberg (2000) 4. Popovici, V., Thiran, J.P.: Higher order autocorrelations for pattern classification. In: Proceedings of International Conference on Image Processing 2001, pp. 724–727 (2001) 5. Popovici, V., Thiran, J.P.: Pattern recognition using higher-order local autocorrela- tion coefficients. In: Neural Networks for Signal Processing XII (NNSP), pp. 229–238 (2002)
6. Kameyama, K., Taga, K.: Texture classification by support vector machines with kernels for higher-order gabor filtering. In: Proceedings of International Joint Con- ference on Neural Networks 2004, vol. 4, pp. 3009–3014 (2004) 7. MacLaughlin, J.A., Raviv, J.: N-th autocorrelations in pattern recognition. Infor- mation and Control 12(2), 121–142 (1968) 8. Umegaki, H.: Basic Information Mathematics - Development via Functional Analysis - (in Japanese). Saiensu-sha (1993) 9. MIT Vision and Modeling Group: Vision texture (1995) Pattern Discovery for High-Dimensional Binary Datasets Václav Snášel 1 , Pavel Moravec 1 , Dušan Húsek 2 , Alexander Frolov 3 , Hana ˇ Rezanková 4 , and Pavel Polyakov 5 1 Department of Computer Science, FEECS, VŠB – Technical University of Ostrava, 17. listopadu 15, 708 33 Ostrava-Poruba, Czech Republic {pavel.moravec,vaclav.snasel}@vsb.cz 2 Institute of Computer Science, Dept. of Nonlinear Systems, Academy of Sciences of the Czech Republic, Pod Vodárenskou vˇeží 2, 182 07 Prague, Czech Republic dusan@cs.cas.cz 3 Institute of Higher Nervous Activity and Neurophysiology, Russian Academy of Sciences, Butlerova 5a, 117 485 Moscow, Russia aafrolov@mail.ru 4 Department of Statistics and Probability, University of Economics, Prague, W. Churchill sq. 4, 130 67 Prague, Czech Republic rezanka@vse.cz 5 Institute of Optical Neural Technologies, Russian Academy of Sciences, Vavilova 44, 119 333 Moscow, Russia pavel.mipt@mail.ru Abstract. In this paper we compare the performance of several dimension reduc- tion techniques which are used as a tool for feature extraction. The tested methods include singular value decomposition, semi-discrete decomposition, non-negative matrix factorization, novel neural network based algorithm for Boolean factor analysis and two cluster analysis methods as well. So called bars problem is used as the benchmark. Set of artificial signals generated as a Boolean sum of given number of bars is analyzed by these methods. Resulting images show that Boolean factor analysis is upmost suitable method for this kind of data. 1 Introduction In order to perform object recognition (no matter which one) it is necessary to learn rep- resentations of the underlying characteristic components. Such components correspond to object-parts, or features. These data sets may comprise discrete attributes, such as those from market basket analysis, information retrieval, and bioinformatics, as well as continuous attributes such as those in scientific simulations, astrophysical measure- ments, and sensor networks. Feature extraction from high-dimensional data typically consists in correlation analysis, clustering including finding efficient representations for clustered data, data classification, and event association. The objective is discovering meaningful M. Ishikawa et al. (Eds.): ICONIP 2007, Part I, LNCS 4984, pp. 861–872, 2008. c Springer-Verlag Berlin Heidelberg 2008 862 V. Snášel et al. information in data to be able to represent them appropriately and if possible in a space of lower dimension. The feature extraction if applied on binary datasets, addresses many research and application fields, such as association rule mining [1], market basket analysis [2], dis- covery of regulation patterns in DNA microarray experiments [3], etc. Many of these problem areas have been described in tests of PROXIMUS framework (e.g. [4]). The feature extraction methods can use different aspects of images as the features. Such methods are either using a heuristics based on the known properties of the image collection, or are fully automatic and may use the original image vectors as an input. Here we will concentrate on the case of black and white pictures of bars combinations represented as binary vectors, so the complex feature extraction methods are unneces- sary. Values of the entries of this vector represent individual pixels i.e. 0 for white and 1 for black. There are many attempts that could be used for this reason. This article reports on an empirical investigation of the performance of some of them. For the sake of simplicity we use the well-known bars problem (see e.g. [5]) where we try to iso- late separate horizontal and vertical bars from images containing their combinations. Here we will concentrate on the category which use dimension reduction techniques for automatic feature extraction. We will compare results of the most up to date procedures. One of such methods is the singular value decomposition which was already many times successfully used for automatic feature extraction. In case of bars collection (such as our test data), the base vectors can be interpreted as images, describing some com- mon characteristics of several input signals. However singular value decomposition is not suitable for huge collections and is computationally expensive, so other methods of dimension reduction were proposed. Here we use the semi-discrete decomposition. Because the data matrix does have all elements non-negative, we tried to apply a new method, called non-negative matrix factorization as well. The question, how does the brain form a useful representation of its environment, was behind the development of neural network based methods for dimension reduction,
developed by Földiák [5]. Here we applied the neural network based Boolean factor analysis developed by us only, because we have not the optimal sparse coding network algorithm for our disposal at this moment. However for the first view on image structures, we can apply traditional statistical methods, mainly different algorithms for statistical cluster analysis. Analysis of discrete data sets, however, generally leads to NP-complete/hard prob- lems, especially when physically interpretable results in discrete spaces are desired. The rest of this paper is organized as follows. The second section explains the di- mension reduction method used in this study. Then in the section three we describe experimental results, and finally in the section four we made some conclusions. Pattern Discovery for High-Dimensional Binary Datasets 863
2 Dimension Reduction We used four promising methods of dimension reduction for our comparison – Singular Value Decomposition (SVD), Semi-Discrete Decomposition (SDD), Non-negative Matrix Factorization (NMF) and Neural network Boolean Factor Analysis (NBFA). For the first analysis we used two statistical clustering methods, Hierarchical Agglomerative Algorithm (HAA) and Two-Step Cluster Analysis (TSCA). All methods are briefly described bellow. 2.1 Singular Value Decomposition The SVD [8] is an algebraic extension of classical vector model. It is similar to the PCA method, which was originally used for the generation of eigenfaces in image retrieval. Informally, SVD discovers significant properties and represents the images as linear combinations of the base vectors. Moreover, the base vectors are ordered according to their significance for the reconstructed image, which allows us to consider only the first k base vectors as important (the remaining ones are interpreted as "noise" and discarded). Furthermore, SVD is often referred to as more successful in recall when compared to querying whole image vectors [8]. Formally, we decompose the matrix of images A by SVD, calculating singular values and singular vectors of A. We have matrix A, which is an n × m rank-r matrix (where m ≥ n without loss of generality) and values σ 1 , . . . , σ r are calculated from eigenvalues of matrix AA T as σ
i = √ λ i . Based on them, we can calculate column-orthonormal matrices U = (u 1 , . . . , u n ) and V = (v 1 , . . . , v n )
T U = I
n a V
T V = I
m , and a diagonal matrix Σ = diag(σ 1 , . . . , σ n ) , where σ i > 0
for i ≤ r, σ
i ≥ σ
i+1 and σ
r+1 = . . . = σ n
. The decomposition A = U ΣV T (1) is called singular decomposition of matrix A and the numbers σ 1 , . . . , σ r are singular values of the matrix A. Columns of U (or V ) are called left (or right) singular vectors of matrix A. Now we have a decomposition of the original matrix of images A. We get r nonzero singular numbers, where r is the rank of the original matrix A. Because the singular values usually fall quickly, we can take only k greatest singular values with the corresponding singular vector coordinates and create a k-reduced singular decomposition of A. Let us have k (0 < k < r) and singular value decomposition of A A = U ΣV
T ≈ A
k = (U
k U 0 ) Σ k 0 0 Σ
0 V T k V T 0 (2)
We call A k = U k Σ k V T k a k-reduced singular value decomposition (rank-k SVD). Instead of the A k matrix, a matrix of image vectors in reduced space D k = Σ
k V T k is used in SVD as the representation of image collection. The image vectors (columns in D k ) are now represented as points in k-dimensional space (the feature-space). represent the matrices U k , Σ k , V
T k . 864 V. Snášel et al. Fig. 1. Rank-k SVD Rank-k SVD is the best rank-k approximation of the original matrix A. This means that any other decomposition will increase the approximation error, calculated as a sum of squares (Frobenius norm) of error matrix B = A −A k
that we could not obtain better result with a different approximation. 2.2 Semi-discrete Decomposition The SDD is one of other LSI methods, proposed recently for text retrieval in [9]. As mentioned earlier, the rank-k SVD method (called truncated SVD by authors of semi- discrete decomposition) produces dense matrices U and V , so the resulting required storage may be even larger than the one needed by the original term-by-document matrix A. To improve the required storage size and query time, the semi-discrete decomposi- tion was defined as A ≈ A
k = X
k D k Y T k , (3)
where each coordinate of X k and Y k is constrained to have entries from the set ϕ = {−1, 0, 1}, and the matrix D k is a diagonal matrix with positive coordinates. The SDD does not reproduce A exactly, even if k = n, but it uses very little storage with respect to the observed accuracy of the approximation. A rank-k SDD (although from mathematical standpoint it is a sum on rank-1 matrices) requires the storage of k(m + n)
values from the set {−1, 0, 1} and k scalars. The scalars need to be only single precision because the algorithm is self-correcting. The SDD approximation is formed iteratively. The optimal choice of the triplets (x i , d i , y
i ) for given k can be determined using greedy algorithm, based on the residual R k = A − A k −1 (where A 0 is a zero matrix). 2.3 Non-negative Matrix Factorization The NMF [10] method calculates an approximation of the matrix A as a product of two matrices, W and H. The matrices are usually pre-filled with random values (or H is initialized to zero and W is randomly generated). During the calculation the values in W and H stay positive. The approximation of matrix A, matrix A k , can be calculated as A k
. Pattern Discovery for High-Dimensional Binary Datasets 865
The original NMF method tries to minimize the Frobenius norm of the difference be- tween A and A k using min W,H ||V − W H|| 2 F
Recently, a new method was proposed in [11], where the constrained least squares problem min H j
j − W H
j || 2 − λ||H j || 2 2 } is the criterion in the minimization problem. This approach is yields better results for sparse matrices. Unlike in SVD, the base vectors are not ordered from the most general one and we have to calculate the decomposition for each value of k separately.
Download 12.42 Mb. Do'stlaringiz bilan baham: |
ma'muriyatiga murojaat qiling