Lecture Notes in Computer Science
Download 12.42 Mb. Pdf ko'rish
|
1 5 10 15 20 25 30 35 37 0 0.2 0.4 0.6 0.8 1
iterations n o rmalised en ergy Fig. 2. Progress of the (normalised) energy over iterations. The energy converges after 29 iterations. The algorithm requires eight consecutive iterations to detect the conver- gence and stop the segmentation process. d e r n e e r g e ul b d e r n e e r g e ul b Fig. 3. Distribution (multi-dimensional colour histograms) inside (left) and outside (right) of the final level-set contour of the zebra test image, shown in the three- dimensional feature space spanned by the three colours red, green and blue. Larger and smaller blobs indicate larger and smaller histogram values, respectively. Only colours with a contribution greater than 1% are displayed. constitutes the object to segment and the green and beige colouring of the sur- rounding steppe. Zebra images are common test images for texture based seg- mentation algorithms. Here we show that even without a description of texture the segmentation task can be successfully accomplished. Figure 1 shows the image overlaid by the initial and final level-set contours of the segmentation
970 D. Weiler and J. Eggert Fig. 4. Final contour of the llama test image from [1] achieved with the segmentation method proposed in this paper. The segmentation result shows an error rate of 1.28% mis- classified pixels based on the error measurement and ground-truth data provided in [1]. Fig. 5. Final contour of exemplary test images from the database provided in [1]. The segmentation results show an error rate of 1.63%, 0.72% and 1.43% misclassified pixels based on the error measurement and ground-truth data provided in [1] (from left to right). A preliminary evaluation of the proposed method with all 50 benchmark images (without special tuning to the database) resulted in an average error rate of 2.25%. process. On the left, the initial level-set contour, a circle centred in the middle of the image and featuring a radius of one fourth of the smallest image dimen- sion, is displayed. This initial level-set contour is commonly used to express the expectation of an object, e.g. gained by a preprocessing stage previous to the segmentation framework that focuses on salient points, like in autonomous mo- bile robotics. Figure 1, right, displays the final level-set contour that is obtained after 37 iterations of (2). The evolution of the level-set function is stopped ac- cording to the development of the value of the energy-functional (1). Figure 2 displays the progress of the values of the energy-functional over iterations. For convenience, the values are normalised to the interval [0, 1]. After 29 iterations, the energy has converged to its minimum. The algorithm needs eight consec- utive iterations to detect the convergence and stop the segmentation process. Multi-dimensional Histogram-Based Image Segmentation 971
Figure 3 displays the region descriptors for the inside and outside regions of the final level-set contour, ρ 1 (f ) and ρ 2 (f ), respectively. In the case of using the RGB colour space as the only features, the region descriptors equal the colour distribution of the object and its surrounding. In Fig. 3, left, the distribution of the colours belonging to the zebra, which is mainly composed of black and white and shades of grey, can be observed as the colours are grouped along the diagonal from black to white. The colour distribution of the outside, that mainly consists of a green and beige colouring, can be noticed in Fig. 3, right, where the colours stay in the “greenish” corner of the colour space. The second image is used in [1] to compare different state-of-the-art image segmentation methods. It was chosen to show the competitive results of the approach proposed in this paper. Figure 4 displays the final level-set contour of the segmentation process, as described in the preceding paragraph. With the ground-truth data provided in [1] and the error measurement introduced by [1] we achieve an error rate of 1.28% of misclassified pixels w.r.t. the number of initially unclassified pixels. This errore rate is comparable to the average error rate of the best performing state-of-the-art image segmentation method, which is specified by 1.36% in [1]. In Fig. 5 we show segmentation results of additional exemplary test images from the database provided in [1]. The segmentation results show an error rate of 1.63%, 0.72% and 1.43% misclassified pixels. 5 Conclusion We have presented an approach for multi-dimensional histogram-based image segmentation that is embedded in a level-set framework for two-region segmen- tation. Contrary to standard level-set methods for image segmentation we as- sumed that the features on which the segmentation is based on are part of a single feature space. In contrast to recent state-of-the-art image segmentation methods, we did not model the feature distributions based on Gaussian Mix- ture Models, but applied multi-dimensional histogram-based feature models and showed that the proposed approach yields competitive results. Furthermore no specific features (e.g. texture) were needed to achieve the presented results. A number of state-of-the-art image segmentation methods provide an alpha mask as segmentation result, that assigns each pixel in a probabilistic manner to the inside and outside region, respectively. In a level-set framework, an alpha mask is not explicitly incorporated but can be easily extraceted as a by-product by evaluating the p i (x) of (5) as α(x) = p 2 (x)/ (p
1 (x) + p
2 (x)).
References 1. Rother, C., Kolmogorov, V., Blake, A.: “GrabCut”: Interactive foreground extrac- tion using iterated graph cuts. ACM Trans. Graph. 23(3), 309–314 (2004) 2. Boykov, Y.Y., Jolly, M.P.: Interactive graph cuts for optimal boundary & region seg- mentation of objects in N-D images. In: Computer Vision, 2001. ICCV 2001. Eighth IEEE International Conference on Computer Vision, vol. 1, pp. 105–112 (2001) 972 D. Weiler and J. Eggert 3. Corel Corperation: Knockout User Guide (2002) 4. Chuang, Y.Y., Curless, B., Salesin, D., Szeliski, R.: A bayesian approach to digital matting. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 2, pp. 264–271 (2001) 5. Osher, S., Sethian, J.A.: Fronts propagating with curvature-dependent speed: Algo- rithms based on Hamilton-Jacobi formulations. J. Comput. Phys. 79, 12–49 (1988) 6. Mumford, D., Shah, J.: Optimal approximation by piecewise smooth functions and associated variational problems. Commun. Pure Appl. Math. 42, 577–685 (1989) 7. Zhu, S.C., Yuille, A.L.: Region competition: Unifying snakes, region growing, and bayes/MDL for multiband image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 18(9), 884–900 (1996) 8. Chan, T., Vese, L.: Active contours without edges. IEEE Trans. Image Pro- cess. 10(2), 266–277 (2001) 9. Kim, J., Fisher, J.W., Yezzi, A.J., C ¸ etin, M., Willsky, A.S.: Nonparametric methods for image segmentation using information theory and curve evolution. In: Interna- tional Conference on Image Processing, Rochester, New York, vol. 3, pp. 797–800 (2002)
10. Rousson, M., Deriche, R.: A variational framework for active and adaptative seg- mentation of vector valued images. In: IEEE Workshop on Motion and Video Com- puting, Orlando, Florida (2002) 11. Brox, T., Rousson, M., Deriche, R., Weickert, J.: Unsupervised segmentation in- corporating colour, texture, and motion. Computer Analysis of Images and Pat- terns 2756, 353–360 (2003) 12. Grossberg, Stephen, Hong, Simon: A neural model of surface perception: Lightness, anchoring, and filling-in. Spatial Vision 19(2-4), 263–321 (2006) 13. Parzen, E.: On the estimation of a probability density function and mode. Annals of Mathematical Statistics 33, 1065–1076 (1962) 14. Kass, M., Witkin, A., Terzopoulos, D.: Snakes: Active contour models. Interna- tional Journal for Computer Vision 1(4), 321–331 (1988) 15. Chan, T., Sandberg, B., Vese, L.: Active contours without edges for vector-valued images. J. Visual Communication Image Representation 11(2), 130–141 (2000) A Framework for Multi-view Gender Classification Jing Li and Bao-Liang Lu Department of Computer Science and Engineering, Shanghai Jiao Tong University, 800 Dong Chuan Rd., Shanghai 200240, China {jinglee,bllu}@sjtu.edu.cn Abstract. This paper proposes a novel framework for dealing with multi-view gender classification problems and shows its feasibility on the CAS-PEAL database of face images. The framework consists of three stages. First, wavelet transform is used to intensify multi-scale edges and remove effects of illumination and noises. Second, instead of traditional Euclidean distance, image Euclidean distance which considers the spa- tial relationships between pixels is used to measure the distance between images. Last, a two layer support vector machine is proposed, which divides face images into different poses in the first layer, and then rec- ognizes the gender with different support vector machines in the second layer. Compared with traditional support vector machines and min-max modular network with support vector machines, our method achieves higher classification accuracy and spends less training and test time. 1 Introduction With the increasing requirements for advanced surveillance and monitor system, gender classification based on face images has received increasing attentions in recent years [1,2,3,4,5,6,7]. Many approaches to the problem include three steps: preprocessing, feature extraction and pattern classification. The preprocessing step often includes performing geometric normalization, masking, and histogram equalization. Then the images are converted into vectors according to the gray level of pixels. Feature extraction methods include shape or texture information extractions [5], and subspace transformations, such as PCA, ICA, and LDA [8,9,10]. Pattern classification methods include k-nearest-neighbor, Fisher linear discriminant [9], neural networks [1,11,12,13], and support vector machines [2,4]. As mentioned above, one representative work is Moghaddam and Yang’s RBF- kernel SVM method based on the gray level of pixels [2], which achieved very good results on the FERET database. But, their work deals with only frontal face images. In real-world applications, we are often required to recognize gen- der based on face images with different poses. In the case of multi-view gender To whom correspondence should be addressed. This work was partially supported by the National Natural Science Foundation of China under the grant NSFC 60473040, and the Microsoft Laboratory for Intelligent Computing and Intelligent Systems of Shanghai Jiao Tong University. M. Ishikawa et al. (Eds.): ICONIP 2007, Part I, LNCS 4984, pp. 973–982, 2008. c Springer-Verlag Berlin Heidelberg 2008
974 J. Li and B.-L. Lu classification problems, Lian and Lu [5], and Luo and Lu [6] have used SVM and M 3 -SVM on face images with different poses from the CAS-PEAL database [14]. But they only used the images with the pose angle which is less than 30 degree, and their experiments have been done on each pose separately. In this paper, we propose a new framework for dealing with multi-view gender classification problem. The proposed framework has the following three advan- tages over the existing approaches. a) We propose a multi-scale edge enhance- ment (MSEE) method to strengthen the edge information and eliminate the non-edge information of face images. The reason is that edges in images are of- ten located at the boundaries of important image structures and reflect shapes, and therefore edges are more important than the non-edge areas, especially in gender classification. b) Almost the existing methods simply convert images into vectors according to the gray level of pixels. However, spatial relationships be- tween pixels are lost after this conversion. On the other hand, Wang et al. have proposed the Image Euclidean Distance (IMED) [15] which considers spatial re- lationships between pixels. They have also proved that IMED can be realized by simply using a linear transformation to images before feeding them to the classification algorithm. In our framework, we use this linear transformation to images before converting them into vectors. c) Although support vector ma- chines have achieved very good performance in gender classification, the time complexity limits its use in large-scale applications. To reduce the training time and increase the classification accuracy, we propose a layered support vector ma- chine (LSVM) to divide a complicated and large-scale problem into several easy subproblems, and then solving these subproblems with different SVM modules in different feature spaces. 2 Preprocessing 2.1 Multi-scale Edge Enhancement Suppose ψ(t) is the derivative of a smoothing function θ(t), the wavelet transform of f (x) at the scale s and position x, is defined by W s
∗ ψ s (x) (1) where ψ
s (x) =
1 s ψ x s is the dilation of the basic wavelet ψ(x) by the scale factor s. Let s = 2 j (j Z, Z is the integral set), then the WT is called dyadic WT. According to Mallat’s algorithm [16], the dyadic WT of a digital signal can be calculated iteratively by convolution with two complementary filters, the low- pass and the high-pass filters, as illustrated in Fig. 1 (a). The down-sampling step in Fig. 1 (a) removes the redundancy of the signal representation. As by products, they separate f (x) into fragments and reduce the temporal resolution of the wavelet coefficients for increasing scales. To keep the continuum and the temporal resolution at different scales, we use the same sampling rate in all scales, what is achieved by interpolating the filter impulse A Framework for Multi-view Gender Classification 975
] 2 [ 1 1 2 l f W G(z)
2 H(z)
2 … H(z)
2 G(z)
2 ] 2 [ 2 2 2 l f W ] [n f ] 2 [ 2 2 2 l f S ] 2 [ 1 1 2 l f S ] [ 1 2
f W G(z)
H(z) ] [ 2 2
f S … H(z
2 ) G(z 2 ) ] [ 2 2 n f W ] [n f ] [ 1 2
f S (a)
(b) Fig. 1. Two implementations of dyadic discrete wavelet transform. (a) Mallat’s algo- rithm; (b) algorithm ` a trous.
Fig. 2. Wavelet decomposition of a face image. The upper row from left to right: the original image and the Modulus images of (W 2 j
1 ≤j≤6
. The lower row from left to right: the multi-scale edge enhancement image and (S 2 j
1 ≤j≤6
. responses of the previous scale, as illustrated in Fig. 1 (b). This algorithm is called algorithm ` a trous [17], and the detailed decomposition step is defined by S 2
f (n) = H z 2 j −1 ∗ S
2 j −1 f (n) W 2 j f (n) = G z 2 j
∗ S 2 j −1 f (n)
(2) In this paper, we use a quadratic spline originally proposed in [18] as prototype wavelet ψ(t). The quadratic spline Fourier transform is ˆ Ψ (ω) = iω sin ω 4 ω 4 4 (3) wher the symbol ‘ˆ ’ represents the discrete Fourier transform. The corresponding wavelet transform of a face image is shown in Fig. 2, from which we can see that in each decomposition step, S 2 j
removed from edge information at corresponding scales. At small scales, such as scale 2
1 , M
2 j f contains not only the edge information, but also many noises. At large scales, such as 2 5 and 2 6 , the edge information reflected by M 2 j
meaningless. On the other hand, S 2 5 f and S 2 6 f mainly reflect the effects of illumination and non-edge information, eliminating them can remove the effects of illumination and non-edge information.
976 J. Li and B.-L. Lu Although wavelet transform can be used to extract edges from M 2 j f , the cal- culation is time consuming and need user defined parameters such as thresholds at each scale. Extracted edges are also need to be linked by some morphological operations. In this paper, our goal is to strengthen the effect of edges without too heavy calculation. So we calculate the difference DS of S 2 j f at a small scale 2 j1 and a large scale 2 j2 . DS = S 2 j1 f (x, y) − S
2 j2 f (x, y) (4) DS mainly contains the information of edges from scale 2 j1+1 to scale 2 j2 . His-
togram equalization of DS can increase the contrast and make the edges more clear. We call the histogram equalized image of DS as the multi-scale edge en- hanced image. An example image is shown in the left bottom corner in Fig. 2, from which we can see that the contour of the face is enhanced and the right part is much clearer than the original image. 2.2
Image Euclidean Distance (IMED) Traditional Euclidean distance compares the gray values of two images pixel by pixel. It dose not take into account the spatial relationships of pixels. If the images are not aligned well, the distance between them will be large even though they may be very same alike. Unlike the traditional Euclidean distance, IMED takes into account the spatial relationships of pixels. Therefore, it is robust to small perturbation of images. IMED defines the distance of two images x, y as d 2
(x, y) = (x − y)
T G (x
− y) = M N i,j=1 g ij x i − y i x j − y j (5) where g ij is the metric coefficient indicating the spatial relationships between pixels P i and P j . In this paper, g ij is defined by g ij = f ( |P i − P j |) =
1 2πσ
2 exp
− |P i − P j | 2 2σ 2 (6) where σ is the width parameter, which is set to be 1 in this paper, and |P i
j | is the spatial distance between P i and P
j on the image lattice. IMED can be embedded in classification algorithms which based on Euclidean distance by applying the following linear transformation (G 1/2 ) to the original images x and y, u = G
1/2 x, v = G 1/2 y (7) then calculating IMED between x and y can be reduced to calculating the tra- ditional Euclidean distance between u and v as follows: (x − y)
T G(x
− y) = (x − y) T G 1/2 G 1/2 (x − y)
= (u − v)
T (u − v) (8) A Framework for Multi-view Gender Classification 977
As a result, embedding IMED in a classification algorithm is to simply perform the linear transformation G 1/2 to images before feeding them to the classifica- tion algorithm. From this point of view, we treat the transformation G 1/2
as a preprocessing step before classification in this paper. 3 Layered Support Vector Machine 3.1 Algorithm Given a training set of instance-label pairs (x i , y i ), i = 1, . . . l where x i R
and y i {1, −1}, support vector machine [19] requires a solution of the following optimization problem: min
W,b,ξ 1 2 W T W + C l i=1
ξ i subject to y i W T φ(x
i ) + b
≥ 1 − ξ i , ξ i ≥ 0
(9) Here training vectors x i are mapped into a higher dimensional space by the function φ. Then SVM finds a linear separating hyperplane with the maximal margin in this higher dimensional space. C ≥ 0 is the penalty parameter of the error term. Furthermore, a kernel function can be defined according to φ. K(x i
j ) ≡ φ(x i ) T φ(x j ) (10) The great success of SVM should be contributed to the introducing of kernel functions. The nonlinear mapping input vectors into a high-dimensional feature space makes nonlinearly separable problem into linearly separable problem. The key point of using SVM for classification is to find the suitable feature space. But a single feature space may be not enough for large-scale problems, since the distributions of data are always complicated and complex in real-world applica- tions. These data may be classified more correctly in different feature spaces. In many large-scale applications, training data are always belonging to dif- ferent subproblems, and it is easier to divide them into these subproblems than to divide them into the categories with complicated hidden meanings. So we propose a layered support vector machine (LSVM). The first layer of LSVM is a support vector machine to divide the problem into several subproblems, while the second layer has several SVMs to solve these subproblems individually. The SVMs in the second layer are independent, so they can have different feature spaces as experts in different fields solving problems using different methods. When a test instance is available, the first layer decides which subproblem it belongs, and then it is classified by the corresponding ‘expert’ in the second layer. It is clear that the accuracy of the first layer SVM will influence the final accuracy. So we emphasize that the proposed LSVM is used only in circumstances that original problems can be easily divided into different subproblems, which is not a strict requirement since many large-scale problems belong to different subproblems inherently. 978 J. Li and B.-L. Lu 3.2 Complexity Analysis Theoretically, the LSVM can not only improve the classification accuracy, but also save the training and test time. The time complexity of a standard SVM QP solver is O(M 3 ), where M denotes the number of training samples. A decom- position method [20] has the complexity of O(k(M q + q 3 )), where q is the size of the working set, which is often relative with the number of support vectors, and k is the number of iterations. Of course, k is supposed to increase as M and the number of support vectors increase. The time complexity of traditional SVM can also be write as O(M p ), where p is between 2 and 3. In Layered SVM, the training data set in the first layer SVM is the same as that in traditional SVM. But the former is easier than the latter, which means less support vectors and less q, and the training time can be greatly saved. The time complexity of one SVM in the second layer is O M K
, where K is the number of SVMs in second layer, and we suppose the number of training data in each SVM are same as each other for simplicity. If we train the SVMs in second layer in parallel, the total time complexity of second layer is O M K
, which is much less than O(M p ). And if we train them in serial, the total time complexity is O K M K p , which is still less than that of traditional SVM. During the recognition phase, the main time consuming is to calculate the kernel of test vectors and support vectors in high dimension input space. So the test time complexity of traditional SVM is O(n), where n is the number of support vectors. In layered SVM, the test instance will be fed into the first layer SVM, the time complexity is O(n 1 ), where n 1 is the number of support vectors of the first layer SVM. Then the test instance will be classified by one SVM in the second layer according to the output of the first layer. The time complexity is O(n 2,i
), where n 2,i
is the number of support vectors of the ith SVM in the second layer. The average test time complexity is O(n 1 ) +
K i=1
O (n 2,i
) /K. Since n
1 and
K i=1
O (n 2,i
) /K is much less than n, the test time can be saved compared to traditional SVM. 4 Experimental Results 4.1 Experiment Setup CAS-PEAL-R1 [14] is a large-scale face database that currently contains 21,832 images of 1,040 individuals (595 males and 445 females) in its ‘pose’ subdirectory. Each individual is asked to look upwards, forward, and downwards, respectively. In each pose, 7 images are obtained from left to right, as shown in Fig. 3. In our experiments, the images are scaled according to the eye coordinates and cropped to make only the face area left. No masking template is used because we think the outlines of face are important for gender classification, while this information will be removed when using masking template. The final resolution is 60 × 48 pixels. We use 5460 images of 260 individuals as the test data set, while the A Framework for Multi-view Gender Classification 979
Fig. 3. Different poses of one individual in the CAS-PEAL-R1 database rest 16372 images of 780 individuals as the training data set. All the images of 600 individuals in the training data set are divided into 3 groups according to looking left (looking left from 22 ◦ to 90
◦ ), looking middle (from looking left at 22 ◦
◦ ), and looking right (looking right from 22 ◦ to 90
◦ ). The images of the rest 180 individuals in the training data set are grouped into the forth training data set. The detailed information of each data set is listed in Table 1.
4.2 Experiments on MSEE and IMED We perform experiments on all the data sets with different preprocessing meth- ods, which are histogram equalization on original images, histogram equalization with IMED, MSEE, and MSEE with IMED, respectively. The nearest neighbor classifier and support vector machines are used as classifiers. The classification accuracies are shown in Table 2. From this table, we can see that higher classi- fication accuracies can be achieved when the poses of training data sets are the same as that of test data sets. If the poses of training data set and the test data set are same, IMED and MSEE achieves better classification accuracies than original images, and MSEE with IMED achieves the best performance whether embedded with nearest neighbor classifier or SVMs. 4.3 Experiments on LSVM Now we carry out experiments on layered SVMs. Since the outlines of faces in different poses are distinct, it is much easier to dividing face images into dif- ferent poses than dividing them into different genders. Other researchers have also shown the high accuracies in pose classification with support vector ma- chines [1,21]. So, the first layer in our LSVM is a support vector machine which divides images into left poses, middle poses and right poses. The second layer contains 3 support vector machines for gender classification at different poses.
980 J. Li and B.-L. Lu Table 1. Number of images in each data set No. Pose
Train Test
Male Female Total Male Female Total 1 left 2112 1714 3826 927 708 1635
2 mid
2538 2403 4941 1191 999 2190 3 right 2112 1714 3826 927
708 1635 4 all 2687 1092 3779 3045 2415 5460 All all
9449 6923 16372 3045 2415 5460 Table 2. Classification accuracies (%) of nearest neighbor classifier (NN) and support vector machines (SVMs) with different preprocessing methods Train Test NN SVM
Original IMED MSEE MSEE Original IMED MSEE MSEE +IMED
+IMED 1 1 70.64 72.05 71.13
73.70 87.89 88.93 89.60 89.97
2 65.66 66.53 65.80 64.79
76.67 76.48 75.89
76.71 3 60.43 60.67 61.10 60.61
67.71 67.09 69.48
70.76 All
65.59 66.43 65.99 66.21
77.34 76.98 78.08
78.90 2 1 59.94 63.00 66.73
66.60 67.09 67.34 70.83 70.83
2 76.39 76.67 77.44 78.86
89.95 90.14 92.01
92.33 3 67.83 70.09 71.25 71.56
73.46 71.62 77.80
78.17 All
68.90 70.60 72.38
73.00 78.17 77.77 81.41 81.65
3 1 56.94 57.74 57.31 54.92
66.61 66.73 69.11
69.30 2 63.88 64.89 64.79 68.40
76.99 77.58 75.84
76.48 3 77.25 78.23 78.35 79.39
90.58 91.01 92.54
92.84 All
65.81 66.74 66.61
67.66 77.95 77.53 78.83 79.23
4 1 64.46 65.44 66.17 66.91
82.14 83.06 83.06
83.61 2 73.84 73.93 74.79 75.53
85.62 86.35 87.12
87.40 3 73.15 73.39 74.13 74.19
86.73 86.73 86.42
86.79 All
70.82 71.23 72.00
72.55 84.91 84.98 85.70 86.08
All 1 69.97 72.05 72.60 73.64
87.95 88.69 89.72
90.83 2 78.49 78.68 78.45 80.18
91.14 91.42 91.74
92.37 3 79.14 79.14 78.72 80.24
91.62 91.93 92.84
92.97 All
76.14 76.83 76.78
78.24 90.33 90.62 91.47 92.09
Table 3. Results of using SVM, M 3 -SVM and LSVM on all the training data. Here, the left column in ‘Time’ means training in serial, while the right column means training in parallel, and the unit is ‘s’. Preprocessing SVM
M 3 -SVM LSVM Acc nSV Time Acc nSV Time
Acc nSV Time
original 90.33 7887 28,554 91.06 7447 3,051 541 91.12 4814 2,131 1,119 IMED
90.62 6822 21,201 91.15 6128 2,997 520 91.03 4158 1,900 901 MSEE
91.47 6970 24,651 91.98 8473 3,843 693 92.14 6143 2,755 1,355 MSEE+IMED 92.09 7067 24,022 92.20 8777 3,164 556 92.44 4680 2,021 974
A Framework for Multi-view Gender Classification 981
RBF kernel has been chosen in each SVM, and parameters are chosen by five fold cross validation on the training data set. The results are shown in Table 3. Traditional SVM and M 3 -SVM are also used for comparison. In the M 3 -SVM,
we divide images of each class into three poses, and then 9 SVMs are trained and combined according to the min-max combination rules [22]. All the experiments are performed on a 2.8GHz Pentium 4 PC with 2GB RAM, and LIBSVM [23] is used for the implementation of SVM. We can see that LSVM can achieve the best classification accuracy among the three methods. Further more, LSVM has the least number of support vectors, which leads to the least response time. Last, the training time of LSVM is com- parable with M 3 -SVM, and is much less than traditional SVM whether training in parallel or in serial. In average, the training time that LSVMs spent is only 4.4% in parallel and 9.0% in serial of that SVM used. In experiment of LSVM, the ‘pose divider’ may misclassify some test images into the wrong poses. But we have found that these improperly divided images always have pose directions near 22 ◦ degree. Suppose an image X with a left pose direction that a little bigger than 22 ◦ is misclassified into middle pose by the ‘pose divider’. Since there are some images with left pose a little smaller than 22
◦ in the training set of middle poses, and these images can be helpful in classifying X in the correct gender. 5 Conclusions In this paper we have proposed a multi-view gender classification framework which includes three steps. First, we use the multi-scale edge enhancement method on the original images to intensify edges and eliminate illumination and non-edge information. Then, the image Euclidean distance which concerns geom- etry relationships between pixels is used to make the distance measure between images more reasonable. Last, a layered SVM which divides face images into different poses in the first layer, and then recognizes the gender with different support vector machines in the second layer is proposed to increase the classifi- cation accuracy and reduce the training and test time. The experiments on the CASPEAL face database show the effectiveness of the proposed framework. References 1. Gutta, S., Huang, J., Jonathon, P., Wechsler, H.: Mixture of experts for classi- fication of gender, ethnic origin, andpose of human faces. IEEE Transactions on Neural Networks 11(4), 948–960 (2000) 2. Moghaddam, B., Yang, M.H.: Learning gender with support faces. IEEE Transac- tions on Pattern Analysis and Machine Intelligence 24(5), 707–711 (2002) 3. Khan, A.: Combination and optimization of classifiers in gender classification using genetic programming. International Journal of Knowledge-Based and Intelligent Engineering Systems 9(1), 1–11 (2005) 4. Lian, H.C., Lu, B.L., Takikawa, E., Hosoi, S.: Gender recognition using a min-max modular support vector machine. In: Wang, L., Chen, K., S. Ong, Y. (eds.) ICNC 2005. LNCS, vol. 3611, pp. 438–441. Springer, Heidelberg (2005) 982 J. Li and B.-L. Lu 5. Lian, H.C., Lu, B.L.: Multi-view gender classification using local binary patterns and support vector machines. In: Wang, J., Yi, Z., ˙ Zurada, J.M., Lu, B.-L., Yin, H. (eds.) ISNN 2006. LNCS, vol. 3972, pp. 202–209. Springer, Heidelberg (2006) 6. Luo, J., Lu, B.L.: Gender recognition using a min-max modular support vector machine with equal clustering. In: Wang, J., Yi, Z., ˙ Zurada, J.M., Lu, B.-L., Yin, H. (eds.) ISNN 2006. LNCS, vol. 3972, pp. 210–215. Springer, Heidelberg (2006) 7. Kim, H.C., Kim, D., Ghahramani, Z., Bang, S.Y.: Appearance-based gender classifi- cation with Gaussian processes. Pattern Recognition Letters 27(6), 618–626 (2006) 8. Balci, K., Atalay, V.: PCA for gender estimation: Which eigenvectors contribute. Proceedings of Sixteenth International Conference on Pattern Recognition 3, 363– 366 (2002) 9. Jain, A., Huang, J.: Integrating independent components and linear discriminant analysis for gender classification. In: Sixth IEEE International Conference on Au- tomatic Face and Gesture Recognition, pp. 159–163 (2004) 10. OToole, A.J., Deffenbacher, K.A., Valentin, D., McKee, K., Huff, D., Abdi, H.: The perception of face gender: The role of stimulus structure in recognition and classification. Memory and Cognition 26(1), 146–160 (1998) 11. Cottrell, G.W., Metcalfe, J.: EMPATH: face, emotion, and gender recognition using holons. In: Proceedings of the 1990 conference on Advances in neural information processing systems, pp. 564–571 (1990) 12. Edelman, B., Valentin, D., Abdi, H.: Sex classification of face areas: how well can a linear neural network predict human performance. Journal of Biological Sys- tem 6(3), 241–264 (1998) 13. Golomb, B., Lawrence, D., Sejnowski, T.: SexNet: A neural network identifies sex from human faces. In: Proceedings of the 1990 conference on Advances in neural information processing, pp. 572–577 (1990) 14. Gao, W., Cao, B., Shan, S., Zhou, D., Zhang, X., Zhao, D.: The CAS-PEAL large-scale Chinese face database and baseline evaluations. Technical report of JDL (2004), http://www.jdl.ac.cn/peal/pealtr.pdf 15. Wang, L., Zhang, Y., Feng, J.: On the Euclidean distance of images. IEEE Trans- actions on Pattern Analysis and Machine Intelligence 27(8), 1334–1339 (2005) 16. Mallat, S.: Multifrequency channel decompositions of images and wavelet models. IEEE Transactions on Acoustics, Speech, and Signal Processing 37(12), 2091–2110 (1989)
17. Cohen, A., Kovacevic, J.: Wavelets: the mathematical background. Proceedings of the IEEE 84(4), 514–522 (1996) 18. Mallat, S., Zhong, S.: Characterization of signals from multiscale edges. IEEE Transactions on Pattern Analysis and Machine Intelligence 14(7), 710–732 (1992) 19. Vapnik, V.N.: Statistical learning theory. Wiley, New York (1998) 20. Platt, J.C.: Fast training of support vector machines using sequential minimal optimization. Advances in kernel methods: support vector learning, 185–208 (1999) 21. Huang, J., Shao, X., Wechsler, H.: Face pose discrimination using support vector machines (SVM). In: Fourteenth International Conference on Pattern Recognition, vol. 1, pp. 154–156 (1998) 22. Lu, B.L., Wang, K.A., Utiyama, M., Isahara, H.: A part-versus-part method for massively parallel training of support vector machines. In: 2004 IEEE International Joint Conference on Neural Networks, vol. 1, pp. 735–740 (2004) 23. Chang, C.C., Lin, C.J.: LIBSVM: a library for support vector machines (2001), http://www.csie.ntu.edu.tw/ ∼
Download 12.42 Mb. Do'stlaringiz bilan baham: |
ma'muriyatiga murojaat qiling