Lecture Notes in Computer Science

bet	86/88
Sana	16.12.2017
Hajmi	12.42 Mb.
	#22381

1 ... 80 81 82 83 84 85 86 87 88

1

5

10

15

20

25

30

35 37

0

0.2

0.4

0.6

0.8

1

iterations

rmalised

ergy

Fig. 2. Progress of the (normalised) energy over iterations. The energy converges after

29 iterations. The algorithm requires eight consecutive iterations to detect the conver-

gence and stop the segmentation process.

Fig. 3. Distribution (multi-dimensional colour histograms) inside (left) and outside

(right) of the ﬁnal level-set contour of the zebra test image, shown in the three-

dimensional feature space spanned by the three colours red, green and blue. Larger and

smaller blobs indicate larger and smaller histogram values, respectively. Only colours

with a contribution greater than 1% are displayed.

constitutes the object to segment and the green and beige colouring of the sur-

rounding steppe. Zebra images are common test images for texture based seg-

mentation algorithms. Here we show that even without a description of texture

the segmentation task can be successfully accomplished. Figure 1 shows the

image overlaid by the initial and ﬁnal level-set contours of the segmentation

970

D. Weiler and J. Eggert

Fig. 4. Final contour of the llama test image from [1] achieved with the segmentation

method proposed in this paper. The segmentation result shows an error rate of 1.28% mis-

classiﬁed pixels based on the error measurement and ground-truth data provided in [1].

Fig. 5. Final contour of exemplary test images from the database provided in [1]. The

segmentation results show an error rate of 1.63%, 0.72% and 1.43% misclassiﬁed pixels

based on the error measurement and ground-truth data provided in [1] (from left to

right). A preliminary evaluation of the proposed method with all 50 benchmark images

(without special tuning to the database) resulted in an average error rate of 2.25%.

process. On the left, the initial level-set contour, a circle centred in the middle

of the image and featuring a radius of one fourth of the smallest image dimen-

sion, is displayed. This initial level-set contour is commonly used to express the

expectation of an object, e.g. gained by a preprocessing stage previous to the

segmentation framework that focuses on salient points, like in autonomous mo-

bile robotics. Figure 1, right, displays the ﬁnal level-set contour that is obtained

after 37 iterations of (2). The evolution of the level-set function is stopped ac-

cording to the development of the value of the energy-functional (1). Figure 2

displays the progress of the values of the energy-functional over iterations. For

convenience, the values are normalised to the interval [0, 1]. After 29 iterations,

the energy has converged to its minimum. The algorithm needs eight consec-

utive iterations to detect the convergence and stop the segmentation process.

Multi-dimensional Histogram-Based Image Segmentation

971

Figure 3 displays the region descriptors for the inside and outside regions of

the ﬁnal level-set contour, ρ

(f ) and ρ

(f ), respectively. In the case of using the

RGB colour space as the only features, the region descriptors equal the colour

distribution of the object and its surrounding. In Fig. 3, left, the distribution

of the colours belonging to the zebra, which is mainly composed of black and

white and shades of grey, can be observed as the colours are grouped along the

diagonal from black to white. The colour distribution of the outside, that mainly

consists of a green and beige colouring, can be noticed in Fig. 3, right, where

the colours stay in the “greenish” corner of the colour space.

The second image is used in [1] to compare diﬀerent state-of-the-art image

segmentation methods. It was chosen to show the competitive results of the

approach proposed in this paper. Figure 4 displays the ﬁnal level-set contour

of the segmentation process, as described in the preceding paragraph. With the

ground-truth data provided in [1] and the error measurement introduced by [1]

we achieve an error rate of 1.28% of misclassiﬁed pixels w.r.t. the number of

initially unclassiﬁed pixels. This errore rate is comparable to the average error

rate of the best performing state-of-the-art image segmentation method, which

is speciﬁed by 1.36% in [1].

In Fig. 5 we show segmentation results of additional exemplary test images

from the database provided in [1]. The segmentation results show an error rate

of 1.63%, 0.72% and 1.43% misclassiﬁed pixels.

Conclusion

We have presented an approach for multi-dimensional histogram-based image

segmentation that is embedded in a level-set framework for two-region segmen-

tation. Contrary to standard level-set methods for image segmentation we as-

sumed that the features on which the segmentation is based on are part of a

single feature space. In contrast to recent state-of-the-art image segmentation

methods, we did not model the feature distributions based on Gaussian Mix-

ture Models, but applied multi-dimensional histogram-based feature models and

showed that the proposed approach yields competitive results. Furthermore no

speciﬁc features (e.g. texture) were needed to achieve the presented results.

A number of state-of-the-art image segmentation methods provide an alpha

mask as segmentation result, that assigns each pixel in a probabilistic manner

to the inside and outside region, respectively. In a level-set framework, an alpha

mask is not explicitly incorporated but can be easily extraceted as a by-product

by evaluating the p

(x) of (5) as α(x) = p

(x)/ (p

(x) + p

(x)).

References

1. Rother, C., Kolmogorov, V., Blake, A.: “GrabCut”: Interactive foreground extrac-

tion using iterated graph cuts. ACM Trans. Graph. 23(3), 309–314 (2004)

2. Boykov, Y.Y., Jolly, M.P.: Interactive graph cuts for optimal boundary & region seg-

mentation of objects in N-D images. In: Computer Vision, 2001. ICCV 2001. Eighth

IEEE International Conference on Computer Vision, vol. 1, pp. 105–112 (2001)

972

D. Weiler and J. Eggert

3. Corel Corperation: Knockout User Guide (2002)

4. Chuang, Y.Y., Curless, B., Salesin, D., Szeliski, R.: A bayesian approach to digital

matting. In: IEEE Computer Society Conference on Computer Vision and Pattern

Recognition, vol. 2, pp. 264–271 (2001)

5. Osher, S., Sethian, J.A.: Fronts propagating with curvature-dependent speed: Algo-

rithms based on Hamilton-Jacobi formulations. J. Comput. Phys. 79, 12–49 (1988)

6. Mumford, D., Shah, J.: Optimal approximation by piecewise smooth functions and

associated variational problems. Commun. Pure Appl. Math. 42, 577–685 (1989)

7. Zhu, S.C., Yuille, A.L.: Region competition: Unifying snakes, region growing, and

bayes/MDL for multiband image segmentation. IEEE Trans. Pattern Anal. Mach.

Intell. 18(9), 884–900 (1996)

8. Chan, T., Vese, L.: Active contours without edges. IEEE Trans. Image Pro-

cess. 10(2), 266–277 (2001)

9. Kim, J., Fisher, J.W., Yezzi, A.J., C

¸ etin, M., Willsky, A.S.: Nonparametric methods

for image segmentation using information theory and curve evolution. In: Interna-

tional Conference on Image Processing, Rochester, New York, vol. 3, pp. 797–800

(2002)

10. Rousson, M., Deriche, R.: A variational framework for active and adaptative seg-

mentation of vector valued images. In: IEEE Workshop on Motion and Video Com-

puting, Orlando, Florida (2002)

11. Brox, T., Rousson, M., Deriche, R., Weickert, J.: Unsupervised segmentation in-

corporating colour, texture, and motion. Computer Analysis of Images and Pat-

terns 2756, 353–360 (2003)

12. Grossberg, Stephen, Hong, Simon: A neural model of surface perception: Lightness,

anchoring, and ﬁlling-in. Spatial Vision 19(2-4), 263–321 (2006)

13. Parzen, E.: On the estimation of a probability density function and mode. Annals

of Mathematical Statistics 33, 1065–1076 (1962)

14. Kass, M., Witkin, A., Terzopoulos, D.: Snakes: Active contour models. Interna-

tional Journal for Computer Vision 1(4), 321–331 (1988)

15. Chan, T., Sandberg, B., Vese, L.: Active contours without edges for vector-valued

images. J. Visual Communication Image Representation 11(2), 130–141 (2000)

A Framework for Multi-view Gender

Classiﬁcation

Jing Li and Bao-Liang Lu

Department of Computer Science and Engineering, Shanghai Jiao Tong University,

800 Dong Chuan Rd., Shanghai 200240, China

{jinglee,bllu}@sjtu.edu.cn

Abstract. This paper proposes a novel framework for dealing with

multi-view gender classiﬁcation problems and shows its feasibility on

the CAS-PEAL database of face images. The framework consists of three

stages. First, wavelet transform is used to intensify multi-scale edges and

remove eﬀects of illumination and noises. Second, instead of traditional

Euclidean distance, image Euclidean distance which considers the spa-

tial relationships between pixels is used to measure the distance between

images. Last, a two layer support vector machine is proposed, which

divides face images into diﬀerent poses in the ﬁrst layer, and then rec-

ognizes the gender with diﬀerent support vector machines in the second

layer. Compared with traditional support vector machines and min-max

modular network with support vector machines, our method achieves

higher classiﬁcation accuracy and spends less training and test time.

Introduction

With the increasing requirements for advanced surveillance and monitor system,

gender classiﬁcation based on face images has received increasing attentions in

recent years [1,2,3,4,5,6,7]. Many approaches to the problem include three steps:

preprocessing, feature extraction and pattern classiﬁcation. The preprocessing

step often includes performing geometric normalization, masking, and histogram

equalization. Then the images are converted into vectors according to the gray

level of pixels. Feature extraction methods include shape or texture information

extractions [5], and subspace transformations, such as PCA, ICA, and LDA

[8,9,10]. Pattern classiﬁcation methods include k-nearest-neighbor, Fisher linear

discriminant [9], neural networks [1,11,12,13], and support vector machines [2,4].

As mentioned above, one representative work is Moghaddam and Yang’s RBF-

kernel SVM method based on the gray level of pixels [2], which achieved very

good results on the FERET database. But, their work deals with only frontal

face images. In real-world applications, we are often required to recognize gen-

der based on face images with diﬀerent poses. In the case of multi-view gender

To whom correspondence should be addressed. This work was partially supported by

the National Natural Science Foundation of China under the grant NSFC 60473040,

and the Microsoft Laboratory for Intelligent Computing and Intelligent Systems of

Shanghai Jiao Tong University.

M. Ishikawa et al. (Eds.): ICONIP 2007, Part I, LNCS 4984, pp. 973–982, 2008.

c Springer-Verlag Berlin Heidelberg 2008

974

J. Li and B.-L. Lu

classiﬁcation problems, Lian and Lu [5], and Luo and Lu [6] have used SVM and

-SVM on face images with diﬀerent poses from the CAS-PEAL database [14].

But they only used the images with the pose angle which is less than 30 degree,

and their experiments have been done on each pose separately.

In this paper, we propose a new framework for dealing with multi-view gender

classiﬁcation problem. The proposed framework has the following three advan-

tages over the existing approaches. a) We propose a multi-scale edge enhance-

ment (MSEE) method to strengthen the edge information and eliminate the

non-edge information of face images. The reason is that edges in images are of-

ten located at the boundaries of important image structures and reﬂect shapes,

and therefore edges are more important than the non-edge areas, especially in

gender classiﬁcation. b) Almost the existing methods simply convert images into

vectors according to the gray level of pixels. However, spatial relationships be-

tween pixels are lost after this conversion. On the other hand, Wang et al. have

proposed the Image Euclidean Distance (IMED) [15] which considers spatial re-

lationships between pixels. They have also proved that IMED can be realized

by simply using a linear transformation to images before feeding them to the

classiﬁcation algorithm. In our framework, we use this linear transformation to

images before converting them into vectors. c) Although support vector ma-

chines have achieved very good performance in gender classiﬁcation, the time

complexity limits its use in large-scale applications. To reduce the training time

and increase the classiﬁcation accuracy, we propose a layered support vector ma-

chine (LSVM) to divide a complicated and large-scale problem into several easy

subproblems, and then solving these subproblems with diﬀerent SVM modules

in diﬀerent feature spaces.

Preprocessing

2.1

Multi-scale Edge Enhancement

Suppose ψ(t) is the derivative of a smoothing function θ(t), the wavelet transform

of f (x) at the scale s and position x, is deﬁned by

s

f (x) = f

∗ ψ

(x)

(1)

where ψ

(x) =

is the dilation of the basic wavelet ψ(x) by the scale

factor s.

Let s = 2

(j Z, Z is the integral set), then the WT is called dyadic WT.

According to Mallat’s algorithm [16], the dyadic WT of a digital signal can be

calculated iteratively by convolution with two complementary ﬁlters, the low-

pass and the high-pass ﬁlters, as illustrated in Fig. 1 (a).

The down-sampling step in Fig. 1 (a) removes the redundancy of the signal

representation. As by products, they separate f (x) into fragments and reduce

the temporal resolution of the wavelet coeﬃcients for increasing scales. To keep

the continuum and the temporal resolution at diﬀerent scales, we use the same

sampling rate in all scales, what is achieved by interpolating the ﬁlter impulse

A Framework for Multi-view Gender Classiﬁcation

975

]

[

l

f

W

G(z)

H(z)

2

…

H(z)

G(z)

]

[

2

l

f

W

]

[

2

l

f

S

]

[

2

l

f

S

]

[

2

n

f

W

G(z)

H(z)

]

[

2

n

f

S

…

H(z

)

G(z

)

]

[

n

f

W

]

[

2

n

f

S

(a)

(b)

Fig. 1. Two implementations of dyadic discrete wavelet transform. (a) Mallat’s algo-

rithm; (b) algorithm `

a trous.

Fig. 2. Wavelet decomposition of a face image. The upper row from left to right: the

original image and the Modulus images of (W

j

f )

≤j≤6

. The lower row from left to

right: the multi-scale edge enhancement image and (S

j

f )

≤j≤6

responses of the previous scale, as illustrated in Fig. 1 (b). This algorithm is

called algorithm `

a trous [17], and the detailed decomposition step is deﬁned by

2

j

f (n) = H z

−1

∗ S

−1

f (n)

f (n) = G z

j

−1

∗ S

−1

f (n)

(2)

In this paper, we use a quadratic spline originally proposed in [18] as prototype

wavelet ψ(t). The quadratic spline Fourier transform is

Ψ (ω) = iω

sin

(3)

wher the symbol ‘ˆ ’ represents the discrete Fourier transform.

The corresponding wavelet transform of a face image is shown in Fig. 2, from

which we can see that in each decomposition step, S

j

f is smoothed and is

removed from edge information at corresponding scales. At small scales, such as

scale 2

, M

f contains not only the edge information, but also many noises. At

large scales, such as 2

and 2

, the edge information reﬂected by M

j

f are almost

meaningless. On the other hand, S

f and S

f mainly reﬂect the eﬀects of

illumination and non-edge information, eliminating them can remove the eﬀects

of illumination and non-edge information.

976

J. Li and B.-L. Lu

Although wavelet transform can be used to extract edges from M

f , the cal-

culation is time consuming and need user deﬁned parameters such as thresholds

at each scale. Extracted edges are also need to be linked by some morphological

operations. In this paper, our goal is to strengthen the eﬀect of edges without

too heavy calculation. So we calculate the diﬀerence DS of S

f at a small scale

and a large scale 2

DS = S

f (x, y)

− S

f (x, y)

(4)

DS mainly contains the information of edges from scale 2

j1+1

to scale 2

. His-

togram equalization of DS can increase the contrast and make the edges more

clear. We call the histogram equalized image of DS as the multi-scale edge en-

hanced image. An example image is shown in the left bottom corner in Fig. 2,

from which we can see that the contour of the face is enhanced and the right

part is much clearer than the original image.

2.2

Image Euclidean Distance (IMED)

Traditional Euclidean distance compares the gray values of two images pixel

by pixel. It dose not take into account the spatial relationships of pixels. If the

images are not aligned well, the distance between them will be large even though

they may be very same alike. Unlike the traditional Euclidean distance, IMED

takes into account the spatial relationships of pixels. Therefore, it is robust to

small perturbation of images.

IMED deﬁnes the distance of two images x, y as

2

E

(x, y) = (x

− y)

G (x

− y)

M N

i,j=1

− y

(5)

where g

is the metric coeﬃcient indicating the spatial relationships between

pixels P

and P

. In this paper, g

is deﬁned by

= f (

− P

|) =

2πσ

exp

−

− P

2σ

(6)

where σ is the width parameter, which is set to be 1 in this paper, and

i

− P

is the spatial distance between P

and P

on the image lattice.

IMED can be embedded in classiﬁcation algorithms which based on Euclidean

distance by applying the following linear transformation (G

1/2

) to the original

images x and y,

u = G

1/2

v = G

1/2

(7)

then calculating IMED between x and y can be reduced to calculating the tra-

ditional Euclidean distance between u and v as follows:

− y)

G(x

− y) = (x − y)

1/2

− y)

= (u

− v)

(8)

A Framework for Multi-view Gender Classiﬁcation

977

As a result, embedding IMED in a classiﬁcation algorithm is to simply perform

the linear transformation G

1/2

to images before feeding them to the classiﬁca-

tion algorithm. From this point of view, we treat the transformation G

1/2

as a

preprocessing step before classiﬁcation in this paper.

Layered Support Vector Machine

3.1

Algorithm

Given a training set of instance-label pairs (x

, y

), i = 1, . . . l where x

R

n

and y

{1, −1}, support vector machine [19] requires a solution of the following

optimization problem:

min

W,b,ξ

W + C

i=1

subject to

φ(x

) + b

≥ 1 − ξ

, ξ

≥ 0

(9)

Here training vectors x

are mapped into a higher dimensional space by the

function φ. Then SVM ﬁnds a linear separating hyperplane with the maximal

margin in this higher dimensional space. C

≥ 0 is the penalty parameter of the

error term. Furthermore, a kernel function can be deﬁned according to φ.

K(x

i

, x

)

≡ φ(x

)

φ(x

)

(10)

The great success of SVM should be contributed to the introducing of kernel

functions. The nonlinear mapping input vectors into a high-dimensional feature

space makes nonlinearly separable problem into linearly separable problem. The

key point of using SVM for classiﬁcation is to ﬁnd the suitable feature space.

But a single feature space may be not enough for large-scale problems, since the

distributions of data are always complicated and complex in real-world applica-

tions. These data may be classiﬁed more correctly in diﬀerent feature spaces.

In many large-scale applications, training data are always belonging to dif-

ferent subproblems, and it is easier to divide them into these subproblems than

to divide them into the categories with complicated hidden meanings. So we

propose a layered support vector machine (LSVM). The ﬁrst layer of LSVM is

a support vector machine to divide the problem into several subproblems, while

the second layer has several SVMs to solve these subproblems individually. The

SVMs in the second layer are independent, so they can have diﬀerent feature

spaces as experts in diﬀerent ﬁelds solving problems using diﬀerent methods.

When a test instance is available, the ﬁrst layer decides which subproblem it

belongs, and then it is classiﬁed by the corresponding ‘expert’ in the second

layer.

It is clear that the accuracy of the ﬁrst layer SVM will inﬂuence the ﬁnal

accuracy. So we emphasize that the proposed LSVM is used only in circumstances

that original problems can be easily divided into diﬀerent subproblems, which

is not a strict requirement since many large-scale problems belong to diﬀerent

subproblems inherently.

978

J. Li and B.-L. Lu

3.2

Complexity Analysis

Theoretically, the LSVM can not only improve the classiﬁcation accuracy, but

also save the training and test time. The time complexity of a standard SVM QP

solver is O(M

), where M denotes the number of training samples. A decom-

position method [20] has the complexity of O(k(M q + q

)), where q is the size

of the working set, which is often relative with the number of support vectors,

and k is the number of iterations. Of course, k is supposed to increase as M

and the number of support vectors increase. The time complexity of traditional

SVM can also be write as O(M

), where p is between 2 and 3.

In Layered SVM, the training data set in the ﬁrst layer SVM is the same as

that in traditional SVM. But the former is easier than the latter, which means

less support vectors and less q, and the training time can be greatly saved. The

time complexity of one SVM in the second layer is O

K

p

, where K is the

number of SVMs in second layer, and we suppose the number of training data in

each SVM are same as each other for simplicity. If we train the SVMs in second

layer in parallel, the total time complexity of second layer is O

K

p

, which is

much less than O(M

). And if we train them in serial, the total time complexity

is O K

, which is still less than that of traditional SVM.

During the recognition phase, the main time consuming is to calculate the

kernel of test vectors and support vectors in high dimension input space. So

the test time complexity of traditional SVM is O(n), where n is the number of

support vectors. In layered SVM, the test instance will be fed into the ﬁrst layer

SVM, the time complexity is O(n

), where n

is the number of support vectors

of the ﬁrst layer SVM. Then the test instance will be classiﬁed by one SVM in

the second layer according to the output of the ﬁrst layer. The time complexity

is O(n

2,i

), where n

2,i

is the number of support vectors of the ith SVM in the

second layer. The average test time complexity is O(n

) +

i=1

O (n

2,i

) /K.

Since n

and

i=1

O (n

2,i

) /K is much less than n, the test time can be saved

compared to traditional SVM.

Experimental Results

4.1

Experiment Setup

CAS-PEAL-R1 [14] is a large-scale face database that currently contains 21,832

images of 1,040 individuals (595 males and 445 females) in its ‘pose’ subdirectory.

Each individual is asked to look upwards, forward, and downwards, respectively.

In each pose, 7 images are obtained from left to right, as shown in Fig. 3. In our

experiments, the images are scaled according to the eye coordinates and cropped

to make only the face area left. No masking template is used because we think

the outlines of face are important for gender classiﬁcation, while this information

will be removed when using masking template. The ﬁnal resolution is 60

× 48

pixels. We use 5460 images of 260 individuals as the test data set, while the

A Framework for Multi-view Gender Classiﬁcation

979

Fig. 3. Diﬀerent poses of one individual in the CAS-PEAL-R1 database

rest 16372 images of 780 individuals as the training data set. All the images of

600 individuals in the training data set are divided into 3 groups according to

looking left (looking left from 22

◦

to 90

◦

), looking middle (from looking left at

◦

to looking right as 22

◦

), and looking right (looking right from 22

◦

to 90

◦

The images of the rest 180 individuals in the training data set are grouped into

the forth training data set. The detailed information of each data set is listed in

Table 1.

4.2

Experiments on MSEE and IMED

We perform experiments on all the data sets with diﬀerent preprocessing meth-

ods, which are histogram equalization on original images, histogram equalization

with IMED, MSEE, and MSEE with IMED, respectively. The nearest neighbor

classiﬁer and support vector machines are used as classiﬁers. The classiﬁcation

accuracies are shown in Table 2. From this table, we can see that higher classi-

ﬁcation accuracies can be achieved when the poses of training data sets are the

same as that of test data sets. If the poses of training data set and the test data

set are same, IMED and MSEE achieves better classiﬁcation accuracies than

original images, and MSEE with IMED achieves the best performance whether

embedded with nearest neighbor classiﬁer or SVMs.

4.3

Experiments on LSVM

Now we carry out experiments on layered SVMs. Since the outlines of faces in

diﬀerent poses are distinct, it is much easier to dividing face images into dif-

ferent poses than dividing them into diﬀerent genders. Other researchers have

also shown the high accuracies in pose classiﬁcation with support vector ma-

chines [1,21]. So, the ﬁrst layer in our LSVM is a support vector machine which

divides images into left poses, middle poses and right poses. The second layer

contains 3 support vector machines for gender classiﬁcation at diﬀerent poses.

980

J. Li and B.-L. Lu

Table 1. Number of images in each data set

No. Pose

Train

Test

Male Female Total Male Female Total

left

2112

1714 3826

927

708 1635

mid

2538

2403 4941 1191

999 2190

right 2112

1714 3826

927

708 1635

all

2687

1092 3779 3045

2415 5460

All all

9449

6923 16372 3045

2415 5460

Table 2. Classiﬁcation accuracies (%) of nearest neighbor classiﬁer (NN) and support

vector machines (SVMs) with diﬀerent preprocessing methods

Train Test

SVM

Original IMED MSEE

MSEE Original IMED MSEE

MSEE

+IMED

70.64 72.05

71.13

73.70

87.89 88.93

89.60

89.97

65.66 66.53

65.80

64.79

76.67 76.48

75.89

76.71

60.43 60.67

61.10

60.61

67.71 67.09

69.48

70.76

All

65.59 66.43 65.99

66.21

77.34 76.98

78.08

78.90

59.94 63.00

66.73

66.60

67.09 67.34

70.83

76.39 76.67

77.44

78.86

89.95 90.14

92.01

92.33

67.83 70.09

71.25

71.56

73.46 71.62

77.80

78.17

All

68.90 70.60

72.38

73.00

78.17 77.77

81.41

81.65

56.94 57.74

57.31

54.92

66.61 66.73

69.11

69.30

63.88 64.89

64.79

68.40

76.99 77.58

75.84

76.48

77.25 78.23

78.35

79.39

90.58 91.01

92.54

92.84

All

65.81 66.74

66.61

67.66

77.95 77.53

78.83

79.23

64.46 65.44

66.17

66.91

82.14 83.06

83.06

83.61

73.84 73.93

74.79

75.53

85.62 86.35

87.12

87.40

73.15 73.39

74.13

74.19

86.73 86.73

86.42

86.79

All

70.82 71.23

72.00

72.55

84.91 84.98

85.70

86.08

All

69.97 72.05

72.60

73.64

87.95 88.69

89.72

90.83

78.49 78.68

78.45

80.18

91.14 91.42

91.74

92.37

79.14 79.14

78.72

80.24

91.62 91.93

92.84

92.97

All

76.14 76.83

76.78

78.24

90.33 90.62

91.47

92.09

Table 3. Results of using SVM, M

-SVM and LSVM on all the training data. Here, the

left column in ‘Time’ means training in serial, while the right column means training

in parallel, and the unit is ‘s’.

Preprocessing

SVM

-SVM

LSVM

Acc nSV Time

Acc nSV

Time

Acc nSV

Time

original

90.33 7887 28,554

91.06 7447 3,051 541 91.12 4814 2,131 1,119

IMED

90.62 6822 21,201 91.15 6128 2,997 520

91.03 4158 1,900

901

MSEE

91.47 6970 24,651

91.98 8473 3,843 693 92.14 6143 2,755 1,355

MSEE+IMED 92.09 7067 24,022

92.20 8777 3,164 556 92.44 4680 2,021

974

A Framework for Multi-view Gender Classiﬁcation

981

RBF kernel has been chosen in each SVM, and parameters are chosen by ﬁve

fold cross validation on the training data set. The results are shown in Table 3.

Traditional SVM and M

-SVM are also used for comparison. In the M

-SVM,

we divide images of each class into three poses, and then 9 SVMs are trained and

combined according to the min-max combination rules [22]. All the experiments

are performed on a 2.8GHz Pentium 4 PC with 2GB RAM, and LIBSVM [23]

is used for the implementation of SVM.

We can see that LSVM can achieve the best classiﬁcation accuracy among the

three methods. Further more, LSVM has the least number of support vectors,

which leads to the least response time. Last, the training time of LSVM is com-

parable with M

-SVM, and is much less than traditional SVM whether training

in parallel or in serial. In average, the training time that LSVMs spent is only

4.4% in parallel and 9.0% in serial of that SVM used.

In experiment of LSVM, the ‘pose divider’ may misclassify some test images

into the wrong poses. But we have found that these improperly divided images

always have pose directions near 22

◦

degree. Suppose an image X with a left

pose direction that a little bigger than 22

◦

is misclassiﬁed into middle pose by

the ‘pose divider’. Since there are some images with left pose a little smaller

than 22

◦

in the training set of middle poses, and these images can be helpful in

classifying X in the correct gender.

Conclusions

In this paper we have proposed a multi-view gender classiﬁcation framework

which includes three steps. First, we use the multi-scale edge enhancement

method on the original images to intensify edges and eliminate illumination and

non-edge information. Then, the image Euclidean distance which concerns geom-

etry relationships between pixels is used to make the distance measure between

images more reasonable. Last, a layered SVM which divides face images into

diﬀerent poses in the ﬁrst layer, and then recognizes the gender with diﬀerent

support vector machines in the second layer is proposed to increase the classiﬁ-

cation accuracy and reduce the training and test time. The experiments on the

CASPEAL face database show the eﬀectiveness of the proposed framework.

References

1. Gutta, S., Huang, J., Jonathon, P., Wechsler, H.: Mixture of experts for classi-

ﬁcation of gender, ethnic origin, andpose of human faces. IEEE Transactions on

Neural Networks 11(4), 948–960 (2000)

2. Moghaddam, B., Yang, M.H.: Learning gender with support faces. IEEE Transac-

tions on Pattern Analysis and Machine Intelligence 24(5), 707–711 (2002)

3. Khan, A.: Combination and optimization of classiﬁers in gender classiﬁcation using

genetic programming. International Journal of Knowledge-Based and Intelligent

Engineering Systems 9(1), 1–11 (2005)

4. Lian, H.C., Lu, B.L., Takikawa, E., Hosoi, S.: Gender recognition using a min-max

modular support vector machine. In: Wang, L., Chen, K., S. Ong, Y. (eds.) ICNC

2005. LNCS, vol. 3611, pp. 438–441. Springer, Heidelberg (2005)

982

J. Li and B.-L. Lu

5. Lian, H.C., Lu, B.L.: Multi-view gender classiﬁcation using local binary patterns

and support vector machines. In: Wang, J., Yi, Z., ˙

Zurada, J.M., Lu, B.-L., Yin,

H. (eds.) ISNN 2006. LNCS, vol. 3972, pp. 202–209. Springer, Heidelberg (2006)

6. Luo, J., Lu, B.L.: Gender recognition using a min-max modular support vector

machine with equal clustering. In: Wang, J., Yi, Z., ˙

Zurada, J.M., Lu, B.-L., Yin,

H. (eds.) ISNN 2006. LNCS, vol. 3972, pp. 210–215. Springer, Heidelberg (2006)

7. Kim, H.C., Kim, D., Ghahramani, Z., Bang, S.Y.: Appearance-based gender classiﬁ-

cation with Gaussian processes. Pattern Recognition Letters 27(6), 618–626 (2006)

8. Balci, K., Atalay, V.: PCA for gender estimation: Which eigenvectors contribute.

Proceedings of Sixteenth International Conference on Pattern Recognition 3, 363–

366 (2002)

9. Jain, A., Huang, J.: Integrating independent components and linear discriminant

analysis for gender classiﬁcation. In: Sixth IEEE International Conference on Au-

tomatic Face and Gesture Recognition, pp. 159–163 (2004)

10. OToole, A.J., Deﬀenbacher, K.A., Valentin, D., McKee, K., Huﬀ, D., Abdi, H.:

The perception of face gender: The role of stimulus structure in recognition and

classiﬁcation. Memory and Cognition 26(1), 146–160 (1998)

11. Cottrell, G.W., Metcalfe, J.: EMPATH: face, emotion, and gender recognition using

holons. In: Proceedings of the 1990 conference on Advances in neural information

processing systems, pp. 564–571 (1990)

12. Edelman, B., Valentin, D., Abdi, H.: Sex classiﬁcation of face areas: how well can

a linear neural network predict human performance. Journal of Biological Sys-

tem 6(3), 241–264 (1998)

13. Golomb, B., Lawrence, D., Sejnowski, T.: SexNet: A neural network identiﬁes sex

from human faces. In: Proceedings of the 1990 conference on Advances in neural

information processing, pp. 572–577 (1990)

14. Gao, W., Cao, B., Shan, S., Zhou, D., Zhang, X., Zhao, D.: The CAS-PEAL

large-scale Chinese face database and baseline evaluations. Technical report of

JDL (2004), http://www.jdl.ac.cn/peal/pealtr.pdf

15. Wang, L., Zhang, Y., Feng, J.: On the Euclidean distance of images. IEEE Trans-

actions on Pattern Analysis and Machine Intelligence 27(8), 1334–1339 (2005)

16. Mallat, S.: Multifrequency channel decompositions of images and wavelet models.

IEEE Transactions on Acoustics, Speech, and Signal Processing 37(12), 2091–2110

(1989)

17. Cohen, A., Kovacevic, J.: Wavelets: the mathematical background. Proceedings of

the IEEE 84(4), 514–522 (1996)

18. Mallat, S., Zhong, S.: Characterization of signals from multiscale edges. IEEE

Transactions on Pattern Analysis and Machine Intelligence 14(7), 710–732 (1992)

19. Vapnik, V.N.: Statistical learning theory. Wiley, New York (1998)

20. Platt, J.C.: Fast training of support vector machines using sequential minimal

optimization. Advances in kernel methods: support vector learning, 185–208 (1999)

21. Huang, J., Shao, X., Wechsler, H.: Face pose discrimination using support vector

machines (SVM). In: Fourteenth International Conference on Pattern Recognition,

vol. 1, pp. 154–156 (1998)

22. Lu, B.L., Wang, K.A., Utiyama, M., Isahara, H.: A part-versus-part method for

massively parallel training of support vector machines. In: 2004 IEEE International

Joint Conference on Neural Networks, vol. 1, pp. 735–740 (2004)

23. Chang, C.C., Lin, C.J.: LIBSVM: a library for support vector machines (2001),

http://www.csie.ntu.edu.tw/

∼

cjlin/libsvm

Download 12.42 Mb.

Do'stlaringiz bilan baham:

1 ... 80 81 82 83 84 85 86 87 88