Lecture Notes in Computer Science

bet	35/88
Sana	16.12.2017
Hajmi	12.42 Mb.
	#22381

1 ... 31 32 33 34 35 36 37 38 ... 88

Step 5)

We perform backtracking. That is, the lastly deleted double/single of

features is restored here with associated all components.

Step 6)

If single deletion process has not been finished then attempt to delete feat-

ure single at a time using step 4

to filter the unnecessary ones, otherwise go to step

. If

SC mentioned in subsection 2.1.2.3 is satisfied, then continue. Otherwise, go to step 5

and then step

Feature Subset Selection Using Constructive Neural Nets

379

Step 7)

Before going to select the final subset, we again check through the

existing features whether any irrelevant feature presents or not. The following steps

are taken to accomplish this task.

Delete the existing features from the network single in each step in

accordance with worst rank, and then again retrain.

ii)

Validate the existing features by using test set. For the next stage, save

the classification accuracy A

C

′′

, the responsible deleted feature, and all

its components.

iii)

If DF<(TF-1) then continue the deletion process. Otherwise, go to the

next step. Here, DF and TF refer to the number of deleted feature and

total number of feature, respectively.

iv)

Compare the values of A

′′

according to

CA

A

C

′′

If better A

C

′′

are available, then identify the higher rank of feature among

them. Otherwise, stop. Delete the higher ranked feature with

corresponding worst ranked ones from the subset that obtained at step 7-i.

After that, recall the components to the current network that was obtained

at step 7-ii and stop.

Step 8)

Finally, we achieve the relevant feature subset with compact network.

3 Experimental Analysis

This section evaluates CFSS’s performance on four well-known benchmark problems

such as Breast cancer (BCR), Diabetes (DBT), Glass (GLS), and Iris (IRS) problems

which are obtained from [13]. Input attributes and output classes of BCR are 9 and 2,

8 and 2 of DBT, 9 and 6 of GLS, 4 and 3 of IRS, respectively.

All datasets are partitioned into three sets: a training set, a validation set, and a test

set. The first 50% is used as training set for training the network, and the second 25%

as validation set to check the condition of training and stop it when the overall

training error goes high. The last 25% is used as test set to evaluate the generalization

performance of the network and is not seen by the network during the whole training

process.

In all experiments, two bias nodes with a fixed input +1 are connected to the

hidden layer and output layer. The learning rate

is set between [0.1,0.3] and the

weights are initialized to random values between [+1.0, -1.0]. Each experiment was

carried out 10 times and the presented results were the average of these 10 runs. The

all experiments were done by Pentium-IV, 3 GHz desktop personal computer.

3.1 Experimental Results

Table 1 shows the average results of CFSS where the number of selected features,

classification accuracies and so on are incorporated for BCR, DBT, GLS, and IRS

problems in before and after feature selection process. For clarification, in this table,

we measured the training error by the section 2.5 of [12], and classification accuracy

CA is the ratio of classified examples to the total examples of the particular data set.

380

M.M. Kabir, M. Shahjahan, and K. Murase

-20

-10

Attribute

ttr

ibute

ontr

ibution

(%)

Subset Size

Classif

ication

ccur

acy

(%)

Fig. 2. Contribution of attributes in breast

cancer problem for single run

Fig. 3. Performance of the network according to

the size of subset in diabetes problem for single

run

Table 1. Results of four problems such as BCR, DBT, GLS, and IRS. Numbers in () are the

standard deviations. Here, TERR, CA, FS, HN, ITN, and CTN refer to training error,

classification accuracy, feature selection, hidden neuron, iteration, and connection respectively.

Before FS

After FS

Name

Feature

TERR

(%)

Feature

TERR

(%)

HN# ITN# CTN#

BCR 9

(0.00)

3.71

(0.002)

98.34

(0.31)

3.2

(0.56)

3.76

(0.003)

98.57

(0.52)

2.60 249.4 18.12

DBT 8

(0.00)

16.69

(0.006)

75.04

(2.06)

2.5

(0.52)

14.37

(0.024)

76.13

(1.74)

2.66 271.3 16.63

GLS 9

(0.00)

26.44

(0.026)

64.71

(1.70)

4.1

(1.19)

26.56

(0.028)

66.62

(2.18)

3.3 1480.9 42.63

IRS 4

(0.00)

4.69

(0.013)

97.29

(0.00)

1.3

(0.48)

8.97

(0.022)

97.29

(0.00)

3.2

2317.1 19.96

Table 2. Results of computational time for

BCR, DBT, GLS, and IRS problems

Name Computational

Time

(Second)

BCR 22.10

DBT 30.07

GLS 22.30

IRS 7.90

Table 3. Results of connection reduction and

accuracy increase for BCR, DBT, GLS, and

IRS problems

Name Connection

Decrease (%)

Accuracy

Increase (%)

BCR 45.42

0.23

DBT 46.80

1.45

GLS 27.5

2.95

IRS 30.2

0.00

In each experiment, CFSS generates a subset having the minimum number of

relevant features is available. Thereupon, CFSS produces better CA with a smaller

number of hidden neurons HN. It is shown in Table 1 that, for example, in BCR, the

network easily selects a minimum number of relevant features subset i.e. 3.2 which

occurs 98.57% CA while the network used only 2.6 of hidden neurons to build the

minimal network structure. In contrast, in case of remaining problems like DBT,

GLS, and IRS, the results of those attributes are nearly similar as BCR except IRS.

This is because, for IRS problem, the network cannot produce better accuracy any

Feature Subset Selection Using Constructive Neural Nets

381

more after feature selection rather it able to generate 1.3 relevant features subset

among 4 attributes which is sufficient to exhibit the same network performance.

In addition, Fig. 2 exhibits the arrangement of attributes during training according

to their contribution in BCR. CFSS can easily delete the unnecessary ones and

generate the subset {6,7,1}. In contrast, Fig. 3 shows the relationship between

classification accuracy and subset size for DBT. We calculated the network

connection of the pruned network at the final stage according to section 2.3 and the

results are shown in Table 1. The computational time for completing the entire FSS

process is exhibited in Table 2. After that, we estimated the connection decrement of

the network corresponding to accuracy increment due to FSS as shown in Table 3

according to 2.3.1 and 2.3.2. We can see here the relation between the reduction of

the network connection, and the increment of accuracy.

3.2 Comparison with other Methods

In this section, we compare the results of CFSS to those obtained by other methods

NNFS and ICFS reported in [4] and [6], respectively. The results are summarized in

Tables 4-7. Prudence should be considered because different technique has been

involved in their methods for feature selection.

Table 4. Comparison on the number of

relevant features for BCR, DBT, GLS, and

IRS data sets

Name

CFSS NNFS ICFS

(M 1)

ICFS

(M 2)

BCR 3.2 2.70 5

DBT 2.5 2.03 2

GLS 4.1

5 4

IRS 1.3 -

- -

Table 5. Comparison on the average testing

CA (%) for BCR, DBT, GLS, and IRS data

sets

Name CFSS NNFS ICFS

(M 1)

ICFS

(M 2)

BCR 98.57 94.10 98.25 98.25

DBT 76.13 74.30 78.88 78.70

GLS

66.62 - 63.77

66.61

IRS 97.29 -

Table 6. Comparison on the average number

of hidden neurons for BCR, DBT, GLS, and

IRS data sets

Name CFSS NNFS ICFS

(M 1)

ICFS

(M 2)

BCR 2.60 12 33.55

42.05

DBT 2.66 12 8.15 21.45

GLS 3.3

- 62.5

53.95

IRS 3.2 -

- -

Table 7. Comparison on the average number

of connections for BCR, DBT, GLS, and IRS

data sets

Name CFSS NNFS ICFS

(M 1)

ICFS

(M 2)

BCR 18.12 70.4 270.4 338.5

DBT 16.63 62.36 42.75 130.7

GLS 42.63

756 599.4

IRS 19.96 -

Table 4 shows the discrimination capability of CFSS in which the dimensionality

of input layer is reduced for above-mentioned four problems. In case of BCR, the

result of CFSS is quite better in terms of ICFS’s two methods while it is not so with

NNFS. The result of DBT is average with comparing others. But, for the case of GLS,

the result is comparable or better with two methods of ICFS.

382

M.M. Kabir, M. Shahjahan, and K. Murase

The comparison on the average testing CA for all problems is shown in Table 5. It

is seen that, the results of BCR and GLS are better with NNFS and ICFS while DBT

with NNFS. The most important aspect however in our proposed method is, less

number of connections of final network due to less number of HN in NN architecture.

The results are shown in Table 6 and 7 where the number of HNs and connections of

CFSS are much less in comparison to other three methods.

Note that there are some missing data in Table 4-7, since IRS problem was not

tested by NNFS and ICFS, and GLS problem by NNFS.

4 Discussion

This paper presents a new combinatorial method for feature selection that generates

subset in minimal computation due to minimal size of hidden layer as well as input

layer. Constructive technique is used to achieve the compact size of hidden layer, and

a straightforward contribution measurement leads to achieve reduced input layer

showing better performance. Moreover, a composite combination of BE by means of

double and single elimination, backtracking, and validation helps CFSS to generate

subset proficiently.

The results shown in Table 1 exhibit that CFSS generates subset with a small

number of relevant features with producing better performance in four-benchmark

problems. The results of relevant subset generation and generalization performance

are better or comparable to other three methods as shown in Tables 4 and 5.

From the long period, the wrapper approaches for FSS are overlooked because of

the huge computation in processing. In CFSS, computational cost is much less. As

seen in Table 3, due to FSS, 37.48% of computational cost is reduced in the

advantage of 1.18% network accuracy for four problems in average.

The computational time for different problems to complete the entire process in

CFSS is shown in Table 2. We believe that these values are sufficiently low especially

for clinical field. The system can give the diagnostic result of the patient to the doctor

within a minute. Though the exact comparison is difficult, other methods such as

NNFS and ICFS may take 4-10 times more since the numbers of hidden neurons and

connections are much more as seen in Tables 6 and 7 respectively.

CFSS thus

provides minimal computation in feature subset selection.

In addition, during subset generation, we used to meet up the generated subset for

validation in each step. The reason is that, during BE we build a composite SC, which

eventually find out the local minima where the network training should be stopped.

Due to implementing such criterion, network produces significant performance and

thus no need to validate the generated subset finally. Furthermore, further checking

for irrelevant features in BE brings the completeness of CFSS.

In this study we applied CFSS to the datasets with smaller number of features up to

9. To get more relevant tests for real tasks, we intend to use CFSS with other datasets

having a larger number of features in future. The issue of extracting rules from NN is

always demandable to interpret the knowledge how it works. For this, a NN with

compactness is desirable. As CFSS can give support to fulfill the requirements, rule

extraction from NN is the further task in future efficiently.

Feature Subset Selection Using Constructive Neural Nets

383

5 Conclusion

This paper presents a new approach for feature subset selection based on contribution

of input attributes in NN. The combination of constructive, contribution, and

backward elimination carries the success of CFSS. Initially a basic constructive

algorithm is used to determine a minimal and optimal structure of NN. In the latter

part, one-by-one removal of input attributes is adopted that does not computationally

expensive. Finally, a backward elimination with new stopping criteria are used to

generate relevant feature subset efficiently. Moreover, to evaluate CFSS, we tested it

on four real-world problems such as breast cancer, diabetes, glass and iris problems.

Experimental results confirmed that, CFSS has a strong capability of feature selection,

and it can remove the irrelevant features from the network and generates feature

subset by producing compact network with minimal computational cost.

Acknowledgements

Supported by grants to KM from the Japanese Society for Promotion of Sciences, the

Yazaki Memorial Foundation for Science and Technology, and the University of

Fukui.

References

1. Liu, H., Tu, L.: Toward Integrating Feature Selection Algorithms for Classification and

Clustering. IEEE Transactions on Knowledge and Data Engineering 17(4), 491–502

(2005)

2. Dash, M., Liu, H.: Feature Selection for Classification. Intelligent Data Analysis - An

International Journal 1(3), 131–156 (1997)

3. Kohavi, R., John, G.H.: Wrapper for feature subset selection. Artificial Intelligence 97,

273–324 (1997)

4. Sateino, R., Liu, H.: Neural Network Feature Selector. IEEE Transactions on Neural

Networks 8 (1997)

5. Milna, L.: Feature Selection using Neural Networks with Contribution Measures. In: 8th

Australian Joint Conference on Artificial Intelligence, Canberra, November 27 (1995)

6. Guan, S., Liu, J., Qi, Y.: An incremental approach to Contribution-based Feature

Selection. Journal of Intelligence Systems 13(1) (2004)

7. Schuschel, D., Hsu, C.: A weight analysis-based wrapper approach to neural nets feature

subset selection. In: Tools with Artificial Intelligence: Proceedings of 10th IEEE

International Conference (1998)

8. Hsu, C., Huang, H., Schuschel, D.: The ANNIGMA-Wrapper Approach to Fast Feature

Selection for Neural Nets. IEEE Trans. on Systems, Man, and Cybernetics-Part B:

Cybernetics 32(2), 207–212 (2002)

9. Dunne, K., Cunningham, P., Azuaje, F.: Solutions to Instability Problems with Sequential

Wrapper-based Approaches to Feature Selection. Journal of Machine Learning Research

(2002)

384

M.M. Kabir, M. Shahjahan, and K. Murase

10. Phatak, D.S., Koren, I.: Connectivity and performance tradeoffs in the cascade correlation

learning architecture, Technical Report TR-92-CSE-27, ECE Department, UMASS,

Amherst (1994)

11. Rumelhart, D.E., McClelland, J.: Parallel Distributed Processing. MIT Press, Cambridge

(1986)

12. Prechelt, L.: PROBEN1-A set of neural network benchmark problems and benchmarking

rules, Technical Report 21/94, Faculty of Informatics, University of Karlsruhe, Germany

(1994)

13. newman, D.J., Hettich, S., Blake, C.L., Merz, C.J.: UCI Repository of Machine Learning

Databases, Dept. of Information and Computer Sciences, University of California, Irvine

(1998), http://www.ics.uci.edu/~mlearn/MLRepository.html

Dynamic Link Matching between Feature

Columns for Diﬀerent Scale and Orientation

Yasuomi D. Sato

, Christian Wolﬀ

, Philipp Wolfrum

,

and Christoph von der Malsburg

1,2

Frankfurt Institute for Advanced Studies (FIAS),

Johann Wolfgang Goethe University, Max-von-Laue-Str. 1, 60438,

Frankfurt am Main, Germany

Computer Science Department, University of Southern California, LA,

90089-2520, USA

Abstract. Object recognition in the presence of changing scale and ori-

entation requires mechanisms to deal with the corresponding feature

transformations. Using Gabor wavelets as example, we approach this

problem in a correspondence-based setting. We present a mechanism for

ﬁnding feature-to-feature matches between corresponding points in pairs

of images taken at diﬀerent scale and/or orientation (leaving out for the

moment the problem of simultaneously ﬁnding point correspondences).

The mechanism is based on a macro-columnar cortical model and dy-

namic links. We present tests of the ability of ﬁnding the correct feature

transformation in spite of added noise.

Introduction

When trying to set two images of the same object or scene into correspondence

with each other, so that they can be compared in terms of similarity, it is nec-

essary to ﬁnd point-to-point correspondences in the presence of changes in scale

or orientation (see Fig. 1). It is also necessary to transform local features (unless

one chooses to work with features that are invariant to scale and orientation,

accepting the reduced information content of such features). Correspondence-

based object recognition systems [1,2,3,4] have so far mainly addressed the issue

of ﬁnding point-to-point correspondences, leaving local features unchanged in the

process [5,6]. In this paper, we propose a system that can not only transform fea-

tures for comparison purposes, but also recognize the transformation parameters

that best match two sets of local features, each one taken from one point in an

image. Our eventual aim will be to ﬁnd point correspondences and feature corre-

spondences simultaneously in one homogeneous dynamic link matching system,

but we here take point correspondences as given for the time being.

Both theoretical [7] and experimental [8] investigations are suggesting

2D-Gabor-based wavelets as features to be used in visual cortex. These are best

sampled in a log-polar manner [11]. This representation has been shown to be

particularly useful for face recognition [12,13], and due to its inherent symmetry

it is highly appropriate for implementing a transformation system for scale and

M. Ishikawa et al. (Eds.): ICONIP 2007, Part I, LNCS 4984, pp. 385–394, 2008.

c Springer-Verlag Berlin Heidelberg 2008

386

Y.D. Sato et al.

Fig. 1. When images of an object diﬀer in (a) scale or (b) orientation, correspondence

between them can only established if also the features extracted at a given point (e.g.,

at the dot in the middle of the images) are transformed during comparison

orientation. The work described here is based entirely on the macro-columnar

model of [14]. In computer simulations we demonstrate feature transformation

and transformation parameter recognition.

Concept of Scale and Rotation Invariance

Let there be two images, called I and M (for image and model). The Gabor-

based wavelet transform ¯

L

(L = I or M), has the form of a vector, called a

jet, whose components are deﬁned as convolutions of the image with a family of

Gabor functions:

J

L

k,l

, a, θ) =

I(x)

Q(θ)(x

− x) d

(1)

ψ(x) =

exp(

−

|x|

) exp(i x

· e

)

− exp(−

) ,

(2)

Q(θ) =

cos θ sin θ

−sin θ cos θ

(3)

Here a simultaneously represents the spatial frequency and controls the width

of a Gaussian envelope window. D represents the standard deviation of the

Gaussian. The individual components of the jet are indexed by n possible ori-

entations and (m + m

+ m

) possible spatial frequencies, which are θ = πk/n

∈ {0, 1, . . . , (n−1)}) and a = a

(0 < a

0

< 1, l

∈ {0, 1, . . . , (m+m

+m

2

−1)}).

Here parameters m

and m

ﬁx the number of scale steps by which the image

can be scaled up or down, respectively. m (> 1) is the number of scales in the

jets of the model domain.

The Gabor jet ¯

is deﬁned as the set

{ ¯

k,l

} of N

Gabor wavelet components

extracted from the position x

of the image. In order to test the ability to ﬁnd the

correct feature transformation described below, we add noise of strength σ

the Gabor wavelet components, ˜

k,l

= ¯

k,l

(1 + σ

ran

) where S

ran

are random

numbers between 0 and 1. Instead of the Gabor jet ¯

itself, we employ the

sum-normalized ˜

L

k,l

. Jets can be visualized in angular-eccentricity coordinates,

Dynamic Link Matching between Feature Columns

387

(a)

(b)

Fig. 2. Comparison of Gabor jets in the model domain M and the image domain I (left

and right, resp., in the two part-ﬁgures). Jets are visualized in log-polar coordinates:

The orientation θ of jets is arranged circularly while the spatial frequency l is set

radially. The jet J

of the model domain M is shown right while the jet J

from

the transformed image in the image domain I is shown left. Arrows indicate which

components of J

and J

are to be compared. (a) comparison of scaled jets and (b)

comparison of rotated jets.

in which orientation and spatial frequency of a jet are arranged circularly and

radially (as shown in Fig.2).

The components of jets that are taken from corresponding points in images I

and M may be arranged according to orientation (rows) and scale (columns):

⎛

⎜

⎝

· · · 0 J

0,m

−1

· · · J

0,m

−1+m

· · · 0

· · · 0 J

n

−1,m

−1

· · · J

−1,m

−1+m

· · · 0

⎞

⎟

⎠ ,

(4)

⎛

⎜

⎝

0,m

−1

· · · J

0,m

−1+m

−1,m

−1

· · · J

−1,m

−1+m

⎞

⎟

⎠ .

(5)

Let us assume the two images M and I to be compared have the exact same

structure, apart from being transformed relative to each other. Let us assume the

transformation conforms to the sample grid of jet components. Then there will be

pair-wise identities between components in Eqs.(4) and (5). If the jet in I is scaled

relative to the jet in M, the non-zero components of J

are shifted along the

horizontal (or, in Fig.2(a), radial) coordinate. If the image I, and correspondingly

the jet J

, is rotated, then jet components are circularly permuted along the

vertical axis in Eq.(4) (see Fig.2(b)). When comparing scaled and rotated jets,

the non-zero components of J

are shifted along both axes simultaneously. There

are model jet components of m diﬀerent scales, and to allow for m

steps of

scaling down the image I and m

steps of scaling it up. The jet in Eq.(4) is

padded on the left and right with m

and m

columns of zeros, respectively.

388

Y.D. Sato et al.

Fig. 3. A dynamic link matching network model of a pair of single macro-columns. In

the I and M domains, the network consists of, respectively, N

and N

feature units

(mini-columns). The units in the I and M domains represent components of the jets J

and J

. The Control domain C controls interconnections between the I and M macro-

columns, which are all-to-all at the beginning of a ν cycle. By comparing the two jets

and through competition of control units in C, the initial all-to-all interconnections are

dynamically reduced to a one-to-one interconnection.

Modelling a Columnar Network

In analogy to [14], we set up a dynamical system of variables, both for jet com-

ponents and for all possible correspondences between jet components. These

variables are interpreted as the activity of cortical mini-columns (here called

“units”, “feature units” in this case). The set of units corresponding to a jet,

either in I or in M, form a cortical macro-column (or short “column”) and in-

hibit each other, the strength of inhibition being controlled by a parameter ν,

which cyclically ramps up and falls down. In addition, there is a column of con-

trol units, forming the control domain C. Each control unit stands for one pair

of possible values of relative scale and orientation between I and M and can

evaluate the similarity in the activity of feature units under the corresponding

transformation. The whole network is schematically shown in Fig.3.

Each domain, or column in our simple case, consists of a set of unit activities

L

1

, . . . , P

} in the respective domain (L = I or M). N

= (m + m

+ m

)

× n

and N

= m

× n represent respectively the total number of units in the I and

M domains. The control unit’s activations are P

= (P

, . . . , P

)

, where

= (m

+ m

+ 1)

×n. The equation system for the activations in the domains

is given by

L

α

= f

) + κ

+ κ

− C

+ ξ

(t),

(α = 1, . . . , N

and L =

{M, I}\L), (6)

= f

) + κ

+ ξ

(t),

(γ = 1, . . . , N

(7)

Dynamic Link Matching between Feature Columns

389

Fig. 4. Time course of all activities in the domains I, M and C over a ν cycle for

two jets without relative rotation and scale, for a system with size m = 5, n = 8,

1

= 0 and m

= 4. The activity of units in the M and I columns is shown within

the bars on the right and the bottom of the panels, respectively, each of the eight sub-

blocks corresponding to one orientation, with individual components corresponding

to diﬀerent scales. The indices k and l at the top label the orientation and scale of

the I jet, respectively. The large matrix C contains, in the top version, as non-zero

(white) entries all allowed component-to-component correspondences between the two

jets (the empty, black, entries would correspond to forbidden relative scales outside the

range set by m

and m

). The matrix consists of 8

× 8 blocks, corresponding to the

× 8 possible correspondences between jet orientations, and each of these sub-matrices

contains m

×(m+m

) entries to make room for all possible scale correspondences.

Each control unit controls (and evaluates) a whole jet to jet correspondence with the

help of its entries in this matrix. The matrix in the lowest panel has active entries

corresponding to just one control unit (for identical scale and orientation). Active

units are white, silent units black. Unit activity is shown for three moments within

one ν cycle. At t = 3.4 ms, the system has already singled out the correct scale, but is

still totally undecided as to relative orientation. System parameters are κ

= κ

= 2,

= 5, C

= 0.99, σ = 0.015 and σ

= 0.1.

390

Y.D. Sato et al.

where ξ

α

(t) and ξ

(t) are Gaussian noise of strength σ

. C

∈ [0, 1] scales the

ratio between the Gabor jet J

and the total input E

from units in the other

domain, L , and κ

controls the strength of the projection L to L. κ

controls

the strength of the inﬂuence of the similarity F

on the control units. Here we

explain the other terms in the above equations. The function f

(

·) is:

) = a

− ν(t) max

β=1,...,N

} − (P

)

(8)

where a

= 25. The function ν(t) describes a cycle in time during which the

feedback is periodically modulated:

ν(t) =

(ν

max

− ν

min

)

−T

+ ν

min

t < T

+ T

)

max

+ T

t < T

+ T

relax

)

(9)

Here T

[ms] is the duration during which ν increases with t while T

(k =

1, 2, 3, . . .) are the periodic times at which the inhibition ν starts to increase.

relax

is a relaxation time after ν has been increased. ν

min

= 0.4, ν

max

= 1,

= 36.0 ms and T

relax

= T

/6 are set up in our study. Because of the increasing

ν(t) and the noise in Eqs.(6) and (7), only one minicolumn in each domain

remains active at the end of the cycle, while the others are deactivated, as shown

in Fig.4.

If we deﬁne the mean-free column activities as ˜

L

α

:= P

− (1/N

)

the interaction terms in Eqs. (6) and (7) are

β=1

α =1

αα

(10)

γ=1

γ =1

γγ

(11)

Here N

= n

×(m

+1) is the number of permitted transformations between

the two domains (n diﬀerent orientations and m

+ m

+ 1 possible scales). The

expression C

αα

designates a family of matrices, rotating and scaling one jet into

another. If we write β = (o, s) so that o identiﬁes the rotation and s the scaling,

then we can write the C matrices as tensor product: C

= A

⊗ B

, where A

is an n

× n matrix with a wrap-around shifted diagonal of entries 1 and zeros

otherwise, implementing rotation o of jet components at any orientation, and B

is an m

× (m + m

+ m

) matrix with a shifted diagonal of entries 1, and zeros

otherwise, implementing an up or down shift of jet components of any scale.

The C matrix implementing identity looks like the lowest panel in Fig. 4. The

function E of Eq. (10), transfers feature unit activity from one domain to the

other, and is a weighted superposition of all possible transformations, weighted

with the activity of the control units. The function of F in Eq. (11) evaluates

the similarity of the feature unit activity in one domain with that in the other

under mutual transformation.

Dynamic Link Matching between Feature Columns

391

Through the columnar dynamics of Eq. (6), the relative strengths of jet com-

ponents are expressed in the history of feature unit activities (the strongest

jet component, for instance, letting its unit stay on longest) so that the sum

of products in Eq. (11), integrated over time in Eq. (7), indeed expresses jet

similarities.

The time course of all unit activities in our network model during a ν cycle

is demonstrated in Fig.4. In this ﬁgure, the non-zero components of the jets

used in the I and M domains are identical without any rotation and scale. At

the beginning of the ν cycle, all units in the I, M and C domains are active,

as indicated in Fig.4 by the white jet-bars on the right and the bottom, and

indicated by all admissible correspondences in the matrix being on. The activity

state of the units in each domain is gradually reduced following Eqs.(6) – (11).

In the intermediate state at t = 3.4 ms, all control units for the wrong scale have

switched oﬀ, but the control units for all orientations are still on. At t = 12.0

ms, ﬁnally, only one control unit has survived, the one that correctly identiﬁes

the identity map. This state remains stable during the rest of the ν cycle.

Results

In this Section, we describe tests performed on the functionality of our macro-

columnar model by studying the robustness of matching to noise introduced into

the image jets (parameter σ

) and the dynamics of the units, Eqs. (6) and (7)

(parameter σ) and robustness to the admixture of input from the other domain,

Eq. (6), due to non-vanishing (1

− C

E

). A correct match is one in which the

surviving control unit corresponds to the correct transformation pair which the

I jet diﬀers from the M jet. Jet noise is to model diﬀerences between jets actually

extracted from natural images. Dynamic noise is to reﬂect signal ﬂuctuations to

be expected in units (minicolumns) composed of spiking neurons, and a non-

zero admixture parameter may be relevant in a system in which transfer of jet

information between the domains is desired.

For both of the experiments described below, we used Gabor jets extracted

from the center of real facial images. The transformed jets used for matching were

extracted from images that had been scaled and rotated relative to the center po-

sition by exactly the same factors and angles that are also used in the Gabor trans-

form. This ensures that there exists a perfect match between any two jets produced

from two diﬀerent rotated and scaled versions of the same face.

The ﬁrst experiment uses a single facial image. This experiment investigates

the inﬂuence of noise in the units (σ) and of the parameter C

. Using a pair

of the same jets, we obtain temporal averages of the correct matching over 10

ν cycles for each size (n = 4, 5, . . . , 9, 10, m = 4, m

= 0 and m

= m

− 1)

of the macro-column. From these temporal averages, we calculate the sampling

average over all n. The standard error is estimated using [σ

√

] where σ

standard deviation of the sampling average. N

is the sample size.

Fig.5(a) shows results for κ

= κ

= 1 and κ

= 5. For strong noise (σ =

0.015 to 0.02), the correct matching probabilities for C

= 0.98 to C

= 0.96

392

Y.D. Sato et al.

(a)

(b)

0.2

0.4

0.6

0.8

0.005

0.01

0.015

0.02

Probability Correct

=1.0

=0.98

=0.96

=0.95

=0.94

0.5

0.75

0.005

0.01

0.015

0.02

Probability Correct

C

E

=1.0

=0.98

=0.96

=0.95

=0.94

Fig. 5. Probability of correct match between units in the I and M domains. (a) κ

=

κ

= 1 and κ

= 5. (b) κ

= 2.3, κ

= 5 and κ

= 5.

0.2

0.4

0.6

0.8

0.05

0.1

0.15

0.2

Probability Correct

=0.000

Fig. 6. Probability of the correct matching for comparisons of two identical faces. 6

facial images are used. Our system is set with m = 4, m

= m

− 1, n = 8,

= κ

= 6.5, κ

= 5.0, C

= 0.98 and σ = 0.0.

take better values than the ones for C

= 1. Interestingly, for these low C

values, matching gets worse for weaker noise, collapsing in the case of σ = 0.

This eﬀect requires further investigation. However, for κ

= 2.3 and κ

= 5,

we have found that the correct matching probability takes higher values than

in case of Fig.5(a). In particular, our system model with higher noise σ = 0.015

even demonstrates perfect correct matching for C

= 0.98, independent of n.

Next, we have investigated the robustness of the matching against the second

type of noise (σ

), using 6 diﬀerent face types. Since the original image is resized

and/or rotated with m

+ m

+ 1 = 7 scales and n = 8 orientations, a total of 56

images could be employed in the I domain. Here our system is set with κ

= κ

6.5, κ

= 5.0 and C

= 0.98. For each face image, we take temporal averages

of the correct matching probability for each size-orientation of the image, in a

similar manner as described above. Averages and standard errors of the temporal

averages on 6 diﬀerent facial images can be plotted, in terms of σ

[Fig. 6]. The

result of this experiment is independent of σ as long as σ

0.02.

As a result of Fig. 6, as random noise in the jets is increased, the correct

matching probability smoothly decreases for one image, or it abruptly shifts

down to around 0 at a certain σ

for the other image. We have also obtained

Dynamic Link Matching between Feature Columns

393

perfect matching, independent of σ

when using the same but rotated or resized

face. The most interesting point that we would like to insist is that the probability

takes higher values than 87.74% as σ < 0.02 and σ

1

< 0.08 (see Fig.6). Therefore,

we can say that network model has a high recognition ability for scale and

orientation invariance, which is not dependent of diﬀerent facial types.

Discussion and Conclusion

The main purpose of this communication is to convey a simple concept. Much

more work needs to be done on the way to practical applications, for instance,

more experiments with features extracted from independently taken images of

diﬀerent scale and orientation to better bring out the strengths and weaknesses

of the approach. In addition to comparing and transforming local packages of

feature values (our jets), it will be necessary to also handle spatial maps, that

is, sets of point-to-point correspondences, a task we will approach next. Real

applications involve, of course, a continuous range of transformation parameters,

whereas we here had admitted only transformations from the same sample grid

used for deﬁning the family of wavelets in a jet. We hope to address this problem

by working with continuous superpositions of transformation matrices for the

neighboring transformation parameter values that straddle the correct value.

Our approach may be seen as using brute force, as it requires many separate

control units to sample the space of transformation parameters with enough

density. However, the same set of control units can be used in a whole region of

visual space, and in addition to that, for controlling point-to-point maps between

regions, as spatial maps and feature maps stand in one-to-one correspondence to

each other. The number of control units can be reduced even further if transfor-

mations are performed in sequence, by consecutive, separate layers of dynamic

links. Thus, the transformation from an arbitrary segment of primary visual cor-

tex to an invariant window could be done in the sequence translation – scaling –

rotation. In this case, the number of required control units would be the sum and

not the product of the number of samples for each individual transformation. A

disadvantage of that approach might be added diﬃculties in ﬁnding correspon-

dences between the domains. Further work is required to elucidate these issues.

Acknowledgements

This work was supported by the EU project Daisy, FP6-2005-015803 and by the

Hertie Foundation. We would like to thank C. Weber for help with preparing

the manuscript.

References

1. Anderson, C.H., Van Essen, D.C., Olshausen, B.A.: Directed visual attention and

the dynamic control of information ﬂow. In: Itti, L., Rees, G., Tsotsos, J.K. (eds.)

Neurobiology of attention, pp. 11–17. Academic Press/Elservier (2005)

2. Weber, C., Wermter, S.: A self-organizing map of sigma-pi units. Neurocomput-

ing 70, 2552–2560 (2007)

394

Y.D. Sato et al.

3. Wiskott, L., v.d. Malsburg, C.: Face Recognition by Dynamic Link Matching, ch. 4.

In: Sirosh, J., Miikkulainen, R., Choe, Y. (eds.) Lateral Interactions in the Cortex:

Structure and Function, vol. 4, Electronic book (1996)

4. Wolfrum, P., von der Malsburg, C.: What is the optimal architecture for visual

information routing? Neural Comput. 19 (2007)

5. Lades, M.: Invariant Object Recognition with Dynamical Links, Robust to Varia-

tions in Illumination. Ph.D. Thesis, Ruhr-Univ. Bochum (1995)

6. Maurer, T., von der Malsburg, C.: Learning Feature Transformations to Recognize

Faces Rotated in Depth. In: Fogelman-Souli´

e, F., Rault, J.C., Gallinari, P., Drey-

fus, G. (eds.) Proc. of the International Conference on Artiﬁcial Neural Networks

ICANN 1995, EC2 & Cie, p. 353 (1995)

7. Daugmann, J.G.: Uncertainty relation for resolution in space, spatial frequency,

and orientation optimized by two-dimensional visual cortical ﬁlters. J. Opt. Soc.

Am. A 2, 1160–1169 (1985)

8. Jones, J., Palmer, L.: An evaluation of the two-dimensional Gabor ﬁlter model of

simple receptive ﬁelds in cat striate cortex. J. Neurophysiol. 58, 1233–1258 (1987)

9. Pentland, A., Moghaddam, B., Starner, T.: View-based and modular eigenspaces

for face recognition. In: Proc. of the third IEEE Conference on Computer Vision

and Pattern Recognition, pp. 84–91 (1994)

10. Wiskott, L., Fellous, J.-M., Kr¨

uger, von der Malsburg, C.: Face Recognition by

Elastic Bunch Graph Matching. IEEE Trans. Pattern Anal. & Machine Intelli-

gence 19, 775–779 (1997)

11. Marcelja, S.: Mathematical description of the responses of simple cortical cells. J.

Optical Soc. Am. 70, 1297–1300 (1980)

12. Yue, X., Tjan, B.C., Biederman, I.: What makes faces special? Vision Res. 46,

3802–3811 (2006)

13. Okada, K., Steﬀens, J., Maurer, T., Hong, H., Elagin, E., Neven, H., von der

Malsburg, C.: The Bochum/USC Face Recognition System and How it Fared in

the FERET Phase III Test. In: Wechsler, H., Phillips, P.J., Bruce, V., Fogelman

Souli´

e, F., Huang, T.S. (eds.) Face Recognition: FromTheory to Applications, pp.

186–205. Springer, Heidelberg (1998)

14. L¨

ucke, J., von der Malsburg, C.: Rapid Correspondence Finding in Networks of

Cortical Columns. In: Kollias, S., Stafylopatis, A., Duch, W., Oja, E. (eds.) ICANN

2006. LNCS, vol. 4131, pp. 668–677. Springer, Heidelberg (2006)

Perturbational Neural Networks for Incremental

Learning in Virtual Learning System

Eiichi Inohira

, Hiromasa Oonishi

, and Hirokazu Yokoi

1

Kyushu Institute of Technology, Hibikino 2-4, 808-0196 Kitakyushu, Japan

{inohira,yokoi}@life.kyutech.ac.jp

Mitshbishi Heavy Industries, Ltd., Japan

Abstract. This paper presents a new type of neural networks, a pertur-

bational neural network to realize incremental learning in autonomous

humanoid robots. In our previous work, a virtual learning system has

been provided to realize exploring plausible behavior in a robot’s brain.

Neural networks can generate plausible behavior in unknown environ-

ment without time-consuming exploring. Although an autonomous robot

should grow step by step, conventional neural networks forget prior learn-

ing by training with new dataset. Proposed neural networks features

adding output in sub neural network to weights and thresholds in main

neural network. Incremental learning and high generalization capabil-

ity are realized by slightly changing a mapping of the main neural net-

work. We showed that the proposed neural networks realize incremental

learning without forgetting through numerical experiments with a two-

dimensional stair-climbing bipedal robot.

Introduction

Recently, humanoid robots such as ASIMO [1] dramatically develop in view of

hardware and are prospective for working just like a human. Although many

researchers have studied artiﬁcial intelligence for a long time, humanoid robots

have not so high autonomy as experts are not wanted. Humanoid robots should

accomplish missions by itself rather than experts give them solutions such as

models, algorithms, and programs.

Researchers have studied to realize a robot with learning ability through trial

and error. Such studies uses so-called soft computing techniques such as rein-

forcement learning [2] and central pattern generator (CPG) [3]. Learning of a

robot saves expert’s work to some degree but takes much time. These techniques

are less eﬃcient than humans. Humans instantly act depending on a situation

by using imagination and experience. Even if a human fails, a failure will serve

himself as experience. In particular, humans know characteristics of environment

and their behavior and simulate trial and error to explore plausible behavior in

their brains.

In our previous work, Yamashita and et al. [4] have proposed a bipedal walking

control system with virtual environment based on motion control mechanism of

primates. In this study, exploring plausible behavior by trial and error is carried

M. Ishikawa et al. (Eds.): ICONIP 2007, Part I, LNCS 4984, pp. 395–404, 2008.

c Springer-Verlag Berlin Heidelberg 2008

396

E. Inohira, H. Oonishi, and H. Yokoi

out in a robot’s brain. However, the problem is that exploring takes much time.

Kawano and et al. [5] have introduced learning of a relation of environmental

data and an optimized behavior into the previous work to save time. Generating

behavior becomes fast because a trained neural network (NN) immediately out-

puts plausible behavior due to its generalization capability, which is an ability

to obtain a desired output at an unknown input. However, a problem in this

previous work is that conventional neural networks cannot realize incremental

learning. It means that a trained NN forgets prior learning by training with new

dataset. Realizing incremental learning is indispensable to learn various behav-

iors step by step.

We presents a new type of NNs, perturbational neural networks (PNN) to

achieve incremental learning. PNN consists of main NN and sub NN. Main NN

learns with representative dataset and never changes after that. Sub NN learns

with new dataset as output error of main NN for the dataset is canceled. Sub NN

is connected with main NN as additional terms of weights and thresholds in main

NN. We guess that connections through weights and thresholds slightly changes

characteristics of main NN, which means that perturbation is given to mapping

of main NN, and that incremental learning does not sacriﬁce its generalization

capability.

A Virtual Learning System for a Biped Robot

We assume that a robot tries to achieve a task in an unknown environment. Vir-

tual learning is deﬁned as learning of virtual experience in virtual environment in

a robot’s brain. Virtual environment is generated from sensory information on real

environment around a robot and used for exploring behavior ﬁt for a task without

real action. Exploring such behavior in virtual environment is regarded as virtual

experience by trial and error or ingenuity. Virtual learning is to memorize a rela-

tion between environment and behavior under a particular task.

Beneﬁts of virtual learning are (1) to reduce risk of robot’s failure in the real

world such as falling and colliding and (2) to enable a robot to immediately act

and achieve learned tasks in similar environments. The former is involved in a

robot’s safety and physical wastage. The latter leads to saving work and time to

achieve a task.

We presented virtual learning system for a biped robot [4,5]. This virtual

learning system has virtual environment and NNs for learning as shown in Fig.1.

We assume that central pattern generator (CPG) generates motion of a biped

robot. CPG of a biped robot consists of neural oscillators corresponding to its

joints. Then behavior is described as CPG parameters rather than joint angles.

CPG parameters for controlling a robot are called CPG input. Optimized CPG

input for a plausible behavior is given by exploring in virtual environment.

Virtual learning is realized by NNs. A NN learned a mapping from environ-

mental data to optimized CPG input, i.e, a mapping from real environment to

robot’s behavior. NNs have generalization capability, which means that desired

output can be generated at an unknown input. When a NN learned a mapping

PNNs for Incremental Learning in Virtual Learning System

397

Exploring

optimized

motion

Does it serve

a purpose ?

CPG input

Download 12.42 Mb.

Do'stlaringiz bilan baham:

1 ... 31 32 33 34 35 36 37 38 ... 88