Lecture Notes in Computer Science

bet	84/88
Sana	16.12.2017
Hajmi	12.42 Mb.
	#22381

1 ... 80 81 82 83 84 85 86 87 88

[ff]

[fb]

[hc]

Fig. 5. (a) Schematic representation of the proposed ﬁlling-in model. (b) Simulation

results of a computational V1 neuron using various length of horizontal bars.

V1 neuron whose RF center is x is referred to as s

(x, t). Presume that I

(x, t)

is constant around x and that σ is a small value. We obtain

∂

∂t

(x, t) =

∂

∂θ

∗ I

∂

∂t

∂

∂θ

I =

∂

∂θ

∂

∂t

(10)

= ˜

ηθθ

+ ˜

ξθθ

− ˜κ

ηηθθ

− ˜κ

ξξθθ

+ λ

∂

∂η

(11)

Equation (11) indicates that a V1 neuron s

is aﬀected by ˜

ηθθ

, ˜

ξθθ

and other

terms. We expect that those terms be the outputs of V2 neurons.

Because of page limitations, details are not presented in this article. However,

we found that the value of ˜

ξθθ

is the sum of four neurons selective to about

◦

angular diﬀerence of the V-shaped pattern or junctions as illustrated in

Fig. 4(b). In addition, ˜

ηθθ

is the sum of neurons selective to patterns in Fig.

4(c). We found the angular selectivity in ˜

ηηθθ

and ˜

ξξθθ

. The ﬁfth term of (11)

represents intra-cortical interaction between V1 neurons connected by horizontal

connections [12].

Results show that our V1 model neurons, s

(x, t), are aﬀected by the output

of V2 model neurons, which encode angular information of lines, and which are

aﬀected by V1 neurons through horizontal connections. Figure 5(a) depicts a

schematic representation of our model formulated by (11). Comparing Fig. 5(a)

to Fig. 1(d), we conclude that our computational model is consistent with the

physiological abstract model. However, explicit intra-cortical connections in V2

are not emerged in (11). This problem will be addressed in future work.

Numerical Simulations

First, numerical simulations of (9) are performed to investigate whether the

expected ﬁlling-in pattern is obtained using Fig. 2(a) as the initial value of I.

Parameter is λ = 0.1. Figure 2(d) is the steady state of I (ﬁlling-in pattern).

We ﬁnd an expected pattern of Fig. 2(d) in which a broken bar of Fig. 2(a) is

completed.

950

S. Satoh and S. Usui

Fig. 6. (a) A baboon with graﬃti. (b) Inpainted (restored) baboon using the proposed

visual model. (c) Almost half of visual information is missing. (d) Restored image by

the proposed visual model.

Next, we evaluate the eﬀectiveness of our visual model as a digital image

inpainting algorithm. Results are shown in Fig. 6 and Fig. 7. Areas

B of Fig. 6(a)

Fig. 6(c) are black curvy lines drawn by an author and checkered orange area,

respectively. (Color results would be available with the electronic version of this

article.) Simulation for color images are executed as follows: decompose a color

image into three (R,G,B) intensity channels, apply (9) to each of color channels,

and unify three steady states into one image. Restored images by our visual

model are shown in Figs. 6(b) and 6(d). We ﬁnd that our visual model is eﬀective

as a DII algorithm. The situation portrayed in Fig. 6(c) is possible, for example,

in the case of block loss because of a packet drop during wireless transmission,

gap padding for image magniﬁcation, and so on.

We compare our model and Spot Healing Brush Tool of Adobe

Photoshop

CS2 (options are default settings). Neither method repaired texture areas as of

baboon fur, but our model restores strong edges, whereas the Photoshop tool

Computational Understanding and Modeling of Filling-In Process

951

Fig. 7. (a) The black rectangle is area

B to be ﬁlled in. (b) Result of the proposed

visual model. (c) Result of Adobe

Photoshop

CS using default setting.

gives a blurred image. The reason our model is not applicable to the textured

area is that the evaluation function (9) contains no texture information.

Finally, we simulate (11) to investigate whether our model neuron, s

repro-

duces the physiological result of Fig. 1(c). The widths of horizontal bars are

two pixels; the length varies from 0 to 14 pixels by 1 pixel step. Parameters

are θ = π/2 (not θ = 0) and σ = 1 such that the neuron s

is selective to the

horizontal bars. The receptive ﬁeld of the simulated neuron overlaps BS area

B. Figure 5(b) illustrates the steady values of s

. We ﬁnd consistency between

physiological results and our model. One end of the bar appears from BS area

B, as in Fig. 1(b4), when the bar length is greater than 9 (pixels). In this situa-

tion, neuron s

implicitly performs orientation detection for a completed bar like

Fig. 1(a4) as its intrinsic ﬁlling-in process. For that reason, s

shows a consid-

erable increase in its activities when the bar length becomes greater than nine

pixels.

Summary

To solve the ﬁlling-in problem, we employed two physiological ﬁndings for a vi-

sual model and present novel aspects for those ﬁndings: variable separation and

adiabatic approximation. Results showed physiological consistency and plausi-

bility in our model, and evaluated the eﬀectiveness as an algorithm for digital

image inpainting. As a basis of computational modeling, standard regularization

theory and the steepest descent method are used to expose the sort of problem

our model solves or optimizes. Our visual model optimizes an evaluation function

representing a priori knowledge of missing images.

We obtained desired patterns and neural responses for bar stimulus. However,

we have not yet answered the following question: why is adiabatic approximation

between V1 and V2 suitable for the ﬁlling-in process? That remains as an open

problem.

952

S. Satoh and S. Usui

We should develop an appropriate means for texture ﬁlling-in. We expect

that a new algorithm or visual models will be derived from theoretical aspects

reﬂecting other neural properties from our fundamental functional. For example,

a new functional including higher order image properties will be eﬀective for

texture ﬁlling-in.

The functional E is deﬁned by authors from theoretical viewpoints. An ex-

citing challenge will include self-organization of E because E represents a priori

knowledge of various kinds of images. It should reﬂect and represent statistical

features of those images.

References

1. Kamitani, Y., Shimojo, S.: Manifestation of scotomas created by transcranial mag-

netic stimulation of the human visual cortex. Nature Neuroscience 2, 767–771

(1999)

2. Gerrits, H.J., Timmerman, G.J.: The ﬁlling-in process in patients with retinal

scotoma. Vision Research 9, 439–442 (1969)

3. Gerrits, H.J., De Haan, B., Vendrik, A.J.: Experiments with retinal stabilized im-

ages. Vision Research 6, 427–440 (1966)

4. Komatsu, H.: The neural mechanisms of perceptual ﬁlling-in. Nature Neuro-

science 7, 200–231 (2006)

5. Komatsu, H., Kinoshita, M., Murakami, I.: Neural responses in the retinotopic

representation of the blind spot in the macaque V1 to stimuli for perceptual ﬁlling

in. J. Neuroscience 20, 9310–9319 (2000)

6. Matsumoto, M., Komatsu, H.: Neural responses in the macaque V1 to bar stimuli

with various lengths presented on the blind spot. J. Neurophysiology 93, 2374–2387

(2005)

7. Hildreth, E.C.: The computation of the velocity ﬁeld. Proc. R. Soc. Lond. B 221,

189–220 (1984)

8. Bertalmio, M., Sapiro, G., Caselles, V., Ballester, C.: Image inpainting. In: Proc.

of SIGGRAPH 2000, pp. 417–424. ACM Press, New York (2000)

9. Rane, S.D., Shantanu, D., Sapiro, G., Bertalmio, M.: Structure and texture ﬁlling-

in of missing image blocks in wireless transmission and compression applications.

IEEE Trans. on Image Processing 12, 296–303 (2003)

10. Ito, M., Komatsu, H.: Representation of angles embedded within contour stimuli

in area V2 of Macaque monkeys. J. Neuroscience 24, 3313–3324 (2004)

11. Young, R.A., Lesperance, R.M., Meyer, W.W.: The Gaussian derivative model for

spatial-temporal vision: I. Cortical model. Spatial vision 14, 261–319 (2001)

12. Satoh, S., Usui, S.: Image reconstruction: another computational role of long-range

horizontal connections in the primary visual cortex. Neural Computation (under

review)

M. Ishikawa et al. (Eds.): ICONIP 2007, Part I, LNCS 4984, pp. 953–962, 2008.

Biologically Motivated Face Selective Attention Model

Woong-Jae Won

, Young-Min Jang

, Sang-Woo Ban

, and Minho Lee

Dept. of Mechatronics Intelligent Vehicle Research Team, Daegu Gyeongbuk Institute of

Science and Technology, 711 Hosan-dong Dalseo-Gu, Taegu 704-230, Korea

wwj@dgist.ac.kr

School of Electrical Engineering and Computer Science, Kyungpook National University

1370 Sankyuk-Dong, Puk-Gu, Taegu 702-701, Korea

ymjang@ee.knu.ac.kr, mholee@knu.ac.kr

Dept. of Information and Communication Engineering, Dongguk University

707 Seokjang-Dong, Gyeongju, Gyeongbuk, 780-714, Korea

swban@dongguk.ac.kr

Abstract.

In this paper, we propose a face selective attention model, which is

based on biologically inspired visual selective attention for human faces. We

consider the radial frequency information and skin color filter to localize a can-

didate region of human face, which is to reflect the roles of the V4 and the

infero-temporal (IT) cells. The ellipse matching based on symmetry axis is ap-

plied to check whether the candidate region contain a face contour feature. Fi-

nally, face detection is conducted by face form perception model implemented

by an auto-associative multi-layer perceptron (AAMLP) that mimics the roles

of faces selective cells in IT area. Based on both the face-color preferable atten-

tion and face-form perception mechanism, the proposed model shows plausible

performance for localizing face candidates in real time.

Keywords: Face selective attention, biologically motivated selective attention,

saliency map.

1 Introduction

Recently, the social development mechanism has been considered for Autonomous

Mental Development (AMD) in construction of more intelligent robots [1, 2, 3]. It

might be possible if the robots can increase their own knowledge through interaction

with environment and human like human does. In order to embody the intelligent

robot with the social development concept, we need to implement more human-like

sensors such as retina, electronic nose, touch, smell and acoustic sensors to the ma-

chines. Also, we need to develop an intelligent model in order to pay attention to

interesting objects by primitive sensory information. Furthermore, it is important that

human and environment can share their knowledge by interactive ways [1, 2, 3].

In order to implement a truly human-like robot system, face detection is one of the

most important functions for realizing social development mechanism [1, 2, 3]. Hu-

man babies learn from their mother after focusing their eyes to mother’s face, and

they can feel emotions and get social functions through experience with learning.

No conventional face attention system has shown comparable performance with the

954

W.-J. Won et al.

system of a human being yet. Recently, biologically motivated approaches have been

developed by L. Itti., T. Poggio, and C. Koch [4, 5, 6]. Some research groups have

developed human-like intelligent robots using these kinds of approaches [2, 3, 7].

And, an attention model was introduced for face detection [8]. However, they

have not shown plausible results for the face attention problem in complex scenes

until now.

In this paper, we propose a real time face candidates localizer as we simply imitate

the function of human visual pathway based on a biologically motivated selective

attention mechanism, which can focus on a face preferentially and reject non-face

areas in order to implement social developmental robot vision system.

When a task is given to find a specific object, not only the features for the saliency

map (SM) in the bottom-up processing should be biased as differently weighted color

features but also the task specific shape feature should be feedback from top-down

process to reject non-interesting area. If the specific task is to find a face, a skin color

characteristic of human faces can be considered as the dominant features in the selec-

tive attention model to intensify the face areas, and also a face shape can be consid-

ered to reject non-face area.

Thus, we simply consider color biased information, which is a color filtered inten-

sity, an R·G color opponent and edge information of the R·G color opponent feature

for generating the preference of a human face areas by intensifying the low level

features related with human faces in a SM. Moreover, in order to reject non-face areas

in the selected face candidate areas, we consider elliptical face contour shape informa-

tion based on symmetry axis for human face. Moreover, face inner form features are

also considered in order to reflect more complicate face form information, which is

implemented using an auto-associative multi-layer perceptron (AAMLP) model.

This paper is organized as follows; Section 2 describes the proposed face localiza-

tion model using bottom-up processing with face color task biased signals for the face

candidate areas and face form perception. The experimental results will be followed

in Section 3. Section 4 presents our conclusions and discussions.

2 Biologically Motivated Selective Attention Model for Localizing

Human Face

When humans pay attention to a target object, the prefrontal cortex gives a competi-

tive bias signal, related with the target object, to the infero-temporal (IT) and the V4

area [9]. Then, IT and the V4 area generates target object dependant information, and

this is transmitted

to the low-level processing part in order to make a filter for the

areas that satisfy the target object dependant features.

In the proposed model, therefore, we simply consider a skin bias color signal and

elliptical face contour shape information as the face specific top-down feedback bi-

ased signal for real time operation. Moreover, we considered more complicate face

inner form features.

We propose a face candidate localizer based on the biologically motivated bottom-

up SM model as shown in Fig 1. The bottom-up SM can preferably focus on face

candidate areas by a simple face-specific color bias filter using face color filtered

Biologically Motivated Face Selective Attention Model

955

Fig. 1. The proposed face candidate localizer based on the saliency map (SM) model; I: inten-

sity, E: edge, R G: red-green opponent coding feature, B Y: blue-yellow opponent coding

feature,

: intensity feature map, E : edge feature map, C : color feature map, SM: saliency

map, CSP: candidate salient point, SP: salient point

intensity, an R·G color opponent and edge information of the R·G color opponent

feature. Then, the candidate regions are checked how much the localized areas match

up the elliptical shape based on a symmetry axis and how similar trained face form

features.

2.1 Face Color Biased Selective Attention

In the bottom-up processing, the intensity, edge and color features are extracted in the

retina. These features are transmitted to the visual cortex through the lateral genicu-

late nucleus (LGN). While transmitting those features to the visual cortex, intensity,

edge, and color feature maps are constructed using the on-set and off-surround

mechanism of the LGN and the visual cortex. And those feature maps make a bottom-

up SM model in the laterial intral-parietal cortex (LIP) [10].

In order to implement a human-like visual attention function, we consider the sim-

plified bottom-up SM model [11]. In our approach, we use the SM model that reflects

the functions of the retina cells, the LGN and the visual cortex. Since the retina cells

can extract edge and intensity information as well as color opponency, we use these

factors as the basic features of the SM model [10-12]. In order to provide the pro-

posed model with face color preference property, the skin color filtered intensity fea-

ture is considered together with the original intensity feature. According to a given

task to be conducted, those two intensity features are differently biased. For face pref-

erable attention, a skin color filtered intensity feature works for a dominant feature in

generating an intensity feature map. The ranges of red(r), green(g), blue(b) for skin

956

W.-J. Won et al.

color filtering are obtained from a lot of natural sample face data. And the real color

components R, G, B, Y are extracted using normalized color coding [10].

Actually, considering the function of the LGN and the ganglian cells, we imple-

ment the on-center and off-surround operation by the Gaussian pyramid images with

different scales from 0 to n-th level, whereby each level is made by the sub-sampling

of 2

, thus it is able to construct four feature basis such as the intensity (I), and the

edge (E), and color (RG and BY) [11, 12]. This reflects the non-uniform distribution

of the retina-topic structure. Then, the center-surround mechanism is implemented in

the model as the difference operation between the fine and coarse scales of the Gaus-

sian pyramid images [11, 12]. Consequently, the three feature maps such as

,

E

, and

C can be obtained by the center-surround difference algorithm [11]. However, in this

paper, we simply consider only the R

·G color opponent features for the color feature

map and the edge feature map to intensify face areas as a bias signal.

A SM is generated by the summation of these three feature maps. The salient areas

are obtained by searching a maximum local energy with a fixed window size shifting

pixel by pixel in the SM. After obtaining the candidate salient points for human face,

a proper scale for the obtained areas is computed using entropy maximization ap-

proach [13].

2.2 Ellipse Fitting Based on Symmetry Axes

Fukushima’s neural network was to model a symmetry axis extraction mechanism

considering the human visual pathway. The model consists of a number layers con-

nected in a hierarchical manner: a contrast layer U

, an edge-extracting layer of a

simple type (U

), an edge-extracting layer of a complex type (U

), and a symmetry-

axis-extracting layer (U

) [14].

In Fukushima’s model, the output of cells in U

which resembles the function of

the ganglion cells sent to orientation quantization layer U

which resembles the func-

tion of simple cells in the primary visual cortex. And the output of layer U

is fed to

layer U

, where blurred version of the response of layer U

is generated, which re-

sembles the function of complex cells in the primary visual cortex. Finally the output

of cells of U

sent to U

layer which resembles the function of hyper complex cells to

analyze symmetry axis [14].

In our model, we extract symmetry axis for face candidate areas which are selected

by the simplified bottom-up face color preferable attention model. Thus, we can get

the edge feature in each candidate face area for layer of U

. In a different way of

Fukusima’s model, we applied quantization of the edge feature using edge and the

orientation of the edge in a face candidate area to generate the orientation feature for

the U

S

layer to reduce computation load. The orientation features sent to U

layer are

burred using Eq. (1)

| |

( , )

( )

(

, )

(

0,1,...,

1), (

0,1,...,

1)

Cm

Cm

Cm

s

v A

u

n k

g

v u n v k

k

K

m

M

<

⋅

−

∑

(1)

Biologically Motivated Face Selective Attention Model

957

where K is a quantization level of orientation and g

is a Gaussian filter with a ra-

dius of A

cm.

However, we use M level Gaussian pyramid images with fixed A

instead

of varying A

cm

for reducing computation load.

After extracting M level blurred orientation features in U

layer, the symmetry axis

is extracted in U

layer using Eq. (2).

]

( , )

{

(

( ,

)

( ,

))

( ,

)

( ,

}

(

0,1,...,

/ 2 1)

M

K

H

m

Cm

r

Cm

m

Cm

r

Cm

u

n k

u

n k

u

n k

u

n k

u

n k

k

K

−

⎡

⋅

+ +

−

⎢⎣

− ⋅

+ −

−

∑ ∑

(2)

where ( )

( , 0)

x

max x

and

m

are the positive parameters for determining how

much degree of asymmetry be allowed, k is the opposite orientation feature to k

number orientation feature, But if k

/ 2

k

k

k

= + −

. n is the pixel position to

get symmetry axis magnitude. That is,

( , )

n

x y

( ,

)

r

r

r

n

x y

and

( ,

)

l

l

l

n

x y

, in

which x, y, x

r

, and y

r

are represented by Eq. (3).

cos(

sin(

)

cos(

sin(

)

r

k

r

k

l

k

l

k

x

x

a

y

y

a

x

x a

y

y

a

= + ⋅

= − ⋅

(3)

where

k

k

K

, and a is the distance from current pixel position to another pixel

position to be compared for obtaining symmetry information.

Because the extracted symmetry axes in the U

layer are not unique line, we need

to find the main symmetry axis line. After finding the symmetry axis line by search-

ing in the U

layer, the main axis with a maximum length is selected among the sev-

eral symmetry axis lines.

Fig. 2 shows an example result of each layer for symmetry axis extraction. Here,

we set K=16,

=1,

=1.5,

m

=1, M=3,

2

a

m

= ×

to extract symmetry axis for a

face candidate area.

Finally, we reject non-face area through checking the length of symmetry axis line

and a matching degree between the ellipse shape obtained from the symmetry axis

and its orthogonal axis and the segmented face candidate area.

Download 12.42 Mb.

Do'stlaringiz bilan baham:

1 ... 80 81 82 83 84 85 86 87 88