Lecture Notes in Computer Science
Download 12.42 Mb. Pdf ko'rish
|
- Bu sahifa navigatsiya:
- Biologically Motivated Face Selective Attention Model
- Keywords
- 2 Biologically Motivated Selective Attention Model for Localizing Human Face
- 2.1 Face Color Biased Selective Attention
- 2.2 Ellipse Fitting Based on Symmetry Axes
[ff]
[fb] [hc] BS RF RF V1 V2 Fig. 5. (a) Schematic representation of the proposed filling-in model. (b) Simulation results of a computational V1 neuron using various length of horizontal bars. V1 neuron whose RF center is x is referred to as s θ (x, t). Presume that I ξ (x, t)
is constant around x and that σ is a small value. We obtain ∂ ∂t s θ (x, t) = ∂ 2 ∂θ 2 g ∗ I ∂ ∂t ∂ 2 ∂θ 2 I = ∂ 2 ∂θ 2 ∂ ∂t I (10) = ˜ κ ηθθ + ˜ μ ξθθ − ˜κ ηηθθ
− ˜κ ξξθθ
+ λ ∂ 2 ∂η 2 s θ . (11) Equation (11) indicates that a V1 neuron s θ is affected by ˜ κ ηθθ
, ˜ μ ξθθ and other terms. We expect that those terms be the outputs of V2 neurons. Because of page limitations, details are not presented in this article. However, we found that the value of ˜ μ ξθθ
is the sum of four neurons selective to about 27 ◦ angular difference of the V-shaped pattern or junctions as illustrated in Fig. 4(b). In addition, ˜ κ ηθθ
is the sum of neurons selective to patterns in Fig. 4(c). We found the angular selectivity in ˜ κ ηηθθ
and ˜ μ ξξθθ . The fifth term of (11) represents intra-cortical interaction between V1 neurons connected by horizontal connections [12]. Results show that our V1 model neurons, s θ (x, t), are affected by the output of V2 model neurons, which encode angular information of lines, and which are affected by V1 neurons through horizontal connections. Figure 5(a) depicts a schematic representation of our model formulated by (11). Comparing Fig. 5(a) to Fig. 1(d), we conclude that our computational model is consistent with the physiological abstract model. However, explicit intra-cortical connections in V2 are not emerged in (11). This problem will be addressed in future work. 5 Numerical Simulations First, numerical simulations of (9) are performed to investigate whether the expected filling-in pattern is obtained using Fig. 2(a) as the initial value of I. Parameter is λ = 0.1. Figure 2(d) is the steady state of I (filling-in pattern). We find an expected pattern of Fig. 2(d) in which a broken bar of Fig. 2(a) is completed.
950 S. Satoh and S. Usui Fig. 6. (a) A baboon with graffiti. (b) Inpainted (restored) baboon using the proposed visual model. (c) Almost half of visual information is missing. (d) Restored image by the proposed visual model. Next, we evaluate the effectiveness of our visual model as a digital image inpainting algorithm. Results are shown in Fig. 6 and Fig. 7. Areas B of Fig. 6(a) Fig. 6(c) are black curvy lines drawn by an author and checkered orange area, respectively. (Color results would be available with the electronic version of this article.) Simulation for color images are executed as follows: decompose a color image into three (R,G,B) intensity channels, apply (9) to each of color channels, and unify three steady states into one image. Restored images by our visual model are shown in Figs. 6(b) and 6(d). We find that our visual model is effective as a DII algorithm. The situation portrayed in Fig. 6(c) is possible, for example, in the case of block loss because of a packet drop during wireless transmission, gap padding for image magnification, and so on. We compare our model and Spot Healing Brush Tool of Adobe R Photoshop R CS2 (options are default settings). Neither method repaired texture areas as of baboon fur, but our model restores strong edges, whereas the Photoshop tool
Computational Understanding and Modeling of Filling-In Process 951
Fig. 7. (a) The black rectangle is area B to be filled in. (b) Result of the proposed visual model. (c) Result of Adobe R Photoshop R CS using default setting. gives a blurred image. The reason our model is not applicable to the textured area is that the evaluation function (9) contains no texture information. Finally, we simulate (11) to investigate whether our model neuron, s θ repro- duces the physiological result of Fig. 1(c). The widths of horizontal bars are two pixels; the length varies from 0 to 14 pixels by 1 pixel step. Parameters are θ = π/2 (not θ = 0) and σ = 1 such that the neuron s θ is selective to the horizontal bars. The receptive field of the simulated neuron overlaps BS area B. Figure 5(b) illustrates the steady values of s θ . We find consistency between physiological results and our model. One end of the bar appears from BS area B, as in Fig. 1(b4), when the bar length is greater than 9 (pixels). In this situa- tion, neuron s θ implicitly performs orientation detection for a completed bar like Fig. 1(a4) as its intrinsic filling-in process. For that reason, s θ shows a consid- erable increase in its activities when the bar length becomes greater than nine pixels.
6 Summary
To solve the filling-in problem, we employed two physiological findings for a vi- sual model and present novel aspects for those findings: variable separation and adiabatic approximation. Results showed physiological consistency and plausi- bility in our model, and evaluated the effectiveness as an algorithm for digital image inpainting. As a basis of computational modeling, standard regularization theory and the steepest descent method are used to expose the sort of problem our model solves or optimizes. Our visual model optimizes an evaluation function representing a priori knowledge of missing images. We obtained desired patterns and neural responses for bar stimulus. However, we have not yet answered the following question: why is adiabatic approximation between V1 and V2 suitable for the filling-in process? That remains as an open problem.
952 S. Satoh and S. Usui We should develop an appropriate means for texture filling-in. We expect that a new algorithm or visual models will be derived from theoretical aspects reflecting other neural properties from our fundamental functional. For example, a new functional including higher order image properties will be effective for texture filling-in. The functional E is defined by authors from theoretical viewpoints. An ex- citing challenge will include self-organization of E because E represents a priori knowledge of various kinds of images. It should reflect and represent statistical features of those images. References 1. Kamitani, Y., Shimojo, S.: Manifestation of scotomas created by transcranial mag- netic stimulation of the human visual cortex. Nature Neuroscience 2, 767–771 (1999) 2. Gerrits, H.J., Timmerman, G.J.: The filling-in process in patients with retinal scotoma. Vision Research 9, 439–442 (1969) 3. Gerrits, H.J., De Haan, B., Vendrik, A.J.: Experiments with retinal stabilized im- ages. Vision Research 6, 427–440 (1966) 4. Komatsu, H.: The neural mechanisms of perceptual filling-in. Nature Neuro- science 7, 200–231 (2006) 5. Komatsu, H., Kinoshita, M., Murakami, I.: Neural responses in the retinotopic representation of the blind spot in the macaque V1 to stimuli for perceptual filling in. J. Neuroscience 20, 9310–9319 (2000) 6. Matsumoto, M., Komatsu, H.: Neural responses in the macaque V1 to bar stimuli with various lengths presented on the blind spot. J. Neurophysiology 93, 2374–2387 (2005) 7. Hildreth, E.C.: The computation of the velocity field. Proc. R. Soc. Lond. B 221, 189–220 (1984) 8. Bertalmio, M., Sapiro, G., Caselles, V., Ballester, C.: Image inpainting. In: Proc. of SIGGRAPH 2000, pp. 417–424. ACM Press, New York (2000) 9. Rane, S.D., Shantanu, D., Sapiro, G., Bertalmio, M.: Structure and texture filling- in of missing image blocks in wireless transmission and compression applications. IEEE Trans. on Image Processing 12, 296–303 (2003) 10. Ito, M., Komatsu, H.: Representation of angles embedded within contour stimuli in area V2 of Macaque monkeys. J. Neuroscience 24, 3313–3324 (2004) 11. Young, R.A., Lesperance, R.M., Meyer, W.W.: The Gaussian derivative model for spatial-temporal vision: I. Cortical model. Spatial vision 14, 261–319 (2001) 12. Satoh, S., Usui, S.: Image reconstruction: another computational role of long-range horizontal connections in the primary visual cortex. Neural Computation (under review)
M. Ishikawa et al. (Eds.): ICONIP 2007, Part I, LNCS 4984, pp. 953–962, 2008. © Springer-Verlag Berlin Heidelberg 2008 Biologically Motivated Face Selective Attention Model Woong-Jae Won 1 , Young-Min Jang 2 , Sang-Woo Ban 3 , and Minho Lee 2
1 Dept. of Mechatronics Intelligent Vehicle Research Team, Daegu Gyeongbuk Institute of Science and Technology, 711 Hosan-dong Dalseo-Gu, Taegu 704-230, Korea wwj@dgist.ac.kr 2 School of Electrical Engineering and Computer Science, Kyungpook National University 1370 Sankyuk-Dong, Puk-Gu, Taegu 702-701, Korea ymjang@ee.knu.ac.kr, mholee@knu.ac.kr 3 Dept. of Information and Communication Engineering, Dongguk University 707 Seokjang-Dong, Gyeongju, Gyeongbuk, 780-714, Korea swban@dongguk.ac.kr Abstract.
In this paper, we propose a face selective attention model, which is based on biologically inspired visual selective attention for human faces. We consider the radial frequency information and skin color filter to localize a can- didate region of human face, which is to reflect the roles of the V4 and the infero-temporal (IT) cells. The ellipse matching based on symmetry axis is ap- plied to check whether the candidate region contain a face contour feature. Fi- nally, face detection is conducted by face form perception model implemented by an auto-associative multi-layer perceptron (AAMLP) that mimics the roles of faces selective cells in IT area. Based on both the face-color preferable atten- tion and face-form perception mechanism, the proposed model shows plausible performance for localizing face candidates in real time. Keywords: Face selective attention, biologically motivated selective attention, saliency map. 1 Introduction Recently, the social development mechanism has been considered for Autonomous Mental Development (AMD) in construction of more intelligent robots [1, 2, 3]. It might be possible if the robots can increase their own knowledge through interaction with environment and human like human does. In order to embody the intelligent robot with the social development concept, we need to implement more human-like sensors such as retina, electronic nose, touch, smell and acoustic sensors to the ma- chines. Also, we need to develop an intelligent model in order to pay attention to interesting objects by primitive sensory information. Furthermore, it is important that human and environment can share their knowledge by interactive ways [1, 2, 3]. In order to implement a truly human-like robot system, face detection is one of the most important functions for realizing social development mechanism [1, 2, 3]. Hu- man babies learn from their mother after focusing their eyes to mother’s face, and they can feel emotions and get social functions through experience with learning. No conventional face attention system has shown comparable performance with the
954 W.-J. Won et al. system of a human being yet. Recently, biologically motivated approaches have been developed by L. Itti., T. Poggio, and C. Koch [4, 5, 6]. Some research groups have developed human-like intelligent robots using these kinds of approaches [2, 3, 7]. And, an attention model was introduced for face detection [8]. However, they have not shown plausible results for the face attention problem in complex scenes until now. In this paper, we propose a real time face candidates localizer as we simply imitate the function of human visual pathway based on a biologically motivated selective attention mechanism, which can focus on a face preferentially and reject non-face areas in order to implement social developmental robot vision system. When a task is given to find a specific object, not only the features for the saliency map (SM) in the bottom-up processing should be biased as differently weighted color features but also the task specific shape feature should be feedback from top-down process to reject non-interesting area. If the specific task is to find a face, a skin color characteristic of human faces can be considered as the dominant features in the selec- tive attention model to intensify the face areas, and also a face shape can be consid- ered to reject non-face area. Thus, we simply consider color biased information, which is a color filtered inten- sity, an R·G color opponent and edge information of the R·G color opponent feature for generating the preference of a human face areas by intensifying the low level features related with human faces in a SM. Moreover, in order to reject non-face areas in the selected face candidate areas, we consider elliptical face contour shape informa- tion based on symmetry axis for human face. Moreover, face inner form features are also considered in order to reflect more complicate face form information, which is implemented using an auto-associative multi-layer perceptron (AAMLP) model. This paper is organized as follows; Section 2 describes the proposed face localiza- tion model using bottom-up processing with face color task biased signals for the face candidate areas and face form perception. The experimental results will be followed in Section 3. Section 4 presents our conclusions and discussions.
When humans pay attention to a target object, the prefrontal cortex gives a competi- tive bias signal, related with the target object, to the infero-temporal (IT) and the V4 area [9]. Then, IT and the V4 area generates target object dependant information, and this is transmitted
to the low-level processing part in order to make a filter for the areas that satisfy the target object dependant features. In the proposed model, therefore, we simply consider a skin bias color signal and elliptical face contour shape information as the face specific top-down feedback bi- ased signal for real time operation. Moreover, we considered more complicate face inner form features. We propose a face candidate localizer based on the biologically motivated bottom- up SM model as shown in Fig 1. The bottom-up SM can preferably focus on face candidate areas by a simple face-specific color bias filter using face color filtered
Biologically Motivated Face Selective Attention Model 955
sity, E: edge, R G: red-green opponent coding feature, B Y: blue-yellow opponent coding feature,
I : intensity feature map, E : edge feature map, C : color feature map, SM: saliency map, CSP: candidate salient point, SP: salient point intensity, an R·G color opponent and edge information of the R·G color opponent feature. Then, the candidate regions are checked how much the localized areas match up the elliptical shape based on a symmetry axis and how similar trained face form features.
In the bottom-up processing, the intensity, edge and color features are extracted in the retina. These features are transmitted to the visual cortex through the lateral genicu- late nucleus (LGN). While transmitting those features to the visual cortex, intensity, edge, and color feature maps are constructed using the on-set and off-surround mechanism of the LGN and the visual cortex. And those feature maps make a bottom- up SM model in the laterial intral-parietal cortex (LIP) [10]. In order to implement a human-like visual attention function, we consider the sim- plified bottom-up SM model [11]. In our approach, we use the SM model that reflects the functions of the retina cells, the LGN and the visual cortex. Since the retina cells can extract edge and intensity information as well as color opponency, we use these factors as the basic features of the SM model [10-12]. In order to provide the pro- posed model with face color preference property, the skin color filtered intensity fea- ture is considered together with the original intensity feature. According to a given task to be conducted, those two intensity features are differently biased. For face pref- erable attention, a skin color filtered intensity feature works for a dominant feature in generating an intensity feature map. The ranges of red(r), green(g), blue(b) for skin
956 W.-J. Won et al. color filtering are obtained from a lot of natural sample face data. And the real color components R, G, B, Y are extracted using normalized color coding [10]. Actually, considering the function of the LGN and the ganglian cells, we imple- ment the on-center and off-surround operation by the Gaussian pyramid images with different scales from 0 to n-th level, whereby each level is made by the sub-sampling of 2
n , thus it is able to construct four feature basis such as the intensity (I), and the edge (E), and color (RG and BY) [11, 12]. This reflects the non-uniform distribution of the retina-topic structure. Then, the center-surround mechanism is implemented in the model as the difference operation between the fine and coarse scales of the Gaus- sian pyramid images [11, 12]. Consequently, the three feature maps such as I ,
, and C can be obtained by the center-surround difference algorithm [11]. However, in this paper, we simply consider only the R ·G color opponent features for the color feature map and the edge feature map to intensify face areas as a bias signal. A SM is generated by the summation of these three feature maps. The salient areas are obtained by searching a maximum local energy with a fixed window size shifting pixel by pixel in the SM. After obtaining the candidate salient points for human face, a proper scale for the obtained areas is computed using entropy maximization ap- proach [13]. 2.2 Ellipse Fitting Based on Symmetry Axes Fukushima’s neural network was to model a symmetry axis extraction mechanism considering the human visual pathway. The model consists of a number layers con- nected in a hierarchical manner: a contrast layer U G , an edge-extracting layer of a simple type (U S ), an edge-extracting layer of a complex type (U C ), and a symmetry- axis-extracting layer (U H ) [14]. In Fukushima’s model, the output of cells in U G
which resembles the function of the ganglion cells sent to orientation quantization layer U S which resembles the func- tion of simple cells in the primary visual cortex. And the output of layer U S is fed to layer U C , where blurred version of the response of layer U C is generated, which re- sembles the function of complex cells in the primary visual cortex. Finally the output of cells of U C sent to U h layer which resembles the function of hyper complex cells to analyze symmetry axis [14]. In our model, we extract symmetry axis for face candidate areas which are selected by the simplified bottom-up face color preferable attention model. Thus, we can get the edge feature in each candidate face area for layer of U G . In a different way of Fukusima’s model, we applied quantization of the edge feature using edge and the orientation of the edge in a face candidate area to generate the orientation feature for the U S
C layer are burred using Eq. (1) | |
( , ) ( )
( , )
( 0,1,...,
1), ( 0,1,...,
1) Cm Cm Cm s v A u n k g v u n v k k K m M < = ⋅ + = − = − ∑ (1)
Biologically Motivated Face Selective Attention Model 957 where K is a quantization level of orientation and g cm is a Gaussian filter with a ra- dius of A
However, we use M level Gaussian pyramid images with fixed A cm instead of varying A
for reducing computation load. After extracting M level blurred orientation features in U C layer, the symmetry axis is extracted in U H layer using Eq. (2). ] 1 1 1 0 0 1 ( , )
{ ( ( , ) ( ,
)) ( ,
) ( ,
} , ( 0,1,..., / 2 1)
M K H m Cm r Cm m Cm r Cm u n k u n k u n k u n k u n k k K κ κ κ ϕ β γ κ κ δ κ κ − − = = ⎡ = ⋅ + +
− ⎢⎣ − ⋅ + − − = − ∑ ∑
(2)
where ( ) ( , 0)
x max x ϕ = , κ δ and m β are the positive parameters for determining how much degree of asymmetry be allowed, k is the opposite orientation feature to k number orientation feature, But if k k = , / 2 k k k κ = + − . n is the pixel position to get symmetry axis magnitude. That is, ( , )
= , ( , )
r r n x y = and ( , )
l l n x y = , in which x, y, x r , and y r
are represented by Eq. (3). cos( ), sin( ) cos(
), sin(
) r k r k l k l k x x a y y a x x a y y a α α α α = + ⋅ = + ⋅ = − ⋅
= − ⋅
(3) where 2 / k k K α π = , and a is the distance from current pixel position to another pixel position to be compared for obtaining symmetry information. Because the extracted symmetry axes in the U H layer are not unique line, we need to find the main symmetry axis line. After finding the symmetry axis line by search- ing in the U H layer, the main axis with a maximum length is selected among the sev- eral symmetry axis lines. Fig. 2 shows an example result of each layer for symmetry axis extraction. Here, we set K=16, κ γ =1, κ δ =1.5, m β =1, M=3, 2 a m = ×
to extract symmetry axis for a face candidate area. Finally, we reject non-face area through checking the length of symmetry axis line and a matching degree between the ellipse shape obtained from the symmetry axis and its orthogonal axis and the segmented face candidate area.
Download 12.42 Mb. Do'stlaringiz bilan baham: |
ma'muriyatiga murojaat qiling