Lecture Notes in Computer Science

bet	19/88
Sana	16.12.2017
Hajmi	12.42 Mb.
	#22381

1 ... 15 16 17 18 19 20 21 22 ... 88

4 Concluding Remarks

3 Numerical Results

The embedded vectors are set to the binary random vectors as follows.

e

(r)

(r)

(1 i N,1 r L)

(12)

where

(r )

(1 i

N ,1 r

L) are the zero-mean pseudo-random numbers between

-1 and +1.

For simplicity, the activation function , eq.(1), is assumed to be a piece-

wise linear function instead of the previous signum form for the binary embedded

vectors[25] and set to

=f (

i

)=

1+sgn 1-

+sgn

1-sgn 1-

(13)

where denotes the signum function

sgn

•

( )

defined by

sgn x =

-1 (x<0)

0 (x=0)

+1 (x>0)

(14)

192 M.

Nakagawa

The initial vector

s

i

(0)

(1 i

N ) is set to

(0)=

-e

(s)

(1 i H

)

+e

(s)

+1 i N)

(15)

where

(r )

is a target pattern to be retrieved and

H

d

is the Hamming distance

between the initial vector

s

i

(0)

and a target vector

e

(s)

The retrieval is succeeded if

(s)

(t ) =

i=1

†(s)

(t )

(16)

results in

±1

for

, in which the system may be in a steady state such that

(t+1)=s

(t) ,

(17a)

(t+1)=

(t) .

(17b)

To see the retrieval ability of the present model, the success rate

S

r

is defined as the

rate of the success for 1000 trials with the different embedded vector sets

e

(r )

(1 i

N ,1 r

L) .

To control from the autocorrelation dynamics after the

initial state (t~1) to the entropy based dynamics (t~

T

max

) , the parameter

in eq.(10)

was simply controlled by

= t

max

(0 t T

max

) ,

(18)

where

max

and

max

are the maximum values of the iterations of the updating

according to eq.(10) and , respectively.

Then we shall present the dependence of the success rate

S

r

on the loading rate

= L / N

are depicted in Figs.2 (a) and (b) for

H

d

/ N

= 0.3

, N =100 for the entropy

approach and the associatron, respectively. From these results, one may confirm the

larger memory capacity of the presently proposed model defined by eq.(10) in

A Generalised Entropy Based Associative Model

193

Choosing N =200,

η =

1, T

max

25, L/N=0.5 and

max

1, we first present an

example of the dynamics of the overlaps in Figs.1(a) and (b) (Entropy based

approach). Therein the cross symbols(

) and the open circles(o) represent the

success of retrievals, in which eqs.(5a) and (5b) are satisfied, and the entropy defined

by eq.(2), respectively, for a retrieval process. In addition the time dependence of the

parameter

max

defined by eq.(18) are depicted as dots ( i ). In Fig. 1 after a

transient state, it is confirmed that the complete association corresponding to eqs.(5a)

and (5b) can be achieved.

-1

-0.8

-0.6

-0.4

-0.2

0.2

0.4

0.6

0.8

1

N=100 Ns=100 Tmax=50 k=0 Hd=10 idia=1 iana=1 ictl=1 iotg=1 ient=1 izero=0 alpmax=1

Overlaps < o(n) >

(a)

H

d

/ N

= 0.1

-1

-0.8

-0.6

-0.4

-0.2

0.2

0.4

0.6

0.8

1

N=100 Ns=100 Tmax=50 k=0 Hd=30 idia=1 iana=1 ictl=1 iotg=1 ient=1 izero=0 alpmax=1

Overlaps < o(n) >

(b)

H

d

/ N

= 0.3

Fig. 1. The time dependence of overlaps

(r )

of the present entropy based model defined by

eq.(10)

194 M.

Nakagawa

Success Rate

MemCap=

0.9999

Hd/N=

0.3

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

N=100 Ns=100 Tmax=50 k=0 Hd=30 idia=1 iana=1 ictl=1 iotg=1 ient=1 izero=0 alpmax=1

L/N

Success Rate Sr(L/N)

(a) Entropy based Model defined by eq.(10)

Success Rate

MemCap=

0.0134

Hd/N=

0.3

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

N=100 Ns=100 Tmax=50 k=0 Hd=30 idia=1 iana=1 ictl=1 iotg=1 ient=0 izero=0 alpmax=1

L/N

Success Rate Sr(L/N)

(b) Conventional Associatron Model defined by eq.(11)

Fig. 2. The dependence of the success rate on the loading rate

L / N

of the present

entropy based model defined by eqs.(10) and (11). Here the Hamming distance is set

to

H

/ N

0.3

A Generalised Entropy Based Associative Model

195

4 Concluding Remarks

To conclude this work, we shall show the dependence of the storage capacity,

which is defined as the area covered in terms of the success rate curves as shown in

Fig.3 , on the Hamming distance in Fig.3 for the analogue embedded vectors (Ana) as

well as the previous binary ones (Bin). In addition OL and CL imply the orthogonal

learning model and the autocorrelation learning model, respectively. Therein one

may see again the great advantage of the present model based on the entropy

functional to be minimized beyond the conventional quadratic form [12,13] even for

comparison with the conventional autoassociation model defined by eq.(11).

practice, it is found that the present approach may achieve the high memory capacity

beyond the conventional autocorrelation strategy even for the analogue embedded

vectors as well as the previously concerned binary case [15,16,25].

0.010.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Memory Capacity

Hd/N

a

Entropy based Model (OL:Ana)

Entropy based Model (OL:Bin)

Entropy based Model (CL:Bin)

Associatron(OL:Bin)

Associatron(OL:Wii=0:Bin)

the analogue embedded vectors. In fact one may realize the considerably larger

storage capacity in the present model in comparison with the associatron over

/ N

0.5

The memory retrievals for the associatron based on the quadratic

Fig. 3. The dependence of the storage capacity on the Hamming distance. Here symbols a, m

and n are for the entropy based approach with eq. (10) as well as the orthogonal learning (OL)

and the autocorrelation learning (CL) [16,17], in which Ana and Bin imply the analogue

embedded vectors and the binary ones, respectively. In addition we presented the associatron in

symbols s with the orthogonal learning [13], and the associatron in symbols with orthogonal

learning under the condition

w

ii

[12], respectively.

196 M.

Nakagawa

t

In the present paper, we have proposed an entropy based association model instead of

the conventional autocorrelation dynamics. From numerical results, it was found that

the large memory capacity may be achieved on the basis of the entropy approach.

This advantage of the association property of the present model is considered to result

from the fact such that the present dynamics to update the internal state eq.(10)

assures that the entropy, eq.(2) is minimized under the conditions, eqs.(5a) and (5b),

which corresponding to the succeeded retrieval of a target pattern. In other words,

the higher-order correlations in the presently proposed dynamics, eq.(10), which was

ignored in the conventional approaches, [1-11] was found to play an important role to

improve memory capacity, or the retrieval ability.

Lyapunov

functionals

minim ized

become

troublesome

near

H d / N = 0

seen in Fig.3 since the directional cosine between the initial vector and a target

pattern eventually vanishes therein.

Remarkably, even in such a case, the present

model attains a remarkably large memory capacity because of the higher-order

correlations involved in eq.(10)

expected from Figs. 1 and 2 for the analogue

vectors as well as the binary ones previously investigated [15,16,25].

As a future problem, it seems to be worthwhile to involve a chaotic dynamics in

the present model introducing a periodic activation function such as sinusoidal one as

a nonmonotonic activation function

[14]

The

entropy

based

approach [15] with

chaos dynamics [14] is now in progress and will be reported elsewhere together with

the synergetic models [17-24] in the near future.

References

1. Anderson, J.A.: A Simple Neural Network Generating Interactive Memory. Mathematical

Biosciences 14, 197–220 (1972)

2. Kohonen, T.: Correlation Matrix Memories. IEEE Transaction on Computers C-21, 353–

359 (1972)

3. Nakano, K.: Associatron-a Model of Associative Memory. IEEE Trans. SMC-2, 381–388

(1972)

4. Amari, S.: Neural Theory of Association and Concept Formation. Biological

Cybernetics 26, 175–185 (1977)

5. Amit, D.J., Gutfreund, H., Sompolinsky, H.: Storing Infinite Numbers of Patternsin a Spin-

glass Model of Neural Networks. Physical Review Letters 55, 1530–1533 (1985)

6. Gardner, E.: Structure of Metastable States in the Hopfield Model. Journal of Physics A19,

L1047–L1052 (1986)

7. Kohonen, T., Ruohonen, M.M.: Representation of Associated Pairs by Matrix Operators.

IEEE Transaction C-22, 701–702 (1973)

8. Amari, S., Maginu, K.: Statistical Neurodynamics of Associative Memory. Neural

Networks 1, 63–73 (1988)

9. Morita, M.: Neural Networks. Associative Memory with Nonmonotone Dynamics 6, 115–

126 (1993)

10. Yanai, H.-F., Amari, S.: Auto-associative Memory with Two-stage Dynamics of non-

monotonic neurons. IEEE Transactions on Neural Networks 7, 803–815 (1996)

11. Shiino, M., Fukai, T.: Self-consistent Signal-to-noise Analysis of the Statistical Behaviour

of Analogu Neural Networks and Enhancement of the Storage Capacity. Phys. Rev. E48,

867 (1993)

12. Kanter, I., Sompolinski, H.: Associative Recall of Memory without Errors. Phys. Rev.

A 35, 380–392 (1987)

13. Personnaz, L., Guyon, I., Dreyfus, D.: Information Storage and Retrieval in Spin-Glass

like Neural Networks. J. Phys(Paris) Lett. 46, L-359 (1985)

14. Nakagawa, M.: Chaos and Fractals in Engineering, p. 944. World Scientific Inc.,

Singapore (1999)

15. Nakagawa, M.: Autoassociation Model based on Entropy Functionals. In: Proc. of NOLTA

2006, pp. 627–630 (2006)

16. Nakagawa, M.: Entropy based Associative Model. IEICE Trans. Fundamentals EA-89(4),

895–901 (2006)

A Generalised Entropy Based Associative Model

197

17. Fuchs, A., Haken, H.: Pattern Recognition and Associative Memory as Dynamical

Processes in a Synergetic System I. Biological Cybernetics 60, 17–22 (1988)

18. Fuchs, A., Haken, H.: Pattern Recognition and Associative Memory as Dynamical

Processes in a Synergetic System II. Biological Cybernetics 60, 107–109 (1988)

19. Fuchs, A., Haken, H.: Dynamic Patterns in Complex Systems. In: Kelso, J.A.S., Mandell,

A.J., Shlesinger, M.F. (eds.), World Scientific, Singapore (1988)

20. Haken, H.: Synergetic Computers and Cognition. Springer, Heidelberg (1991)

21. Nakagawa, M.: A study of Association Model based on Synergetics. In: Proceedings of

International Joint Conference on Neural Networks 1993 NAGOYA, JAPAN, pp. 2367–

2370 (1993)

22. Nakagawa, M.: A Synergetic Neural Network. IEICE Fundamentals E78-A, 412–423

(1995)

23. Nakagawa, M.: A Synergetic Neural Network with Crosscorrelation Dynamics. IEICE

Fundamentals E80-A, 881–893 (1997)

24. Nakagawa, M.: A Circularly Connected Synergetic Neural Networks. IEICE

Fundamentals E83-A, 881–893 (2000)

25. Nakagawa, M.: Entropy based Associative Model. In: Proceedings of ICONIP 2006, pp.

397–406. Springer, Heidelberg (2006)

198 M.

Nakagawa

The Detection of an Approaching Sound Source

Using Pulsed Neural Network

Kaname Iwasa

, Takeshi Fujisumi

, Mauricio Kugler

, Susumu Kuroyanagi

Akira Iwata

, Mikio Danno

, and Masahiro Miyaji

Nagoya Institute of Technology, Gokiso-cho, Showa-ku, Nagoya, 466-8555, Japan

kaname@mars.elcom.nitech.ac.jp

Toyota InfoTechnology Center, Co., Ltd,

6-6-20 Akasaka, Minato-ku, Tokyo, 107-0052, Japan

Toyota Motor Corporation,

1 Toyota-cho, Toyota, Aichi, 471-8572, Japan

Abstract. Current automobiles’ safety systems based on video cameras

and movement sensors fail when objects are out of the line of sight. This

paper proposes a system based on pulsed neural networks able to detect

if a sound source is approaching a microphone or moving away from it.

The system, based on PN models, compares the sound level diﬀerence

between consecutive instants of time in order to determine its relative

movement. Moreover, the combined level diﬀerence information of all fre-

quency channels permits to identify the type of the sound source. Exper-

imental results show that, for three diﬀerent vehicles sounds, the relative

movement and the sound source type could be successfully identiﬁed.

Introduction

Driving safety is one of the major concerns of the automotive industry nowa-

days. Video cameras and movement sensors are used in order to improve the

driver’s perception of the environment surrounding the automobile [1][2]. These

methods present good performance when detecting objects (e.g., cars, bicycles,

and people) which are in line of sight of the sensor, but fail in case of obstruction

or dead angles. Moreover, the use of multiple cameras or sensors for handling

dead angles increases the size and cost of the safety system.

The human being, in contrast, is able to perceive people and vehicles around

itself by the information provided by the auditory system [3]. If this ability could

be reproduced by artiﬁcial devices, complementary safety systems for automo-

biles would emerge. Cause of diﬀraction, sound waves can contour objects and

be detected even when the source is not in direct line of sight.

A possible approach for processing temporal data is the use of Pulsed Neuron

(PN) models [4]. This type of neuron deals with input signals on the form of

pulse trains, using an internal membrane potential as a reference for generating

pulses on its output. PN models can directly deal with temporal data and can be

eﬃciently implemented in hardware, due to its simple structure. Furthermore,

M. Ishikawa et al. (Eds.): ICONIP 2007, Part I, LNCS 4984, pp. 199–208, 2008.

c Springer-Verlag Berlin Heidelberg 2008

200

K. Iwasa et al.

high processing speeds can be achieved, as PN model based methods are usually

highly parallelizable.

A sound localization system based on pulsed neural networks has already

being proposed in [5] and a sound source identiﬁcation system, with a corre-

sponding implementation on FPGA, was introduced in [6]. This paper focuses

speciﬁcally on the relative moving direction of a sound emitting object, and pro-

poses a method to detect if a sound source is approaching or moving away from

it using a microphone. The system, based on PN models, compares the sound

level diﬀerence between consecutive instants of time in order to determine its

relative movement. Moreover, the proposed method also identiﬁes the type of the

sound source by the use of PN model based competitive learning pulsed neural

network for processing the spectral information.

Pulsed Neuron Model

When processing time series data (e.g., sound), it is important to consider the

time relation and to have computationally inexpensive calculation procedures

to enable real-time processing. For these reasons, a PN model is used in this

research.

I(t)

1

2

(t)

Input Pulses

Output Pulses

A Local Membrane

Potential

The Inner Potential

of the Neuron

(t)

o(t)

(t)

Fig. 1. Pulsed neuron model

Figure 1 shows the structure of the PN model. When an input pulse IN

(t)

reaches the k

synapse, the local membrane potential p

(t) is increased by the

value of the weight w

. The local membrane potentials decay exponentially with

a time constant τ

across time. The neuron’s output o(t) is given by

o(t) = H(I(t) − θ)

(1)

I(t) =

k=1

(t)

(2)

(t) = w

(t) + p

(t − 1)e

−

(3)

The Detection of an Approaching Sound Source

201

where n is the total number of inputs, I(t) is the inner potential, θ is the threshold

and H(·) is the unit step function. The PN model also has a refractory period

ndti

, during which the neuron is unable to ﬁre, independently of the membrane

potential.

The Proposed System

The basic structure of the proposed system is shown in Fig.2. This system con-

sists of three main blocks, the frequency-pulse converter, the level diﬀerence

extractor and the sound source classiﬁer, from which the last two are based on

PN models.

The relative movement (approaching or moving away) of the sound source is

determined by the sound level variation. The system compares a signal level x(t)

from a microphone with the level in a previous time x(t−Δt). If x(t) > x(t−Δt),

the sound source is getting closer to a microphone, if x(t) < x(t−Δt), it is moving

away. After the level diﬀerence having been extracted, the outputs of the level

diﬀerence extractors contain the spectral pattern of the input sound, which is

then used for recognizing the type of the source.

3.1

Filtering and Frequency-Pulse Converter

Initially, the input signal must be pre-processed and converted to a train of

pulses. A bank of 4

order band-pass ﬁlters decomposes the signal in 13 fre-

quency channels equally spaced in a logarithm scale from 500 Hz to 2 kHz. Each

frequency channel is modiﬁed by the non-linear function shown in Eq.(4), and

the resulting signal’s envelope is extracted by a 400 Hz low-pass ﬁlter. Finally,

Input Signal

Level Difference

Extractor

Approaching Detection & Sound Classification

Sound Source

Classifier

Time Delay

x(t)

Filter Bank &

Frequency - Pulse

Converter

Level Difference

Extractor

Time Delay

x(t)

x(t- t)

Level Difference

Extractor

Time Delay

x(t)

f

1

x(t- t)

Fig. 2. The structure of the recognition system

202

K. Iwasa et al.

each output signal is independently converted to a pulse train, whose rate is

proportional to the amplitude of the signal.

F (t) =

x(t)

x(t) ≥ 0

x(t)

x(t) < 0

(4)

3.2

Level Diﬀerence Extractor

Each pulse trains generated by the Frequency-Pulse converter is inputted in a

Level Diﬀerence Extractor (LDE) independently. The LDE, shown in Fig. 3, is

composed by two parts, the Lateral Superior Olive (LSO) model and the Level

Mapping Two (LM2) model [7]. In LSO model and LM2 model, each neurons

work as Eq.(3). The LSO is responsible for the time diﬀerence extraction itself,

while the LM2 extracts the envelope of the complex ﬁring pattern.

Each pulse train correspondent to each frequency channel is inputted in a LSO

model. The PN potential of f

channel, i

LSO neuron I

LSO

i,f

(t) is calculated

as follows:

LSO

i,f

(t) = p

i,f

(t) + p

i,f

(t)

(5)

i,f

(t) = w

i,f

(t) + p

i,f

(t − 1)e

−

τLSO

(6)

i,f

(t) = w

i,f

(t − Δt) + p

i,f

(t − 1)e

−

τLSO

(7)

where τ

LSO

is the time constant of the LSO neuron and the weights w

i,f

and

i,f

are deﬁned as:

N

i,f

⎧

⎪

⎨

⎪

⎩

0.0

i = 0

1.0

i > 0

−10

−b < i < 0

−10

−(K−i)

i ≤ −b

i,f

⎧

⎪

⎨

⎪

⎩

0.0

i = 0

1.0

i < 0

−10

0 < i < b

−10

K−i

i ≥ −b ,

(8)

where α, γ are parameters for adjustment of the weights K is the index of

the last neuron of each side of the LSO (totalizing 2K + 1 neurons, including

the central neuron) and b is the index of the last inner neuron of each side of the

LSO. The inner neurons have current input weights smaller than delayed input

weights. They are used to make a feature of the input level diﬀerence clear when

the input level diﬀerence is small.

As larger the signal becomes, more neurons ﬁre on the LSO model. The LM2

stage then generates a clearer output, extracting the envelope of the ﬁring pat-

tern generated by the LSO. The potentials in the LM2 are calculated as follows:

LM 2

l,f

(t) = p

l,f

(t) + p

l,f

(t)

(9)

l,f

(t) = m

i,f

(t) + p

l,f

(t − 1)e

−

τLM2

(10)

l,f

(t) = −m

i,f

(t) + p

l,f

(t − 1)e

−

τLM2

(11)

where τ

LM 2

is the time constant of the LM2 neuron and m

i,f

(t) is the output

of the i

LSO neuron in f

frequency channel.

The Detection of an Approaching Sound Source

203

Fig. 3. Level diﬀerence extractor

3.3

Sound Source Classiﬁer

The sound source classiﬁer is based on the Competitive Learning Network using

Pulsed Neurons (CONP) proposed in [5]. The basic structure of CONP is shown

in Fig.4.

The CONP is constructed on PN models. In the learning process of CONP,

the neuron with the most similar weights to the input (winner neuron) should

be chosen for learning in order to obtain a topological relation between inputs

and outputs. However, in the case of two or more neurons ﬁring, it is diﬃcult

to decide which one is the winner, as their outputs are only pulses, and not real

values. In order to this, CONP has extra external units called control neurons.

Based on the output of the Competitive Learning (CL) neurons, the control neu-

rons’ outputs increase or decrease the inner potential of all CL neurons, keeping

the number of ﬁring neurons equal to one. Controlling the inner potential is

equivalent to controlling the threshold. Two types of control neurons are used

in this work. The No-Firing Detection (NFD) neuron ﬁres when no CL neuron

ﬁres, increasing their inner potential. Complementarily, the Multi-Firing Detec-

tion (MFD) neuron ﬁres when two or more CL neurons ﬁre at the same time,

decreasing their inner potential [5].

The CL neurons are also controlled by another potential, named the input

potential p

(t), and a gate threshold θ

gate

. The input potential is calculated as

the sum of the inputs (with unitary weights), representing the rate of the input

pulse train. When p

(t) < θ

gate

, the CL neurons are not updated by the control

neurons and become unable to ﬁre, as the input train has a too small potential

for being responsible for an output ﬁring. Furthermore, the input potential of

each CL neuron is decreased along time by a factor β, to follow rapid changes

on the inner potential and improving its adjustment.

204

K. Iwasa et al.

Fig. 4. Competitive Learning Network using Pulsed Neurons (CONP)

Considering all the described adjustments on the inner potential of CONP

neurons, the output equation (3) of each CL neurons becomes:

o(t) = H

k=1

(t) − θ

+ p

nf d

(t) − p

mf d

(t) − β · p

(t)

(12)

where p

nf d

(t) and p

mf d

(t) corresponds respectively to the potential generated by

NFD and MFD neurons’ outputs, p

(t) is the input potential and β (0 ≤ β ≤ 1)

is a parameter.

Experimental Results

Three diﬀerent sound sources were used on the experiments: “police car”, “am-

bulance” and “scooter”. The ﬁrst two correspond to the alarm sounds of the

vehicles, while the last corresponds to the engine sound of a scooter. All the sig-

nals were recorded from a static sound source. The moving sound source signals

were generated by computer, with the sound intensity at each instant of time

calculated as:

S(t) = 20S

log

d(t)

(13)

where I

is a sound intensity in the center position, d

and d(t) are, respectively,

the distance between the sensor and the sound source at center position and

the distance at time t. All signal have 4.0 s of duration and the sound source is

normal to the sensor at 2.0 s, as shown in Fig. 5.

The Detection of an Approaching Sound Source

205

Microphone

Sound Source

= 1m

0.0s

2.0s

4.0s

d(t)

S(t)

Fig. 5. Sound source movement on experiments

Table 1. Parameters of each module used on the experiments

Input Sound

Sampling frequency

48 kHz

Quantization bits

16 bits

Number of frequency channels

13 channels

Delay time

Δt

0.4 s

Level Diﬀerence Extractor

Number of total LSO neurons 2

K + 1

51 units

Number of inner LSO neurons 2

b + 1

11 units

Number of output neurons

48 units

Threshold

LSO

/θ

0.001 / 0.001

Time constant

LSO

/τ

0.1 s / 35.0

μs

Parameter

α/β

60 / 60

4.1

Level Diﬀerence Information Extraction

The level diﬀerence information was extracted as described in section 3.2. The

used parameters for the signal acquisition, preprocessing and level diﬀerence

extraction are shown in Table 1.

Figure 6 shows the output of the LDE model for the “police car” signal in

four distinct intervals of time. The x-axis corresponds to the index of the neu-

rons in the LM2, representing the level diﬀerence information, and the y-axis

corresponds to the frequency channels. The gray level intensity represents the

rate of the output pulse train.

The ﬁring pattern diﬀers signiﬁcantly from each interval of time, especially

when comparing the graphics of opposite relative movements. Although the LM2

could not successfully extract the envelope from the ﬁring pattern of the signals

corresponding to a sound source moving away from the sensor, the result is

enough clear for distinguishing it from an approaching sound source signal.

Figure 7 shows the ﬁring patterns of each kind of sound for the approaching

(interval of 0.0

∼ 2.0 s) and moving away (2.0 ∼ 4.0 s) cases. As diﬀerent fre-

quency components present diﬀerent ﬁring information, it is possible to classify

the sound source, as described in the next section.

206

K. Iwasa et al.

Fig. 6. Level Diﬀerence Extractor output of the “police car” dataset

Fig. 7. Comparing the output of level diﬀerence information for each dataset

The Detection of an Approaching Sound Source

207

Table 2. Parameters of CONP used on the experiments

Competitive learning Neuron

Number of Inputs of CL neurons

637 units

Number of CL neurons

30 units

Threshold

1.0

×10

−4

Gating threshold

gate

150.0

Rate for input pulse frequency

0.0629

Time constant

0.1 s

Refractory period

ndti

10 ms

Learning coeﬃcient

2.0

×10

−8

Learning iterations

1000

Control Neurons(NFD/MFD)

Time constant

N F D

/τ

M F D

0.5 ms / 1.0 ms

Threshold

N F D

/θ

M F D

-1.0

×10

−3

/ 2.0

Connection weight

to each CL neurons

16.0 / -16.0

Table 3. Results of sound recognition

(A = approaching, M = moving away)

Recognition Rate[%]

police car

ambulance

scooter

Input Sound

police

70.6

6.8

2.4

7.3

12.9

0.0

6.8

88.3

0.0

4.9

0.0

ambulance

1.1

4.2

82.8

9.9

2.0

0.0

3.8

0.2

7.3

86.3

0.0

2.4

scooter

0.0

5.7

0.0

94.3

0.0

1.9

0.3

5.4

0.0

92.4

4.2

Sound Source Classiﬁcation

The ﬁring information patterns provided by all the level diﬀerence extractors

are recognized by the CONP model described in section 3.3. The CONP model

was trained according to the parameters shown in Table 2. Table 3 shows the

accuracy of the CONP model for each dataset. The recognition rate is deﬁned

as the ratio between the number of neuron’s ﬁring corresponding to the correct

vehicle and relative movement and the total number of ﬁrings. The correct sound

source and relative direction could be recognized with an average accuracy of

85.8%.

The results of the “scooter” dataset present a better recognition rate than the

“police car” and “ambulance” datasets. The reason for this is that the sound

signal of the “scooter” dataset is constant over time, in opposite to the alarm

sounds of the other two vehicles, which actually correspond to two diﬀerent and

alternated sounds. Thus, the CONP model can be more eﬃciently trained with

208

K. Iwasa et al.

the “scooter” data than the others, which would require more data in order to

obtain a comparable accuracy.

Conclusions

This paper proposes a system for detecting the approaching and classifying a

sound source using pulsed neural networks. The system extracts the level dif-

ference information from pulse trains corresponding to several frequency bands.

The ﬁring pattern is then classiﬁed by a CONP model, which identiﬁes the type

and recognizes the relative movement of the sound source.

The experimental results conﬁrmed that the PN model based level diﬀerence

extractor can successfully detect the relative movement (approaching or moving

away) of a sound source. By using the ﬁring pattern provided by the LDE, the

sound source type and relative movement could be correctly classiﬁed with a

average accuracy of 85.8%.

Future works include the detection of a sound source position (its distance

from the sensor) and the combination of the proposed system with a sound lo-

calization method. The hardware implementation of the proposed systems using

an FPGA device is also in progress.

Acknowledgment

This research is supported in part by a grant from the Hori Information Science

Promotion Foundation.

References

1. Surendra, G., Osama, M., Robert, F.K.M., Nikolaos, P.P.: Detection and Classiﬁ-

cation of Vehicles. IEEE Trans. ITS 3(1), 37–47 (2002)

2. Chieh-Chi, W., Thorpe, C., Thrun, S.: Online simultaneous localization and map-

ping with detection and tracking of moving objects: theory and results from a ground

vehicle in crowded urban areas. In: Proceedings of ICRA 2003, pp. 842–849 (2003)

3. Pickles, J.O.: An Introduction to the Physiology of Hearing. Academic Press, Lon-

don (1988)

4. Maass, W., Bishop, C.M.: Pulsed Neural Networks. MIT Press, Cambridge (1998)

5. Kuroyanagi, S., Iwata, A.: A Competitive Learning Pulsed Neural Network for Tem-

poral Signals. In: Proceedings of ICONIP 2002, pp. 348–352 (2002)

6. Iwasa, K., Kuroyanagi, S., Iwata, A.: A Sound Localization and Recognition Sys-

tem using Pulsed Neural Networks on FPGA. In: Proceeding of International Joint

Conference of Neural Networks 2007 (to appear, August 2007)

7. Kuroyanagi, S., Iwata, A.: Auditory Pulse Neural Network Model to Extract the

Inter-Aural Time and Level Diﬀerence for Sound Localization. IEICE Trans. Infor-

mation and Systems E77-D(4), 466–474 (1994)

8. Kuroyanagi, S., Iwata, A.: Auditory Pulse Neural Network Model for Sound Lo-

calization -Mapping of the ITD and ILD-. IEICE J78-D2(2), 267–276 (1996) (in

Japanese)

M. Ishikawa et al. (Eds.): ICONIP 2007, Part I, LNCS 4984, pp. 209–218, 2008.

Download 12.42 Mb.

Do'stlaringiz bilan baham:

1 ... 15 16 17 18 19 20 21 22 ... 88