Lecture Notes in Computer Science
Download 12.42 Mb. Pdf ko'rish
|
- Bu sahifa navigatsiya:
- 4 Concluding Remarks
3 Numerical Results The embedded vectors are set to the binary random vectors as follows.
e
i =z
(r) i (1 i N,1 r L) (12)
where z (r ) i (1 i N ,1 r L) are the zero-mean pseudo-random numbers between -1 and +1. For simplicity, the activation function , eq.(1), is assumed to be a piece- wise linear function instead of the previous signum form for the binary embedded vectors[25] and set to
i =f (
i
i 1+sgn 1- i 2 +sgn
i 1-sgn 1-
i 2
,
(13) where denotes the signum function sgn •
defined by
sgn x = -1 (x<0) 0 (x=0) +1 (x>0) .
(14) 192 M. Nakagawa The initial vector s i (0)
(1 i N ) is set to
s i (0)=
-e (s)
i (1 i H d )
(s) i (H d +1 i N)
,
(15)
where
e (r ) i is a target pattern to be retrieved and H d
is the Hamming distance between the initial vector s i (0)
and a target vector e (s) i .
The retrieval is succeeded if
m (s) (t ) =
i=1 N e †(s) i s i (t )
(16)
±1 for
t 1 , in which the system may be in a steady state such that s i (t+1)=s i (t) , (17a) i (t+1)= i (t) . (17b)
To see the retrieval ability of the present model, the success rate S r is defined as the rate of the success for 1000 trials with the different embedded vector sets
(r ) i (1 i N ,1 r L) . To control from the autocorrelation dynamics after the initial state (t~1) to the entropy based dynamics (t~
max
) , the parameter in eq.(10) was simply controlled by
= t T max
max (0 t T max ) ,
(18)
where T max
and max
are the maximum values of the iterations of the updating according to eq.(10) and , respectively. Then we shall present the dependence of the success rate
on the loading rate = L / N are depicted in Figs.2 (a) and (b) for H d / N = 0.3 , N =100 for the entropy approach and the associatron, respectively. From these results, one may confirm the larger memory capacity of the presently proposed model defined by eq.(10) in A Generalised Entropy Based Associative Model 193
Choosing N =200, η =
1, T max
= 25, L/N=0.5 and α max
= 1, we first present an example of the dynamics of the overlaps in Figs.1(a) and (b) (Entropy based approach). Therein the cross symbols( × ) and the open circles(o) represent the success of retrievals, in which eqs.(5a) and (5b) are satisfied, and the entropy defined by eq.(2), respectively, for a retrieval process. In addition the time dependence of the parameter α / α max
defined by eq.(18) are depicted as dots ( i ). In Fig. 1 after a transient state, it is confirmed that the complete association corresponding to eqs.(5a) and (5b) can be achieved.
0 5 10 15 20 25 30 35 40 45 50 -1 -0.8
-0.6 -0.4
-0.2 0 0.2 0.4 0.6
0.8 1
n Overlaps < o(n) > (a) H d / N = 0.1
0 5 10 15 20 25 30 35 40 45 50 -1 -0.8
-0.6 -0.4
-0.2 0 0.2 0.4 0.6
0.8 1
n Overlaps < o(n) > (b) H d / N = 0.3
m (r ) of the present entropy based model defined by eq.(10)
194 M. Nakagawa Success Rate MemCap= 0.9999 Hd/N= 0.3 0 0.1 0.2 0.3
0.4 0.5
0.6 0.7
0.8 0.9
1 0 0.1 0.2 0.3
0.4 0.5
0.6 0.7
0.8 0.9
1 L/N
Success Rate Sr(L/N)
(a) Entropy based Model defined by eq.(10)
Success Rate MemCap= 0.0134 Hd/N= 0.3 0 0.1 0.2 0.3
0.4 0.5
0.6 0.7
0.8 0.9
1 0 0.1 0.2 0.3
0.4 0.5
0.6 0.7
0.8 0.9
1 L/N
Success Rate Sr(L/N)
(b) Conventional Associatron Model defined by eq.(11) Fig. 2. The dependence of the success rate on the loading rate α = L / N of the present entropy based model defined by eqs.(10) and (11). Here the Hamming distance is set to
d / N = 0.3
.
A Generalised Entropy Based Associative Model 195 4 Concluding Remarks To conclude this work, we shall show the dependence of the storage capacity, which is defined as the area covered in terms of the success rate curves as shown in Fig.3 , on the Hamming distance in Fig.3 for the analogue embedded vectors (Ana) as well as the previous binary ones (Bin). In addition OL and CL imply the orthogonal learning model and the autocorrelation learning model, respectively. Therein one may see again the great advantage of the present model based on the entropy functional to be minimized beyond the conventional quadratic form [12,13] even for comparison with the conventional autoassociation model defined by eq.(11). In
practice, it is found that the present approach may achieve the high memory capacity beyond the conventional autocorrelation strategy even for the analogue embedded vectors as well as the previously concerned binary case [15,16,25].
a a a a a a a a a a m m m m m m m m m m n n n n n n n n n n s s s s s s s s s s t t t t t t t t t t 0.010.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0 0.1
0.2 0.3
0.4 0.5
0.6 0.7
0.8 0.9
1 Memory Capacity Hd/N a
m Entropy based Model (OL:Bin) n Entropy based Model (CL:Bin) s Associatron(OL:Bin) t Associatron(OL:Wii=0:Bin) the analogue embedded vectors. In fact one may realize the considerably larger storage capacity in the present model in comparison with the associatron over H d / N 0.5 .
The memory retrievals for the associatron based on the quadratic
and n are for the entropy based approach with eq. (10) as well as the orthogonal learning (OL) and the autocorrelation learning (CL) [16,17], in which Ana and Bin imply the analogue embedded vectors and the binary ones, respectively. In addition we presented the associatron in symbols s with the orthogonal learning [13], and the associatron in symbols with orthogonal learning under the condition
= 0 [12], respectively. 196 M.
Nakagawa t In the present paper, we have proposed an entropy based association model instead of the conventional autocorrelation dynamics. From numerical results, it was found that the large memory capacity may be achieved on the basis of the entropy approach. This advantage of the association property of the present model is considered to result from the fact such that the present dynamics to update the internal state eq.(10) assures that the entropy, eq.(2) is minimized under the conditions, eqs.(5a) and (5b), which corresponding to the succeeded retrieval of a target pattern. In other words, the higher-order correlations in the presently proposed dynamics, eq.(10), which was ignored in the conventional approaches, [1-11] was found to play an important role to improve memory capacity, or the retrieval ability.
Lyapunov
functionals to
be
minim ized become
troublesome
near
H d / N = 0 . 5 as
seen in Fig.3 since the directional cosine between the initial vector and a target pattern eventually vanishes therein. Remarkably, even in such a case, the present model attains a remarkably large memory capacity because of the higher-order correlations involved in eq.(10) as
expected from Figs. 1 and 2 for the analogue vectors as well as the binary ones previously investigated [15,16,25]. As a future problem, it seems to be worthwhile to involve a chaotic dynamics in the present model introducing a periodic activation function such as sinusoidal one as a nonmonotonic activation function . [14] The
entropy based
approach [15] with chaos dynamics [14] is now in progress and will be reported elsewhere together with the synergetic models [17-24] in the near future. References 1. Anderson, J.A.: A Simple Neural Network Generating Interactive Memory. Mathematical Biosciences 14, 197–220 (1972) 2. Kohonen, T.: Correlation Matrix Memories. IEEE Transaction on Computers C-21, 353– 359 (1972) 3. Nakano, K.: Associatron-a Model of Associative Memory. IEEE Trans. SMC-2, 381–388 (1972) 4. Amari, S.: Neural Theory of Association and Concept Formation. Biological Cybernetics 26, 175–185 (1977) 5. Amit, D.J., Gutfreund, H., Sompolinsky, H.: Storing Infinite Numbers of Patternsin a Spin- glass Model of Neural Networks. Physical Review Letters 55, 1530–1533 (1985) 6. Gardner, E.: Structure of Metastable States in the Hopfield Model. Journal of Physics A19, L1047–L1052 (1986) 7. Kohonen, T., Ruohonen, M.M.: Representation of Associated Pairs by Matrix Operators. IEEE Transaction C-22, 701–702 (1973) 8. Amari, S., Maginu, K.: Statistical Neurodynamics of Associative Memory. Neural Networks 1, 63–73 (1988) 9. Morita, M.: Neural Networks. Associative Memory with Nonmonotone Dynamics 6, 115– 126 (1993) 10. Yanai, H.-F., Amari, S.: Auto-associative Memory with Two-stage Dynamics of non- monotonic neurons. IEEE Transactions on Neural Networks 7, 803–815 (1996) 11. Shiino, M., Fukai, T.: Self-consistent Signal-to-noise Analysis of the Statistical Behaviour of Analogu Neural Networks and Enhancement of the Storage Capacity. Phys. Rev. E48, 867 (1993) 12. Kanter, I., Sompolinski, H.: Associative Recall of Memory without Errors. Phys. Rev. A 35, 380–392 (1987) 13. Personnaz, L., Guyon, I., Dreyfus, D.: Information Storage and Retrieval in Spin-Glass like Neural Networks. J. Phys(Paris) Lett. 46, L-359 (1985) 14. Nakagawa, M.: Chaos and Fractals in Engineering, p. 944. World Scientific Inc., Singapore (1999) 15. Nakagawa, M.: Autoassociation Model based on Entropy Functionals. In: Proc. of NOLTA 2006, pp. 627–630 (2006) 16. Nakagawa, M.: Entropy based Associative Model. IEICE Trans. Fundamentals EA-89(4), 895–901 (2006) A Generalised Entropy Based Associative Model 197
17. Fuchs, A., Haken, H.: Pattern Recognition and Associative Memory as Dynamical Processes in a Synergetic System I. Biological Cybernetics 60, 17–22 (1988) 18. Fuchs, A., Haken, H.: Pattern Recognition and Associative Memory as Dynamical Processes in a Synergetic System II. Biological Cybernetics 60, 107–109 (1988) 19. Fuchs, A., Haken, H.: Dynamic Patterns in Complex Systems. In: Kelso, J.A.S., Mandell, A.J., Shlesinger, M.F. (eds.), World Scientific, Singapore (1988) 20. Haken, H.: Synergetic Computers and Cognition. Springer, Heidelberg (1991) 21. Nakagawa, M.: A study of Association Model based on Synergetics. In: Proceedings of International Joint Conference on Neural Networks 1993 NAGOYA, JAPAN, pp. 2367– 2370 (1993) 22. Nakagawa, M.: A Synergetic Neural Network. IEICE Fundamentals E78-A, 412–423 (1995)
23. Nakagawa, M.: A Synergetic Neural Network with Crosscorrelation Dynamics. IEICE Fundamentals E80-A, 881–893 (1997) 24. Nakagawa, M.: A Circularly Connected Synergetic Neural Networks. IEICE Fundamentals E83-A, 881–893 (2000) 25. Nakagawa, M.: Entropy based Associative Model. In: Proceedings of ICONIP 2006, pp. 397–406. Springer, Heidelberg (2006) 198 M. Nakagawa The Detection of an Approaching Sound Source Using Pulsed Neural Network Kaname Iwasa 1 , Takeshi Fujisumi 1 , Mauricio Kugler 1 , Susumu Kuroyanagi 1 , Akira Iwata 1 , Mikio Danno 2 , and Masahiro Miyaji 3 1 Nagoya Institute of Technology, Gokiso-cho, Showa-ku, Nagoya, 466-8555, Japan kaname@mars.elcom.nitech.ac.jp 2 Toyota InfoTechnology Center, Co., Ltd, 6-6-20 Akasaka, Minato-ku, Tokyo, 107-0052, Japan 3 Toyota Motor Corporation, 1 Toyota-cho, Toyota, Aichi, 471-8572, Japan Abstract. Current automobiles’ safety systems based on video cameras and movement sensors fail when objects are out of the line of sight. This paper proposes a system based on pulsed neural networks able to detect if a sound source is approaching a microphone or moving away from it. The system, based on PN models, compares the sound level difference between consecutive instants of time in order to determine its relative movement. Moreover, the combined level difference information of all fre- quency channels permits to identify the type of the sound source. Exper- imental results show that, for three different vehicles sounds, the relative movement and the sound source type could be successfully identified. 1 Introduction Driving safety is one of the major concerns of the automotive industry nowa- days. Video cameras and movement sensors are used in order to improve the driver’s perception of the environment surrounding the automobile [1][2]. These methods present good performance when detecting objects (e.g., cars, bicycles, and people) which are in line of sight of the sensor, but fail in case of obstruction or dead angles. Moreover, the use of multiple cameras or sensors for handling dead angles increases the size and cost of the safety system. The human being, in contrast, is able to perceive people and vehicles around itself by the information provided by the auditory system [3]. If this ability could be reproduced by artificial devices, complementary safety systems for automo- biles would emerge. Cause of diffraction, sound waves can contour objects and be detected even when the source is not in direct line of sight. A possible approach for processing temporal data is the use of Pulsed Neuron (PN) models [4]. This type of neuron deals with input signals on the form of pulse trains, using an internal membrane potential as a reference for generating pulses on its output. PN models can directly deal with temporal data and can be efficiently implemented in hardware, due to its simple structure. Furthermore, M. Ishikawa et al. (Eds.): ICONIP 2007, Part I, LNCS 4984, pp. 199–208, 2008. c Springer-Verlag Berlin Heidelberg 2008
200 K. Iwasa et al. high processing speeds can be achieved, as PN model based methods are usually highly parallelizable. A sound localization system based on pulsed neural networks has already being proposed in [5] and a sound source identification system, with a corre- sponding implementation on FPGA, was introduced in [6]. This paper focuses specifically on the relative moving direction of a sound emitting object, and pro- poses a method to detect if a sound source is approaching or moving away from it using a microphone. The system, based on PN models, compares the sound level difference between consecutive instants of time in order to determine its relative movement. Moreover, the proposed method also identifies the type of the sound source by the use of PN model based competitive learning pulsed neural network for processing the spectral information. 2 Pulsed Neuron Model When processing time series data (e.g., sound), it is important to consider the time relation and to have computationally inexpensive calculation procedures to enable real-time processing. For these reasons, a PN model is used in this research. I(t) 1
k n w w w w IN 1 2 k n n 1 p p (t) (t)
2 k p p (t)
(t) Input Pulses Output Pulses θ A Local Membrane Potential The Inner Potential of the Neuron (t)
(t) o(t)
(t) (t)
IN IN IN Fig. 1. Pulsed neuron model Figure 1 shows the structure of the PN model. When an input pulse IN k (t)
reaches the k th synapse, the local membrane potential p k (t) is increased by the value of the weight w k . The local membrane potentials decay exponentially with a time constant τ k across time. The neuron’s output o(t) is given by o(t) = H(I(t) − θ) (1)
I(t) = n k=1 p k (t) (2) p k (t) = w k IN k (t) + p
k (t − 1)e
− t τ (3) The Detection of an Approaching Sound Source 201
where n is the total number of inputs, I(t) is the inner potential, θ is the threshold and H(·) is the unit step function. The PN model also has a refractory period t ndti
, during which the neuron is unable to fire, independently of the membrane potential. 3 The Proposed System The basic structure of the proposed system is shown in Fig.2. This system con- sists of three main blocks, the frequency-pulse converter, the level difference extractor and the sound source classifier, from which the last two are based on PN models. The relative movement (approaching or moving away) of the sound source is determined by the sound level variation. The system compares a signal level x(t) from a microphone with the level in a previous time x(t−Δt). If x(t) > x(t−Δt), the sound source is getting closer to a microphone, if x(t) < x(t−Δt), it is moving away. After the level difference having been extracted, the outputs of the level difference extractors contain the spectral pattern of the input sound, which is then used for recognizing the type of the source. 3.1
Filtering and Frequency-Pulse Converter Initially, the input signal must be pre-processed and converted to a train of pulses. A bank of 4 th order band-pass filters decomposes the signal in 13 fre- quency channels equally spaced in a logarithm scale from 500 Hz to 2 kHz. Each frequency channel is modified by the non-linear function shown in Eq.(4), and the resulting signal’s envelope is extracted by a 400 Hz low-pass filter. Finally, Input Signal Level Difference Extractor Approaching Detection & Sound Classification Sound Source Classifier Time Delay x(t) Filter Bank & Frequency - Pulse Converter Level Difference Extractor Time Delay x(t)
x(t- t) D Level Difference Extractor Time Delay x(t) f
f 2 f N x(t- t)
D x(t- t)
D Fig. 2. The structure of the recognition system 202 K. Iwasa et al. each output signal is independently converted to a pulse train, whose rate is proportional to the amplitude of the signal. F (t) = x(t)
1 3 x(t) ≥ 0 1 4 x(t) 1 3 x(t) < 0 (4) 3.2
Level Difference Extractor Each pulse trains generated by the Frequency-Pulse converter is inputted in a Level Difference Extractor (LDE) independently. The LDE, shown in Fig. 3, is composed by two parts, the Lateral Superior Olive (LSO) model and the Level Mapping Two (LM2) model [7]. In LSO model and LM2 model, each neurons work as Eq.(3). The LSO is responsible for the time difference extraction itself, while the LM2 extracts the envelope of the complex firing pattern. Each pulse train correspondent to each frequency channel is inputted in a LSO model. The PN potential of f th channel, i th LSO neuron I LSO i,f
(t) is calculated as follows: I LSO
i,f (t) = p
N i,f
(t) + p B i,f (t) (5)
p N i,f (t) = w N i,f x f (t) + p N i,f
(t − 1)e − t τLSO (6)
p B i,f (t) = w B i,f x f (t − Δt) + p B i,f
(t − 1)e − t τLSO (7)
where τ LSO
is the time constant of the LSO neuron and the weights w N i,f and w B i,f are defined as: w N
= ⎧ ⎪ ⎪ ⎪ ⎨ ⎪ ⎪ ⎪ ⎩ 0.0
i = 0 1.0
i > 0 −10
i γ −b < i < 0 −10 −(K−i)
α i ≤ −b
w B i,f = ⎧ ⎪ ⎪ ⎪ ⎨ ⎪ ⎪ ⎪ ⎩ 0.0
i = 0 1.0
i < 0 −10
i γ 0 < i < b −10 K−i
α i ≥ −b ,
(8) where α, γ are parameters for adjustment of the weights K is the index of the last neuron of each side of the LSO (totalizing 2K + 1 neurons, including the central neuron) and b is the index of the last inner neuron of each side of the LSO. The inner neurons have current input weights smaller than delayed input weights. They are used to make a feature of the input level difference clear when the input level difference is small. As larger the signal becomes, more neurons fire on the LSO model. The LM2 stage then generates a clearer output, extracting the envelope of the firing pat- tern generated by the LSO. The potentials in the LM2 are calculated as follows: I LM 2
l,f (t) = p
D l,f
(t) + p S l,f (t) (9)
p D l,f (t) = m i,f
(t) + p D l,f (t − 1)e − t τLM2 (10)
p S l,f (t) = −m i,f
(t) + p S l,f (t − 1)e − t τLM2 (11)
where τ LM 2
is the time constant of the LM2 neuron and m i,f
(t) is the output of the i
th LSO neuron in f th frequency channel. The Detection of an Approaching Sound Source 203
Fig. 3. Level difference extractor 3.3
Sound Source Classifier The sound source classifier is based on the Competitive Learning Network using Pulsed Neurons (CONP) proposed in [5]. The basic structure of CONP is shown in Fig.4. The CONP is constructed on PN models. In the learning process of CONP, the neuron with the most similar weights to the input (winner neuron) should be chosen for learning in order to obtain a topological relation between inputs and outputs. However, in the case of two or more neurons firing, it is difficult to decide which one is the winner, as their outputs are only pulses, and not real values. In order to this, CONP has extra external units called control neurons. Based on the output of the Competitive Learning (CL) neurons, the control neu- rons’ outputs increase or decrease the inner potential of all CL neurons, keeping the number of firing neurons equal to one. Controlling the inner potential is equivalent to controlling the threshold. Two types of control neurons are used in this work. The No-Firing Detection (NFD) neuron fires when no CL neuron fires, increasing their inner potential. Complementarily, the Multi-Firing Detec- tion (MFD) neuron fires when two or more CL neurons fire at the same time, decreasing their inner potential [5]. The CL neurons are also controlled by another potential, named the input potential p in (t), and a gate threshold θ gate . The input potential is calculated as the sum of the inputs (with unitary weights), representing the rate of the input pulse train. When p in (t) < θ
gate , the CL neurons are not updated by the control neurons and become unable to fire, as the input train has a too small potential for being responsible for an output firing. Furthermore, the input potential of each CL neuron is decreased along time by a factor β, to follow rapid changes on the inner potential and improving its adjustment. 204 K. Iwasa et al. Fig. 4. Competitive Learning Network using Pulsed Neurons (CONP) Considering all the described adjustments on the inner potential of CONP neurons, the output equation (3) of each CL neurons becomes: o(t) = H
n k=1
p k (t) − θ + p nf d
(t) − p mf d
(t) − β · p in (t) (12) where p
nf d (t) and p mf d (t) corresponds respectively to the potential generated by NFD and MFD neurons’ outputs, p in (t) is the input potential and β (0 ≤ β ≤ 1) is a parameter. 4 Experimental Results Three different sound sources were used on the experiments: “police car”, “am- bulance” and “scooter”. The first two correspond to the alarm sounds of the vehicles, while the last corresponds to the engine sound of a scooter. All the sig- nals were recorded from a static sound source. The moving sound source signals were generated by computer, with the sound intensity at each instant of time calculated as: S(t) = 20S b log 10 d(t)
d b (13) where I b is a sound intensity in the center position, d b and d(t) are, respectively, the distance between the sensor and the sound source at center position and the distance at time t. All signal have 4.0 s of duration and the sound source is normal to the sensor at 2.0 s, as shown in Fig. 5.
The Detection of an Approaching Sound Source 205
Microphone Sound Source = 1m 0.0s
2.0s 4.0s
d b S b d(t)
S(t) Fig. 5. Sound source movement on experiments Table 1. Parameters of each module used on the experiments Input Sound Sampling frequency 48 kHz
Quantization bits 16 bits
Number of frequency channels 13 channels Delay time Δt 0.4 s Level Difference Extractor Number of total LSO neurons 2 K + 1 51 units
Number of inner LSO neurons 2 b + 1
11 units Number of output neurons L 48 units
Threshold θ LSO /θ LM 2 0.001 / 0.001 Time constant τ LSO
/τ LM 2 0.1 s / 35.0 μs Parameter α/β 60 / 60
4.1 Level Difference Information Extraction The level difference information was extracted as described in section 3.2. The used parameters for the signal acquisition, preprocessing and level difference extraction are shown in Table 1. Figure 6 shows the output of the LDE model for the “police car” signal in four distinct intervals of time. The x-axis corresponds to the index of the neu- rons in the LM2, representing the level difference information, and the y-axis corresponds to the frequency channels. The gray level intensity represents the rate of the output pulse train. The firing pattern differs significantly from each interval of time, especially when comparing the graphics of opposite relative movements. Although the LM2 could not successfully extract the envelope from the firing pattern of the signals corresponding to a sound source moving away from the sensor, the result is enough clear for distinguishing it from an approaching sound source signal. Figure 7 shows the firing patterns of each kind of sound for the approaching (interval of 0.0 ∼ 2.0 s) and moving away (2.0 ∼ 4.0 s) cases. As different fre- quency components present different firing information, it is possible to classify the sound source, as described in the next section. 206 K. Iwasa et al. Fig. 6. Level Difference Extractor output of the “police car” dataset Fig. 7. Comparing the output of level difference information for each dataset The Detection of an Approaching Sound Source 207
Table 2. Parameters of CONP used on the experiments Competitive learning Neuron Number of Inputs of CL neurons 637 units Number of CL neurons 30 units
Threshold θ 1.0 ×10 −4 Gating threshold θ gate
150.0 Rate for input pulse frequency β 0.0629
Time constant τ p 0.1 s Refractory period t ndti
10 ms Learning coefficient α 2.0
×10 −8 Learning iterations 1000 Control Neurons(NFD/MFD) Time constant τ N F D /τ M F D
0.5 ms / 1.0 ms Threshold θ N F D
/θ M F D
-1.0 ×10
−3 / 2.0
Connection weight to each CL neurons 16.0 / -16.0 Table 3. Results of sound recognition (A = approaching, M = moving away) Recognition Rate[%] police car ambulance scooter Input Sound A M A M A M police A 70.6 6.8 2.4
7.3 12.9
0.0 M 6.8 88.3 0.0
4.9 0.0
0.0 ambulance A 1.1
4.2 82.8
9.9 2.0
0.0 M 3.8 0.2 7.3
86.3 0.0
2.4 scooter
A 0.0
0.0 5.7
0.0 94.3
0.0 M 0.0 1.9 0.3
5.4 0.0
92.4 4.2
Sound Source Classification The firing information patterns provided by all the level difference extractors are recognized by the CONP model described in section 3.3. The CONP model was trained according to the parameters shown in Table 2. Table 3 shows the accuracy of the CONP model for each dataset. The recognition rate is defined as the ratio between the number of neuron’s firing corresponding to the correct vehicle and relative movement and the total number of firings. The correct sound source and relative direction could be recognized with an average accuracy of 85.8%. The results of the “scooter” dataset present a better recognition rate than the “police car” and “ambulance” datasets. The reason for this is that the sound signal of the “scooter” dataset is constant over time, in opposite to the alarm sounds of the other two vehicles, which actually correspond to two different and alternated sounds. Thus, the CONP model can be more efficiently trained with 208 K. Iwasa et al. the “scooter” data than the others, which would require more data in order to obtain a comparable accuracy. 5 Conclusions This paper proposes a system for detecting the approaching and classifying a sound source using pulsed neural networks. The system extracts the level dif- ference information from pulse trains corresponding to several frequency bands. The firing pattern is then classified by a CONP model, which identifies the type and recognizes the relative movement of the sound source. The experimental results confirmed that the PN model based level difference extractor can successfully detect the relative movement (approaching or moving away) of a sound source. By using the firing pattern provided by the LDE, the sound source type and relative movement could be correctly classified with a average accuracy of 85.8%. Future works include the detection of a sound source position (its distance from the sensor) and the combination of the proposed system with a sound lo- calization method. The hardware implementation of the proposed systems using an FPGA device is also in progress. Acknowledgment This research is supported in part by a grant from the Hori Information Science Promotion Foundation. References 1. Surendra, G., Osama, M., Robert, F.K.M., Nikolaos, P.P.: Detection and Classifi- cation of Vehicles. IEEE Trans. ITS 3(1), 37–47 (2002) 2. Chieh-Chi, W., Thorpe, C., Thrun, S.: Online simultaneous localization and map- ping with detection and tracking of moving objects: theory and results from a ground vehicle in crowded urban areas. In: Proceedings of ICRA 2003, pp. 842–849 (2003) 3. Pickles, J.O.: An Introduction to the Physiology of Hearing. Academic Press, Lon- don (1988) 4. Maass, W., Bishop, C.M.: Pulsed Neural Networks. MIT Press, Cambridge (1998) 5. Kuroyanagi, S., Iwata, A.: A Competitive Learning Pulsed Neural Network for Tem- poral Signals. In: Proceedings of ICONIP 2002, pp. 348–352 (2002) 6. Iwasa, K., Kuroyanagi, S., Iwata, A.: A Sound Localization and Recognition Sys- tem using Pulsed Neural Networks on FPGA. In: Proceeding of International Joint Conference of Neural Networks 2007 (to appear, August 2007) 7. Kuroyanagi, S., Iwata, A.: Auditory Pulse Neural Network Model to Extract the Inter-Aural Time and Level Difference for Sound Localization. IEICE Trans. Infor- mation and Systems E77-D(4), 466–474 (1994) 8. Kuroyanagi, S., Iwata, A.: Auditory Pulse Neural Network Model for Sound Lo- calization -Mapping of the ITD and ILD-. IEICE J78-D2(2), 267–276 (1996) (in Japanese)
M. Ishikawa et al. (Eds.): ICONIP 2007, Part I, LNCS 4984, pp. 209–218, 2008. © Springer-Verlag Berlin Heidelberg 2008 Download 12.42 Mb. Do'stlaringiz bilan baham: |
ma'muriyatiga murojaat qiling