Lecture Notes in Computer Science
Download 12.42 Mb. Pdf ko'rish
|
- Bu sahifa navigatsiya:
- Modified Modulated Hebb-Oja Learning Rule: A Method for Biologically Plausible Principal Component Analysis
- Keywords
- 2 Modulated Hebb Learning Rule and Computational Circuit 2.1 Modulated Hebbain Rule: The Case N = K
- 2.2 Modulated Hebb-Oja Rule: Case N
5.2 The Testing Procedure The recognition algorithm can be summarized by the following steps. Step 1 : Unknown speakers speech is recorded first. Step 2 : The starting and endpoint is detected and speech should go through the filtering process. Step 3 : Speech features are extracted from the speech signal which is used to create the testing Vector (acoustic vector) for that utterances. Step 4 : The testing vector is then fed into the vector quantizer Step 5 : The predefined knowledge is used by the vector quantization to calculate the spectral distortion(distance) for each utterance and smallest distance value is selected. Step 6 :
The smallest distance value is compared with a threshold value and a decision of whether the unknown speaker to be recognized or not is made[6]. An Automatic Speaker Recognition System 525
In this work, the utterances of several speakers are taken and the data are divided in music (rock and melody) and sample voice data (Bengali and English).Each sample data is taken to train the Vector Quantizer and then all the utterances are used for recognition or testing. The input of the VQ is obtained by the frequency analysis for the given input utterances. The detail of the VQ is specified by representing the input in the form of Matrix. We’ve taken about 70 data for which the result is shown in percentage below.
Data Type Correct Recognition Inclusion False Rejection Music 86 % 6 % 8 % Speech (English) 92 % 3 % 5 % Speech (Bengali) 91 % 4 % 5 % 7 Conclusion This paper deals with the automatic Speaker recognition system using Vector Quantization. There are two main modules, feature extraction and feature matching. The speaker specific features are extracted using Mel-Frequency Cepstrum Co- efficient (MFCC) processor. A set of Mel-frequency cepstrum coefficients was found, which are called acoustic vectors. These are the extracted features of the speakers. These acoustic vectors are used in feature matching using vector quantization technique. There are another techniques for feature matching such as Hidden Markov model (HMM), Artificial Neural network (ANN) for speaker recognition. We’ve used VQ as its computational complexity is less than others. Vector Quantization is the typical feature matching technique in which VQ codebook is generated using trained data. Finally tested data are provided for searching the nearest neighbor to match that data with the trained data. The result is to recognize correctly the speakers where music & speech data (Both in English & Bengali format) are taken for the recognition process. The correct recognition is almost ninety percent. It is comparatively better than Hidden Markov model (HMM) & Artificial Neural network (ANN) because the correct recognition for HMM & ANN is below ninety percent. The future work is to generate a VQ codebook with many pre-defined spectral vectors. Then it will be possible to add many trained data in that codebook in a training session, but the main problem is that the network size and training time become prohibitively large with increasing data size. To overcome these limitations, time alignment technique can be applied, so that continuous speaker recognition system becomes possible.
and to KM from the Japanese Society for the Promotion of Science, the Yazaki Memorial Foundation for Science and Technology, and the University of Fukui. 526 P. Chakraborty et al. References 1. Rabiner, L., Juang, B.H.: Fundamentals of Speech Recognition. Prentice Hall, Englewood Cliffs (1978) 2. Rabiner, L., Schafer, R.W.: Digital Processing of Speech Signals. Prentice Hall, Englewood Cliffs (1978) 3. Davis, S.B., Mermelstein, P.: Comparison of Parametric representations for monosyllabic word recognition in Continuously Spoken Sentences. IEEE Transactions on Acoustics Speech, Signal Processing ASSP-28(4) (August 1980) 4. Buzo, L.A., Gray, R.: An Algorithm for Vector Quantizer Design. IEEE Transactions on Communications 28, 84–95 (1980) 5. Furui, S.: Speaker Independent Isolated word Recognition using Dynamic Features of Speech Spectrum. IEEE Transactions on Acoustic, and Speech Signal Processing ASSP- 34(1), 52–59 (1986) 6. Furui, S.: An Overview of Speaker Recognition Technology. In: ESCA Workshop on Automatic Speaker Recognition, Identification & Verification, pp. 1–9 (1994)
M. Ishikawa et al. (Eds.): ICONIP 2007, Part I, LNCS 4984, pp. 527–536, 2008. © Springer-Verlag Berlin Heidelberg 2008 Modified Modulated Hebb-Oja Learning Rule: A Method for Biologically Plausible Principal Component Analysis Marko Jankovic 1 , Pablo Martinez 1 , Zhe Chen 2 , and Andrzej Cichocki 1
1 Laboratory for Advanced Brain Signal Processing, RIKEN Brain Science Institute, Wako-shi, Saitama, 351-0198, Japan {elmarkoni,cia}@brain.riken.jp 2 Neuroscience Statistics Research Laboratory, Departement of Anesthesia and Critical Care, Massachusetts General Hospital, Harvard Medical School, Boston, MA 02114, USA zhechen@neurostat.mgh.harvard.edu
that performs principal component analysis. Method is based on implementation of Time-Oriented Hierarchical Method applied on recently proposed principal subspace analysis rule called Modulated Hebb-Oja learning rule. Comparing to some other well-known methods for principal component analysis, the proposed method has one feature that could be seen as desirable from the biological point of view – synaptic efficacy learning rule does not need the explicit information about the value of the other efficacies to make individual efficacy modification. Simplicity of the “neural circuits” that perform global computations and a fact that their number does not depend on the number of input and output neurons, could be seen as good features of the proposed method. The number of necessary global calculation circuit is one. Some similarity to a part of the frog retinal circuit will be suggested, too.
submanifold, local learning rules. 1 Introduction Neural networks provide a way for parallel on-line computations of the principal component analysis (PCA) or principal subspace analysis (PSA). Due to their parallelism and adaptivity to input data, such algorithms and their implementations in neural networks are potentially useful in feature extraction and data compression. It is well known that in the first step of any pattern recognition scheme, which is the representation of the objects from a usually large amount of raw data, some preprocessing and data compression are essential. In that case a minimal loss of information is a central objective. Principal subspace analysis (PSA) and principal component analysis (PCA) are powerful and popular tools that are used in statistical signal processing and data compression and they have objective to reduce the
528 M. Jankovic et al. complexity of input data and at the same time keep as much as possible of the input data information. Within last years various PCA and PSA learning algorithms have been proposed and mathematically investigated (see [1-5], [8-15], [17-28]). Most of the proposed algorithms are based on local Hebbian learning - due to locality it has been argued that these algorithms are biologically plausible. In [11], biologically inspired PSA methods, named Modulated Hebbian (MH) and Modulated Hebb-Oja learning rules have been introduced. Major objectives for the methods derivation were: – to obtain a network which has a learning rule for individual synaptic efficacy that requires the least possible amount of explicit information about the other synaptic efficacies, especially those related to other neurons – minimization of the neural hardware that is necessary for implementation of the proposed learning rule. In this paper modification of MHO algorithm, MMHO algorithm is analyzed. MMHO algorithm performs PCA. New algorithm is obtained by implementation of time-oriented hierarchical method (TOHM) [14] on MHO method. TOHM method is general method that transforms PSA methods into PCA methods. It is based on learning on approximate principal Grassman/Stiefel submanifold [14]. Section 2 reviews basic theory of modulated Hebb (MH) and modulated Hebb-Oja (MHO) learning rules and suggests its computational circuit in the context of retinal processing. Section 3 is devoted to the introduction of new MMHO learning rule. Section 4 contains some simulation results. In Section 5 suggests speculative role of the proposed method for early visual processing in part of the frog retinal circuit. Section 6 gives some conclusion.
Let x ∈
K denotes the input random variables with zero mean, and let y = W T
∈
ℜ N
denotes the output, where W ∈
ℜ KxN denotes the synaptic weight matrix. It is assumed N=K. In scalar form, for the kth output, we have y k =w k T
k
∈ ℜ
denotes the column vector of W. Modulated Hebbian (MH) learning rule is introduced in [11] and analyzed in [13][15]. Here we will give a little bit different interpretation. MH rule can be derived as a gradient descent learning rule for minimization of the following cost function ( )
) ( ) ( ) ,
assumption under
tr tr 2 ) ( tr E E E ) ( T 2 T T T 2 2 T 4
W W W CWW W CW W C x WW x Wy x W = ⎟ ⎠ ⎞ ⎜ ⎝ ⎛ + − = ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎝ ⎛ ⎟ ⎠ ⎞ ⎜ ⎝ ⎛ − = − =
(1)
Modified Modulated Hebb-Oja Learning Rule 529 or in compact notation . E ) ( 2 1 2 1 2 ⎪⎭ ⎪ ⎬ ⎫ ⎪⎩ ⎪ ⎨ ⎧ ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎝ ⎛ − = ∑ ∑ = = N i i N i i LoPCA N y x J W
(2) It leads to the following gradient descent algorithm: ). ( ) ( ) ( ) ( ) ( ) ( ) 1 ( T 1 2 1 2 i i i y i x i i i N k k N k k y x W W ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎝ ⎛ − + = + ∑ ∑ = = γ (3)
It is probably noticed that only an implicit constraint W T
explicit constraint. As it is shown in [13], that this algorithm, named Modulated Hebbian learning (MH), will lead toward a solution which insures W T
no additional terms, that should keep weight vectors orthonormal, are added. Also, it is interesting to notice that algorithm can be seen as a variant of the moving threshold algorithm, which is a form similar to the BCM learning rule [3], where the moving threshold is equal to the power contained in input signals. From equation (3) we can see that the original Hebb learning rule modulated with a single signal is applied. The network structure of interest is depicted on Fig. 1. The proposed learning method posses the locality and homogeneity. These are desirable properties for implementing artificial neural networks in parallel hardware [21]. Comparing to some other approaches (e.g. [19] and [26]) in this case we have solution in which update learning rule for individual synaptic efficacy does not require information about the explicit value of the other synaptic efficacies. Also, the number of circuits that perform global calculations (in this case summators) is two, regardless of the dimension of input and output vectors. 2.2 Modulated Hebb-Oja Rule: Case N < K The MH learning rule is of low practical interest since the output has the same dimensionality as the input (for instance it can be used for statistical orthogonalization of a matrix). In the case when the number of output neurons is lower than the number of input neurons, or if N can't count any more on orthonormality of W. In order to maintain orthonormality of the solution, some additional term in learning rule is needed. One possibility is to use the approach which is proposed recently in [12]. However, in that case equations become very complex. Other possibility is to use Oja's stabilizing term applied separately on each weight vector, which is done in [15]. By doing proposed, we have the following learning equation: ( ) ( ) ( ) . ) ( ) ( diag ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) 1 ( T T T T i i i i i i i i i i i i y y W y x y y x x W W − − + = + γ
(4) It is called Modulated Hebb Oja (MHO) learning rule. This learning rule generates a solution W, whose range is equal to the subspace spanned by N biggest eigenvectors of input signal covariance matrix C (C= E(tr(xx T )). Structure of the neural network of
530 M. Jankovic et al.
Fig. 1. MH(O) principal subspace network (small empty circle at the end of an arrow denotes the square function) interest, can also be presented by Fig. 1. The only difference is that some additional calculations at the output are necessary. In [15] was shown that matrix W(i) has bounded elements W
(i). 3 Introduction of the New PCA Learning Rule Now it will be explained how we can transform MHO algorithm into PCA algorithm. First we will recall the definition of Grassman and Stiefel manifold that can be found in [7]. Grassman manifold is defined as the space of matrices W ∈ O KxN
⊂ ℜ
(N ≤ K) such that W T
KxN →
ℜ
Stiefel manifold is defined as the space of matrices W ∈ O KxN
⊂ ℜ
(N ≤
function J: O
→
ℜ .
Some learning algorithms can be seen as learning on Stiefel manifold if the weight matrix is kept automatically orthonormal at each update step. Such algorithms are known as strongly orthonormaly constrained (SOC) algorithms (e.g. [9]). Here we will use recently proposed time oriented hierarchical method (TOHM) (or more general GTOHM [14]) in order to transform PSA algorithm MHO into PCA method. The method is based on the following idea, that Each neuron tries to do what is the best for his family, and then what is the best for itself. We shall call this idea “the family principle”. In other words, the algorithm consists of two parts: the first part is responsible for the family-desirable feature learning and the second part is responsible for the individual-neuron-desirable feature learning. The second part is taken with a weight coefficient which is, by absolute y N y 2 y 1 x K x 2 x 1 Output y Σ Σ
Modified Modulated Hebb-Oja Learning Rule 531 value, smaller than 1. This means that some time-oriented hierarchy in realization of the family and individual parts of the learning rules is made. In order to realize “the family principle”, we propose the following class of learning rules that can be used for parallel extraction of principal components, defined by the following equation , IP
( SPCAorSMCA PSA PCA
i D W W + Δ = Δ
(5) where
Δ W PCA represents modification of the weights, Δ
defines family part of the learning rule (that is PSA), IP SPCAorSMCA represent individual part of the learning rule (single unit PCA or MCA algorithm) and D(i) is diagonal matrix which diagonal elements can be functions of time (in the case of homogeneous algorithm D(i, i) =
).
In the most general case all individual parts can be implemented as different learning rules. It is interesting to note that individual part can pursue minor component while the whole algorithm pursue principal components. The intuition is as follows: we use two times scales; so if α is sufficiently small, then the term multiplied by
does not affect the PSA learning direction. When algorithm reaches the principal subspace, then part of the algorithm that is multiplied by
α will perform learning on approximately Grasmman/Stiefel principal submanifold (definitions are given in [14]), and will rotate weight vectors toward principal components. By implementation of the proposed principle new learning rule can be written in the form ( ) ( ) ( ) ( )
( ) ( ) ( ) ( ) ( ) ( ) ( ) ( )
( ) ( ) ( ) . ,.., 1 , ) ( ) ( ) ( ) 1 ( 1 2 2 2 2 2 2 N k i y i i y i i y i i i i i y i i y i i i i k j j k k k k k k k k = ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎝ ⎛ − − + − − + = + ∑ =
w x y x w x w w αγ γ (6)
We can see that we actually have a system of equations that have same family part of the learning rule, and all individual parts of the learning rules are different. By implementation of the same method that was proposed in [15], it can be shown that synaptic vectors have bounded norms under some reasonable assumptions. It can be proved that the stable points of the algorithm are principal eigenvectors of the input signal covariance matrix. However, the proof is lengthy and will be omitted here. The sketch of the proof is as follows: – difference equations (6) can be related to their differential counterparts by implementation of stochastic approximation [16], [20]; – it is possible to show that all equations actually represent eigenvector equations for the set of matrices that have eigenvectors equal to eigenvectors of the input signal covariance matrix C= E(tr(xx T );
– if the weight matrix is full rank for all t, then columns of the synaptic matrix will converge toward some of the principal eigenvectors of the input covariance matrix; – then, using the method proposed in [15] it is possible to prove that those eigenvectors will correspond to principal eigenvectors of the input covariance matrix.
|
ma'muriyatiga murojaat qiling