Lecture Notes in Computer Science
Part I, LNCS 4984, pp. 93–101, 2008
Download 12.42 Mb. Pdf ko'rish
|
M. Ishikawa et al. (Eds.): ICONIP 2007, Part I, LNCS 4984, pp. 93–101, 2008. c Springer-Verlag Berlin Heidelberg 2008
94 A. Ichiki and M. Shiino involvement of synaptic noise can be analyzed by the cavity method to derive the TAP equation in the thermodynamic limit, since the energy concept appears as an effective Hamiltonian in this limit. It is natural to consider such neural network models with fluctuating synaptic couplings, since the synaptic couplings in real biological systems are updated by learning rules and the time-sequence of the synaptic couplings may be stochastic under the influence of noisy external stimuli. Thus the study on such networks is required to understand the retrieval process of the realistic networks. The aim of this paper is two-fold: (i) we will investigate the networks with which realization of synaptic noise have the en- ergy concept to apply the cavity method to derive the TAP equations, (ii) we will show the TAP equations for the networks when the concept of the effective Hamiltonian appears. This paper is organized as the follows: in the next section, we will briefly review how the energy concept appears in the network with synaptic noise and derive the TAP equations and the order parameter equations by using the cavity method and the SCSNA [8]. Once the effective Hamiltonian is found, the replica method is also applicable to derive the order parameter equations. However in the present paper, to make clear the relationship between the TAP equations and the order parameter equations, we do not use the replica trick. In section 3, we will inves- tigate the cases where the energy concept appears in the networks with synap- tic noise. We will see that the TAP equations and the order parameter equa- tions for some models can be derived in the framework mentioned in section 2. We will also mention that some difficulties to derive the TAP equations arise in some models with other involvements of synaptic noise. In the last section, we will make discussions on the structure of the TAP equations for the network with temporally fluctuating synaptic noise and conclude this paper. 2 Brief Review on Effective Hamiltonian, TAP Equations and Order Parameter Equations In this section, we briefly review the cavity method becomes applicable to the network with fluctuating synaptic couplings [8]. Then we derive the TAP equa- tions and the order parameter equations self-consistently in the framework of the SCSNA. In this section, we deal with the following stochastic analog neural network of N neurons with temporally fluctuating synaptic noise (multiplicative noise): ˙x i = −φ (x
i ) +
j(=i) J ij (t)x j + η i (t),
(1) η i (t)η j (t ) = 2Dδ ij δ(t
− t ), (2)
where x i (i = 1, · · · , N) represents a state of the neuron at site i taking a continuous value, φ(x i ) is a potential of an arbitrary form which determines the probability distribution of x i in the case without the input j(=i) J ij x j , η i the
Langevin white noise with its noise intensity 2D and J ij (t) the synaptic coupling. TAP Equation for Associative Memory Neural Network Models 95 We note here that, in the case of associative memory neural network, the synaptic coupling J ij is usually defined by the well-known Hebb learning rule. However, in the present paper, we will deal with the coupling J ij fluctuating around the Hebb rule with a white noise: J ij (t) = ¯ J ij + ij (t), (3) ij (t) kl (t ) =
2 ˜ D N δ ik δ jl δ(t
− t ), (4)
where ¯ J ij is defined by the usual Hebb learning rule ¯ J ij ≡ 1 N p μ=1
ξ μ i ξ μ j with p = αN the number of patterns embedded in the network, ξ μ i
±1 is the μ th embedded pattern at neuron i, and ij (t) denotes the synaptic noise independent of η i
2 ˜ D/N .
Using Ito integral, we obtain the Fokker-Planck equation corresponding to the Langevin equation (1) as ∂P (t, x) ∂t = − N i=1 ∂ ∂x i ⎧ ⎨ ⎩ −φ (x i ) + j(=i) ¯ J ij x j − D + ˜ D ˆ
q ∂ ∂x i ⎫ ⎬ ⎭ P (t, x), (5) where ˆ q
1 N j(=i) x 2 j . Since the self-averaging property holds in the thermody- namic limit N → ∞, ˆq is identified as ˆ q = 1 N N i=1 x 2 i . (6) The order parameter ˆ q is obtained self-consistently in our framework as seen be- low. Supposing ˆ q is given, one can easily find the equilibrium probability density for the Fokker-Planck equation (5) as P N (x) = Z −1 exp ⎧ ⎨ ⎩ −β eff ⎛ ⎝ N i=1 φ(x i ) − i ¯ J ij x i x j ⎞ ⎠ ⎫ ⎬ ⎭ , (7)
where Z denotes the normalization constant and β −1 eff ≡ D + ˜
D ˆ q (8) plays the role of the effective temperature of the network. The temperature of the system is modified as a consequence of the multiplicative noise and it depends on the order parameter ˆ q. Notice here that the equilibrium distribution of the system becomes Gibbs distribution in the thermodynamic limit N → ∞. The equilib- rium solution for equation (5) in the finite N -body system indeed differs from the probability density (7). However, since 1 N
x 2 j − 1 N j x 2 j = O(1/
√ N ), the
difference between the probability densities in the finite N -body system P N and in the system in the thermodynamic system P N →∞ is P N →∞ −P N = O(1/ √ N ).
96 A. Ichiki and M. Shiino Thus one can conclude that the equilibrium density for equation (5) converges to the probability density (7) in the thermodynamic limit N → ∞. Since we have explicitly written down the equilibrium probability density (7) as a Gibbsian form, one can define the effective Hamiltonian of (sufficiently large) N -body system as H N
N i=1
φ(x i ) − i ¯ J ij x i x j . (9) Since we have found the effective Hamiltonian and the effective temperature, one can apply the usual cavity method [3] to this system and derive the TAP equation. According to the cavity method, we divide the Hamiltonian of N -body system (9) into that of (N −1)-body system and the part involving the state of i th neuron
as H N = φ(x i ) − h i x i + H
N −1 , where h i ≡ j(=i) ¯ J ij x j is the local field at site i and the Hamiltonian of (N −1)-
body system H N −1 is given as H N −1 ≡ j(=i)
φ(x j ) − j ¯ J jk x j x k . The
marginal probability density of x i and the local field h i is given as P N
i , h
i ) =
⎡ ⎣ j(=i) dx j ⎤ ⎦ δ ⎛ ⎝h i − j(=i) ¯ J ij x j ⎞ ⎠ P N (x) = ˜ Z −1 exp { −β
eff [φ(x
i ) − h i x i ] } P
N −1 (h i ), where ˜ Z is the normalization constant and P N −1 (h i ) denotes the probability density of the local field h i in the (N − 1)-body system defined as P N −1 (h i ) ≡ Z
−1 N −1 ⎡ ⎣ j(=i) dx j ⎤ ⎦ δ ⎛ ⎝h i − j(=i) ¯ J ij x j ⎞ ⎠ exp [−β eff H N −1 ] , where Z N −1 denotes the normalization constant. Since the local field is given as the summation of a sufficiently large number of random variables and their cross- correlations are expected to be O(1/ √ N ), one can expect that P N −1 (h i ) turns
out to be a Gaussian density in the thermodynamic limit N → ∞ according to the central limit theorem: P N −1 (h i ) = 1 √ 2πσ 2 exp − (h i − h i N
−1 ) 2 2σ 2 , where · N −1 represents the thermal average with respect to the (N − 1)-body probability density P N −1
2 is the variance of P N −1
i ), which is eval- uated later self-consistently in the framework of the SCSNA. Then taking the
TAP Equation for Associative Memory Neural Network Models 97 average of x i with respect to the marginal probability P N (x
, h i ) straightfor- wardly yields x i = F ( h i N
−1 ), (10) where F is a transfer function defined as F (y)
≡ dx x exp
−β eff φ(x) − yx − β eff σ 2 2 x 2 dx exp −β eff φ(x) − yx − β eff σ 2 2 x 2 . (11) Similarly h i N −1
h i N
−1 = h
i − β
eff σ 2 x i . Thus we have the pre-TAP equation x i = F ⎛ ⎝ j(=i) ¯ J ij x j − Γ Ons
x i ⎞ ⎠ , (12)
where Γ Ons
≡ β eff σ 2 . Since the concrete form of the transfer function F depends on the effective temperature β eff and the variance of the local field σ 2 , it is
necessary to obtain β eff and σ 2 to have the TAP equation [7,9]. Then we can apply the SCSNA to equation (12) to determine β eff and σ 2 self-
consistently. We assume that the only one condensed pattern {ξ 1 i } is retrieved. Using the overlap order parameter m μ ≡ 1 N N i=1 ξ μ i x i and the concept of the SCSNA [7,9], we have h i
1 i m 1 + ξ
μ i m μ + z
iμ + Γ
SCSNA x i , (13)
where ν ≥2 ξ ν i m ν = ξ μ i m μ + z
iμ + γ x
i , Γ
SCSNA ≡ γ − α and z iμ is a Gaus- sian random variable with zero mean. We will evaluate the overlap m μ self- consistently and obtain z iμ and γ. Substituting equation (13) into the pre-TAP equation (12) reads x i = F ξ 1 i m 1 + ξ μ i m μ + z
iμ + (Γ
SCSNA − Γ
Ons ) x
i and comparing this equation with equation (10) yields [7] Γ SCSNA
= Γ Ons
, since h
i N −1 is considered to be a Gaussian random variable which should not contain the Onsager reaction term. Noting that m μ = O(1/ √ N ) for μ ≥ 2, one can obtain the overlap for noncondensed patterns as m μ
1 N (1
− U) N j=1 ξ μ j F (ξ 1 j m 1 + z jμ ), (14) U ≡ 1 N N j=1 F (ξ 1 j m 1 + z jμ ), (15) 98 A. Ichiki and M. Shiino where F denotes the derivative of the transfer function F and the order expan- sion of F with respect to 1/ √ N has been applied to x i = F (ξ
1 i m 1 +z iμ +ξ μ i m μ ). Using equation (14) and the definitions of z iμ and γ, one finds γ = α 1 − U , z iμ = 1 N (1 − U)
ν(=1,μ) j(=i) ξ ν i ξ ν j F (ξ
1 j m 1 + z
jν ). Thus the variance of z iμ is evaluated as σ 2
= α (1 − U) 2 F 2 (ξm
1 + z)
ξ,z , (16) where · ξ,z represents the average over a random variables ξ = ±1 and the Gaussian variable z, and the self-averaging property has been used. Similarly one obtains the set of order parameter equations as m 1
1 + z)
ξ,z , (17) U = F (ξm 1 + z) ξ,z , (18) Γ Ons
= Γ SCSNA
= αU 1 − U . (19) For the present model, it is required to evaluate the order parameter ˆ q to deter- mine the form of the transfer function F . Since ˆ q is related to the susceptibility and, by definition of F (11), the order parameter U corresponds with the sus- ceptibility as U = β eff ( x
2 − x
2 ), one finds ˆ q =
U β eff + (1 − U) 2 α σ 2 z . (20) The set of equations (16), (17), (18), (19), (20) takes a closed form and one can determine the form of F self-consistently as well as the set of order parameters. Therefore substituting into the pre-TAP equation (12) the solutions β eff and
Γ Ons
that are self-consistently obtained within this framework yields the TAP equation. 3 Models with Effective Hamiltonian and Effective Temperature In this section, we examine the applicability of the concept shown in the previ- ous section for the network models with various types of involvement of synaptic noise. As is shown in the previous section, the network has the effective Hamilto- nian when the equilibrium density becomes Gibbsian one in the thermodynamic limit. Since the forms of the equilibrium probability densities are estimated by investigating the forms of the Fokker-Planck equations corresponding to the
TAP Equation for Associative Memory Neural Network Models 99 original models, we will see the concrete forms of the drift and diffusion coef- ficients of the Fokker-Planck equations in the thermodynamic limit. Once the Fokker-Planck equation for each model is obtained, the form of the equilibrium distribution is estimated and hence we can know whether the network has energy concept in the thermodynamic limit. We deal with the following models: the state of i th neuron x i obeys the dy- namics (1) and (2). The synaptic couplings J ij are given in Table 1: Table 1. case
J ij ( t) noise
(i) ¯ J ij + ij ( t) ij ( t) kl ( t ) = 2δ
ik δ jl δ(t − t )/ ˜βN (ii)
¯ J ij + i ( t) j ( t) i ( t) j ( t ) = 2/ ˜βNδ ij δ 1 /2 ( t − t ) (iii)
¯ J ij + i ( t)˜ j ( t) i ( t) j ( t ) = √ 2 / ˜βN
γ δ ij δ 1 /2 ( t − t )
˜ i ( t)˜ j ( t ) = √ 2 / ˆβN
1 −γ δ ij δ 1 /2 ( t − t ) i ( t)˜ j ( t ) = 0 (iv) ¯ J ij (1 +
ij ( t)) ij ( t) kl ( t ) = 2δ ik δ jl δ(t − t )/ ˜β (v)
¯ J ij (1 + j ( t)) i ( t) j ( t ) = 2δ ij δ(t − t )/ ˜β (vi) ¯ J ij (1 +
i ( t) j ( t)) i ( t) j ( t ) = 2/ ˜βδ ij δ 1 /2 ( t − t ) (vii) ¯ J ij (1 + i ( t)˜ j ( t)) i ( t) j ( t ) = √ 2 δ ij δ 1 /2 ( t − t )/ ˜βN γ ˜
( t)˜
j ( t ) = √ 2 δ ij δ 1 /2 ( t − t )/ ˆβN −γ i ( t)˜
j ( t ) = 0 (viii) ¯ J ij + j ( t) i ( t) j ( t ) = 2δ
ij δ(t − t )/ ˜βN (ix) ¯
ij + i ( t) i ( t) j ( t ) = 2δ
ij δ(t − t )/ ˜βN (x) ¯
ij (1 +
i ( t)) i ( t) j ( t ) = 2δ ij δ(t − t )/ ˜β (xi) ¯
ij + (
t) ( t) (t ) = 2δ(t − t )/ ˜βN (xii) ¯ J ij (1 + (
t)) ( t) (t ) = 2δ(t − t )/ ˜β Note that, in Table 1, · denotes the average with respect to the noise, β and ˜ β relate with the noise intensity of i and ˜
i respectively and γ arbitrary real number. The case (i) has been treated in the previous section. Using Ito integral, we can straightforwardly find the diffusion coefficient D (2) ij
q and M are defined as ˆ q ≡
x 2 i /N , M ≡ i x i / √ N and the local field is defined as h i
j(=i) ¯ J ij x j . The drift coefficients D (1)
i for all cases are same as D (1) i
−φ (x i ) + j(=i) ¯ J ij x j . Thus there exists the energy concept or the effective Hamiltonian for the mod- els (i)-(vii). In these models, the effective temperature 1/β eff = D (2) ii also exits. 100 A. Ichiki and M. Shiino Therefore the TAP equations and the order parameter equations for these models can be obtained in the framework described in the previous section. However, in the case (viii), the diffusion coefficient D (2)
ij contains non-zero off-diagonal ele- ments and hence the equilibrium distribution does not take the form of the Gibbs distribution. This difficulty occurs also in the cases (xi) and (xii). In the case (ix), the effective Hamiltonian and the effective temperature exist. However, in this case, evaluating the quantity M is required to determine the effective tem- perature. For the quantity M the self-averaging property does not hold and thus the effective temperature can not be found in the framework mentioned previ- ously. In the case (x), the local field h i itself is the thermally fluctuating random variable, and hence the dynamics for the local fields h i ’s are also required to estimate the effective temperature. Note that the equilibrium distribution for this model may not become the Gibbs distribution since the dynamics of the local fields h i ’s should be also considered. In the case (xi), there exist the off- diagonal elements in the diffusion coefficient and it depends on M , which can not be evaluated by the self-averaging property. In the case (xii), the off-diagonal elements on the diffusion coefficient also exist and it depends on the temporally fluctuating random variables h i ’s. Therefore deriving the TAP equations and the order parameter equations for the models (viii)-(xii) is the future problem. Table 2.
case D (2) ij energy concept (i) (1
ij exist
(ii) (1 /β + ˆq/ ˜β)δ ij exist
(iii) (1 /β + ˆq/ ˜βˆβ)δ ij exist
(iv) (1 /β + αˆq/ ˜β)δ ij exist
(v) (1 /β + αˆq/ ˜β)δ ij exist
(vi) (1 /β + αˆq/ ˜β)δ ij exist
(vii) (1 /β + αˆq/ ˜βˆβ)δ ij exist
(viii) δ ij /β + ˆq/ ˜β not exist (ix) (1 /β + M
2 / ˜β)δ
ij not exist (x) (1
2 i / ˜β)δ ij not exist (xi) δ
/β + M 2 / ˜β not exist (xii)
δ ij /β + h i h j / ˜β not exist 4 Conclusion In this paper we have derived the TAP equations and the order parameter equa- tions for the neural network models with synaptic noise. In the models presented TAP Equation for Associative Memory Neural Network Models 101
in this paper, the concept of energy or the Hamiltonian does not exist in the orig- inal finite N -body system. As we have seen in the previous section, the effective Hamiltonian and the effective temperature exist in the thermodynamic limit and then the TAP equations together with the order parameter equations are derived by the use of the cavity method and the SCSNA as shown in section 2 for the cases (i)-(vii). For the cases (viii)-(xii), however, some difficulties mentioned in the previous section arise. To derive the TAP equations and the order parameter equations for these cases remains as the future problem. One of the authors (A. I.) was supported by the 21st Century COE Program at Tokyo Tech ”Nanometer-Scale Quantum Physics” by the Ministry of Education, Culture, Sports, Science and Technology. References 1. Sherrington, D., Kirkpatrick, S.: Solvable Model of a Spin-Glass. Phys. Rev. Lett. 35, 1792–1796 (1975) 2. Amit, D.J., Geutfreund, H., Sompolinsky, H.: Storing Infinite Numbers of Patterns in a Spin-Glass Model of Neural Networks. Phys. Rev. Lett. 55, 1530–1533 (1985) 3. M´ ezard, M., Parisi, G., Virasoro, M.A.: Spin Glass Theory and Beyond. World Scientific, Singapore (1987) 4. Thouless, D.J., Anderson, P.W., Palmer, R.G.: Solution of ’solvable model of a spin glass’. Philos. Mag. 35, 593–601 (1977) 5. Morita, T., Horiguchi, T.: Exactly solvable model of a spin glass. Solid. State. Comm. 19, 833–835 (1976) 6. Shiino, M., Fukai, T.: Self-consistent signal-to-noise analysis and its application to analogue neural networks with asymmetric connections. J. Phys. A: Math. Gen. 25, L375-L381 (1992) 7. Shiino, M., Yamana, M.: Statistical mechanics of stochastic neural networks: Re- lationship between the self-consistent signal-to-noise analysis. Thouless-Anderson- Palmer equation, and replica symmetric calculation approaches. Phys. Rev. E. 69, 011904-1-13 (2004) 8. Ichiki, A., Shiino, M.: Thouless-Anderson-Palmer equation for analog neural network with temporally fluctuating white synaptic noise. J. Phys. A: Math. Theor. 40, 9201– 9211 (2007) 9. Shiino, M., Fukai, T.: Self-consistent signal-to-noise analysis of the statistical be- havior of analog neural networks and enhancement of the storage capacity. Phys. Rev. E. 48, 867–897 (1993) Spike-Timing Dependent Plasticity in Recurrently Connected Networks with Fixed External Inputs Matthieu Gilson 1,2,3 , David B. Grayden 1,2,3 , J. Leo van Hemmen 4 ,
1,3 , and Anthony N. Burkitt 1,2,3 1
2 The Bionic Ear Institute, Melbourne, Australia 3 NICTA, The Victorian Research Lab, The University of Melbourne 4 Physik Department, die Technische Universit¨ at M¨ unchen, Germany mgilson@bionicear.org Abstract. This paper investigates spike-timing dependent plasticity (STDP) for recurrently connected weights in a network with fixed ex- ternal inputs (homogeneous Poisson pulse trains). We use a dynamical system to model the network activity and predict its asymptotic evolu- tion, which turns out to qualitatively depend on the learning parameters and the correlation structure of the inputs. Our predictions are supported by numerical simulations of Poisson neuron networks in general cases as well as for certain cases when using Integrate-And-Fire (IF) neurons. Keywords: Spiking neurons, neurodynamics, learning, STDP, dynami- cal system, recurrent network. 1 Introduction The hypothesis that changes in the efficacies of neuronal connections depend upon the correlations in timing of pre- and postsynaptic action potentials (or spikes) has received considerable experimental support [1]. Such a learning mech- anism is related to the notion of Hebbian learning [6] and can implement func- tional organisation in neural networks. The understanding of the underlying mechanisms behind learning in the brain is crucial both from a physiological level and for applications (eg. neural prostheses, robotics). Spike-timing dependent plasticity (STDP, [3]) is a fruitful candidate for this mechanism, that has been analysed in previous studies for feed-forward network architectures of Poisson neurons [7,8]. This theory was recently extended to the case of recurrently connected architectures, and applied to a fully connected network with no external input [2]. This paper pursues the development of this framework [2] based on the linear Poisson neuron model and additive STDP (as described in Sec. 2). The net- work activity is characterised by the firing rates of and the correlations between the neurons. The evolution of the network activity (or neural dynamics) is de- scribed by a dynamical system (as discussed in Sec. 3). It is analysed in terms of M. Ishikawa et al. (Eds.): ICONIP 2007, Part I, LNCS 4984, pp. 102–111, 2008. c Springer-Verlag Berlin Heidelberg 2008
Spike-Timing Dependent Plasticity in Recurrently Connected Networks 103
equilibria (or fixed points) and stability in order to predict the asymptotic net- work behaviour when stimulated by steady external inputs (Sec. 4). 2 Models
2.1 Linear Poisson Neuron In this study, we use the linear Poisson neuron [7]. The neuronal output firing mechanism is approximated by an inhomogeneous Poisson process generated by the Poisson parameter ρ(t) (it determines the probability of spiking an action potential, considered as an instantaneous event); ρ(t) evolves according to the sum of all the synaptic contributions (cf. Fig. 1.A) ρ(t) = ν
0 + k J k (t) n (t − t k,n − d
k ) . (1) As in [7], the total synaptic influx is the temporal and spatial (over the synapses) summation of the post-synaptic potentials (PSP) modelled by the synaptic re- sponse kernel weighted by the synaptic efficacies J k (or weights; note that we only consider positive weights here). models the current injected into the post- synaptic neuron induced by one single spike (indexed by n) at time t k,n
at the k th synaptic connection, after the fixed delay d k . Finally, ν 0 is the spontaneous firing rate caused by background activity (identical for all the neurons). 2.2
Additive STDP Gerstner et al. [3] first proposed a framework to study additive STDP, where the change of synaptic weight J k induced by a sole pair of pre- and post-synaptic pulses at times (t k , t out ) is
ΔJ k ∝ w in + w
out + W (t
k − t
out ). (2) J k is the weight of the k th synapse. w in and w
out are the rate-based learning parameters (one contribution per pulse at the pulse time); W is the STDP learning window (cf. Fig. 1.B) so that a synaptic weight is potentiated for a given pulse pair if a presynaptic input precedes a postsynaptic spike (i.e. “takes part” in the output firing), and is depressed otherwise [3]. Usually, such changes are scaled by a learning rate η. 2.3
Link with the Physiology In our abstract model, the neurons are excited by external inputs, which convey “neural information” that is assumed to be encoded in their firing rate and correlation structures. The neurons also have a spontaneous activity that can be understood as a consequence of a background activity. There are thus two sources of stochasticity, which can be interpreted as follows: one is related to the external inputs (pulse trains described by their firing rate and correlation
104 M. Gilson et al. Fig. 1. A: Linear Poisson neuron model. The output pulse train S(t) (top) is generated using the inhomogeneous Poisson parameter ρ(t) (middle). Each pre-synaptic pulse at the k
th synapse (bottom) induces a variation of ρ determined by the post-synaptic re- sponse kernel , the synaptic weight J k (t) and the delay d k . B : STDP window function W (in arbitrary unit). Each side of the real axis is determined by a negative exponential: u → c x exp(
−|u|/τ x ) with the index x = D for the depression part (u > 0) and x = P for the potentiation part (u < 0), cf. [2, Sec. 5] for details and the parameter values. structures) and it can be linked to the variability observed in physiological data; the other is due to the additional impact of the background activity upon the generation of the spike output of each neuron (intrinsic stochasticity modelled by the Poisson process). 3 Theoretical Analysis 3.1 Characterisation of the Neural Activity We consider a network of N Poisson neurons (referred to as internal neurons) stimulated with M Poisson pulse trains (or external inputs), as shown in Fig. 2. The activity of the neural network can be described using the firing rates and the pairwise correlations, plus the weights. In the terminology of the theory of dynamical systems, these variables characterise the “state” of the network activity at each time t and their evolution is referred to as neural dynamics. Similar to [7,2], we consider the time-averaged firing rates ν i (t) (for the inter- nal neuron indexed by i; T is a given time period) ν i (t) 1 T t t −T S i (t ) dt (3) where S
i (t) is the spike-time series of the i th neuron and the brackets . . . denotes the ensemble averaging. Likewise for the coefficient correlations (time- average correlation function convoluted with the STDP window function W ): D W
(t) between the i th internal neuron and the k th external input D W
(t) + ∞ −∞ W (u)
1 T t t −T S i (t ) ˆ
S k (t + u) dt du (4)
and Q W ij (t) between the i th and j th internal neurons (with S i and S
j ) [2, Sec. 3]. Spike-Timing Dependent Plasticity in Recurrently Connected Networks 105
Fig. 2. Presentation of the network and the notation. The internal neurons (within the network) are indexed by i ∈ [1..N], and their output pulse trains are denoted S i (t) (it can be understood as a sum of “Dirac functions” at each spiking time [2, Sec. 2]). Likewise for the external input pulse trains ˆ S k
∈ [1..M]). The time-averaged firing rates are denoted by ν i (t), cf. Eq. 3; the correlation coefficients within the network by Q ij (t) (resp. D ik (t) between a neuron in the network and an external input; and ˆ Q
(t) between two external inputs), cf. Eq. 4. The weight of the connection from the k th external input onto the i th internal neuron is denoted K ik (t) (resp. J ij (t) from the j th internal neuron onto the i th internal neuron). The stimulation parameters (which represent the “information” carried in the external inputs) are determined by the time-averaged input spiking rates (ˆ ν k
defined similarly to ν i (t)) and their correlation coefficients ( ˆ Q W kl (t), defined simi- larly to D W ik
W ij (t)) [2, Sec. 3]. In this paper, we only consider stimulation parameters that are constant in time. 3.2
Learning Equations Learning equations can be derived from Eq. 2 as in Kempter et al. [7]. This re- quires the assumption that the internal pulse trains are statistically independent (this can be considered valid when the number of recurrent connection is large enough) and a small learning rate η. This leads to the matrix equation Eq. 10 for the weights between internal neurons (resp. to Eq. 9 for the input weights K). 3.3 Activation Dynamics In order to study the evolution of the weights described by Eqs. 9 and 10, we need to evaluate the neuron time-average firing rates (the vector ν(t)) and their time-average correlation coefficients (the matrices D W (t) and Q W (t)). Similar to Kempter et al. [7] and Burkitt et al. [2], we approximate the instantaneous firing rate S i (t) of the i th Poisson neuron by its expected inhomogeneous Pois- son parameter ρ(t) (cf. Eq. 1) and we neglect the impact of the short-time dynamics (the synaptic response kernel and the synaptic delays ˆ d ik and d ij ) by using time averaged variables (over a “long” period T ). We require that the 106 M. Gilson et al. learning occurs slowly compared to the activation mechanisms (cf. the neuron and synapse models) so that T is large compared to the time scale of these mech- anisms but small compared to the inverse of the learning parameter η −1 . This leads to the consistency matrix equations of the firing rates (Eq. 6) and of the correlation coefficients (Eqs. 7 and 8). See Burkitt et al. [2, Sec. 3] for details of the derivation. Note that the consistency equations of the correlation coefficients as defined in [2, Sec. 3 and 4] have been reformulated to express the usual covariance using the assumption that the correlations are quasi-constant in time [5] (this implies that ˆ Q
[2, Sec. 3] is actually equal to ˆ Q W ). Eqs. 7 and 8 express the impact of the connectivity (through the term [I − J] −1
and the cross covariances in terms of the input covariance ˆ C W ˆ C W ˆ Q W − W ˆν ˆν T , (5) where W
W (u) du evaluates the balance between the potentiation and the depression of our STDP rule. These equations are obtained by combining the equations in [2] with the firing-rate consistency equation Eq. 6. 3.4
Network Dynamical System Putting everything together, the network dynamics is described by ν = [I − J]
−1 ν 0 E + K ˆ ν (6) D W − W ν ˆν T = [I
− J] −1 K ˆ Q W − W ˆν ˆν T (7) Q W − W ν ν T = [I
− J] −1 K ˆ Q W − W ˆν ˆν T K T [I − J] −1T (8)
dK dt = Φ K w in E ˆ ν T + w out
ν ˆ E T + D W (9) dJ dt = Φ J w in E ν T + w out ν E
T + Q
W , (10) where E is the unit vector of N elements (likewise for ˆ E with M elements); Φ J is a projector on the space of N ×N matrices to which J belongs [2, Sec. 3]. The effect of Φ J is to nullify the coefficients corresponding to a missing connection in the network, viz. all the diagonal terms because the self-connection of a neuron onto itself is forbidden. More generally, such an projection operator can account for any network connectivity [2, Sec. 3]. Note that time has been rescaled, in order to remove η from these equations and for simplicity of notation the dependence on time t will be omitted in the rest of this paper. The matrix [I − J(t)] is assumed invertible at all time (the contrary would imply a diverging behaviour of the firing rates [2, Sec. 4]). 4 Recurrent Network with Fixed Input Weights We now examine the case of a recurrently connected network with fixed input weights K and learning on the recurrent weights J . Only Eqs. 6, 8 and 10 remain, Spike-Timing Dependent Plasticity in Recurrently Connected Networks 107
which simplifies the analysis. In the case of full recurrent connectivity, Φ J in Eq. 10 only nullifies the diagonal terms of the square matrix in its argument. 4.1
Analytical Predictions Homeostatic equilibrium. Similarly to [7,2], we derive the scalar equations of the mean firing rate ν av N −1 i ν i and weight J av (N (N
− 1)) −1 i=j J ij . It consists of neglecting the inhomogeneities of the firing rates and of the weights over the network, as well as of the connectivity, and we obtain ν av
ν 0 + (K ˆ ν) av 1 − (N − 1)J av (11) ˙ J av = w in + w out ν av + W + C ν 2 av , where (K ˆ ν) av
ν and C is defined as C (K ˆ C W K T ) av [ν 0 + (K ˆ ν) av ] 2 . (12) Eqs. 11 are thus the same equations as for the case with no input [2, Sec. 5], with ν 0 and W replaced by, resp., ν 0 + (K ˆ ν) av and W + C. It qualitatively implies the same dynamical behaviour and, provided that W + C < 0, the network exhibits a homeostatic equilibrium (when it is realisable, it requires in particular μ > 0), and the means of the firing rates and of the weights converge towards ν ∗ av = μ
− w in + w out
W + C (13)
J ∗ av = μ − ν 0 − (K ˆν)
av (N − 1)μ , If the correlation inputs are time-invariant functions and if the correlations are positive (i.e. more likely to fire in a synchronous way) and homogeneous among the correlated input pool, it follows that C is of the same sign as W . Therefore, the condition for stability reverts to W < 0 as in [7,2]. Structural dynamics. For the particular case of uncorrelated inputs (or more generally when K ˆ C W K T = 0), the fixed-point structure of the dynamical sys- tem is qualitatively the same as for the case of no external input [2, Sec. 5]: a homogeneous distribution for the firing rates and a continuous manifold of fixed points for the internal weights. In the case of “spatial” inhomogeneities over the input correlations, the net- work dynamics shows a different evolution. To illustrate and to compare this with the case of feed-forward connections, we consider the network configura- tion described in Fig. 3 and inspired by [7], where one input pool has correlation while the other pool has uncorrelated sources. In general, the equations that 108 M. Gilson et al. Fig. 3. Architecture of the simulated network. The inputs are divided into two sub- pools, each feeding half of the internal network with means K 1 and K
2 for the input weights from each subpool. Similarly to the recurrent weights J and the firing rates ˆ ν and ν, the inhomogeneities are neglected within each subpool and they are assumed to be all identical to their mean. The weights and delays are initially set with 10% random variation around a mean. determine the fixed points have no accurate solution. Yet, we can reduce the dimensionality by neglecting the variance within each subpool and make ap- proximations to evaluate the asymptotic distribution of the firing rates, which turns out to be bimodal. 4.2 Simulation Protocol and Results We simulated a network of Poisson neurons as described in Fig. 3 with random initial recurrent weights (uniformly distributed in a given interval, as well as all the synaptic delays). An issue with such simulations is to maintain positive internal weights during their homeostatic convergence because they individually diverge. Thus, all equilibria are not realisable depending on the initial distribu- tions of K and J and the weight bounds [7,2]. See [2, Sec. 5] for details about the simulation parameters. An interesting first case to test the analytical predictions consists of two pools of uncorrelated inputs, each feeding half of the network with distinct weights (K 1 ˆ ν 1 = K 2 ˆ ν 2 and ˆ C W = 0). Each half of the internal network thus has distinct firing rates initially and, as predicted, the outcome is a convergence of these firing-rates towards a uniform value (similar to the case of no external input [2, Sec. 5]). In the case of a full connected (for both the K and the J according to Fig. 3) network stimulated by one correlated input pool (short-time correlation inspired by [4] so that ˆ C W
and the internal weights also exhibit a homeostatic equilibrium. As shown in Fig. 4, the means over the network (thick solid lines) converge towards the predicted equilibrium values (dashed lines). Furthermore, the individual firing rates tend to stabilise and their distribution remains bimodal (the subpool #1 excited by correlated inputs fires at a lower rate eventually when W < 0). The recurrent weights individually diverge similarly to [2, Sec. 5] and reorganise so that the outgoing weights from the subpool #1 (see the means over each weight subpool
Spike-Timing Dependent Plasticity in Recurrently Connected Networks 109
0 10000
20000 10 15 20 25 time (sec) firing rates (Hz) #1 #1 #2 #2 0 10000 20000
0 0.005
0.01 0.015
0.02 0.025
0.03 time (sec) weights J21
J22 J12
J11 Fig. 4. Evolution of the firing rates (left) and of the recurrent weights (right) for N = 30 fully-connected Poisson neurons (cf. Fig.3) with short-time correlated inputs. The outcome is a quasi bimodal distribution of the firing rates (the grey bundle, the mean is the thick solid line) around the predicted homeostatic equilibrium (dashed line). The subgroup #1 that receives correlated inputs is more excited initially but fires at a lower rate at the end of the simulation (cf. the two thin black solid lines which represent the mean over each subpool). The internal weights individually diverge, while their mean (thick line) converges towards the predicted equilibrium value (dashed line). They reorganise themselves so that the weights outgoing from the subpool #2 (that receives uncorrelated inputs) become silent, while the ones from #1 are strengthened. Note that the homeostatic equilibrium is preserved even when some weights saturate. 0 10000
20000 15 20 25 time (sec) firing rates (Hz) #2 #1 #2 #1 Fig. 5. Evolution of the firing rates for a partially connected network of N = 75 neurons. Both the K and the J have 40% probability of connection with the same setup as the network in Fig. 4 (to preserve the total input synaptic strength). The mean firing rate (very thick line) still converges towards the predicted equilibrium value (dashed line) and the two subgroups (grey bundles, each mean represented by a thin black solid line) get separated similarly to the case of full connectivity. The internal weights (not shown) exhibit similar dynamics as for the case of full connectivity. J 11
21 in Fig. 4) are strengthened while the other ones almost become silent (see J 12
22 ). In other words, the subpool that receives correlated inputs takes the upper hand in the recurrent architecture.
110 M. Gilson et al. 0 10000
15 20 25 30 time (sec) firing rates (Hz) 0
10000 15 20 25 30 time (sec) firing rates (Hz) Fig. 6. Simulation of networks of IF neurons with partial random connectivity of 50%. The network qualitatively exhibits the expected behaviour in the case of uncorrelated inputs (left) and inputs with one correlated pool and one uncorrelated pool (right). In the case of partial connectivity for both the K and the J , the behaviour of the individual firing rates (cf. Fig. 5) still follows the predictions but they are more dispersed, their convergence is slower and the bimodal distribution is not always observed as clearly as in the case of full connectivity (in Fig. 5 the means of the two internal neuron subpools clearly remain separated though). The homeostatic equilibrium of the internal weights also holds and they individually diverge. The partial connectivity needs to be rich enough for the predictions of the mean variable to be accurate enough. First results with IF neurons comply with the analytical predictions remain valid even if the activation mechanisms are more complex (here with a connec- tivity of 50%, cf. Fig. 6). 5 Discussion and Future Work The analytical results presented here are preliminary, and further investigation is needed to gain better understanding of the interplay between the input cor- relation structure and STDP. Nevertheless, our results illustrate two points: STDP induces a stable activity in recurrent architectures similar to that for feed-forward ones (homeostatic regulation of the network activity under the con- dition W < 0); and the qualitative structure of the internal firing rates is mainly determined by the input correlation structure. Namely, a “poor” correlation structure (uncorrelated or delta-correlated inputs, so that ˆ C W
homogenisation of the firing activity. Finally, partial connectivity impacts upon the structure of the internal firing rates, but such networks still exhibit similar behaviour to fully connected ones. Preliminary results involving more complex patterns of K ˆ Q W
T suggest
more complex interplay between the input correlation structure and the equi- librium distribution of the network firing rates and weights. Such cases are under investigation and may constitute more “interesting” dynamic behaviour of the network from a cognitive modelling point of view, namely through the Spike-Timing Dependent Plasticity in Recurrently Connected Networks 111
relationship between the attractors of the network activity and the input struc- ture. The case of learning for both the input connections and the recurrent ones will also form part of a future study. Comparison with IF neurons suggests that the impact of the neuron activation mechanisms on the weight dynamics may not be significant. It can be linked to the separation of the time scales between them in the case of slow learning. Note that with our approximations the IF neurons are assumed to be in “linear input-output regime” (no bursting for instance). Acknowledgments The authors thank Iven Mareels, Chris Trengove, Sean Byrnes and Hamish Mef- fin for useful discussions that introduced significant improvements. MG is funded by two scholarships from The University of Melbourne and NICTA. ANB and DBG acknowledge funding from the Australian Research Council (ARC Discov- ery Projects #DP0453205 and #DP0664271) and The Bionic Ear Institute. References 1. Bi, G.Q., Poo, M.M.: Synaptic modification by correlated activity: Hebb’s postulate revisited. Annual Review of Neuroscience 24, 139–166 (2001) 2. Burkitt, A.N., Gilson, M., van Hemmen, J.L.: Spike-timing-dependent plasticity for neurons with recurrent connections. Biological Cybernetics 96, 533–546 (2007) 3. Gerstner, W., Kempter, R., van Hemmen, J.L., Wagner, H.: A neuronal learning rule for sub-millisecond temporal coding. Nature 383, 76–78 (1996) 4. Gutig, R., Aharonov, R., Rotter, S., Sompolinsky, H.: Learning input correlations through nonlinear temporally asymmetric hebbian plasticity. Journal of Neuro- science 23, 3697–3714 (2003) 5. Hawkes, A.G.: Point spectra of some mutually exciting point processes. Journal of the Royal Statistical Society Series B-Statistical Methodology 33, 438–443 (1971) 6. Hebb, D.O.: The organization of behavior: a neuropsychological theory. Wiley, Chichester (1949) 7. Kempter, R., Gerstner, W., van Hemmen, J.L.: Hebbian learning and spiking neu- rons. Physical Review E 59, 4498–4514 (1999) 8. van Rossum, M.C.W., Bi, G.Q., Turrigiano, G.G.: Stable hebbian learning from spike timing-dependent plasticity. Journal of Neuroscience 20, 8812–8821 (2000) A Comparative Study of Synchrony Measures for the Early Detection of Alzheimer’s Disease Based on EEG Justin Dauwels, Fran¸cois Vialatte, and Andrzej Cichocki RIKEN Brain Science Institute, Saitama, Japan justin@dauwels.com, {fvialatte,cia}@brain.riken.jp Abstract. It has repeatedly been reported in the medical literature that the EEG signals of Alzheimer’s disease (AD) patients are less syn- chronous than in age-matched control patients. This phenomenon, how- ever, does at present not allow to reliably predict AD at an early stage, so-called mild cognitive impairment (MCI), due to the large variabil- ity among patients. In recent years, many novel techniques to quan- tify EEG synchrony have been developed; some of them are believed to be more sensitive to abnormalities in EEG synchrony than tradi- tional measures such as the cross-correlation coefficient. In this paper, a wide variety of synchrony measures is investigated in the context of AD detection, including the cross-correlation coefficient, the mean-square and phase coherence function, Granger causality, the recently proposed corr-entropy coefficient and two novel extensions, phase synchrony in- dices derived from the Hilbert transform and time-frequency maps, information-theoretic divergence measures in time domain and time- frequency domain, state space based measures (in particular, non-linear interdependence measures and the S-estimator), and at last, the recently proposed stochastic-event synchrony measures. For the data set at hand, only two synchrony measures are able to convincingly distinguish MCI patients from age-matched control patients (p < 0.005), i.e., Granger causality (in particular, full-frequency directed transfer function) and stochastic event synchrony (in particular, the fraction of non-coincident activity). Combining those two measures with additional features may eventually yield a reliable diagnostic tool for MCI and AD. 1 Introduction Many studies have shown that the EEG signals of AD patients are generally less coherent than in age-matched control patients (see [1] for an in-depth review). It is noteworthy, however, that this effect is not always easily detectable: there tends to be a large variability among AD patients. This is especially the case for patients in the pre-symptomatic phase, commonly referred to as Mild Cognitive Impairment (MCI), during which neuronal degeneration is occurring prior to the clinical symptoms appearance. On the other hand, it is crucial to predict AD at M. Ishikawa et al. (Eds.): ICONIP 2007, Part I, LNCS 4984, pp. 112–125, 2008. c Springer-Verlag Berlin Heidelberg 2008
A Comparative Study of Synchrony Measures for the Early Detection of AD 113
an early stage: medication that aims at delaying the effects of AD (and hence intend to improve the quality of life of AD patients) are the most effective if applied in the pre-symptomatic phase. In recent years, a large variety of measures has been proposed to quantify EEG synchrony (we refer to [2]–[5] for recent reviews on EEG synchrony measures); some of those measures are believed to be more sensitive to perturbations in EEG synchrony than classical indices as for example the cross-correlation coefficient or the coherence function. In this paper, we systematically investigate the state-of- the-art of measuring EEG synchrony with special focus on the detection of AD in its early stages. (A related study has been presented in [6,7] in the context of epilepsy.) We consider various synchrony measures, stemming from a wide spectrum of disciplines, such as physics, information theory, statistics, and signal processing. Our aim is to investigate which measures are the most suitable for detecting the effect of synchrony perturbations in MCI and AD patients; we also wish to better understand which aspects of synchrony are captured by the different measures, and how the measures are related to each other. This paper is structured as follows. In Section 2 we review the synchrony measures considered in this paper. In Section 3 those measures are applied to EEG data, in particular, for the purpose of detecting MCI; we describe the EEG data set, elaborate on various implementation issues, and present our results. At the end of the paper, we briefly relate our results to earlier work, and speculate about the neurophysiological interpretation of our results. 2 Synchrony Measures We briefly review the various families of synchrony measures investigated in this paper: cross-correlation coefficient and analogues in frequency and time-frequency domain, Granger causality, phase synchrony, state space based synchrony, information theoretic interdependence measures, and at last, stochastic-event synchrony measures, which we developed in recent work. 2.1
Cross-Correlation Coefficient The cross-correlation coefficient r is perhaps one of the most well-known mea- sures for (linear) interdependence between two signals x and y. If x and y are not linearly correlated, r is close to zero; on the other hand, if both signals are identical, then r = 1 [8]. 2.2
Coherence The coherence function quantifies linear correlations in frequency domain. One distinguishes the magnitude square coherence function c(f ) and the phase co- herence function φ(f ) [8]. 114 J. Dauwels, F. Vialatte, and A. Cichocki 2.3 Corr-Entropy Coefficient The corr-entropy coefficient r E is a recently proposed [9] non-linear extension of the correlation coefficient r; it is close to zero if x and y are independent (which is stronger than being uncorrelated). 2.4 Coh-Entropy and Wav-Entropy Coefficient One can define a non-linear magnitude square coherence function, which we will refer to as “coh-entropy” coefficient c E (f ); it is an extension of the corr-entropy coefficient to the frequency domain. The corr-entropy coefficient r E can also be extended to the time-frequency domain, by replacing the signals x and y in the definition of r E by their time-frequency (“wavelet”) transforms. In this paper, we use the complex Morlet wavelet, which is known to be well-suited for EEG signals [10]. The resulting measure is called “wav-entropy” coefficient w E (f ).
(To our knowledge, both c E (f ) and w E (f ) are novel). 2.5 Granger Causality Granger causality 1 refers to a family of synchrony measures that are derived from linear stochastic models of time series; as the above linear interdependence measures, they quantify to which extent different signals are linearly interde- pendent. Whereas the above linear interdependence measures are bivariate, i.e., they can only be applied to pairs of signals, Granger causality measures are multivariate, they can be applied to multiple signals simultaneously. Suppose that we are given n signals X 1 (k), X
2 (k), . . . , X n (k), each stemming from a different channel. We consider the multivariate autoregressive (MVAR) model:
X(k) = p =1 A(j)X(k − ) + E(k), (1) where X(k) = (X 1 (k), X
2 (k), . . . , X n (k))
T , p is the model order, the model coefficients A(j) are n × n matrices, and E(k) is a zero-mean Gaussian random vector of size n. In words: Each signal X i (k) is assumed to linearly depend on its own p past values and the p past values of the other signals X j (k). The deviation between X(k) and this linear dependence is modeled by the noise component E(k). Model (1) can also be cast in the form: E(k) = p
˜ A(j)X(k
− ), (2)
1 The Granger causality measures we consider here are implemented in the BioSig library, available from http://biosig.sourceforge.net/
A Comparative Study of Synchrony Measures for the Early Detection of AD 115
where ˜ A(0) = I (identity matrix) and ˜ A(j) = −A(j) for j > 0. One can transform (2) into the frequency domain (by applying the z-transform and by substituting z = e −2πiΔt , where 1/Δt is the sampling rate): X(f ) = ˜ A −1 (f )E(f ) = H(f )E(f ). (3)
The power spectrum matrix of the signal X(k) is determined as S(f ) = X(f )X(f ) ∗ = H(f )VH ∗ (f ),
(4) where V stands for the covariance matrix of E(k). The Granger causality mea- sures are defined in terms of coefficients of the matrices A, H, and S. Due to space limitations, only a short description of these methods is provided here, additional information can be found in existing literature (e.g., [4]). From these coefficients, two symmetric measures can be defined: – Granger coherence |K ij (f ) | ∈ [0, 1] describes the amount of in-phase com- ponents in signals i and j at the frequency f . – Partial coherence (PC) |C ij
| ∈ [0, 1] describes the amount of in-phase components in signals i and j at the frequency f when the influence (i.e., linear dependence) of the other signals is statistically removed. The following asymmetric (“directed”) Granger causality measures capture causal relations: – Directed transfer function (DTF) γ 2 ij
channel i stemming from channel j. – Full frequency directed transfer function (ffDTF) F 2
(f ) = |H ij (f ) | 2 f m j=1 |H ij (f ) | 2 ∈ [0, 1], (5) is a variation of γ 2 ij
– Partial directed coherence (PDC) |P ij (f ) | ∈ [0, 1] represents the fraction of outflow from channel j to channel i – Direct directed transfer function (dDTF) χ 2 ij
2 ij (f )C 2 ij (f ) is non-zero if the connection between channel i and j is causal (non-zero F 2 ij (f )) and direct (non-zero C 2 ij
2.6 Phase Synchrony Phase synchrony refers to the interdependence between the instantaneous phases φ x and φ y of two signals x and y; the instantaneous phases may be strongly synchronized even when the amplitudes of x and y are statistically independent. The instantaneous phase φ x of a signal x may be extracted as [11]: φ H x (k) = arg [x(k) + i˜ x(k)] ,
(6) 116 J. Dauwels, F. Vialatte, and A. Cichocki where ˜ x is the Hilbert transform of x. Alternatively, one can derive the instan- taneous phase from the time-frequency transform X(k, f ) of x: φ W x (k, f ) = arg[X(k, f )]. (7) The phase φ W x (k, f ) depends on the center frequency f of the applied wavelet. By appropriately scaling the wavelet, the instantaneous phase may be computed in the frequency range of interest. The phase synchrony index γ for two instantaneous phases φ x and φ y is defined as [11]: γ =
e i(nφ
x −mφ
y ) ∈ [0, 1], (8) where n and m are integers (usually n = 1 = m). We will use the notation γ H
W to indicate whether the instantaneous phases are computed by the Hilbert transform or time-frequency transform respectively. In this paper, we will consider two additional phase synchrony indices, i.e., the evolution map approach (EMA) and the instantaneous period approach (IPA) [12]. Due to space constraints, we will not describe those measures here, instead we refer the reader to [12] 2 ; additional information about phase synchrony can be found in [6]. 2.7
State Space Based Synchrony State space based synchrony (or “generalized synchronization”) evaluates syn- chrony by analyzing the interdependence between the signals in a state space reconstructed domain (see e.g., [7]). The central hypothesis behind this ap- proach is that the signals at hand are generated by some (unknown) deter- ministic, potentially high-dimensional, non-linear dynamical system. In order to reconstruct such system from a signal x, one considers delay vectors X(k) = (x(k), x(k − τ), . . . , x(k − (m − 1) τ)) T , where m is the embedding dimension and τ denotes the time lag. If τ and m are appropriately chosen, and the signals are indeed generated by a deterministic dynamical system (to a good approxi- mation), the delay vectors lie on a smooth manifold (“mapping”) in R m , apart from small stochastic fluctuations. The S-estimator [13], here denoted by S est
, is a state space based measure obtained by applying principal component analysis (PCA) to delay vectors 3 . We
also considered three measures of nonlinear interdependence, S k , H k , and N
k (see [6] for details 4 ).
Information-Theoretic Measures Several interdependence measures have been proposed that have their roots in information theory. Mutual Information I is perhaps the most well-known 2 Program code is available at www.agnld.uni-potsdam.de/%7Emros/dircnew.m 3 We used the S Toolbox downloadable from http://aperest.epfl.ch/docs/ software.htm 4 Software is available from http://www.vis.caltech.edu/~rodri/software.htm A Comparative Study of Synchrony Measures for the Early Detection of AD 117
information-theoretic interdependence measure; it quantifies the amount of in- formation the random variable Y contains about random variable X (and vice versa); it is always positive, and it vanishes when X and Y are statistically in- dependent. Recently, a sophisticated and effective technique to compute mutual information between time series was proposed [14]; we will use that method in this paper 5 . The method of [14] computes mutual information in time-domain; alternatively, this quantity may also be determined in time-frequency domain (denoted by I W ), more specifically, from normalized spectrograms [15,16] (see also [17,18]). We will also consider several information-theoretic measures that quantify the dissimilarity (or “distance”) between two random variables (or signals). In con- trast to the previously mentioned measures, those divergence measures vanish if the random variables (or signals) are identical ; moreover, they are not necessar- ily symmetric, and therefore, they can not be considered as distance measures in the strict sense. Divergences may be computed in time domain and time- frequency domain; in this paper, we will only compute the divergence measures in time-frequency domain, since the computation in time domain is far more involved. We consider the Kullback-Leibler divergence K, the R´ enyi divergence D α , the Jensen-Shannon divergence J , and the Jensen-R´ enyi divergence J α . Due
to space constraints, we will not review those divergence measures here; we refer the interested reader to [15,16]. 2.9 Stochastic Event Synchrony (SES) Stochastic event synchrony, an interdependence measure we developed in ear- lier work [19], describes the similarity between the time-frequency transforms of two signals x and y. As a first step, the time-frequency transform of each signal is approximated as a sum of (half-ellipsoid) basis functions, referred to as “bumps” (see Fig. 1 and [20]). The resulting bump models, representing the most prominent oscillatory activity, are then aligned (see Fig. 2): bumps in one time- frequency map may not be present in the other map (“non-coincident bumps”); other bumps are present in both maps (“coincident bumps”), but appear at slightly different positions on the maps. The black lines in Fig. 2 connect the centers of coincident bumps, and hence, visualize the offset in position between pairs of coincident bumps. Stochastic event synchrony consists of five parameters that quantify the alignment of two bump models: – ρ: fraction of non-coincident bumps, – δ
t and δ
f : average time and frequency offset respectively between coincident bumps, – s
t and s
f : variance of the time and frequency offset respectively between coincident bumps. The alignment of the two bump models (cf. Fig. 2 (right)) is obtained by iterative max-product message passing on a graphical model; the five SES parameters are determined from the resulting alignment by maximum a posteriori (MAP) 5 The program code (in C) is available at www.klab.caltech.edu/~kraskov/MILCA/ 118 J. Dauwels, F. Vialatte, and A. Cichocki Fig. 1. Bump modeling 00 200 400 600
800 5 10 15 20 25 30 Bump models of two EEG signals t f
200 400
600 800
5 10 15 20 25 30 Coincident bumps (ρ = 27%) t f Fig. 2. Coincident and non-coincident activity (“bumps”); (left) bump models of two signals; (right) coincident bumps; the black lines connect the centers of coincident bumps estimation [19]. The parameters ρ and s t are the most relevant for the present study, since they quantify the synchrony between bump models (and hence, the original time-frequency maps); low ρ and s t implies that the two time-frequency maps at hand are well synchronized. 3 Detection of EEG Synchrony Abnormalities in MCI Patients In the following section, we describe the EEG data we analyzed. In Section 3.2 we address certain technical issues related to the synchrony measures, and in Section 3.3, we present and discuss our results. 3.1 EEG Data
The EEG data 6 analyzed here have been analyzed in previous studies concern- ing early detection of AD [21]–[25]. They consist of rest eyes-closed EEG data 6 We are grateful to Prof. T. Musha for providing us the EEG data. A Comparative Study of Synchrony Measures for the Early Detection of AD 119
recorded from 21 sites on the scalp based on the 10–20 system. The sampling frequency was 200 Hz, and the signals were band pass filtered between 4 and 30Hz. The subjects comprised two study groups. The first consisted of a group of 25 patients who had complained of memory problems. These subjects were then diagnosed as suffering from MCI and subsequently developed mild AD. The criteria for inclusion into the MCI group were a mini mental state exam (MMSE) score above 24, the average score in the MCI group was 26 (SD of 1.8). The other group was a control set consisting of 56 age-matched, healthy subjects who had no memory or other cognitive impairments. The average MMSE of this control group was 28.5 (SD of 1.6). The ages of the two groups were 71.9 ± 10.2 and 71.7 ± 8.3, respectively. Pre-selection was conducted to ensure that the data were of a high quality, as determined by the presence of at least 20 sec. of artifact free data. Based on this requirement, the number of subjects in the two groups described above was reduced to 22 MCI patients and 38 control subjects. 3.2 Methods
In order to reduce the computational complexity, we aggregated the EEG signals into 5 zones (see Fig. 3); we computed the synchrony measures (except the S-estimator) from the averages of each zone. For all those measures except SES, we used the arithmetic average; in the case of SES, the bump models obtained from the 21 electrodes were clustered into 5 zones by means of the aggregation algorithm described in [20]. We evaluated the S-estimator between each pair of zones by applying PCA to the state space embedded EEG signals of both zones. We divided the EEG signals in segments of equal length L, and computed the synchrony measures by averaging over those segments. Since spontaneous EEG is usually highly non-stationary, and most synchrony measures are strictly speaking only applicable to stationary signals, the length L should be sufficiently small; on the other hand, in order to obtain reliable measures for synchrony, the length should be chosen sufficiently large. Consequently, it is not a priori clear how to choose the length L, and therefore, we decided to test several values, i.e., L = 1s, 5s, and 20s. In the case of Granger causality measures, one needs to specify the model order p. Similarly, for mutual information (in time domain) and the state space based measures, the embedding dimension m and the time lag τ needs to be chosen; the phase synchrony indices IPA and EMA involve a time delay τ . Since it is not obvious which parameter values amount to the best performance for detecting AD, we have tried a range of parameter settings, i.e., p = 1, 2,. . . , 10, and m = 1, 2,. . . , 10; the time delay was in each case set to τ = 1/30s, which is the period of the fastest oscillations in the EEG signals at hand. 3.3
Results and Discussion Our main results are summarized in Table 1, which shows the sensitivity of the synchrony measures for detecting MCI. Due to space constraints, the table only shows results for global synchrony, i.e., the synchrony measures were averaged |
ma'muriyatiga murojaat qiling