Lecture Notes in Computer Science
Download 12.42 Mb. Pdf ko'rish
|
tation algorithm. While the proposed speed-up makes each iteration slower than the basic gradient update, the time to reach the error level 0.85 is greatly diminished. Method Complexity Seconds/Iter Hours to E O = 0.85 Gradient O(N c + nc) 58 1.9
Speed-up O(N c + nc) 110 0.22
Natural Grad. O(N c + nc 2 ) 75 3.5
Imputation O(nd
2 ) 110000 64 EM O(N c 2 + nc
3 ) 45000 58 574 T. Raiko, A. Ilin, and J. Karhunen adapting them (marked as VB2). We initialized regularized PCA and VB1 using normal PCA learned with α = 0.625 and orthogonalized A, and VB2 using VB1. The parameter α was set to 2/3. Fig. 1 (right) shows the results. The performance of basic PCA starts to de- grade during learning, especially using the proposed speed-up. Natural gradient diminishes this phenomenon known as overlearning, but it is even more effective to use regularization. The best results were obtained using VB2: The final vali- dation error E V was 0.9180 and the training rms error E O was 0.7826 which is naturally larger than the unregularized E O = 0.7657. 6 Discussion We studied a number of different methods for PCA with sparse data and it turned out that a simple gradient descent approach worked best due to its minimal com- putational complexity per iteration. We could also speed it up more than ten times by using an approximated Newton’s method. We found out empirically that set- ting the parameter α = 2/3 seems to work well for our problem. It is left for future work to find out whether this generalizes to other problem settings. There are also many other ways to speed-up the gradient descent algorithm. The natural gradi- ent did not help here, but we expect that the conjugate gradient method would. The modification to the gradient proposed in this paper, could be used together with the conjugate gradient speed-up. This will be another future research topic. There are also other benefits in solving the PCA problem by gradient descent. Algorithms that minimize an explicit cost function are rather easy to extend. The case of variational Bayesian learning applied to PCA was considered in Section 4, but there are many other extensions of PCA, such as using non-Gaussianity, non- linearity, mixture models, and dynamics. The developed algorithms can prove useful in many applications such as bioin- formatics, speech processing, and meteorology, in which large-scale datasets with missing values are very common. The required computational burden is linearly proportional to the number of measured values. Note also that the proposed techniques provide an analogue of confidence regions showing the reliability of estimated quantities. Acknowledgments. This work was supported in part by the Academy of Fin- land under its Centers for Excellence in Research Program, and the IST Program of the European Community, under the PASCAL Network of Excellence, IST- 2002-506778. This publication only reflects the authors’ views. We would like to thank Antti Honkela for useful comments. References 1. Pearson, K.: On lines and planes of closest fit to systems of points in space. Philo- sophical Magazine 2(6), 559–572 (1901) 2. Jolliffe, I.: Principal Component Analysis. Springer, Heidelberg (1986) 3. Bishop, C.: Pattern Recognition and Machine Learning. Springer, Heidelberg (2006)
Principal Component Analysis for Sparse High-Dimensional Data 575
4. Diamantaras, K., Kung, S.: Principal Component Neural Networks - Theory and Application. Wiley, Chichester (1996) 5. Haykin, S.: Modern Filters. Macmillan, Basingstoke (1989) 6. Cichocki, A., Amari, S.: Adaptive Blind Signal and Image Processing - Learning Algorithms and Applications. Wiley, Chichester (2002) 7. Roweis, S.: EM algorithms for PCA and SPCA. In: Advances in Neural Information Processing Systems, vol. 10, pp. 626–632. MIT Press, Cambridge (1998) 8. Karhunen, J., Oja, E.: New methods for stochastic approximation of truncated Karhunen-Loeve expansions. In: Proceedings of the 6th International Conference on Pattern Recognition, pp. 550–553. Springer, Heidelberg (1982) 9. Oja, E.: Subspace Methods of Pattern Recognition. Research Studies Press and J. Wiley (1983) 10. Amari, S.: Natural gradient works efficiently in learning. Neural Computa- tion 10(2), 251–276 (1998) 11. Grung, B., Manne, R.: Missing values in principal components analysis. Chemo- metrics and Intelligent Laboratory Systems 42(1), 125–139 (1998) 12. Raiko, T., Ilin, A., Karhunen, J.: Principal component analysis for large scale prob- lems with lots of missing values. In: Kok, J.N., Koronacki, J., Lopez de Mantaras, R., Matwin, S., Mladeniˇ c, D., Skowron, A. (eds.) ECML 2007. LNCS (LNAI), vol. 4701, pp. 691–698. Springer, Heidelberg (2007) 13. Bishop, C.: Variational principal components. In: Proceedings of the 9th Inter- national Conference on Artificial Neural Networks (ICANN 1999), pp. 509–514 (1999)
14. Netflix: Netflix prize webpage (2007), http://www.netflixprize.com/ 15. Funk, S.: Netflix update: Try this at home (December 2006), http://sifter.org/ ∼ simon/journal/20061211.html 16. Salakhutdinov, R., Mnih, A., Hinton, G.: Restricted Boltzmann machines for col- laborative filtering. In: Proceedings of the International Conference on Machine Learning (2007)
Hierarchical Bayesian Inference of Brain Activity Masa-aki Sato 1 and Taku Yoshioka 1,2 1 ATR Computational Neuroscience Laboratories masa-aki@atr.jp 2 National Institute of Information and Communication Technology Abstract. Magnetoencephalography (MEG) can measure brain activ- ity with millisecond-order temporal resolution, but its spatial resolution is poor, due to the ill-posed nature of the inverse problem, for estimat- ing source currents from the electromagnetic measurement. Therefore, prior information on the source currents is essential to solve the inverse problem.
We have proposed a new hierarchical Bayesian method to combine several sources of information. In our method, the variance of the source current at each source location is considered an unknown parameter and estimated from the observed MEG data and prior information by using variational Bayes method. The fMRI information can be imposed as prior distribution rather than the variance itself so that it gives a soft constraint on the variance. It is shown that the hierarchical Bayesian method has better accu- racy and spatial resolution than conventional linear inverse methods by evaluating the resolution curve. The proposed method also demonstrated good spatial and temporal resolution for estimating current activity in early visual area evoked by a stimulus in a quadrant of the visual field. 1 Introduction In recent years, there has been rapid progress in noninvasive neuroimaging mea- surement for human brain. Functional organization of the human brain has been revealed by PET and functional magnetic resonance imaging (fMRI). However, these methods can not reveal the detailed dynamics of information processing in the human brain, since they have poor temporal resolution due to slow hemo- dynamic responses to neural activity (Bandettini, 2000;Ogawa et al., 1990). On the other hand, Magnetoencephalography (MEG) can measure brain activity with millisecond-order temporal resolution, but its spatial resolution is poor, due to the ill-posed nature of the inverse problem, for estimating source currents from the electromagnetic measurement (Hamalainen et al., 1993)). Therefore, prior information on the source currents is essential to solve the inverse prob- lem. One of the standard methods for the inverse problem is a dipole method (Hari, 1991; Mosher et al., 1992). It assumes that brain activity can be approx- imated by a small number of current dipoles. Although this method gives good estimates when the number of active areas is small, it can not give distributed brain activity for higher function. On the other hand, a number of distributed M. Ishikawa et al. (Eds.): ICONIP 2007, Part I, LNCS 4984, pp. 576–585, 2008. c Springer-Verlag Berlin Heidelberg 2008 Hierarchical Bayesian Inference of Brain Activity 577
source methods have been proposed to estimate distributed activity in the brain such as the minimum norm method, the minimum L1-norm method, and others (Hamalainen et al., 1993)). It has been also proposed to combine fMRI infor- mation with MEG data (Dale and Sereno, 1993;Ahlfors et al., 1999; Dale et al., 2000;Phillips et al., 2002). However, there are essential differences between fMRI and MEG due to their temporal resolution. The fMRI activity corresponds to an average of several thousands of MEG time series data and may not correspond MEG activity at some time points. We have proposed a new hierarchical Bayesian method to combine several sources of information (Sato et al. 2004). In our method, the variance of the source current at each source location is considered an unknown parameter and estimated from the observed MEG data and prior information. The fMRI in- formation can be imposed as prior information on the variance distribution rather than the variance itself so that it gives a soft constraint on the variance. Therefore, our method is capable of appropriately estimating the source current variance from the MEG data supplemented with the fMRI data, even if fMRI data convey inaccurate information. Accordingly, our method is robust against inaccurate fMRI information. Because of the hierarchical prior, the estimation problem becomes nonlinear and cannot be solved analytically. Therefore, the ap- proximate posterior distribution is calculated by using the Variational Bayesian (VB) method (Attias, 1999; Sato, 2001). The resulting algorithm is an iterative procedure that converges quickly because the VB algorithm is a type of natural gradient method (Amari, 1998) that has an optimal local convergence property. The position and orientation of the cortical surface obtained from structural MRI can be also introduced as hard constraint. In this article, we explain our hierarchical Bayesian method. To evaluate the performance of the hierarchical Bayesian method, the resolution curves were cal- culated by varying the numbers of model dipoles, simultaneously active dipoles and MEG sensors. The results show the superiority of the hierarchical Bayesian method over conventional linear inverse methods. We also applied the hierarchi- cal Bayesian method to visual experiments, in which subjects viewed a flickering stimulus in one of four quadrants of the visual field. The estimation results are consistent with known physiological findings and show the good spatial and temporal resolutions of the hierarchical Bayesian method. 2 MEG Inverse Problem When neural current activity occurs in the brain, it produces a magnetic field observed by MEG. The relationship between the magnetic field B = {B m
1 : M } measured by M sensors and the primary source current J = {J n |n = 1 :
N } in the brain is given by B = G · J,
(1) where G=
{G m,n
|m = 1 : M, n = 1 : N} is the lead field matrix. The lead field G
m,n represents the magnetic field B m produced by the n-th unit dipole current. The above equations give the forward model and the inverse problem 578 M. Sato and T. Yoshioka is to estimate the source current J from the observed magnetic field data B. The probabilistic model for the source currents can be constructed assuming Gaussian noise for the MEG sensors. Then, the probability distribution, that the magnetic field B is observed for a given current J , is given by P (B |J) ∝ exp − 1 2 β (B − G · J) · Σ G · (B − G · J) , (2) where (βΣ G )
denotes the covariance matrix of the sensor noise. Σ −1 G is the normalized covariance matrix satisfying T r(Σ −1 G
−1 is the average noise variance. 3 Hierarchical Bayesian Method In the hierarchical Bayesian method, the variances of the currents are considered unknown parameters and estimated from the observed MEG data by introducing a hierarchical prior on the current variance. The fMRI information can be im- posed as prior information on the variance distribution rather than the variance itself so that it gives a soft constraint on the variance. The spatial smoothness constraint, that neurons within a few millimeter radius tends to fire simultane- ously due to the neural interactions, can also be implemented as a hierarchical prior (Sato et al. 2004). Hierarchical Prior. Let us suppose a time sequence of MEG data B 1:T
≡ {B(t)|t = 1 : T } is observed. The MEG inverse problem in this case is to estimate the primary source current J 1:T
≡ {J(t)|t = 1 : T } from the observed MEG data B 1:T . We assume a Normal prior for the current: P 0 (J 1:T |α) ∝ exp − 1 2
T t=1
J (t) · Σα · J(t) , (3) where Σα is the diagonal matrix with diagonal elements α = {α n |n = 1 : N}. We also assume that the current variance α −1 does not change over period T . The current inverse variance parameter α is estimated by introducing an ARDiAutomatic Relevance Determinationj hierarchical prior (Neal, 1996): P 0
N n=1
Γ (α n |¯α 0n , γ
0nα ), (4) Γ (α |¯α, γ) ≡ α −1 (αγ/ ¯
α) γ Γ (γ) −1 e −αγ/¯ α , where Γ (α |¯α, γ) represents the Gamma distribution with mean ¯α and degree of freedom γ. Γ (γ) ≡ ∞
dtt γ −1 e −t is the Gamma function. When the fMRI data is not available, we use a non-informative prior for the current inverse variance parameter α n , i.e., γ 0nα = 0 and P 0 (α
) = α −1 n . When the fMRI data is available, fMRI information is imposed as the prior for the inverse variance parameter α n
α 0n , is assumed to be inversely proportional to the fMRI activity. Confidence parameter γ 0nα
controls a reliability of the fMRI information. Hierarchical Bayesian Inference of Brain Activity 579
Variational Bayesian Method. The objective of the Bayesian estimation is to calculate the posterior probability distribution of J for the observed data B (in the following, B 1:T
and J 1:T
are abbreviated as B and J , respectively, for notational simplicity): P (J |B) =
dα P (J , α |B),
P (J , α |B) =
P (J , α, B) P (B)
, P (J , α, B) = P (B |J) P 0
|α) P 0 (α) , P (B) = dJ dαP (J , α, B) . The calculation of the marginal likelihood P (B) cannot be done analytically. In the VB method, the calculation of the joint posterior P (J , α |B) is reformulated as the maximization problem of the free energy. The free energy for a trial distribution Q(J , α) is defined by F (Q) =
dJ dαQ (J , α) log P (J , α, B) Q (J , α) = log (P (B)) − KL [Q (J, α) || P (J, α|B)]. (5)
Equation (5) implies that the maximization of the free energy F (Q) is equivalent to the minimization of the Kullback-Leibler distance (KL-distance) defined by KL [Q(J , α) || P (J, α|B)] ≡ dJ dαQ(J , α) log (Q(J , α)/P (J , α |B)) .
This measures the difference between the true joint posterior P (J , α |B) and
the trial distribution Q(J , α). Since the KL-distance reaches its minimum at zero when the two distributions coincide, the joint posterior can be obtained by maximizing the free energy F (Q) with respect to the trial distribution Q. In addition, the maximum free energy gives the log-marginal likelihood log(P (B)). The optimization problem can be solved using a factorization approximation restricting the solution space (Attias, 1999; Sato, 2001): Q (J , α) = QJ (J) Qα (α) . (6)
Under the factorization assumption (6), the free energy can be written as F (Q) =
log P (J , α, B) J α − log QJ (J) J − log Qα (α) α = log P (B |J) J − KL QJ(J)Qα(α) || P 0 (J |α)P 0 (α) , (7) where
· J and · α represent the expectation values with respect to QJ(J) and Qα(α), respectively. The first term in the second equation of (7) corre- sponds to the negative sign of the expected reconstruction error. The second term (KL-distance) measures the difference between the prior and the posterior 580 M. Sato and T. Yoshioka and corresponds to the effective degree of freedom that can be well specified from the observed data. Therefore, (negative sign of) the free energy can be considered a regularized error function with a model complexity penalty term. The maximum free energy is obtained by alternately maximizing the free energy with respect to QJ and Qα. In the first step (J-step), the free energy F (Q) is maximized with respect to QJ while Qα is fixed. The solution is given by QJ (J) ∝ exp [ log P (J, α, B) α] . (8)
In the second step (α-step), the free energy F (Q) is maximized with respect to Qα while QJ is fixed. The solution is given by Qα (α) ∝ exp log P (J, α, B) J . (9) The above J- and α-steps are repeated until the free energy converges. VB algorithm. The VB algorithm is summarized here. In the J-step, the in- verse filter L(Σ −1 α ) is calculated using the estimated covariance matrix Σ −1 α in
the previous iteration: L(Σ
−1 α ) = Σ
−1 α · G · G · Σ −1 α · G + Σ −1 G −1 (10) The expectation values of the current J and the noise variance β −1 with respect to the posterior distribution are estimated using the inverse filter (10). J = L(Σ
−1 α ) · B,
γ β = 1 2 N T , γ β β −1 = 1 2 (B − G · J) · Σ G · (B − G · J) + J · Σα · J . (11) In the α-step, the expectation values of the variance parameters α −1 n with respect to the posterior distribution are estimated as γ nα α −1 n = γ 0nα
α −1 0n + T 2 α −1 n 1 − Σ
−1 α · G · Σ −1 B
n,n , (12) where γ nα is given by γ nα = γ
0nα + T 2 . 4 Resolution Curve We evaluated the performance of the hierarchical Bayesian method by calculating the resolution curve and compared with the minimum norm (MN) method. The inverse filter of the MN method L can be obtained from Eq.(10), if the inverse variance parameters α n is set to a given constant which is independent of position n. Let define the resolution matrix R by R = L
· G. (13)
Hierarchical Bayesian Inference of Brain Activity 581
Fig. 1. Resolution curves of minimum norm method for different number of model dipoles (170, 262, 502, and 1242). The number of sensors is 251. Horizontal axis denotes the radius from the source position in m. The (n,k) component of the resolution matrix R n,k represents the n-th estimated current when the unit current dipole is applied for the k-th position without noise, i.e., J k = 1 and J l = 0(l = k). The resolution curve is defined by the averaged estimated currents as a function of distance r from the source position. It can be obtained by summing the estimated currents R n,k at the n-th position whose distance from the k-th position is in the range from r to r + dr, when the unit current dipole is applied for the k-th position. The averaged resolution curve is obtained by averaging the resolution curve over the k-th positions. If the estimation is perfect, the resolution curve at the origin, which is the estimation gain, should be one. In addition, the resolution curve should be zero elsewhere. However, the estimation gain of the linear inverse method such as MN method satisfies the constraint, N n=1 G n ≤ M, where G n denotes the estimation gain at n-th position (Sato et al. 2004). This constraint implies that the linear inverse method cannot perfectly retrieve more current dipoles than the number of sensors M . To see the effect of this constraint, we calculated the resolution curve for the MN method by varying the number of model dipoles while the number of sensors M is fixed at 251 (Fig. 1). We assumed model dipoles are placed evenly on a hemisphere. Fig. 1 shows that MN method gives perfect estimation if the number of dipoles are less than M . On the other hand, the performance degraded as the number of dipoles increases over M . Although the above results are obtained by using MN method, similar results can be obtained for a class of linear inverse methods. This limitation is the main cause of poor spatial 582 M. Sato and T. Yoshioka Fig. 2. Resolution curves of hierarchical Bayesian method with 4078/10442 model dipoles and 251/515 sensors. The number of active dipoles are 240 or 400. Horizontal axis denotes the radius from the source position in m. resolution of the linear inverse methods. When several dipoles are simultaneously active, estimated currents in the linear inverse methods can be obtained by the summation of the estimated currents for each dipole. Therefore, the resolution curve gives complete descriptions on the spatial resolution of the linear inverse methods.
From the theoretical analysis (in preparation), the hierarchical Bayesian method can estimate dipole currents perfectly even when the number of model dipoles are larger than the number of sensors M . This is because the hierarchi- cal Bayesian method effectively eliminates inactive dipoles from the estimation model by adjusting the estimation gain of these dipoles to zero. Nevertheless, the number of active dipoles gives the constraint on the performance of the hierarchical Bayesian method. The calculation of the resolution curves for the hierarchical Bayesian method are somewhat complicated, because Bayesian in- verse filters are dependent on the MEG data. To evaluate the performance for the situations where multiple dipoles are active, we generated 240 or 400 active dipoles randomly on the hemisphere, and calculated the corresponding MEG data where 240 or 400 dipoles were simultaneously active. The Bayesian inverse filters were estimated using these simulated MEG data. Then, the resolution curves were calculated using the estimated Bayesian inverse filters for each ac- tive dipole and they were averaged over all active dipoles. Fig. 2 shows the resolution curves for the hierarchical Bayesian method with 4078/10442 model dipoles and 251/515 MEG sensors. When the number of simultaneously active dipoles are less than those of MEG sensors, almost perfect estimation is obtained Hierarchical Bayesian Inference of Brain Activity 583
regardless of the number of model dipoles. Therefore, the hierarchical Bayesian method can achieve much better spatial resolution than the conventional linear inverse method. On the other hand, the performance is degraded if the numbers of simultaneously active dipoles are larger than the number of MEG sensors. The above results demonstrate the superiority of the hierarchical Bayesian method over MN method. 5 Visual Experiments We also applied the hierarchical Bayesian method to visual experiments, in which subjects viewed a flickering stimulus in one of four quadrants of the visual field. Red and green checkerboards of a pseudo-randomly selected quadrant are pre- sented for 700 ms in one trial. FMRI experiments with the same quadrant stimuli were also done by adopting conventional block design where the stimuli were pre- sented for 15 seconds in a block. The global field power (sum of MEG signals of all sensors) recorded from subject RH induced by the upper right stimulus is shown in Fig. 3a. The strong peak was observed after 93 ms of the stimulus onset. Cortical currents were estimated by applying the hierarchical Bayesian method to the averaged MEG data between 100 ms before and 400 ms after the stimulus onset. The fMRI activity t-values were used as a prior for the inverse Fig. 3. Estimated current for quadrant visual stimulus. (a) shows the global field power of MEG signal. (b) shows the temporal pattens of averaged currents in V1, V2/3, and V4. (c-e) shows spatial pattens of the current strength averaged over 20ms time windows centered at 93ms, 98ms, and 134ms.
584 M. Sato and T. Yoshioka variance parameters. As explained in ’Hierarchical Prior’ subsection, the mean of the prior was assumed to be ¯ α −1
= a 0 · t f (n), where t f (n) was the t-value at the n-th position and a 0 was a hyper parameter and set to 500 in this analysis. Es- timated spatiotemporal brain activities are illustrated in Fig. 3. We identified 3 ROIs (Region Of Interest) in V1, V2/3, and V4 and temporal patterns of the esti- mated currents are obtained by averaging the current within these ROIs. Fig. 3b shows that V1, V2/3, and V4 are successively activated and attained their peak around 93ms, 98ms, and 134ms, respectively. Fig. 3c-3e illustrates the spatial pattern of the current strength averaged over 20ms time windows (centered at 93ms, 98ms, and 134ms), in a flattened map format. The flattened map was made by cutting along the bottom of calcarine sulcus. We can see strongly ac- tive regions in V1, V2/3, and V4 corresponding to their peak activities. The above results are consistent with known physiological findings and show the good spatial and temporal resolutions of the hierarchical Bayesian method. 6 Conclusion In this article, we have explained the hierarchical Bayesian method which com- bines MEG and fMRI by using the hierarchical prior. We have shown the su- periority of the hierarchical Bayesian method over conventional linear inverse methods by evaluating the resolution curve. We also applied the hierarchical Bayesian method to visual experiments, in which subjects viewed a flickering stimulus in one of four quadrants of the visual field. The estimation results are consistent with known physiological findings and shows the good spatial and tem- poral resolutions of the hierarchical Bayesian method. Currently, we are applying the hierarchical Bayesian method for brain machine interface using noninvasive neuroimaging. In our approach, we first estimate current activity in the brain. Then, the intention or the motion of the subject is estimated by using the cur- rent activity. This approach enables us to use physiological knowledge and gives us more insight on the mechanism of human information proceeding. Acknowledgement. This research was supported in part by NICT-KARC. References Ahlfors, S.P., Simpson, G.V., Dale, A.M., Belliveau, J.W., Liu, A.K., Korvenoja, A., Virtanen, J., Huotilainen, M., Tootell, R.B.H., Aronen, H.J., Ilmoniemi, R.J.: Spa- tiotemporal activity of a cortical network for processing visual motion revealed by MEG and fMRI. J. Neurophysiol. 82, 2545–2555 (1999) Amari, S.: Natural Gradient Works Efficiently in Learning. Neural Computation 10, 251–276 (1998) Attias, H.: Inferring parameters and structure of latent variable models by variational Bayes. In: Proc. 15th Conference on Uncertainty in Artificial Intelligence, pp. 21–30 (1999)
Bandettini, P.A.: The temporal resolution of functional MRI. In: Moonen, C.T.W., Bandettini, P.A. (eds.) Functional MRI, pp. 205–220. Springer, Heidelberg (2000) Hierarchical Bayesian Inference of Brain Activity 585
Dale, A.M., Liu, A.K., Fischl, B.R., Buchner, R.L., Belliveau, J.W., Lewine, J.D., Halgren, E.: Dynamic statistical parametric mapping: Combining fMRI and MEG for high-resolution imaging of cortical activity. Neuron 26, 55–67 (2000) Dale, A.M., Sereno, M.I.: Improved localization of cortical activity by combining EEG and MEG with MRI cortical surface reconstruction: A Linear approach. J. Cognit. Neurosci. 5, 162–176 (1993) Hamalainen, M.S., Hari, R., Ilmoniemi, R.J., Knuutila, J., Lounasmaa, O.V.: Magentoencephalography– Theory, instrumentation, and applications to noninva- sive studies of the working human brain. Rev. Modern Phys. 65, 413–497 (1993) Hari, R.: On brain’s magnetic responses to sensory stimuli. J. Clinic. Neurophysiol. 8, 157–169 (1991) Mosher, J.C., Lewis, P.S., Leahy, R.M.: Multiple dipole modelling and localization from spatio-temporal MEG data. IEEE Trans. Biomed. Eng. 39, 541–557 (1992) Neal, R.M.: Bayesian learning for neural networks. Springer, Heidelberg (1996) Ogawa, S., Lee, T.-M., Kay, A.R., Tank, D.W.: Brain magnetic resonance imaging with contrast-dependent oxygenation. In: Proc. Natl. Acad. Sci. USA, vol. 87, pp. 9868–9872 (1990) Phillips, C., Rugg, M.D., Friston, K.J.: Anatomically Informed Basis Functions for EEG Source Localization: Combining Functional and Anatomical Constraints. Neu- roImage 16, 678–695 (2002) Sato, M.: On-line Model Selection Based on the Variational Bayes. Neural Computa- tion 13, 1649–1681 (2001) Sato, M., Yoshioka, T., Kajihara, S., Toyama, K., Goda, N., Doya, K., Kawato, M.: Hierarchical Bayesian estimation for MEG inverse problem. NeuroImage 23, 806–826 (2004)
Neural Decoding of Movements: From Linear to Nonlinear Trajectory Models Byron M. Yu 1,2
, John P. Cunningham 1 , Krishna V. Shenoy 1 , and Maneesh Sahani 2 1 Dept. of Electrical Engineering and Neurosciences Program, Stanford University, Stanford, CA, USA 2 Gatsby Computational Neuroscience Unit, UCL, London, UK {byronyu,jcunnin,shenoy}@stanford.edu, maneesh@gatsby.ucl.ac.uk Abstract. To date, the neural decoding of time-evolving physical state – for example, the path of a foraging rat or arm movements – has been largely carried out using linear trajectory models, primarily due to their computational efficiency. The possibility of better capturing the statis- tics of the movements using nonlinear trajectory models, thereby yield- ing more accurate decoded trajectories, is enticing. However, nonlinear decoding usually carries a higher computational cost, which is an im- portant consideration in real-time settings. In this paper, we present techniques for nonlinear decoding employing modal Gaussian approxi- mations, expectatation propagation, and Gaussian quadrature. We com- pare their decoding accuracy versus computation time tradeoffs based on high-dimensional simulated neural spike counts. Keywords: Nonlinear dynamical models, nonlinear state estimation, neural decoding, neural prosthetics, expectation-propagation, Gaussian quadrature. 1 Introduction We consider the problem of decoding time-evolving physical state from neural spike trains. Examples include decoding the path of a foraging rat from hip- pocampal neurons [1,2] and decoding the arm trajectory from motor cortical neurons [3,4,5,6,7,8]. Advances in this area have enabled the development of neural prosthetic devices, which seek to allow disabled patients to regain mo- tor function through the use of prosthetic limbs, or computer cursors, that are controlled by neural activity [9,10,11,12,13,14,15]. Several of these prosthetic decoders, including population vectors [11] and linear filters [10,12,15], linearly map the observed neural activity to the estimate of physical state. Although these direct linear mappings are effective, recur- sive Bayesian decoders have been shown to provide more accurate trajectory estimates [1,6,7,16]. In addition, recursive Bayesian decoders provide confidence regions on the trajectory estimates and allow for nonlinear relationships between M. Ishikawa et al. (Eds.): ICONIP 2007, Part I, LNCS 4984, pp. 586–595, 2008. c Springer-Verlag Berlin Heidelberg 2008
Neural Decoding Using Nonlinear Trajectory Models 587
the neural activity and the physical state variables. Recursive Bayesian decoders are based on the specification of a probabilistic model comprising 1) a trajectory model, which describes how the physical state variables change from one time step to the next, and 2) an observation model, which describes how the observed neural activity relates to the time-evolving physical state. The function of the trajectory model is to build into the decoder prior knowl- edge about the form of the trajectories. In the case of decoding arm movements, the trajectory model may reflect 1) the hard, physical constraints of the limb (for example, the elbow cannot bend backward), 2) the soft, control constraints imposed by neural mechanisms (for example, the arm is more likely to move smoothly than in a jerky motion), and 3) the physical surroundings of the per- son and his/her objectives in that environment. The degree to which the trajec- tory model captures the statistics of the actual movements directly affects the accuracy with which trajectories can be decoded from neural data [8]. The most commonly-used trajectory models assume linear dynamics per- turbed by Gaussian noise, which we refer to collectively as linear-Gaussian mod- els. The family of linear-Gaussian models includes the random walk model [1,2,6], those with a constant [8] or time-varying [17,18] forcing term, those without a forcing term [7,16], those with a time-varying state transition matrix [19], and those with higher-order Markov dependencies [20]. Linear-Gaussian models have been successfully applied to decoding the path of a foraging rat [1,2], as well as arm trajectories in ellipse-tracing [6], pursuit-tracking [7,20,16], “pinball” [7,16], and center-out reach [8] tasks. Linear-Gaussian models are widely used primarily due to their computational efficiency, which is an important consideration for real-time decoding applica- tions. However, for particular types of movements, the family of linear-Gaussian models may be too restrictive and unable to capture salient properties of the observed movements [8]. We recently proposed a general approach to construct- ing trajectory models that can exhibit rather complex dynamical behaviors and whose decoder can be implemented to have the same running time (using a par- allel implementation) as simpler trajectory models [8]. In particular, we demon- strated that a probabilistic mixture of linear-Gaussian trajectory models, each accurate within a limited regime of movement, can capture the salient properties of goal-directed reaches to multiple targets. This mixture model, which yielded more accurate decoded trajectories than a single linear-Gaussian model, can be viewed as a discrete approximation to a single, unified trajectory model with nonlinear dynamics. An alternate approach is to decode using this single, unified nonlinear tra- jectory model without discretization. This makes the decoding problem more difficult since nonlinear transformations of parametric distributions are typi- cally no longer easily parametrized. State estimation in nonlinear dynamical systems is a field of active research that has made substantial progress in recent years, including the application of numerical quadrature techniques to dynami- cal systems [21,22,23], the development of expectation-propagation (EP) [24] and its application to dynamical systems [25,26,27,28], and the improvement in the
588 B.M. Yu et al. computational efficiency of Monte Carlo techniques (e.g., [29,30,31]). However, these techniques have not been rigorously tested and compared in the context of neural decoding, which typically involves observations that are high-dimensional vectors of non-negative integers. In particular, the tradeoff between decoding ac- curacy and computational cost among different neural decoding algorithms has not been studied in detail. Knowing the accuracy-computational cost tradeoff is important for real-time applications, where one may need to select the most accu- rate algorithm given a computational budget or the least computationally inten- sive algorithm given a minimal acceptable decoding accuracy. This paper takes a step in this direction by comparing three particular deterministic Gaussian approximations. In Section 2, we first introduce the nonlinear dynamical model for neural spike counts and the decoding problem. Sections 3 and 4 detail the three deterministic Gaussian approximations that we focus on in this report: global Laplace, Gaussian quadrature-EP (GQ-EP), and Laplace propagation (LP). Finally, in Section 5, we compare the decoding accuracy versus computa- tional cost of these three techniques. 2 Nonlinear Dynamical Model and Neural Decoding In this report, we consider nonlinear dynamical models for neural spike counts of the following form: x t
t −1 ∼ N (f (x t −1 ) , Q) (1a) y i t | x
t ∼ Poisson (λ i (x
) · Δ) ,
(1b) where x
t ∈ R
p ×1 is a vector containing the physical state variables at time t = 1, . . . , T , y i t ∈ {0, 1, 2, . . .} is the corresponding observed spike count for neuron i = 1, . . . , q taken in a time bin of width Δ, and Q ∈ R p
is a covariance matrix. The functions f : R p
→ R p ×1 and λ i : R p ×1 → R + are, in general, nonlinear. The initial state x 1 is Gaussian-distributed. For notational compactness, the spike counts for all q simultaneously-recorded neurons are assembled into a q × 1 vector y t , whose ith element is y i t
valued and that, typically, q p. Equations (1a) and (1b) are referred to as the trajectory and observation models, respectively. The task of neural decoding involves finding, at each timepoint t, the likely physical states x t given the neural activity observed up to that time {y} t 1 . In other words, we seek to compute the filtered state posterior P (x t | {y}
t 1 ) at each t. We previously showed how to estimate the filtered state posterior when f is a linear function [8]. Here, we consider how to compute P (x t | {y}
t 1 ) when f is nonlinear. The extended Kalman filter (EKF) is a commonly-used technique for nonlin- ear state estimation. Unfortunately, it cannot be directly applied to the current problem because the observation noise in (1b) is not additive Gaussian. Possi- ble alternatives are the unscented Kalman filter (UKF) [21,22] and the closely- related quadrature Kalman filter (QKF) [23], both of which employ quadrature Neural Decoding Using Nonlinear Trajectory Models 589
techniques to approximate Gaussian integrals that are analytically intractable. While the UKF has been shown to outperform the EKF [21,22], the UKF re- quires making Gaussian approximations in the observation space. This property of the UKF is undesirable from the standpoint of the current problem because the observed spike counts are typically 0 or 1 (due to the use of relatively short binwidths Δ) and, therefore, distinctly non-Gaussian. As a result, the UKF yielded substantially lower decoding accuracy than the techniques presented in Sections 3 and 4 [28], which make Gaussian approximations only in the state space. While we have not yet tested the QKF, the number of quadrature points required grows geometrically with p + q, which quickly becomes impractical even for moderate values of p and q. Thus, we will no longer consider the UKF and QKF in the remainder of this paper. The decoding techniques described in Sections 3 and 4 naturally yield the smoothed state posterior P x t | {y}
T 1 , rather than the filtered state posterior P (x t | {y} t 1 ). Thus, we will focus on the smoothed state posterior in this work. However, the filtered state posterior at time t can be easily obtained by smooth- ing using only observations from timepoints 1, . . . , t. 3 Global Laplace The idea is to estimate the joint state posterior across the entire sequence (i.e., the global state posterior) as a Gaussian matched to the location and curvature of a mode of P {x}
T 1 | {y} T 1 , as in Laplace’s method [32]. The mode is defined as {x } T 1 = argmax {x}
T 1 P {x} T 1 | {y} T 1 = argmax {x}
T 1 L {x} T 1 , (2)
where L {x} T 1 = log P {x} T 1 , {y}
T 1 = log P (x 1 ) +
T t=2
log P (x t | x t −1 ) + T t=1
q i=1
log P y i t | x t . (3) Using the known distributions (1), the gradients of L {x} T
can be computed exactly and a local mode {x } T
can be found by applying a gradient optimization technique. The global state posterior is then approximated as: P {x}
T 1 | {y} T 1 ≈ N {x } T 1 , −∇ 2 L {x } T 1 −1 . (4) 4 Expectation Propagation We briefly summarize here the application of EP [24] to dynamical models [25,26,27,28]. More details can be found in the cited references. The two pri- mary distributions of interest here are the marginal P x t | {y} T 1 and pairwise 590 B.M. Yu et al. joint P x t −1 , x t | {y} T 1 state posteriors. These distributions can be expressed in terms of forward α t and backward β t messages as follows P x t
T 1 = 1 P {y} T 1 α t (x t ) β t (x t ) (5) P x t −1 , x t | {y} T 1 = α t −1 (x t −1 ) P (x t | x t −1 ) P (y t | x
t ) β
t (x t ) P {y} T 1 , (6) where α
t (x t ) = P (x t , {y} t 1 ) and β t (x t ) = P
{y} T t+1 | x t . The messages α t and
β t are typically approximated by an exponential family density; in this paper, we use an unnormalized Gaussian. These approximate messages are iteratively updated by matching the expected sufficient statistics 1 of the marginal posterior (5) with those of the pairwise joint posterior (6). The updates are usually per- formed sequentially via multiple forward-backward passes. During the forward pass, the α t are updated while the β t remain fixed: P x t
T 1 = α t −1 (x t −1 ) P (x t | x t −1 ) P (y t | x
t ) β
t (x t ) P {y} T 1 dx t −1 (7) ≈ ˆ P (x t −1 , x t ) dx
t −1 (8) α t (x t ) ∝ ˆ P (x
t , x
t −1 ) dx t −1 β t (x t ) , (9)
where ˆ P (x
t −1 , x t ) is an exponential family distribution whose expected sufficient statistics are matched to those of P x t −1 , x t | {y} T 1 . In this paper, ˆ P (x t −1 , x t ) is assumed to be Gaussian. The backward pass proceeds similarly, where the β t are updated while the α t remain fixed. The decoded trajectory is obtained by combining the messages α t and β t , as shown in (5), after completing the forward- backward passes. In Section 5, we investigate the accuracy-computational cost tradeoff of using different numbers of forward-backward iterations. Although the expected sufficient statistics (or moments) of P x t −1 , x t | {y} T 1 cannot typically be computed analytically for the nonlinear dynamical model (1), they can be approximated using Gaussian quadrature [26,28]. This EP-based decoder is referred to as GQ-EP. By applying the ideas of Laplace propaga- tion (LP) [33], a closely-related decoder has been developed that uses a modal Gaussian approximation of P x t −1
t | {y}
T 1 rather than matching moments [27,28]. This technique, which uses the same message-passing scheme as GQ-EP, is referred to here as LP. In practice, it is possible to encounter invalid message updates. For example, if the variance of x t in the numerator is larger than that in the denominator in (9) due to approximation error in the choice of ˆ P , the update rule would assign α t
t ) a negative variance. A way around this problem is to simply skip that message update and hope that the update is no longer invalid during the next 1 If the approximating distributions are assumed to be Gaussian, this is equivalent to matching the first two moments. Neural Decoding Using Nonlinear Trajectory Models 591
forward-backward iteration [34]. An alternative is to set β t (x t ) = 1 in (7) and (9), which guarantees a valid update for α t (x t ). This is referred to as the one- sided update and its implications for decoding accuracy and computation time are considered in Section 5. 5 Results
We evaluated decoding accuracy versus computational cost of the techniques described in Sections 3 and 4. These performance comparisons were based on the model (1), where f (x) = (1 − k) x + k · W · erf(x) (10)
λ i (x) = log 1 + e c i x+d i (11)
with parameters W ∈ R
p ×p , c i ∈ R
p ×1 , and d i ∈ R. The error function (erf) in (10) acts element-by-element on its argument. We have chosen the dynam- ics (10) of a fully-connected recurrent network due to its nonlinear nature; we make no claims in this work about its suitability for particular decoding applica- tions, such as for rat paths or arm trajectories. Because recurrent networks are often used to directly model neural activity, it is important to emphasize that x is a vector of physical state variables to be decoded, not a vector of neural activity. We generated 50 state trajectories, each with 50 time points, and correspond- ing spike counts from the model (1), where the model parameters were randomly chosen within a range that provided biologically realistic spike counts (typically, 0 or 1 spike in each bin). The time constant k ∈ R was set to 0.1. To understand how these algorithms scale with different numbers of physical state variables and observed neurons, we considered all pairings (p, q), where p ∈ {3, 10} and q ∈ {20, 100, 500}. For each pairing, we repeated the above procedure three times. For the global Laplace decoder, the modal trajectory was found using Polack- Ribi` ere conjugate gradients with quadratic/cubic line searches and Wolfe-Powell stopping criteria (minimize.m by Carl Rasmussen, available at http://www.kyb. tuebingen.mpg.de/bs/people/carl/code/minimize/). To stabilize GQ-EP, we used a modal Gaussian proposal distribution and the custom precision 3 quadra- ture rule with non-negative quadrature weights, as described in [28]. For both GQ- EP and LP, minimize.m was used to find a mode of P x t −1 , x t | {y} T 1 . Fig. 1 illustrates the decoding accuracy versus computation time of the pre- sented techniques. Decoding accuracy was measured by evaluating the marginal state posteriors P x t | {y} T 1 at the actual trajectory. The higher the log prob- ability, the more accurate the decoder. Each panel corresponds to a different number of state variables and observed neurons. For GQ-EP (dotted line) and LP (solid line), we varied the number of forward-backward iterations between one and three; thus, there are three circles for each of these decoders. Across all panels, global Laplace required the least computation time and yielded state
592 B.M. Yu et al. -207 -199
-191 Log probability -118 -114
-110 -28
-24 -20
10 -1 10 0 10 1 -2600 -1600
-600 Log probability Computation time (sec) 10 -1 10 0 10 1 -950
-600 -250
Computation time (sec) 10 -1 10 0 10 1 -550
-250 50 Computation time (sec) (a) (f)
(e) (d)
(c) (b)
Fig. 1. Decoding accuracy versus computation time of global Laplace (no line), GQ- EP (dotted line), and LP (solid line). (a) p = 3, q = 20, (b) p = 3, q = 100, (c) p = 3, q = 500, (d) p = 10, q = 20, (e) p = 10, q = 100, (f) p = 10, q = 500. The circles and bars represent mean ±SEM. Variability in computation time is not represented on the plots because they were negligible. The computation times were obtained using a 2.2-GHz AMD Athlon 64 processor with 2 GB RAM running MATLAB R14. Note that the scale of the vertical axes is not the same in each panel and that some error bars are so small that they can’t be seen. estimates as accurate as, or more accurate than, the other techniques. This is the key result of this report. We also implemented a basic particle smoother [35], where the number of particles (500 to 1500) was chosen such that its computation time was on the same order as those shown in Fig. 1 (results not shown). Although this particle smoother yielded substantially lower decoding accuracy than global Laplace, GQ-EP, and LP, the three deterministic techniques should be compared to more recently-developed Monte Carlo techniques, as described in Section 6. Fig. 1 shows that all three techniques have computation times that scale well with the number of state variables p and neurons q. In particular, the required computational time typically scales sub-linearly with increases in p and far sub- linearly with increases in q. As the q increases, the accuracies of the techniques become more similar (note that different panels have different vertical scales), and there is less advantage to performing multiple forward-backward iterations for GQ-EP and LP. The decoding accuracy and required computation time both typically increase with the number of iterations. In a few cases (e.g., GQ-EP in Fig. 1(b)), it is possible for the accuracy to decrease slightly when going from two to three iterations, presumably due to one-sided updates. In theory, GQ-EP should require greater computation time than LP because it needs to perform the same modal Gaussian approximation, then use it as a proposal distribution for Gaussian quadrature. In practice, it is possible for LP Neural Decoding Using Nonlinear Trajectory Models 593
to be slower if it needs many one-sided updates (cf. Fig. 1(d)), since one-sided updates are used only when the usual update (9) fails. Furthermore, LP required greater computation time in Fig. 1(d) than in Fig. 1(e) due to the need for many more one-sided updates, despite having five times fewer neurons. It was previously shown that {x }
T 1 is a local optimum of P {x} T 1 | {y} T 1 (i.e., a solution of global Laplace) if and only if it is a fixed-point of LP [33]. Be- cause the modal Gaussian approximation matches local curvature up to second order, it can also be shown that the estimated covariances using global Laplace and LP are equal at {x } T
[33]. Empirically, we found both statements to be true if few one-sided updates were required for LP. Due to these connections be- tween global Laplace and LP, the accuracy of LP after three forward-backward iterations was similar to that of global Laplace in all panels in Fig. 1. Although LP may have computational savings compared to global Laplace in certain ap- plications [33], we found that global Laplace was substantially faster for the particular graph structure described by (1). 6 Conclusion We have presented three deterministic techniques for nonlinear state estimation (global Laplace, GQ-EP, LP) and compared their decoding accuracy versus computation cost in the context of neural decoding, involving high-dimensional observations of non-negative integers. This work can be extended in the follow- ing directions. First, the deterministic techniques presented here should be com- pared to recently-developed Monte Carlo techniques that have yielded increased accuracy and/or reduced computational cost compared to the basic particle fil- ter/smoother in applications other than neural decoding [29]. Examples include the Gaussian particle filter [31], sigma-point particle filter [30], and embedded hid- den Markov model [36]. Second, we have compared these decoders based on one particular non-linear trajectory model (10). Other non-linear trajectory models (e.g., a model describing primate arm movements [37]) should be tested to see if the decoders have similar accuracy-computational cost tradeoffs as shown here. Acknowledgments. This work was supported by NIH-NINDS-CRCNS-R01, NDSEG Fellowship, NSF Graduate Research Fellowship, Gatsby Charitable Foundation, Michael Flynn Stanford Graduate Fellowship, Christopher Reeve Paralysis Foundation, Burroughs Wellcome Fund Career Award in the Biomedical Sciences, Stanford Center for Integrated Systems, NSF Center for Neuromorphic Systems Engineering at Caltech, Office of Naval Research, Sloan Foundation and Whitaker Foundation. References 1. Brown, E.N., Frank, L.M., Tang, D., Quirk, M.C., Wilson, M.A.: A statistical paradigm for neural spike train decoding applied to position prediction from the ensemble firing patterns of rat hippocampal place cells. J. Neurosci 18(18), 7411– 7425 (1998)
594 B.M. Yu et al. 2. Zhang, K., Ginzburg, I., McNaughton, B.L., Sejnowski, T.J.: Interpreting neuronal population activity by reconstruction: Unified framework with application to hip- pocampal place cells. J. Neurophysiol 79, 1017–1044 (1998) 3. Wessberg, J., Stambaugh, C.R., Kralik, J.D., Beck, P.D., Laubach, M., Chapin, J.K., Kim, J., Biggs, J., Srinivasan, M.A., Nicolelis, M.A.L.: Real-time prediction of hand trajectory by ensembles of cortical neurons in primates. Nature 408(6810), 361–365 (2000) 4. Schwartz, A.B., Taylor, D.M., Tillery, S.I.H.: Extraction algorithms for cortical control of arm prosthetics. Curr. Opin. Neurobiol. 11, 701–707 (2001) 5. Serruya, M., Hatsopoulos, N., Fellows, M., Paninski, L., Donoghue, J.: Robustness of neuroprosthetic decoding algorithms. Biol. Cybern. 88(3), 219–228 (2003) 6. Brockwell, A.E., Rojas, A.L., Kass, R.E.: Recursive Bayesian decoding of motor cortical signals by particle filtering. J. Neurophysiol 91(4), 1899–1907 (2004) 7. Wu, W., Black, M.J., Mumford, D., Gao, Y., Bienenstock, E., Donoghue, J.P.: Modeling and decoding motor cortical activity using a switching Kalman filter. IEEE Trans Biomed Eng 51(6), 933–942 (2004) 8. Yu, B.M., Kemere, C., Santhanam, G., Afshar, A., Ryu, S.I., Meng, T.H., Sahani, M., Shenoy, K.V.: Mixture of trajectory models for neural decoding of goal-directed movements. J. Neurophysiol. 97, 3763–3780 (2007) 9. Chapin, J.K., Moxon, K.A., Markowitz, R.S., Nicolelis, M.A.L.: Real-time control of a robot arm using simultaneously recorded neurons in the motor cortex. Nat. Neurosci. 2, 664–670 (1999) 10. Serruya, M.D., Hatsopoulos, N.G., Paninski, L., Fellows, M.R., Donoghue, J.P.: Instant neural control of a movement signal 416, 141–142 (2002) 11. Taylor, D.M., Tillery, S.I.H., Schwartz, A.B.: Direct cortical control of 3D neuro- prosthetic devices. Science 296, 1829–1832 (2002) 12. Carmena, J.M., Lebedev, M.A., Crist, R.E., O’Doherty, J.E., Santucci, D.M., Dim- itrov, D.F., Patil, P.G., Henriquez, C.S., Nicolelis, M.A.L.: Learning to control a brain-machine interface for reaching and grasping by primates. PLoS Biology 1(2), 193–208 (2003) 13. Musallam, S., Corneil, B.D., Greger, B., Scherberger, H., Andersen, R.A.: Cognitive control signals for neural prosthetics. Science 305, 258–262 (2004) 14. Santhanam, G., Ryu, S.I., Yu, B.M., Afshar, A., Shenoy, K.V.: A high-performance brain-computer interface. Nature 442, 195–198 (2006) 15. Hochberg, L.R., Serruya, M.D., Friehs, G.M., Mukand, J.A., Saleh, M., Caplan, A.H., Branner, A., Chen, D., Penn, R.D., Donoghue, J.P.: Neuronal ensemble con- trol of prosthetic devices by a human with tetraplegia. Nature 442, 164–171 (2006) 16. Wu, W., Gao, Y., Bienenstock, E., Donoghue, J.P., Black, M.J.: Bayesian pop- ulation decoding of motor cortical activity using a Kalman filter. Neural Com- put 18(1), 80–118 (2006) 17. Kemere, C., Meng, T.: Optimal estimation of feed-forward-controlled linear sys- tems. In: Proc IEEE ICASSP, pp. 353–356 (2005) 18. Srinivasan, L., Eden, U.T., Willsky, A.S., Brown, E.N.: A state-space analysis for reconstruction of goal-directed movements using neural signals. Neural Com- put 18(10), 2465–2494 (2006) 19. Srinivasan, L., Brown, E.N.: A state-space framework for movement control to dynamic goals through brain-driven interfaces. IEEE Trans. Biomed. Eng. 54(3), 526–535 (2007) 20. Shoham, S., Paninski, L.M., Fellows, M.R., Hatsopoulos, N.G., Donoghue, J.P., Normann, R.A.: Statistical encoding model for a primary motor cortical brain- machine interface. IEEE Trans. Biomed. Eng. 52(7), 1313–1322 (2005)
Neural Decoding Using Nonlinear Trajectory Models 595
21. Wan, E., van der Merwe, R.: The unscented Kalman filter. In: Haykin, S. (ed.) Kalman Filtering and Neural Networks, Wiley Publishing, Chichester (2001) 22. Julier, S., Uhlmann, J.: Unscented filtering and nonlinear estimation. Proceedings of the IEEE 92(3), 401–422 (2004) 23. Arasaratnam, I., Haykin, S., Elliott, R.: Discrete-time nonlinear filtering algorithms using Gauss-Hermite quadrature. Proceedings of the IEEE 95(5), 953–977 (2007) 24. Minka, T.: Expectation propagation for approximate Bayesian inference. In: Pro- ceedings of the 17th Conference on Uncertainty in Artificial Intelligence (UAI), pp. 362–369 (2001) 25. Heskes, T., Zoeter, O.: Expectation propagation for approximate inference in dy- namic Bayesian networks. In: Darwiche, A., Friedman, N. (eds.) Proceedings UAI- 2002, pp. 216–223 (2002) 26. Zoeter, O., Ypma, A., Heskes, T.: Improved unscented Kalman smoothing for stock volatility estimation. In: Barros, A., Principe, J., Larsen, J., Adali, T., Douglas, S. (eds.) Proceedings of the IEEE Workshop on Machine Learning for Signal Process- ing (2004) 27. Ypma, A., Heskes, T.: Novel approximations for inference in nonlinear dynamical systems using expectation propagation. Neurocomputing 69, 85–99 (2005) 28. Yu, B.M., Shenoy, K.V., Sahani, M.: Expectation propagation for inference in non- linear dynamical models with Poisson observations. In: Proc. IEEE Nonlinear Sta- tistical Signal Processing Workshop (2006) 29. Doucet, A., de Freitas, N., Gordon, N. (eds.): Sequential Monte Carlo Methods in Practice. Springer, Heidelberg (2001) 30. van der Merwe, R., Wan, E.: Sigma-point Kalman filters for probabilistic inference in dynamic state-space models. In: Proceedings of the Workshop on Advances in Machine Learning (2003) 31. Kotecha, J.H., Djuric, P.M.: Gaussian particle filtering. IEEE Transactions on Sig- nal Processing 51(10), 2592–2601 (2003) 32. MacKay, D.: Information Theory, Inference and Learning Algorithms. Cambridge University Press, Cambridge (2003) 33. Smola, A., Vishwanathan, V., Eskin, E.: Laplace propagation. In: Thrun, S., Saul, L., Sch¨ olkopf, B. (eds.) Advances in Neural Information Processing Systems, vol. 16, MIT Press, Cambridge (2004) 34. Minka, T., Lafferty, J.: Expectation-propagation for the generative aspect model. In: Proceedings of the 18th Conference on Uncertainty in Artificial Intelligence (UAI), pp. 352–359 (2002) 35. Doucet, A., Godsill, S., Andrieu, C.: On sequential Monte Carlo sampling methods for Bayesian filtering. Statistics and Computing 10(3), 197–208 (2000) 36. Neal, R.M., Beal, M.J., Roweis, S.T.: Inferring state sequences for non-linear sys- tems with embedded hidden Markov models. In: Thrun, S., Saul, L., Sch¨ olkopf,
B. (eds.) Advances in Neural Information Processing Systems, vol. 16, MIT Press, Cambridge (2004) 37. Chan, S.S., Moran, D.W.: Computational model of a primate arm: from hand position to joint angles, joint torques and muscle forces. J. Neural. Eng. 3, 327–337 (2006)
M. Ishikawa et al. (Eds.): ICONIP 2007, Part I, LNCS 4984, pp. 596–603, 2008. © Springer-Verlag Berlin Heidelberg 2008 Download 12.42 Mb. Do'stlaringiz bilan baham: |
ma'muriyatiga murojaat qiling