Lecture Notes in Computer Science
Download 12.42 Mb. Pdf ko'rish
|
- Bu sahifa navigatsiya:
- 5 Conclusion
Fig. 5. Subject distribution of estimated meta-parameters, larning rate, α (inverse temperature), β , and discount rate, γ. Left panel: distribution on α−γ space. Subject LI, NN, and NT (Left panel, indicated by inside of ellipsoid), were trapped to local optimal action sequence. Right panel: distribution on α−β space. Subject BB and LI (right panel, indicated by inside of ellipsoid) were reported that they could not find any confident strategy. Estimating Internal Variables of a Decision Maker’s Brain 603
Theoretical framework of reinforcement learning to model behavioral decision making and the Bayesian estimating method for subjective internal variable can be powerful tools for analyzing both neural recording (Samejima et al., 2005) and human imaging data (Daw et al., 2006; Pessiglione et al., 2006; Tanaka et al., 2006). Especially, tracking meta-parameter of RL can capture behavioral tendency of animal or human decision making. Recently, correlation with anterior cingulated cortex activity and learning rate in uncertain environmental change are reported by using the approach with Bayesian decision model with temporal evolving the parameter of learning rate (Behrens et al., 2007). Although not detailed in this paper, Bayesian estimation framework also provides a way of objectively selecting the best model in reference to the given data. Combination of Bayesian model selection and hidden variable estimation methods would contribute to a new understanding of decision mechanism of our brain through falsifiable hypotheses and objective experimental tests.
1. Behrens, T.E., Woolrich, M.W., Walton, M.E., Rushworth, M.F.: Learning the value of information in an uncertain world. Nat. Neurosci. 10, 1214–1221 (2007) 2. Corrado, G., Doya, K.: Understanding neural coding through the model-based analysis of decision making. J. Neurosci. 27, 8178–8180 (2007) 3. Daw, N.D., O’Doherty, J.P., Dayan, P., Seymour, B., Dolan, R.J.: Cortical substrates for exploratory decisions in humans. Nature 441, 876–879 (2006) 4. Doucet, A., Freitas, N., Gordon, N.: Sequential Monte Carlo Methods in Practice. Springer, Heidelberg (2001) 5. Haruno, M., Kuroda, T., Doya, K., Toyama, K., Kimura, M., Samejima, K., Imamizu, H., Kawato, M.: A neural correlate of reward-based behavioral learning in caudate nucleus: a functional magnetic resonance imaging study of a stochastic decision task. J. Neurosci. 24, 1660–1665 (2004) 6. Pessiglione, M., Seymour, B., Flandin, G., Dolan, R.J., Frith, C.D.: Dopamine-dependent prediction errors underpin reward-seeking behaviour in humans. Nature 442, 1042–1045 (2006)
7. Samejima, K., Doya, K., Ueda, Y., Kimura, M.: Advances in neural processing systems, vol. 16. The MIT Press, Cambridge, Massachusetts, London, England (2004) 8. Samejima, K., Ueda, Y., Doya, K., Kimura, M.: Representation of action-specific reward values in the striatum. Science 310, 1337–1340 (2005) 9. Schultz, W., Dayan, P., Montague, P.R.: A neural substrate of prediction and reward. Science 275, 1593–1599 (1997) 10. Sutton, R.S., Barto, A.G.: Reinforcement Learning. The MIT press, Cambridge (1998) 11. Tanaka, S.C., Samejima, K., Okada, G., Ueda, K., Okamoto, Y., Yamawaki, S., Doya, K.: Brain mechanism of reward prediction under predictable and unpredictable environmental dynamics. Neural Netw. 19, 1233–1241 (2006) Visual Tracking Achieved by Adaptive Sampling from Hierarchical and Parallel Predictions Tomohiro Shibata 1 , Takashi Bando 2 , and Shin Ishii 1,3 1
tom@is.naist.jp 2 DENSO Corporation 3 Graduate School of Informatics, Kyoto University Abstract. Because the inevitable ill-posedness exists in the visual infor- mation, the brain essentially needs some prior knowledge, prediction, or hypothesis to acquire a meaningful solution. From computational point of view, visual tracking is the real-time process of statistical spatiotemporal filtering of target states from an image stream, and incremental Bayesian computation is one of the most important devices. To make Bayesian computation of the posterior density of state variables tractable for any types of probability distribution, Particle Filters (PFs) have been often employed in the real-time vision area. In this paper, we briefly review incremental Bayesian computation and PFs for visual tracking, indicate drawbacks of PFs, and then propose our framework, in which hierarchical and parallel predictions are integrated by adaptive sampling to achieve appropriate balancing of tracking accuracy and robustness. Finally, we discuss the proposed model from the viewpoint of neuroscience. 1 Introduction Because the inevitable ill-posedness exists in the visual information, the brain essentially needs some prior knowledge, prediction, or hypothesis to acquire a meaningful solution. The prediction is also essential for real-time recognition or visual tracking. Due to flood of visual data, examining the whole data is infeasible, and ignoring the irrelevant data is essentially requisite. Primate fovea and oculomotor control can be viewed from this point; high visual acuity is realized by the narrow foveal region on the retina, and the visual axis has to actively move by oculomotor control. Computer vision, in particular real-time vision faces the same computational problems discussed above, and attractive as well as feasible methods and appli- cations have been developed in the light of particle filters (PFs) [4]. One of key ideas of PFs is importance sampling distribution or proposal distribution which can be viewed as prediction or attention in order to overcome the discussed computational problems. The aim of this paper is to propose a novel Bayesian visual tracking frame- work for hierarchically-modeled state variables for single object tracking, and to discuss the PFs and our framework from the viewpoint of neuroscience. M. Ishikawa et al. (Eds.): ICONIP 2007, Part I, LNCS 4984, pp. 604–613, 2008. c Springer-Verlag Berlin Heidelberg 2008
Visual Tracking Achieved by Adaptive Sampling 605
2 Adaptive Sampling from Hierarchical and Parallel Predictions 2.1
Incremental Bayes and Particle Filtering Particle filtering [4] is an approach to performing Bayesian estimation of in- tractable posterior distributions from time series signals with non-Gaussian noise, such to generalize the traditional Kalman filtering. This approach has been attracting attention in various research areas, including real-time visual process- ing (e.g., [5]). In clutter, there are usually several competing observations and these causes the posterior to be multi-modal and therefore non-Gaussian. In reality, using a large number of particles is not allowed especially for real- time processing, and thus there is strong demand to reduce number of particles. Reducing the number of particles, however, can lead to sacrificing accuracy and robustness of filtering, particularly in case that the dimension of state variables is high. How to cope with this trade-off has been one of the most important com- putational issues in PFs, but there have been few efforts to reduce the dimension of the state variables in the context of PFs. Making use of hierarchy in state variables seems a natural solution to this problem. For example, in the case of pose estimation of a head from a video- stream involving state variables of three-dimensional head pose and the two- dimansional position of face features on the image, the high-dimensional state space can be divided into two groups by using the causality in their states, i.e., the head pose strongly affects the position of feace features (cf. Fig. 1, except dotted arrows). There are, however, two big problems in this setting. First, the estimation of the lower state variables is strongly dependent on the estimation of the higher state variables. In real applications, it often happens that the assumption on generating from the higher state variable to the lower state variable is violated. Second, the assumption on generating from the lower state variable to the input, typically image frames, can also be violated. These two problems lead to failure in the estimation. 2.2
Estimation of Hierarchically-Modeled State Variables Here we present a novel framework for hierarchically-modeled state variables for single object tracking. The intuition of our approach is that the higher and lower layer have their own dynamics respectively, and mixing their predictions over the proposal distribution based on their reliability adds both robustness and accuracy in tracking with the fewer number of particles. We assume there are two continuous state vectors at time step t, denoted by a t ∈ R N a and x t ∈ R N x , and they are hierarchically modeled as in Fig. 1. Our goal is then estimating these unobservable states from an observation sequence z 1:t
. State Estimation. According to Bayes rule, the joint posterior density p(a t
t |z 1:t ) is given by p(a
t , x
t |z 1:t ) ∝p(z
t |a t , x t )p(a t , x
t |z 1:t −1 ),
606 T. Shibata, T. Bando, and S. Ishii
Fig. 1. A graphical model of a hierarchical and parallel dynamics. Right panels depict example physical representations processed in the layers. where p(a t , x
t |z 1:t −1 ) is the joint prior density, and p(z t |a
, x t ) is the likelihood. The joint prior density p(a t , x t |z 1:t −1 ) is given by the previous joint posterior density p(a t −1 , x t −1 |z 1:t
−1 ) and the state transition model p(a t , x
t |a t −1 , x
t −1 ) as follows: p(a
t , x
t |z 1:t −1 ) = p(a t , x t |a t −1 , x
t −1 )p(a t −1 , x t −1 |z 1:t −1 )da t −1 dx t −1 . The state vectors a t and x t are assumed to be genereted from the hierarchical model shown in Fig. 1. Furthermore, dynamics model of a t is assumed to be repre- sented as a temporal Markov chain, conditional independence between a t −1 and x t given a t . Under these assumptions, the state transition model p(a t , x
t |a t −1 , x
t −1 ) and the joint prior density p(a t , x t |z 1:t −1 ) can be represented as p(a t
t |a t −1 , x
t −1 ) =p(x t |a t )p(a t |a t −1 ) and p(a
t , x
t |z 1:t −1 ) =p(x
t |a t )p(a t |z 1:t −1 ), respectively. Then, we can carry out hierarchically computation of the joint posterior dis- tribution by PF as follows; first, the higher samples {a (i) t |i = 1, ..., N a } are drawn from the prior density p(a t |z 1:t −1 ). The lower samples {x (j)
t |j = 1, ..., N x } are
then drawn from the prior density p(x t |z 1:t −1 ) described as the proposal dis- tribution p(x t |a (i) t ). Finally, the weights of samples are given by the likelihood p(z t |a t , x
t ). Adaptive Mixing of Proposal Distribution. For applying to the state estimation problem in the real world, the above proposal distribution can be contaminated by the cruel non-Gaussian noise and/or the declination of the assumption. Especially in the case of PF estimation with small number of par- ticles, the contaminated proposal distribution can give fatal disturbance to the Visual Tracking Achieved by Adaptive Sampling 607
estimation. In this study, we assumed the state transition, which are represented as dotted arrows in Fig.1, in lower layer independently of upper layer, and we suppose to enable robust and acculate estimation by adaptive determination of the contribution the from hypothetical state transition. The state transition p(a t
t |a t −1 , x
t −1 ) can be represented as p(a t , x t |a t −1 , x
t −1 ) =p(x t |a t , x t −1 )p(a t |a t −1 ). (1) Here, the dynamics model of x t is assumed to be represented as p(x t |a t , x
t −1 ) =α a,t p(x
t |a t ) + α x,t
p(x t |x t −1 ), (2) with α
a,t + α
x,t = 1. In our algorithm, p(x t |a
, x t −1 ) is modelled as a mixture of approximated prediction densities computed in the lower and higher layers, p(x t
t −1 ) and p(x t |a t ). Then, its mixture ratio α t = {α a,t
, α x,t
} representing the contribution of each layer is determined by means of which enables the interaction between the layers. We describe the determination way of the mixture ratio α
t in the following subsection. From Eqs. (1) and (2), the joint prior density p(a t
t |z 1:t −1 ) is given as p(a t
t |z 1:t −1 ) = p(x
t |a t , z 1:t
−1 )p(a
t |z 1:t −1 ) = π(x t |α t )p(a t |z 1:t −1 ), where π(x
t |α t ) =α a,t
p(x t |a t ) + α
x,t p(x
t |z 1:t −1 ) (3) is the adaptive-mixed proposal distribution for x t based on the prediction den- sities p(x t |a t ) and p(x t |z
−1 ). Determination of α t using On-line EM Algorithm. The mixture ratio α t
π(x t |α t ), and its determination by minimalizing the KL divergence between the posterior density in the lower layer p(x t |a t , z
1:t ) and π(x t |α
) gives robust and accurate estimation. The determination of α t is equal to determination of mixture ratio in two components mixture model, and we employ a sequential Maximum-Likelihood (ML) estimation of the mixture ratio. In our method, the index variable for componet selection becomes the latent variable, and therefore the sequential ML estimation is implemented by means of an on-line EM algo- rithm [11]. Resampling from the posterior density p(x t |a t , z
1:t ), we obtain N x samples ˜ x (i)
t . Using the latent variable m = {m a
x } indicates which prediction density, p(x t |a t ) or p(x
t |z 1:t −1 ) is trusted, the on-line log likelihood can then be represented as L(α
t ) = η
t t τ =1 t s=τ +1
λ s N x i=1
log π(˜ x (i) τ |α t ) = η
t t τ =1 t s=τ +1
λ s N x i=1
log m p(˜ x (i)
τ , m
|α τ ) , 608 T. Shibata, T. Bando, and S. Ishii
t in the lower layer. – Obtain α a,t
N x samples x (i) a,t
from p(x t |a t ). – Obtain α x,t N x samples x (i)
x,t from p(x
t |z 1:t −1 ). – Obtain expectation ˆ x t and Std(x t ) of p(x
n,t |a t , z 1:t
) using N x mixture samples x (i)
t , constituted by x (i) x,t
and x (i)
a,t . Above procedure is applied to each feature, and obtain {ˆx n,t
, Std(x n,t
) }. 2. Estimation of the state variable a t in the higher layer. – Obtain D a,n
(t) based on Std(x n,t
), and then estimate p(a t |z 1:t ). 3. Determination of the mixture ratio ¸ t+1
. – Obtain N x samples ˜ x (i)
t from p(x
n,t |a t , z 1:t
). – Calculate ¸ t+1
such to maximize the on-line log likelihood. Above procedure is applied to each feature, and obtain {¸ n,t+1
}.
Fig. 2. Hierarchical pose estimation algorithm where λ
s is a decay constant to decline adverse influence from former inaccurate estimation, and η t = t τ =1
t s=τ +1
λ s −1 is a normalization constant which works as a learning coefficient. The optimal mixture ratio α ∗ t
KL divergence, which gives the optimal proposal distribution π(x t |α ∗ t ), can be calculated by miximization of the on-line log likelihood as follows: α ∗ m,t = m t m ∈m m t , where p(˜
x (i)
t , m
a |α t ) =α a,t
p(˜ x (i) t |a t ), p(˜
x (i)
t , m
x |α t ) = α x,t
p(˜ x (i) t |z 1:t −1 ), and m t = η t t τ =1 t s=τ +1
λ s N x i=1
p(m |˜x
(i) τ , α τ ), p(m |˜x (i)
t , α
t ) =
p(˜ x (i) t , m
|α t ) m ∈m p(˜ x (i)
t , m
|α t ) . Note that m t
2.3 Application to Pose Estimation of a Rigid Object Here the proposed method is applied to a real problem, pose estimation of a rigid object (cf. Fig. 1). The algorithm is shown in Fig. 2. N f features on the image plane at time step t are denoted by x n,t
= (u n,t
, v n,t
) T , (n = 1, ..., N f ), an affine camera matrix which projects a 3D model of the object onto the image plane at time step t is reshaped into a vector form a t = (a
1,t , ..., a
8,t ) T , for simplicity, and an observed image at time step t is denoted by z t .
assumed in the higher layer and the object’s pose is estimated by Kalman Filter, Visual Tracking Achieved by Adaptive Sampling 609
while tracking of the features is performed by PF because of the existence of cruel non-Gaussian noise, e.g. occlusion, in the lower layer. To obtain the samples x (i)
n,t from the mixture proposal distribution we need two prediction densities, p(x n,t
|a t ) and p(x n,t |z 1:t −1 ). The prediction density computed in the higher layer, p(x n,t
|a t ), which contains the physical relationship between the features, is given by the affine projection process of the 3D model of the rigid object. 2.4 Experiments Simulations. The goal of this simulation is to estimate the posture of a rigid object, as in Fig. 3A, from an observation sequence of eight features. The rigid object was a hexahedral piece, whose size was 20 (upper base) × 30 (lower base) × 20 (height) × 20 (depth) [mm], was rotated at 150mm apart from a pin-hole camera (focal length: 40mm) and was projected onto the image plane (pixel size: 0.1 × 0.1 [mm]) by a perspective projection process disturbed by a Gaussian noise (mean: 0, standard deviation: 1 pixel). The four features at back of the object were occluded by the object itself. An occluded feature was represented by assuming the standard deviation of measurement noise grows to 100 pixels from a non-occluded value of 1 pixel. The other four features at front of the object were not occluded. The length of the observation sequence of the features was 625 frames, i.e., about 21 sec. In this simulation, we compared the performance of our proposed method with (i) the method with a fixed α t = {1, 0} which is equivalent to the simple hierarchical modeling in which the prediction density computed in the higher layer is trusted every time, (ii) the method with a fixed α t = {0, 1} which does not implement the mutual interaction between the layers. The decay constant of the on-line EM algorithm was set at λ t = 0.5. The estimated pose by the adaptive proposal distribution and α a,t
at each time step are shown in Figs. 3B and 3C, respectively. Here, the object’s pose θ t
{θ X,t
, θ Y,t
, θ Z,t
} was calculated from the estimated a t using Extended Kalman Filter (EKF). In our implementation, maximum and minimum values of α a,t were limited to 0.8 and 0.2, respectively, to prohibit the robustness from de- generating. As shown in Fig. 3B, our method achieved robust estimation against the occlusion. Concurrently with the robust estimation of the object’s pose, ap- propriate determination of the mixture ratio was exhibited. For example, as in the case of the feature x 1 , the prediction density computed in the higher layer was emphasized and well predicted using the 3D object’s model during the period in which x 1 was occluded, because the observed feature position contaminated by cruel noise depressed confidence of the lower prediction density. Real Experiments. To investigate the performance against the cruel non- Gaussian noise existing in real environments, the proposed method was applied to a head pose estimation problem of a driver from a real image sequence cap- tured in a car. The face extraction/tracking from an image sequence is a well- studied problem because of its applicability to various area, and then several 610 T. Shibata, T. Bando, and S. Ishii
pose. C: Time course of α a,t
determined by the on-line EM algorithm. Gray back- ground represents a frame in which the feature was occluded. PF algorithms have been proposed. However, for accurate estimation of state variables, e.g. human face, lying in a high dimensional space especially in the case of real-time processing, some techniques for dimensional reduction are re- quired. The proposed method is expected to enable more robust estimation in spite of limitation in computing resource by exploiting hierarchy of state vari- ables. The real image sequence was captured by a near-infrared camera at the back of a handle, i.e., the captured images did not contain color information, and the image resolution was 640 × 480. In such a visual tracking task, the true observation process p(z t |x n,t ) is un- known because the true positions of face features are unobservable, hence, a model of the observation process for calculating the particle’s weight w (i)
n,t is needed. In this study, we employed normalized correlation with a template as the model of the approximate observation process. Although this observation model seems too simple to apply to problems in real environments, it is suf- ficient for examining the efficiency of the proposal distribution. We employed nose, eyes, canthi, eyebrows, corners of mouth as the face features. Using the 3D feature positions measured by a 3D distance measuring equipment, we con- structed the 3D face model. The proposed method was applied by employing 50 particles for each face feature, as well as in the simulation, and processed by a Pentium 4 (2.8 GHz) Windows 2000 PC with 1048MB RAM. Our system processed one frame in 29.05 msec, and hence achieved real-time processing.
Visual Tracking Achieved by Adaptive Sampling 611
Fig. 4. Mean estimation error in the case when α t was estimated by EM, fixed at α a,t
= {1, 0}, and fixed at α a,t =
the figure, they were shorten and the error amount is instead displayed on the top of them.
Fig. 5. Tracked face features Fig. 4 shows the estimation error of the head pose, and the true head pose of the driver measured by a gyro sensor in the car. Our adaptive mixing proposal distribution achieved robust head pose estimation as well as in the simulation task. In Fig. 5, the estimated face features are depicted by “+” for various head pose of the driver; the variance of the estimated feature position is represented as the size of the “+” mark, i.e., the larger a “+” is, the higher estimation confidence the estimator has. 3 Modeling Primate’s Visual Tracking by Particle Filters Here we note that our computer vision study and primates’ vision share the same computational problems and similar constraints. Namely, they need to perform real-time spatiotemporal filtering of the visual data robustly and accurately as
612 T. Shibata, T. Bando, and S. Ishii much as possible with the limited computing resource. Although there are huge numbers of neurons in the brain, their firing rate is very noisy and much slower than recent personal computers. We can visually track only around four to six objects simultaneously (e.g., [2]). These facts indicate the limited attention re- source. As mentioned at the beginning this paper, it is widely known that only the foveal region on the retina can acquire high-resolution images in primates, and that humans usually make saccades mostly to eyes, nose, lip, and contours when we watch a human face. In other words, primates actively ignore irrelevant information against massive image inputs. Furthermore, there have been many behavioral and computational studies reporting that the brain would compute Bayesian statistics (e.g, [8][10]). As we discussed, however, Bayesian computation is intractable in general, and particle filters (PFs) is an attractive and feasible solution to the problem as it is very flexible, easy to implement, parallelisable. Importance sampling is analogous to efficient computing resource delivery. As a whole, we conjecture that the primate’s brain would employ PFs for visual tracking. Although one of the major drawbacks of PF is that a large number of particles, typically exponential to the dimension of state variables, are required for accurate estimation, our proposed framework in which adaptive sampling from hierarchical and parallel predictive distributions can be a solution. As demonstrated in section 2 and in other’s [1], adaptive importance sam- pling from multiple predictions can balance both accuracy and robustness of the estimation with a restricted numbers of particles. Along this line, overt/covert smooth pursuit in primates could be a research target to investigate our conjec- ture. Based on the model of Shibata, et al. [12], Kawawaki et al. investigated the human brain mechanism of overt/covert smooth pursuit by fMRI experiments and suggested that the activity of the anterior/superior lateral occipito-temporal cortex (a/sLOTC) was responsible for target motion prediction rather than mo- tor commands for eye movements [7]. Note that LOTC involves the monkey medial superior temporal (MST) homologue responsible for visual motion pro- cessing (e.g., [9][6]) In their study, the mechanism for increasing the a/sLOTC activity remained unclear. The increase in the a/sLOTC activity was observed particularly when subjects pursued blinking target motion covertly. This blink condition might cause two predictions, e.g., one emphasizes observation and the other its belief as proposed in [1]), and require the computational resource for adaptive sampling. Multiple predictions might be performed in other brain regions such as frontal eye filed (FEF), the inferior temporal (IT) area, fusiform face area (FFA). It is known that FEF involved in smooth pursuit (e.g., [3]), and it has reciprocal connection to the MST area (e.g., [13]), but how they work together is unclear. Visual tracking for more general object tracking rather than a small spot as a visual stimulus requires a specific target representation and distractors’ repre- sentation inculding a background. So the IT, FFA and other areas related to higher-order visual representation would be making parallel predictions to deal with the varying target appearance during tracking. Visual Tracking Achieved by Adaptive Sampling 613
4 Conclusion In this paper, first we have introduced particle filters (PFs) as an approximated incremental Bayesian computation, and pointed out their drawbacks. Then, we have proposed a novel framework for visual tracking based on PFs as a solution to the drawback. The keys of the framework are: (1) high-dimensional state space is decomposed into hierarchical and parallel predictors which treat state variables in the lower dimension, and (2) their integration is achieved by adaptive sampling. The feasibility of our frame work has been demonstrated by real as well as simulation studies. Finally, we have pointed out the shared computational problems between PFs and human visual tracking, presented our conjecture that at least the primate’s brain employs PFs, and discussed its possibility and perspectives for future investigations. References 1. Bando, T., Shibata, T., Doya, K., Ishii, S.: Switching particle filters for efficient visual tracking. J. Robot Auton. Syst. 54(10), 873 (2006) 2. Cavanagh, P., Alvarez, G.A.: Tracking multiple targets with multifocal attention. Trends in Cogn. Sci. 9(7), 349–354 (2005) 3. Fukushima, K., Yamanobe, T., Shinmei, Y., Fukushima, J., Kurkin, S., Peterson, B.W.: Coding of smooth eye movements in three-dimensional space by frontal cortex. Nature 419, 157–162 (2002) 4. Gordon, N.J., Salmond, J.J., Smith, A.F.M.: Novel approach to nonlinear non- Gaussian Bayesian state estimation. IEEE Proc. Radar Signal Processing 140, 107– 113 (1993) 5. Isard, M., Blake, A.: Condensation - conditional density propagation for visual tracking. Int. J. Comput. Vis. 29(1), 5–28 (1998) 6. Kawano, M., Shidara, Y., Watanabe, Y., Yamane, S.: Neural activity in cortical area MST of alert monkey during ocular following responses. J. Neurophysiol 71(6), 2305–2324 (1994) 7. Kawawaki, D., Shibata, T., Goda, N., Doya, K., Kawato, M.: Anterior and superior lateral occipito-temporal cortex responsible for target motion prediction during overt and covert visual pursuit. Neurosci. Res. 54(2), 112 8. Knill, D.C., Pouget, A.: The Bayesian brain: the role of uncertainty in neural coding and computation. Trends in Neurosci. 27(12) (2004) 9. Newsome, W.T., Wurtz, H., Komatsu, R.H.: Relation of cortical areas MT and MST to pursuit eye movements. II. Differentiation of retinal from extraretinal inputs. J. Neurophysiol 60(2), 604–620 (1988) 10. Rao, R.P.N.: The Bayesian Brain: Probabilistic Approaches to Neural Coding. In: Neural Models of Bayesian Belief Propagation, MIT Press, Cambridge (2006) 11. Sato, M., Ishii, S.: On-line EM algorithm for the normalized gaussian network. Neural Computation 12(2), 407–432 (2000) 12. Shibata, T., Tabata, H., Schaal, S., Kawato, M.: A model of smooth pursuit in primates based on learning the target dynamics. Neural Netw. 18(3), 213 13. Tian, J.-R., Lynch, J.C.: Corticocortical input to the smooth and saccadic eye movement subregions of the frontal eye field in cebus monkeys. J. Neurophys- iol 76(4), 2754–2771 (1996) Bayesian System Identification of Molecular Cascades
Junichiro Yoshimoto 1,2
and Kenji Doya 1,2,3
1 Initial Research Project, Okinawa Institute of Science and Technology Corporation 12-22 Suzaki, Uruma, Okinawa 904-2234, Japan {jun-y,doya}@oist.jp 2 Graduate School of Information Science, Nara Institute of Science and Technology 8916-5 Takayama, Ikoma, Nara 630-0192, Japan 3 ATR Computational Neuroscience Laboratories 2-2-2 Hikaridai, “Keihanna Science City”, Kyoto 619-0288, Japan Abstract. We present a Bayesian method for the system identification of molecular cascades in biological systems. The contribution of this study is to provide a theoretical framework for unifying three issues: 1) estimating the most likely parameters; 2) evaluating and visualizing the confidence of the estimated parameters; and 3) selecting the most likely structure of the molecular cascades from two or more alternatives. The usefulness of our method is demonstrated in several benchmark tests. Keywords: Systems biology, biochemical kinetics, system identification, Bayesian inference, Markov chain Monte Carlo method. 1 Introduction In recent years, the analysis of molecular cascades by mathematical models has contributed to the elucidation of intracellular mechanisms related to learning and memory [1,2]. In such modeling studies, the structure and parameters of the molecular cascades are selected based on the literature and databases 1 . How-
ever, if reliable information about a target molecular cascade is not obtained from those repositories, we must tune its structure and parameters so as to fit the model behaviors to the available experimental data. The development of a theoretically sound and efficient system identification framework is crucial for making such models useful. In this article, we propose a Bayesian system iden- tification framework for molecular cascades. For a given set of experimental data, the system identification can be sep- arated into two inverse problems: parameter estimation and model selection. The most popular strategy for parameter estimation is to find a single set of parameters based on the least mean-square-error or maximum likelihood cri- terion [3]. However, we should be aware that the estimated parameters might 1 DOQCS ( http://doqcs.ncbs.res.in/) and BIOMODELS ( http://biomodels. net/) are available for example. M. Ishikawa et al. (Eds.): ICONIP 2007, Part I, LNCS 4984, pp. 614–624, 2008. c Springer-Verlag Berlin Heidelberg 2008
Bayesian System Identification of Molecular Cascades 615
suffer from an “over-fitting effect” because the available set of experimental data is often small and noisy. For evaluating the accuracy of the estimators, statisti- cal methods based on the asymptotic theory [4] and Fisher information [5] were independently proposed. Still, we must pay attention to practical limitations: a large number of data are required for the former method; and the mathematical model should be linear at least locally for the latter method. For model selec- tion, Sugimoto et al. [6] proposed a method based on genetic programming. This provided a systematic way to construct a mathematical model, but a criterion to adjust a tradeoff between data fitness and model complexity was somewhat heuristic. Below, we present a Bayesian formulation for the system identification of molecular cascades. 2 Problem Setting In this study, we consider the dynamic behavior of molecular cascades that can be modeled as a well-stirred mixture of I molecular species {S 1
I }, which
chemically interact through J elementary reaction channels {R 1 , . . . , R J } inside a compartment with a fixed volume Ω and constant temperature [7]. The dy- namical state of this system is specified by a vector X(t) ≡ (X 1
I (t)),
where X i (t) is the number of the molecules of species S i in the system at time t. If we assume that there are a sufficiently large number of molecules within the compartment, it is useful to take the state vector as the molar concentra- tions x(t) ≡ (x
1 (t), . . . , x I (t)) by the transformation of x(t) = X(t)/(ΩN A ), where N A ≈ 6.0 × 10 23 is Avogadro’s constant. In this case, the system behavior can be well approximated by the deterministic ordinary differential equations (ODEs) [8]: ˙x i
J j=1
(ν ij − ν ij )k j I l=1
x l (t) ν lj , i = 1, . . . , I, (1)
where ˙ x ≡ dx/dt denotes the time-derivative (velocity) of a variable x. ν ij and
ν ij are the stoichiometric coefficients of the molecular species S i as a reactant and a product, respectively, in the reaction channel R j . k j is the rate constant of the reaction channel R j . Equation (1) is called the law of mass action. To illustrate an example of this mathematical model, let us consider a simple interaction whose chemical equation is given by A + B k
−→ ←− k b AB.
(2) Here, the two values associated with the arrows (k f and k
b ) are the rate constants of their corresponding reactions. The dynamic behavior of this system is given by [ ˙ AB] = −[ ˙A] = −[ ˙B] = k f [A][B]
− k b [AB], (3) where the bracket [S i ] denotes the concentration of the molecular species S i . Enzymatic reactions, which frequently appear in biological signalings, can be modeled as a series of elementary reactions. In an enzymatic reaction that 616 J. Yoshimoto and K. Doya changes a substrate S into a product P by the catalysis of an enzyme E, for exam- ple, the enzyme-substrate complex ES is regarded as the intermediate product, and the overall reaction is modeled as E + S
k f −→ ←− k b ES k cat −→ E + P. (4)
We can derive a system of ODEs corresponding to the chemical equation (4) based on the law of mass action (1), but it is also common to use the approxi- mation form called the Michaelis-Menten equation [8]: [ ˙
P] = −[ ˙S] = (k cat [E
][S]) / (K M + [S]) , (5) where [E
tot ] ≡ [E] + [ES] is the total concentration of the molecular species including the enzyme E. K M ≡ (k b + k
cat )/k
f is called the Michaelis-Menten constant. When considering signaling communications among other compartments and biochemical experiments wherein drugs are injected, it is convenient to separate a control vector u ≡ (u 1
I ), which denotes I external effects on the system, from a state vector x. Let us consider a model (2), in which [B] can be arbitrarily manipulated by an experimenter. In this case, the control vector is u = ([B]), and the state vector is modified into x = ([A], [AB]). By summarizing the above discussion, the dynamic models of molecular cas- cades concerned in this study assume a general form of ˙x(t) = f (x(t), u(t), q 1 )
x(0) = q 0 , (6) where q
0 ∈ R
I + is a set of I initial values of state variables 2 . q
1 ∈ R
M + is a set of M system parameters such as rate constants and Michaelis-Menten constants. f : (x, u, q 1 )
We assume that a set of control inputs U ≡ {u(t)|t ∈ [0, T f ]
designed before commencing an experiment, where T f is a final time point of the experiment. Let O ⊂ {1, . . . , I} be a set of indices such that the concentration for any species in {S i |i ∈ O} can be observed through the experiment. Then, let t
1 , . . . , t N ∈ [0, T
f ] be the time points at which the observations are made in the experiment. Using these notations, a set of all the data observed in the experiment is defined by Y ≡ {y i
n ) |i ∈ O; n = 1, . . . , N}, where y i (t n ) is the concentration of the species S i observed at time t n . The objective of this study is to identify the mathematical model (6) from a limited number of experimental data Y . This objective can be separated into two inverse problems. One problem involves fitting a parameter vector q ≡ (q 1
0 ) so as to minimize the residuals i (t n ) = y i (t n ) − x i (t n ; q, U ), where x i (t n ; q, U )
denotes the solution of the ODEs (6) at time t n that depends on the param- eter vector q and control inputs U . In other words, we intend to find q such that model (6) reproduces the observation data Y as closely as possible. This inverse problem is called parameter estimation. The second problem is to in- fer the most likely candidate among multiple options {C 1
C C } with dif- ferent model structures (functional forms of f ) from the observation data Y . 2 We denote the set of all non-negative real values by R + in accordance with usage. Bayesian System Identification of Molecular Cascades 617
This problem is called model selection, and we call each C c (c = 1, . . . , C) a “model candidate” in this study. 3 Bayesian System Identification In this section, we first present our Bayesian approach to the parameter estima- tion problem, and then extend the discussion to the model selection problem. In statistical inference, the residuals { i (t n ) |i ∈ O; n = 1, . . . , N} are mod- eled as unpredictable noises. At first, we present a physical interpretation of the noises. Let us assume that each observation y i (t n ) is obtained by sampling particles of the molecular species S i from a small region with a fixed volume ω i (0 < ω
i < Ω) inside the model compartment. Since the interior of the model compartment is well-stirred, the particles of S i would be distributed by a spa- tial Poisson distribution that is independent of the states of the other molecular species. This implies that the number of particles of the sampled species, X ω i
n ), would be distributed according to a binomial distribution with the total number of trials X i (t n ) and success probability (ω i /Ω) [9]. Also, this distribution can be well approximated by X ω i (t n ) i.i.d. ∼ N (ω
i X i (t n )/Ω, ω i X i (t n )/Ω) 3 from the as- sumption of X i (t n ) 0. Using the two relationships, y i (t n ) = X ω i (t n )/(ω i N A ) and x
i (t n ) = X i (t n )/(ΩN
A ), we have y i (t
) i.i.d.
∼ N (x i (t n ), x
i (t n )/(ω i N A )),
that is, i (t n ) i.i.d. ∼ N (0, x i (t n )/γ
i ) ,
(7) where γ
i ≡ ω
i N A . In general, we cannot determine the exact values of γ ≡ (γ i ) i ∈O ∈ R
|O| + ; hence, they are regarded as latent variables to be estimated along with the parameter q via the Bayesian inference. Now, we consider the Bayesian inference of the set of all the unknown variables θ ≡ (q, γ), i.e., the posterior distribution of θ. By combining (7) with (6), we have the conditional distribution of the observation data Y : p(Y
|θ; U) = N n=1 i ∈O N 1 (y i (t n ); x i (t n ; q, U ), x i (t n ; q, U )/γ i ) ,
(8) where
N p ( ·; ·, ·) denotes a p-dimensional Gaussian density function 4 . According to Bayes’ theorem, the posterior distribution of θ is given by p(θ
|Y ; U) = p(Y
|θ;U)p(θ;U) p(Y ;U )
= p(Y
|θ;U)p(θ;U) p(Y
|θ;U)p(θ;U)dθ , (9) where p(θ; U ) is a prior distribution of θ defined before obtaining Y . In this study, we employ the prior distribution that has a form of p(θ; U ) = I+M
i=1 L (q
i ; ξ
0 ) i ∈O G (γ
i ; α
0 , β
0 ) ,
(10) where q
i and γ
i are the i-th elements of the vectors q and γ, respectively. α 0
0 , and ξ
0 are small positive constants. L (·; ·) and G (·; ·, ·) denote the density 3 N(μ, σ 2 ) denotes a Gaussian (or normal) distribution with mean μ and variance σ 2 .
N p (x; m, S) ≡ (2π) −p/2
|S| −1/2
exp {− (x − m) S −1 (x
be noted that all vectors are supposed to be row vectors throughout this article. 618 J. Yoshimoto and K. Doya functions of a log-Laplace distribution and a gamma distribution, respectively 5 . By substituting (8) and (10) for (9), the posterior distribution is well-defined. The posterior distribution can be regarded as a degree of certainty of θ de- duced from the outcome Y [10]. The most “likely” value of θ can be given by the maximum a posteriori (MAP) estimator θ M AP ≡ argmax
θ p(θ
|Y ; U). This value is asymptotically equivalent to the solutions of the maximum likelihood method and a generalized least-mean-square method as the number of time points N approaches infinity. We must be aware that the MAP estimator asymptotically has a variance of O(1/N ) as long as N is finite. Accordingly, it is informative to evaluate a high-confidence region of θ. The Bayesian inference can be used for this purpose, as will be shown in Section 4. Although we have considered the parameter estimation problem for a given model candidate so far, the denominator of (9), which is called the evidence, plays an important role in the model selection problem. To make the dependence on model candidates explicit, let us write the evidence p(Y ; U ) for each candidate C c (c = 1, . . . , C) as p(Y |C c ; U ). The posterior distribution of C c is given by p( C c |Y ; U) ∝ p(Y |C c ; U ) if the prior distribution over {C 1 , . . . , C C } is uniform. This implies that the model candidate C ∗ , where C ∗ = argmax c p(Y |C c ; U ), is the optimal solution in the sense of the MAP estimation. Accordingly, the main task in the Bayesian model selection reduces to the calculation of the evidence p(Y ; U ) for each model candidate C c . The major difficulty encountered in the Bayesian inference is that an in- tractable integration of the denominator in the final equation in (9) must be evaluated. For convenience of implementation, we adopt a Markov chain Monte Carlo technique [11] to approximate the posterior distribution and evidence. Hereafter, we describe only the key essence of our approximation method. The basic framework of our approximation is based on a Gibbs sampling method [11]. We suppose that it is possible to draw the realizations q and γ directly from two conditional distributions with density functions p(q |γ, Y ; U) and p(γ |q, Y ; U), respectively. In this case, the following iterative procedures asymptotically generate a set of L (L 1) random samples Θ 1:L ≡ {θ
(l) ≡ (q (l) , γ
(l) ) |l = 1, . . . , L} distributed according to the density function p(θ|Y ; U). 1. Initialize the value of γ (0)
appropriately and set l = 1. 2. Generate q (l) from the distribution with the density p(q |γ (l −1) , Y ; U ). 3. Generate γ (l) from the distribution with the density p(γ |q (l)
, Y ; U ). 4. Increment l := l + 1 and go to Step 2. Since we select a gamma distribution as the prior distribution of γ i in this study, p(γ |q, Y ; U) always has the form i ∈O
i |α, β(q)), where α = α 0
N 2 and β(q) = β 0 + 1 2 N n=1 (y i (t n ) −x i (t n ;q,U ))
2 x i (t n ;q,U ) . (11)
This implies that it is possible to draw γ directly from p(γ |q, Y ; U) using a gamma-distributed random number generator in Step 3. On the other hand, the 5 L (x; ξ) ≡ exp{−| ln x|/ξ}/(2ξx) and G (γ i ; α, β)
≡ β α x α −1 e −βx /Γ (α), where Γ (α) ≡
0 t α −1 e −t dt is the gamma function. Bayesian System Identification of Molecular Cascades 619
analytical calculation of p(q |γ, Y ; U) is still intractable; however, we know that p(q |γ, Y ; U) is proportional to p(Y |θ; U)p(θ; U) up to the normalization con- stant. Thus, we approximate Step 2 by a Metropolis sampling method [11]. Here, a transition distribution of the Markov chain is constructed based on an adaptive direction sampling method [12] in order to improve the efficiency of the sam- pling. Indeed, it is required to compute the solution of the ODEs x i (t
; q (l)
, U ) explicitly in both Steps 2 and 3; however, this is not significant because many numerical ODE solvers are currently available (see [13] for example). After the procedures generate L random samples Θ 1:L distributed according to the density function p(θ |Y ; U), the target function p(θ|Y ; U) can be well reconstructed by the kernel density estimator [14]: ˜ p(θ |Y ; U) = 1 L L l=1
N D θ θ; θ (l)
, V ; V = 4 (D θ +2)L
2 (Dθ +4)
Cov(Θ 1:L
), (12) where D
θ is the dimensionality of the vector θ, and Cov(Θ 1:L ) is the covariance matrix of the set of samples Θ 1:L
. Futhermore, assuming that ˜ p(θ
|Y ; U) is a good approximator of the true posterior distribution p(θ |Y ; U), the importance sampling theory [11] states that the integration of the denominator in (9) (i.e., the evidence p(Y ; U )) can be well approximated by ˜ p(Y ; U ) = 1 L L l=1 p(Y
|θ (l)
;U )p( θ (l) ;U ) ˜ p( θ (l)
|Y ;U) . (13) 4 Numerical Demonstrations In order to demonstrate the usefulness, we applied our Bayesian method to sev- eral benchmark problems. For all the benchmark problems, the set of constants for the prior distribution were fixed at α 0 = 10 −4 , β
0 = 1, and ξ 0 = 1. The total number of Monte Carlo samples were set at L = 10 7 , and every element in the initial sample θ (0)
was drawn from a uniform distribution within the range of [0.99
× 10 −10
, 1.01 × 10
−10 ]. In the first benchmark, we supposed a situation where the structure of the molecular cascades was fixed by the chemical equation (2), but [A] was un- observable at any time point. Instead of real experiments, the observations Y with the total number of time points N = 51 were synthetically gener- ated by (3) using the parameters q 1 = (k
f , k
b ) = (5.0, 0.01) and the initial state q 0
servation processes were added according to the model (7) with the parameter γ i = 1000. The circles and squares in Fig.1(a) show the generated observation Y . For the observation data Y , we evaluated the posterior distribution of the set of all the unknown variables θ using the method described in Section 3. Figures 1(b- f) show the posterior distributions of non-trivial unknown variables (k f , k
b , [A](0)
and γ i for species B and AB), respectively, after marginalizing out any other vari- ate. Here, the bold-solid lines denote the true values used in generating data Y , and the bold-broken lines are MAP estimators that were approximately obtained by argmax l ∈{1,...,L} p(Y |θ (l) ; U )p(θ (l)
; U ). The histograms indicate the distribu- tion of Monte Carlo samples Θ 1:L , and the curves surrounding the histograms 620 J. Yoshimoto and K. Doya 0 1
3 4 5 0 0.05
0.1 0.15
0.2 0.25
0.3 0.35
Time Concentration Observation data Y and reconstruction
[B] obs
[AB] obs
[B] model
[AB] model
2 3 4 5 6 7 0 0.1
0.2 0.3
0.4 0.5
0.6 k f p(k f |Y;U) Posterior distribution: P(k f |Y;U)
hist true
MAP 0.5%
99.5% 0 0.01 0.02 0.03
0.04 0.05
0 10 20 30 40 50 k b p(k b |Y;U)
Posterior distribution: P(k b |Y;U)
hist true
MAP 0.5%
99.5% (a)
(b) (c)
0.3 0.35
0.4 0.45
0.5 0.55
0.6 0 2 4 6 8 10 12 [A](0): initial concentration of species A p([A](0)|Y;U) Posterior distribution: P([A](0)|Y;U)
true
MAP 0.5%
99.5% 500
1000 1500
2000 0 0.5 1 1.5
x 10 -3 γ 2 (for species B) p( γ
|Y;U) Posterior distribution: P( γ 2
hist true
MAP 0.5%
99.5% 400
600 800
1000 1200
0 0.5
1 1.5
2 2.5
3 x 10
-3 γ 3 (for species AB) p( γ 3 |Y;U)
Posterior distribution: P( γ 3 |Y;U)
hist pdf
true MAP
0.5% 99.5%
(d) (e)
(f) Fig. 1. Brief summaries of results in the first benchmark k f
b Posterior distribution: P(k f ,k
|Y)
3 4 5 6 0 0.005 0.01 0.015
0.02 0.025
0.03 0 10 20 30 40 50 60 k f [A](0)
Posterior distribution: P(k f ,[A](0)|Y)
3 4 5 6 0.3 0.35
0.4 0.45
0.5 0 5 10 15 20 25 30 k b [A](0)
Posterior distribution: P(k b ,[A](0)|Y)
0 0.01 0.02
0.03 0.3
0.35 0.4
0.45 0.5
0 500
1000 1500
Fig. 2. The joint posterior distributions of two out of three parameters (k f , k b , [A](0)) denote the kernel density estimators (12). Though model (3) with the MAP es- timators could reproduce the observation data Y well (see solid and broken lines in Fig.1(a)), the estimators were slightly different from their true values because the total number of data was not very large. For this reason, the broadness of the posterior distribution provided useful information about the confidence in infer- ence. To evaluate the confidence more quantitatively, we defined “99% credible interval” as the interval between 0.5% and 99.5% percentiles along each axis of Θ 1:N . The dotted lines in Figs.1(b-f) show the 99% credible interval. Since the posterior distribution was originally a joint probability density of all the unknown variables θ, it was also possible to visualize the highly confident parameter space even in higher dimensions, which provided useful information about the depen- dence among the parameters. The three panels in Fig.2 show that the joint poste- rior distribution of two out of the three parameters (k f , k
b , and [A](0)), where the white circles denote the true values. From the panels, we can easily observe that these parameters have a strong correlation between them. Interestingly, highly Bayesian System Identification of Molecular Cascades 621
Table 1. MAP estimators and 99% credible intervals in the second benchmark Parameters True MAP 99% itvls. Parameters True MAP 99% itvls. Y s ( ×10
−5 ) 7.00 7.00 [6.83,7.19] k 2
×10 +6 ) 6.00 6.22 [5.77,6.66] k syn
( ×10
−2 ) 1.68 1.97 [1.46,3.97] K IB (
−2 ) 1.00 0.74 [0.28,1.28] 0 10
30 40 50 60 0 0.5 1 1.5
2 2.5
Time Concentration Observation data Y and reconstruction
[B] model
[B] obs
[S] model
[S] obs
0 10 20 30 40 50 60 0 0.1 0.2 0.3
0.4 0.5
0.6 0.7
Time Concentration Observation data Y and reconstruction
[M1] model
[M1] obs
[M2] model
[M2] obs
[M3] model
[M3] obs
k syn
K IB Posterior distribution: P(k syn ,K IB |Y)
0.015 0.02
0.025 0.03
0.002 0.004
0.006 0.008
0.01 0.012
0 0.5
1 1.5
2 x 10
5 (a)
(b) (c)
Fig. 3. (a-b) Observation data Y and state trajectories reconstructed in the second benchmark; and (c) the joint posterior distribution of (k syn , K
IB ) confident spaces over (k f -[A](0)) and (k b -[A](0)) had hyperbolic shapes that re- flected the bilinear form in the ODEs (3). To demonstrate the applicability to more realistic molecular cascades, we adopted a simulator of a bioreactor system developed by [15] in the second benchmark. The mathematical model corresponding to this system was given by the following ODEs: [ ˙
B] = (μ − q
in )[B]; [ ˙S] = q in (c
− [S]) − r 1 M w [B]
[ ˙ M1] = r
1 − r
2 − μ[M1]; [ ˙ M2] = r 2
3 − μ[M2]; [ ˙ M3] = r syn
− μ[M3] r 1 ≡ r 1,max [S] K S +[S] ; r
2 ≡ k 2 [M3][M1]
K M1 +[M1] ; r 3 ≡ r 3,max
[M2] K M2 +[M2] ; r
syn ≡ k syn K IB K IB +[M2] ; μ = Y s r 1 , where M w , r
1,max , K
S , K
M1 , r
3,max , and K
M2 were assumed to be known con- stants that were same as those reported in [15]. u ≡ (c
in , q
in ) were control inputs in this simulator, and all the state variables x ≡ ([B], [S], [M1], [M2], [M3]) were observable. The observation time-series Y were downloaded from the website 6 . The objective in this benchmark was to estimate four unknown system param- eters q
1 ≡ (Y
s , k
2 , k
syn , K
IB ). As a result of evaluating the posterior distribution over all the elements of θ, the MAP estimators and 99% credible intervals of q 1 were obtained, and are shown in Table 1. Indeed, the observation noises in this benchmark were distributed according to i (t n ) i.i.d. ∼ N(0, (0.1x i (t n )) 2 ), which was different from our model (7). This inconsistency and small amount of data led to MAP estimators that were deviated slightly from the true values. How- ever, it was notable that the model with the MAP estimators could reproduce the observation data Y fairly well, as shown in Figs.3(a-b), so that the credible intervals provide good information about the plausibility/implausibility of the 6 http://sysbio.ist.uni-stuttgart.de/projects/benchmark 622 J. Yoshimoto and K. Doya 0 1
3 4 5 0 0.2
0.4 0.6
[S] Observation data Y 1 and reconstruction
C 1 C 2 C 3 Obs. 0 1 2 3 4 5 0 0.1
0.2 Time
[P]
C 1 C 2 C 3 Obs. 0 1 2 3 4 5 0 0.2 0.4 0.6
[S] Observation data Y 2 and reconstruction
C 1 C 2 C 3 Obs. 0 1 2 3 4 5 0 0.2
0.4 Time
[P]
C 1 C 2 C 3 Obs. C1 C2 C3 0 100 200 300
400 373.41
168.29 306.69
Model candidate log-evidence: ln p(Y 1 |C;U)
Log-evidence of three candidates C1 C2 C3 0 100 200 300
189.74 246.92
246.16 Model candidate log-evidence: ln p(Y 2 |C;U) Log-evidence of three candidates (a) for observation data Y 1 (b) for observation data Y 2 Fig. 4. Brief summaries of results in the model selection problem estimated parameters. In addition, we could find a nonlinear dependence between the two parameters (k syn , K
IB ) even in this problem, as shown in Fig.3(c). Finally, our Bayesian method was applied to a simple model selection problem. In this problem, we supposed that the observation data were obtained from the molecular cascades whose chemical equations were written as S in k=1 −→ S; S + E k f 1
−→ ←− k b1 SE k f 2 −→ ←− k b2 P + E, (14) where x
≡ ([S], [P], [E], [SE]) was a state vector of our target compartment, and only two elements, [S] and [P], were observable. u ≡ ([S in
to the compartment. We considered three model candidates depending on the reversibility of the reaction: C 1
mass action without any approximation. In this case, four system parameters (k f 1 , k b1 , k f 2 , and k
b2 ) were estimated. C 2
b2 was sufficiently small so that the second part of model (14) was reduced to model (4). The estimated parameters were also reduced to (k f 1
, k b1 , k f 2 ). C 3 : In addition to the assumption of C 2
approximated by the ODEs (5). In this case, the system parameters were further reduced to (K M 1 , k
f 2 ), where K M 1 ≡ (k
b1 + k
f 2 )/k
f 1 . The goal of this benchmark is to select the most reasonable candidate from {C 1 , C 2 , C 3 } for the given observation data Y . We prepared two sets of observation data Y 1 and Y 2 that were generated by model candidates C 1
C 2 , respectively. The circles in the first and second panels in Figs.4(a-b) show the points of data Y 1 and Y 2 . Here, the curves associated with the circles denote the state trajectories Bayesian System Identification of Molecular Cascades 623
reconstructed by the three model candidates with their MAP estimators. Then, we evaluated the evidence ˜ p(Y d
c ; U ) for d = 1, 2 and c = 1, . . . , 3 by Eq.(13). The results are shown in the bottom panels in Figs.4(a-b), where the evidence is transformed into the natural logarithm to avoid numerical underflow. As shown in Fig.4(a), only C 1 could satisfactorily reproduce the observation data Y 1 , while the others could not. Accordingly, the evidence for the model candidate C 1 was maximum. For the observation data Y 2 , all the model candidates could repro- duce them satisfactorily in a similar manner, as shown in Fig.4(b). In this case, the evidence for the simplest model candidate C 3 was nearly as high as that for C 2 , from which the observation data Y 2 were generated. These results demon- strated that the evidence provided useful information for determining whether a structure of molecular cascades could be further simplified or not. 5 Conclusion In this article, we presented a Bayesian method for the system identification of molecular cascades in biological systems. The advantage of this method is to provide a theoretically sound framework for unifying parameter estimation and model selection. Due to the limited space, we have shown only the small-scale benchmark results here. However, we have also confirmed that the precision of our MAP estimators was comparable to that of an existing good parame- ter estimation method in a medium-scale benchmark with 36 unknown system parameters [3]. The current limitations of our method are difficulty in the conver- gence diagnosis of the Monte Carlo sampling and time-consuming computation for large-scale problems. We will overcome these limitations in our future work. References 1. Bhalla, U.S., Iyengar, R.: Emergent properties of networks of biological signaling pathways. Science 387, 283–381 (1999) 2. Doi, T., et al.: Inositol 1,4,5-trisphosphate-dependent Ca 2+ threshold dynamics detect spike timing in cerebellar Purkinje cells. The Journal of Neuroscience 25(4), 950–961 (2005) 3. Moles, C.G., et al.: Parameter estimation in biochemical pathways: A comparison of global optimization methods. Genome Research 13(11), 2467–2474 (2003) 4. Faller, D., et al.: Simulation methods for optimal experimental design in systems biology. Simulation 79, 717–725 (2003) 5. Banga, J.R., et al.: Computation of optimal identification experiments for nonlinear dynamic process models: A stochastic global optimization approach. Industrial & Engineering Chemistry Research 41, 2425–2430 (2002) 6. Sugimoto, M., et al.: Reverse engineering of biochemical equations from time-course data by means of genetic programming. BioSystems 80, 155–164 (2005) 7. Gillespie, D.T.: The
chemical langevin
equation. Journal
of Chemical
Physics 113(1), 297–306 (2000) 8. Crampin, E.J., et al.: Mathematical and computational techniques to deduce com- plex biochemical reaction mechanisms. Progress in Biophysics & Molecular Biol- ogy 86, 77–112 (2004) 624 J. Yoshimoto and K. Doya 9. Daley, D.J., Vere-Jones, D.: An Introduction to the Theory of Point Processes, 2nd edn. Springer, Heidelberg (2003) 10. Bernardo, J.M., Smith, A.F.M.: Bayesian Theory. John Wiley & Sons Inc., Chich- ester (2000) 11. Andrieu, C., et al.: An introduction to MCMC for machine learning. Machine Learning 50(1-2), 5–43 (2003) 12. Gilks, W.R., et al.: Adaptive direction sampling. The Statistician 43(1), 179–189 (1994)
13. Hindmarsh, A.C., et al.: Sundials: Suite of nonlinear and differential/algebraic equation solvers. ACM Transactions on Mathematical Software 31, 363–396 (2005) 14. Silverman, B.W.: Density Estimation for Statistics and Data Analysis. Chapman and Hall, Boca Raton (1986) 15. Kremling, A., et al.: A benchmark for methods in reverse engineering and model discrimination: Problem formulation and solutions. Genome Research 14(9), 1773– 1785 (2004)
M. Ishikawa et al. (Eds.): ICONIP 2007, Part I, LNCS 4984, pp. 625–634, 2008. © Springer-Verlag Berlin Heidelberg 2008 Download 12.42 Mb. Do'stlaringiz bilan baham: |
ma'muriyatiga murojaat qiling